On Mon, Nov 29, 2010 at 1:06 PM, Alexander Klimetschek <[email protected]> wrote:
>>The only drawback is that the current jr lucene impl does not fit the >>InfinispanDirectory (infinispan lucene dir). It is because of the >>multi-index and never re-open setup in jr: It was state of the art >>against lucene 1.4, but now mostly redundant. > > Just one node doing the indexing sounds interesting. But I would then > think we store the index inside the repository (as a randomly-accessible > binary), so that you can use any persistence manager and the > implementation is simpler (no need to adapt to the various databases). You are completely right! Good point.. :-) > > We had some plans to do something like this with additional indexes > (calling them "collections") that are created by the application side, but > store inside the repository. And implemented by Lucene (especially for the > full-text part). Hmmm... personally, I wouldn't go this route. I think you next line covers more my thing: > > The idea here is to overcome the problem of the single-big index for the > entire repository that is mandated by the JCR spec. You often want indexes And this is a big burden! I think, we could have a single big index for the JCR spec implementation. But, I wouldn't solve this by having more small indexes, as collections. I would like to have an option, in case of XPath, like 'simpleXPath=true' where we limit some of the options: In other words, not all the jcr spec queries are available, but it is efficient and fast (we at Hippo limit ourselves to only efficient xpath queries). If you do not by default store all properties, and do not have to support complex path constraint (only simple ones), then, you wouldn't have to bother that much about one single Lucene index. Lucene 4.0 will be so blistering fast and efficient...the figures we need to index with Jackrabbit is peanuts for Lucene. *If* we improve indexing, a couple of hundreds of millions of nodes is a no-brainer! We should not be thinking about problems that are a result of the current implementation and its short comings (they are a result that it needed to work against Lucene 1.4, this is no critics to be sure!). > that are only for part of a repository (e.g. /content/siteA) and are > asynchronous (not blocking other repository writes) and can be more easily > thrown away, updated etc. without breaking core repository functionality. asynchronous indexing is already part of the jcr 283 afaik and is allowed, certainly for binary content > >>Anyway, in due time we need to pick this up at the dev list > > Of course. To be continued :-) Regards Ard > > Regards, > Alex > > -- > Alexander Klimetschek > Developer // Adobe (Day) // Berlin - Basel > > > > > -- Hippo Europe • Amsterdam Oosteinde 11 • 1017 WT Amsterdam • +31 (0)20 522 4466 USA • San Francisco 185 H Street Suite B • Petaluma CA 94952-5100 • +1 (707) 773 4646 Canada • Montréal 5369 Boulevard St-Laurent • Montréal QC H2T 1S5 • +1 (514) 316 8966 www.onehippo.com • www.onehippo.org • [email protected]
