[ https://issues.apache.org/jira/browse/OAK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965442#comment-13965442 ]
Alex Parvulescu commented on OAK-1702: -------------------------------------- bq. For example, given Lucene's field cache, the compression overhead we're seeing now because of repeated intializations of the IndexSearcher might simply vanish once the key problem is addressed. I think we already had numbers showing that only sharing the readers doesn't help that much. This is also ignoring last numbers published by Thomas which already had the new codec in. We can easily run the bench again without the codec and verify this assumption. Digging even deeper I think the decompression is actually caused by the fact that we are now storing everything in the index. Looking at that code from a distance it looks like the decompression happens only on stored fields so why not take it out again and maybe lucene doesn't need to decompress anything. There is another aspect of this decompression issue. It looks like lucene will decompress all the existing fields before accessing the one you need. We're only looking for the path, why pay the price each time? I think it should be smart enough to know how to skip certain bits of information more efficiently, especially in the case where I'm putting the single field I'm interested in at the very beginning of the document and I'm only asking for this one field. Part of this could be optimized with a custom field visitor, and I wanted to address this, but the decompression still remains an open issue. > Create a benchmark for Full text search > --------------------------------------- > > Key: OAK-1702 > URL: https://issues.apache.org/jira/browse/OAK-1702 > Project: Jackrabbit Oak > Issue Type: Task > Components: bench > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Fix For: 1.1 > > Attachments: OAK-1702-hack.patch, OAK-1702-lazy-cursor.patch, > OAK-1702-shared-indexer.patch, OAK-1702.oakcodec.patch, OAK-1702.patch > > > To compare the performance of Full text search between Jackrabbit 2 and Oak a > benchmark should be added. > To start with the benchmark would do following > * Would be based on WikipediaImport benchmark. So it would import the > wikipedia dump and perform full text query on that > * Should be able to run on both JR2 and Oak. Need to account for maven setup > to handle different Lucene version as JR2 uses 3.6.0 and Oak use 4.x > Later we can add concurrent version -- This message was sent by Atlassian JIRA (v6.2#6252)