[jira] Resolved: (LUCENE-1172) Small speedups to DocumentsWriter
[ https://issues.apache.org/jira/browse/LUCENE-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1172. Resolution: Fixed > Small speedups to DocumentsWriter > - > > Key: LUCENE-1172 > URL: https://issues.apache.org/jira/browse/LUCENE-1172 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1172.patch > > > Some small fixes that I found while profiling indexing Wikipedia, > mainly using our own quickSort instead of Arrays.sort. > Testing first 200K docs of Wikipedia shows a speedup from 274.6 > seconds to 270.2 seconds. > I'll commit in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1178) Hits does not use MultiSearcher's createWeight
Hits does not use MultiSearcher's createWeight -- Key: LUCENE-1178 URL: https://issues.apache.org/jira/browse/LUCENE-1178 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3 Reporter: Israel Tsadok Attachments: hits.diff I am developing a distributed index, using MultiSearcher and RemoteSearcher. When investigating some performance issues, I noticed that there is a lot of back-and-forth traffic between the servers during the weight calculation. Although MultiSearcher has a method called createWeight that minimizes the calls to the sub-searchers, this method never actually gets called when I call search(query). >From what I can tell, this is fixable by changing in Hits.java the line: weight = q.weight(s); to: weight = s.createWeight(q); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1178) Hits does not use MultiSearcher's createWeight
[ https://issues.apache.org/jira/browse/LUCENE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Israel Tsadok updated LUCENE-1178: -- Attachment: hits.diff Adding a patch for the suggested solution (it's just two lines, really). > Hits does not use MultiSearcher's createWeight > -- > > Key: LUCENE-1178 > URL: https://issues.apache.org/jira/browse/LUCENE-1178 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.3 >Reporter: Israel Tsadok > Attachments: hits.diff > > > I am developing a distributed index, using MultiSearcher and RemoteSearcher. > When investigating some performance issues, I noticed that there is a lot of > back-and-forth traffic between the servers during the weight calculation. > Although MultiSearcher has a method called createWeight that minimizes the > calls to the sub-searchers, this method never actually gets called when I > call search(query). > From what I can tell, this is fixable by changing in Hits.java the line: > weight = q.weight(s); > to: > weight = s.createWeight(q); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1177) IW.optimize() can do too many merges at the very end
[ https://issues.apache.org/jira/browse/LUCENE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1177. Resolution: Fixed Committed to 2.3 (and was already fixed on trunk). > IW.optimize() can do too many merges at the very end > > > Key: LUCENE-1177 > URL: https://issues.apache.org/jira/browse/LUCENE-1177 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1177.patch > > > This was fixed on trunk in LUCENE-1044 but I'd like to separately > backport it to 2.3. > With ConcurrentMergeScheduler there is a bug, only when CFS is on, > whereby after the final merge of an optimize has finished and while > it's building its CFS, the merge policy may incorrectly ask for > another merge to collapse that segment into a compound file. The net > effect is optimize can spend many extra iterations unecessarily > merging a single segment to collapse it to compound file. > I believe the case is rare (hard to hit), and maybe only if you have > multiple threads calling optimize at once (the TestThreadedOptimize > test can hit it), but it's a low-risk fix so I plan to commit to 2.3 > shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1173) index corruption autoCommit=false
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1173. Resolution: Fixed Fix Version/s: 2.3 2.4 Committed to 2.3 & trunk. > index corruption autoCommit=false > - > > Key: LUCENE-1173 > URL: https://issues.apache.org/jira/browse/LUCENE-1173 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Yonik Seeley >Assignee: Michael McCandless >Priority: Critical > Fix For: 2.4, 2.3 > > Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch > > > In both Lucene 2.3 and trunk, the index becomes corrupted when > autoCommit=false -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1166) A tokenfilter to decompose compound words
[ https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Peuss updated LUCENE-1166: - Attachment: CompoundTokenFilter.patch Updated version: * new dumb decomposition filter ** uses a brute-force approach by generating substrings and checking them against the dictionary ** seems to work better for languages that have no patterns file with a lot of special cases ** Is roughly 3 times slower than the decomposition filter using hyphenation patterns ** No licensing problems because of the hyphenation pattern files * Refactoring to have all methods used by both decomposition filters in one place * Minor performance improvements > A tokenfilter to decompose compound words > - > > Key: LUCENE-1166 > URL: https://issues.apache.org/jira/browse/LUCENE-1166 > Project: Lucene - Java > Issue Type: New Feature > Components: Analysis >Reporter: Thomas Peuss > Attachments: CompoundTokenFilter.patch, CompoundTokenFilter.patch, > CompoundTokenFilter.patch, de.xml, hyphenation.dtd > > > A tokenfilter to decompose compound words you find in many germanic languages > (like German, Swedish, ...) into single tokens. > An example: Donaudampfschiff would be decomposed to Donau, dampf, schiff so > that you can find the word even when you only enter "Schiff". > I use the hyphenation code from the Apache XML project FOP > (http://xmlgraphics.apache.org/fop/) to do the first step of decomposition. > Currently I use the FOP jars directly. I only use a handful of classes from > the FOP project. > My question now: > Would it be OK to copy this classes over to the Lucene project (renaming the > packages of course) or should I stick with the dependency to the FOP jars? > The FOP code uses the ASF V2 license as well. > What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1177) IW.optimize() can do too many merges at the very end
[ https://issues.apache.org/jira/browse/LUCENE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1177: - Fix Version/s: 2.3.1 > IW.optimize() can do too many merges at the very end > > > Key: LUCENE-1177 > URL: https://issues.apache.org/jira/browse/LUCENE-1177 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.1, 2.4 > > Attachments: LUCENE-1177.patch > > > This was fixed on trunk in LUCENE-1044 but I'd like to separately > backport it to 2.3. > With ConcurrentMergeScheduler there is a bug, only when CFS is on, > whereby after the final merge of an optimize has finished and while > it's building its CFS, the merge policy may incorrectly ask for > another merge to collapse that segment into a compound file. The net > effect is optimize can spend many extra iterations unecessarily > merging a single segment to collapse it to compound file. > I believe the case is rare (hard to hit), and maybe only if you have > multiple threads calling optimize at once (the TestThreadedOptimize > test can hit it), but it's a low-risk fix so I plan to commit to 2.3 > shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: cleaning up Jira versions
Thanks Hoss! Mike Chris Hostetter wrote: i added a version 2.3.1 into Jira since mikemccand has been backporting several bug fixes onto that branch for an impending 2.3.1 release. While doing so i noticed that Jira has a "2.0.1" release listed -- to the best of my knowledge no release with that name ever actually happened, however there are 22 issues that indicate they were fixed in "2.0.1" Unleass anyone objects i will (try to remember to) "merge" 2.0.1 with 2.1 (assuming Jira's merge versions feature does what i expect it to do) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1173) index corruption autoCommit=false
[ https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1173: - Fix Version/s: (was: 2.3) 2.3.1 > index corruption autoCommit=false > - > > Key: LUCENE-1173 > URL: https://issues.apache.org/jira/browse/LUCENE-1173 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Yonik Seeley >Assignee: Michael McCandless >Priority: Critical > Fix For: 2.3.1, 2.4 > > Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch > > > In both Lucene 2.3 and trunk, the index becomes corrupted when > autoCommit=false -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
cleaning up Jira versions
i added a version 2.3.1 into Jira since mikemccand has been backporting several bug fixes onto that branch for an impending 2.3.1 release. While doing so i noticed that Jira has a "2.0.1" release listed -- to the best of my knowledge no release with that name ever actually happened, however there are 22 issues that indicate they were fixed in "2.0.1" Unleass anyone objects i will (try to remember to) "merge" 2.0.1 with 2.1 (assuming Jira's merge versions feature does what i expect it to do) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1176) TermVectors corruption case when autoCommit=false
[ https://issues.apache.org/jira/browse/LUCENE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1176: - Affects Version/s: 2.3 Fix Version/s: 2.3.1 > TermVectors corruption case when autoCommit=false > - > > Key: LUCENE-1176 > URL: https://issues.apache.org/jira/browse/LUCENE-1176 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.4 >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 2.3.1, 2.4 > > Attachments: LUCENE-1176.patch, LUCENE-1176.take2.patch > > > I took Yonik's awesome test case (TestStressIndexing2) and extended it to > also compare term vectors, and, it's failing. > I still need to track down why, but it seems likely a separate issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1168) TermVectors index files can become corrupt when autoCommit=false
[ https://issues.apache.org/jira/browse/LUCENE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1168: - Fix Version/s: 2.3.1 > TermVectors index files can become corrupt when autoCommit=false > > > Key: LUCENE-1168 > URL: https://issues.apache.org/jira/browse/LUCENE-1168 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 2.3.1, 2.4 > > Attachments: LUCENE-1168.patch > > > Spinoff from this thread: > http://www.gossamer-threads.com/lists/lucene/java-dev/55951 > There are actually 2 separate cases here, both only happening when > autoCommit=false: > * First issue was caused by LUCENE-843 (sigh): if you add a bunch of > docs with no term vectors, such that 1 or more flushes happen; > then you add docs that do have term vectors, the tvx file will not > have enough entries (= corruption). > * Second issue was caused by bulk merging of term vectors > (LUCENE-1120 -- only in trunk) and bulk merging of stored fields > (LUCENE-1043, in 2.3), and only shows when autoCommit=false, and, > the bulk merging optimization runs. In this case, the code that > reads the rawDocs tries to read too far in the tvx/fdx files (it's > not really index corruption but rather a bug in the rawDocs > reading). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]