[jira] Resolved: (LUCENE-1172) Small speedups to DocumentsWriter

2008-02-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1172.


Resolution: Fixed

> Small speedups to DocumentsWriter
> -
>
> Key: LUCENE-1172
> URL: https://issues.apache.org/jira/browse/LUCENE-1172
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1172.patch
>
>
> Some small fixes that I found while profiling indexing Wikipedia,
> mainly using our own quickSort instead of Arrays.sort.
> Testing first 200K docs of Wikipedia shows a speedup from 274.6
> seconds to 270.2 seconds.
> I'll commit in a day or two.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1178) Hits does not use MultiSearcher's createWeight

2008-02-14 Thread Israel Tsadok (JIRA)
Hits does not use MultiSearcher's createWeight
--

 Key: LUCENE-1178
 URL: https://issues.apache.org/jira/browse/LUCENE-1178
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.3
Reporter: Israel Tsadok
 Attachments: hits.diff

I am developing a distributed index, using MultiSearcher and RemoteSearcher. 
When investigating some performance issues, I noticed that there is a lot of 
back-and-forth traffic between the servers during the weight calculation.
Although MultiSearcher has a method called createWeight that minimizes the 
calls to the sub-searchers, this method never actually gets called when I call 
search(query).

>From what I can tell, this is fixable by changing in Hits.java the line:
weight = q.weight(s);
to:
weight = s.createWeight(q);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1178) Hits does not use MultiSearcher's createWeight

2008-02-14 Thread Israel Tsadok (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Israel Tsadok updated LUCENE-1178:
--

Attachment: hits.diff

Adding a patch for the suggested solution (it's just two lines, really).

> Hits does not use MultiSearcher's createWeight
> --
>
> Key: LUCENE-1178
> URL: https://issues.apache.org/jira/browse/LUCENE-1178
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.3
>Reporter: Israel Tsadok
> Attachments: hits.diff
>
>
> I am developing a distributed index, using MultiSearcher and RemoteSearcher. 
> When investigating some performance issues, I noticed that there is a lot of 
> back-and-forth traffic between the servers during the weight calculation.
> Although MultiSearcher has a method called createWeight that minimizes the 
> calls to the sub-searchers, this method never actually gets called when I 
> call search(query).
> From what I can tell, this is fixable by changing in Hits.java the line:
> weight = q.weight(s);
> to:
> weight = s.createWeight(q);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1177) IW.optimize() can do too many merges at the very end

2008-02-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1177.


Resolution: Fixed

Committed to 2.3 (and was already fixed on trunk).

> IW.optimize() can do too many merges at the very end
> 
>
> Key: LUCENE-1177
> URL: https://issues.apache.org/jira/browse/LUCENE-1177
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1177.patch
>
>
> This was fixed on trunk in LUCENE-1044 but I'd like to separately
> backport it to 2.3.
> With ConcurrentMergeScheduler there is a bug, only when CFS is on,
> whereby after the final merge of an optimize has finished and while
> it's building its CFS, the merge policy may incorrectly ask for
> another merge to collapse that segment into a compound file.  The net
> effect is optimize can spend many extra iterations unecessarily
> merging a single segment to collapse it to compound file.
> I believe the case is rare (hard to hit), and maybe only if you have
> multiple threads calling optimize at once (the TestThreadedOptimize
> test can hit it), but it's a low-risk fix so I plan to commit to 2.3
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1173) index corruption autoCommit=false

2008-02-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1173.


   Resolution: Fixed
Fix Version/s: 2.3
   2.4

Committed to 2.3 & trunk.

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Fix For: 2.4, 2.3
>
> Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1166) A tokenfilter to decompose compound words

2008-02-14 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated LUCENE-1166:
-

Attachment: CompoundTokenFilter.patch

Updated version:
* new dumb decomposition filter
** uses a brute-force approach by generating substrings and checking them 
against the dictionary
** seems to work better for languages that have no patterns file with a lot of 
special cases
** Is roughly 3 times slower than the decomposition filter using hyphenation 
patterns
** No licensing problems because of the hyphenation pattern files
* Refactoring to have all methods used by both decomposition filters in one 
place
* Minor performance improvements

> A tokenfilter to decompose compound words
> -
>
> Key: LUCENE-1166
> URL: https://issues.apache.org/jira/browse/LUCENE-1166
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Thomas Peuss
> Attachments: CompoundTokenFilter.patch, CompoundTokenFilter.patch, 
> CompoundTokenFilter.patch, de.xml, hyphenation.dtd
>
>
> A tokenfilter to decompose compound words you find in many germanic languages 
> (like German, Swedish, ...) into single tokens.
> An example: Donaudampfschiff would be decomposed to Donau, dampf, schiff so 
> that you can find the word even when you only enter "Schiff".
> I use the hyphenation code from the Apache XML project FOP 
> (http://xmlgraphics.apache.org/fop/) to do the first step of decomposition. 
> Currently I use the FOP jars directly. I only use a handful of classes from 
> the FOP project.
> My question now:
> Would it be OK to copy this classes over to the Lucene project (renaming the 
> packages of course) or should I stick with the dependency to the FOP jars? 
> The FOP code uses the ASF V2 license as well.
> What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1177) IW.optimize() can do too many merges at the very end

2008-02-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1177:
-

Fix Version/s: 2.3.1

> IW.optimize() can do too many merges at the very end
> 
>
> Key: LUCENE-1177
> URL: https://issues.apache.org/jira/browse/LUCENE-1177
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3.1, 2.4
>
> Attachments: LUCENE-1177.patch
>
>
> This was fixed on trunk in LUCENE-1044 but I'd like to separately
> backport it to 2.3.
> With ConcurrentMergeScheduler there is a bug, only when CFS is on,
> whereby after the final merge of an optimize has finished and while
> it's building its CFS, the merge policy may incorrectly ask for
> another merge to collapse that segment into a compound file.  The net
> effect is optimize can spend many extra iterations unecessarily
> merging a single segment to collapse it to compound file.
> I believe the case is rare (hard to hit), and maybe only if you have
> multiple threads calling optimize at once (the TestThreadedOptimize
> test can hit it), but it's a low-risk fix so I plan to commit to 2.3
> shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: cleaning up Jira versions

2008-02-14 Thread Michael McCandless


Thanks Hoss!

Mike

Chris Hostetter wrote:



i added a version 2.3.1 into Jira since mikemccand has been  
backporting several bug fixes onto that branch for an impending  
2.3.1 release.


While doing so i noticed that Jira has a "2.0.1" release listed --  
to the best of my knowledge no release with that name ever actually  
happened, however there are 22 issues that indicate they were fixed  
in "2.0.1"


Unleass anyone objects i will (try to remember to) "merge" 2.0.1  
with 2.1 (assuming Jira's merge versions feature does what i expect  
it to do)




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1173) index corruption autoCommit=false

2008-02-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1173:
-

Fix Version/s: (was: 2.3)
   2.3.1

> index corruption autoCommit=false
> -
>
> Key: LUCENE-1173
> URL: https://issues.apache.org/jira/browse/LUCENE-1173
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Yonik Seeley
>Assignee: Michael McCandless
>Priority: Critical
> Fix For: 2.3.1, 2.4
>
> Attachments: indexstress.patch, indexstress.patch, LUCENE-1173.patch
>
>
> In both Lucene 2.3 and trunk, the index becomes corrupted when 
> autoCommit=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



cleaning up Jira versions

2008-02-14 Thread Chris Hostetter


i added a version 2.3.1 into Jira since mikemccand has been backporting 
several bug fixes onto that branch for an impending 2.3.1 release.


While doing so i noticed that Jira has a "2.0.1" release listed -- to the 
best of my knowledge no release with that name ever actually happened, 
however there are 22 issues that indicate they were fixed in "2.0.1"


Unleass anyone objects i will (try to remember to) "merge" 2.0.1 with 2.1 
(assuming Jira's merge versions feature does what i expect it to do)




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1176) TermVectors corruption case when autoCommit=false

2008-02-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1176:
-

Affects Version/s: 2.3
Fix Version/s: 2.3.1

> TermVectors corruption case when autoCommit=false
> -
>
> Key: LUCENE-1176
> URL: https://issues.apache.org/jira/browse/LUCENE-1176
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3, 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 2.3.1, 2.4
>
> Attachments: LUCENE-1176.patch, LUCENE-1176.take2.patch
>
>
> I took Yonik's awesome test case (TestStressIndexing2) and extended it to 
> also compare term vectors, and, it's failing.
> I still need to track down why, but it seems likely a separate issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1168) TermVectors index files can become corrupt when autoCommit=false

2008-02-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated LUCENE-1168:
-

Fix Version/s: 2.3.1

> TermVectors index files can become corrupt when autoCommit=false
> 
>
> Key: LUCENE-1168
> URL: https://issues.apache.org/jira/browse/LUCENE-1168
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 2.3.1, 2.4
>
> Attachments: LUCENE-1168.patch
>
>
> Spinoff from this thread:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/55951
> There are actually 2 separate cases here, both only happening when
> autoCommit=false:
>   * First issue was caused by LUCENE-843 (sigh): if you add a bunch of
> docs with no term vectors, such that 1 or more flushes happen;
> then you add docs that do have term vectors, the tvx file will not
> have enough entries (= corruption).
>   * Second issue was caused by bulk merging of term vectors
> (LUCENE-1120 -- only in trunk) and bulk merging of stored fields
> (LUCENE-1043, in 2.3), and only shows when autoCommit=false, and,
> the bulk merging optimization runs.  In this case, the code that
> reads the rawDocs tries to read too far in the tvx/fdx files (it's
> not really index corruption but rather a bug in the rawDocs
> reading).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]