Build failed in Hudson: Lucene-trunk #1064

2010-01-15 Thread Apache Hudson Server
See Changes: [uschindler] move changes.txt entry into contrib [uschindler] LUCENE-2211: Fix various missing clearAttributes() and improve BaseTokenStreamTestCase to check for this trap ---

[jira] Updated: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

2010-01-15 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2183: Attachment: LUCENE-2183.patch This version "duplicates" the incrementToken method to preve

[jira] Commented: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801027#action_12801027 ] Uwe Schindler commented on LUCENE-2183: --- +1, because this is very speed-sensitive.

[jira] Updated: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

2010-01-15 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2183: Attachment: LUCENE-2183.patch Uwe, using an interface doesn't work though as I can not red

[jira] Commented: (LUCENE-2215) paging collector

2010-01-15 Thread Aaron McCurry (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800981#action_12800981 ] Aaron McCurry commented on LUCENE-2215: --- :) I will try to get my patch ready this w

[jira] Commented: (LUCENE-2127) Improved large result handling

2010-01-15 Thread Adam Heinz (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800965#action_12800965 ] Adam Heinz commented on LUCENE-2127: Entered a placeholder issue for Aaron's patch: LU

[jira] Created: (LUCENE-2215) paging collector

2010-01-15 Thread Adam Heinz (JIRA)
paging collector Key: LUCENE-2215 URL: https://issues.apache.org/jira/browse/LUCENE-2215 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: 3.0, 2.4 Reporter: Adam H

[jira] Resolved: (LUCENE-2185) add @Deprecated annotations

2010-01-15 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2185. - Resolution: Fixed Committed revision 899831 to flex > add @Deprecated annotations > ---

[jira] Commented: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800939#action_12800939 ] Uwe Schindler commented on LUCENE-2183: --- Have not looked detailed into it yet, but i

[jira] Created: (LUCENE-2214) Remove deprecated StemExclusionSet setters in contrib/analyzers

2010-01-15 Thread Simon Willnauer (JIRA)
Remove deprecated StemExclusionSet setters in contrib/analyzers --- Key: LUCENE-2214 URL: https://issues.apache.org/jira/browse/LUCENE-2214 Project: Lucene - Java Issue Type: Task

[jira] Updated: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

2010-01-15 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2183: Attachment: LUCENE-2183.patch I updated the patch to make use of the nice reflection utils

[jira] Updated: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2213: --- Attachment: LUCENE-2213.patch > Small improvements to ArrayUtil.getNextSize > --

[jira] Commented: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800901#action_12800901 ] Michael McCandless commented on LUCENE-2213: Thanks Marvin, I fixed the typo.

[jira] Commented: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-15 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800828#action_12800828 ] Marvin Humphrey commented on LUCENE-2213: - Algorithm looks good. The addition of

[jira] Updated: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-15 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2213: --- Attachment: LUCENE-2213.patch > Small improvements to ArrayUtil.getNextSize > --

[jira] Created: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-15 Thread Michael McCandless (JIRA)
Small improvements to ArrayUtil.getNextSize --- Key: LUCENE-2213 URL: https://issues.apache.org/jira/browse/LUCENE-2213 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCa

[jira] Updated: (LUCENE-1939) IndexOutOfBoundsException at ShingleMatrixFilter's Iterator#hasNext method

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1939: -- Fix Version/s: 2.9.2 > IndexOutOfBoundsException at ShingleMatrixFilter's Iterator#hasNext met

[jira] Resolved: (LUCENE-2211) Improve BaseTokenStreamTestCase to uses a fake attribute to check if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2211. --- Resolution: Fixed Committed into 2.9 branch revision: 899681 > Improve BaseTokenStreamTestC

[jira] Commented: (LUCENE-1939) IndexOutOfBoundsException at ShingleMatrixFilter's Iterator#hasNext method

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800740#action_12800740 ] Uwe Schindler commented on LUCENE-1939: --- Committed into 2.9 branch revision: 899681

[jira] Updated: (LUCENE-2211) Improve BaseTokenStreamTestCase to uses a fake attribute to check if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Attachment: LUCENE-2211-branch29.patch There was a new bug in the 2.9 ShingleMatrixFilter beca

[jira] Updated: (LUCENE-2211) Improve BaseTokenStreamTestCase to uses a fake attribute to check if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Summary: Improve BaseTokenStreamTestCase to uses a fake attribute to check if clearAttributes(

[jira] Assigned: (LUCENE-2211) Improve BaseTokenStreamTestCase to uses a fake attribute to check if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-2211: - Assignee: Uwe Schindler > Improve BaseTokenStreamTestCase to uses a fake attribute to ch

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Attachment: LUCENE-2211-branch29.patch Patch for 2.9 branch. Tests are running... > Advances

[jira] Commented: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800709#action_12800709 ] Uwe Schindler commented on LUCENE-2211: --- Fixed in Lucene 3.0 revision: 899639 > Adv

[jira] Updated: (LUCENE-2185) add @Deprecated annotations

2010-01-15 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2185: Attachment: LUCENE-2185_flex.patch patch for flex, also RegexTermsEnum is undeprecated as it was a

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Attachment: LUCENE-2211-branch30.patch Patch for 3.0 branch > Advances BaseTokenStreamTestCas

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Attachment: LUCENE-2211.patch I'll commit attached patch woith some fixes for typos etc. > Ad

[jira] Commented: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800698#action_12800698 ] Uwe Schindler commented on LUCENE-2211: --- Fixed in trunk revision: 899627 > Advances

Re: Finding frequency of regex query match in a field

2010-01-15 Thread Erick Erickson
Could I ask you to re-post this on the java user's list? This list is for *internal* Lucene development discussion. Thanks Erick On Fri, Jan 15, 2010 at 8:28 AM, Altimatic wrote: > > Hi All, > > I have an application that has to count the frequency that a specific > regular expression is matched

Finding frequency of regex query match in a field

2010-01-15 Thread Altimatic
Hi All, I have an application that has to count the frequency that a specific regular expression is matched on a particular field for each document in an indexed directory. For example. Lets say I have 10 documents in the directory and each document has 3 fields, "table", "column" and "data".

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2211: Attachment: LUCENE-2211.patch i reviewed all code with incrementToken() and found 3 more problems,

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-15 Thread Sanne Grinovero
A common error I see is that people assume the IndexWriter to be not threadsafe, and open several different instances. You should use just one IndexWriter, keep it open and flush periodically (not commit at each add operation), and read the Lucene wiki pages about the IndexWriter settings like ramB

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Attachment: LUCENE-2211.patch More updates to TeeSink and also BaseTokenStreamTestCase to stil

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2211: -- Attachment: LUCENE-2211.patch Some updates to TeeSink test. Changes.txt. > Advances BaseToken