[jira] Commented: (LUCENE-2676) TestIndexWriter failes for SimpleTextCodec

2010-10-01 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917163#action_12917163
 ] 

Simon Willnauer commented on LUCENE-2676:
-

mike what was the solution to this?

> TestIndexWriter failes for SimpleTextCodec
> --
>
> Key: LUCENE-2676
> URL: https://issues.apache.org/jira/browse/LUCENE-2676
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Index
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> I just ran into this failure since SimpleText obviously takes a lot of disk 
> space though.
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testCommitOnCloseDiskUsage(org.apache.lucene.index.TestIndexWriter):FAILED
> [junit] writer used too much space while adding documents: mid=608162 
> start=5293 end=634214
> [junit] junit.framework.AssertionFailedError: writer used too much space 
> while adding documents: mid=608162 start=5293 end=634214
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter.testCommitOnCloseDiskUsage(TestIndexWriter.java:1047)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 3.281 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testCommitOnCloseDiskUsage 
> -Dtests.seed=-7526585723238322940:-1609544650150801239
> [junit] NOTE: test params are: codec=SimpleText, locale=th_TH, 
> timezone=UCT
> [junit] -  ---
> [junit] Test org.apache.lucene.index.TestIndexWriter FAILED
> {noformat}
> I did not look into SimpleText but I guess we need either change the 
> threshold for this test or exclude SimpleText from it.
> any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2677) Tests failing when run with tests.iter > 1

2010-10-01 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2677.
-

Resolution: Fixed

committed in rev. 1003747

> Tests failing when run with tests.iter > 1
> --
>
> Key: LUCENE-2677
> URL: https://issues.apache.org/jira/browse/LUCENE-2677
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2677.patch
>
>
> TestMultiLevelSkipList and TestsFieldReader are falling if run with 
> -Dtests.iter > 1 - not all values are reset though
> I will attach a patch in a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2677) Tests failing when run with tests.iter > 1

2010-10-01 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917161#action_12917161
 ] 

Simon Willnauer commented on LUCENE-2677:
-

Same is true for org.apache.lucene.search.TestRemoteCachingWrapperFilter - I 
will fix

> Tests failing when run with tests.iter > 1
> --
>
> Key: LUCENE-2677
> URL: https://issues.apache.org/jira/browse/LUCENE-2677
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2677.patch
>
>
> TestMultiLevelSkipList and TestsFieldReader are falling if run with 
> -Dtests.iter > 1 - not all values are reset though
> I will attach a patch in a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2677) Tests failing when run with tests.iter > 1

2010-10-01 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reopened LUCENE-2677:
-


> Tests failing when run with tests.iter > 1
> --
>
> Key: LUCENE-2677
> URL: https://issues.apache.org/jira/browse/LUCENE-2677
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-2677.patch
>
>
> TestMultiLevelSkipList and TestsFieldReader are falling if run with 
> -Dtests.iter > 1 - not all values are reset though
> I will attach a patch in a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2678) TestCachingSpanFilter sometimes fails

2010-10-01 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917160#action_12917160
 ] 

Simon Willnauer commented on LUCENE-2678:
-

awesome - thanks mike

> TestCachingSpanFilter sometimes fails
> -
>
> Key: LUCENE-2678
> URL: https://issues.apache.org/jira/browse/LUCENE-2678
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> if I run 
> {noformat} 
> ant test -Dtestcase=TestCachingSpanFilter -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146 -Dtests.iter=100
> {noformat} 
> I get two failures on my machine against current trunk
> {noformat} 
> junit-sequential:
> [junit] Testsuite: org.apache.lucene.search.TestCachingSpanFilter
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Tests run: 100, Failures: 2, Errors: 0, Time elapsed: 2.297 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: test params are: 
> codec=MockVariableIntBlock(baseBlockSize=43), locale=fr, 
> timezone=Africa/Bangui
> [junit] -  ---
> [junit] Test org.apache.lucene.search.TestCachingSpanFilter FAILED
> {noformat}
> not sure what it is but it seems likely to be a WeakRef / GC issue in the 
> cache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2529) always apply position increment gap between values

2010-10-01 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-2529:
-

Attachment: LUCENE-2529_skip_posIncr_for_1st_token.patch

Always adding the position increment is good but insufficient to solve my 
problem.

A new patch rectifies the followup situation I reported inadvertently to 
LUCENE-2668 that I should have said here.  The jist is that DocInverterPerField 
_conditionally_ decrements the position and then always increments it, and this 
is problematic for attempting to keep position increments across several 
multi-value fields aligned (using an analyzer setting posIncr to 0) when the 
first value generates no tokens (either blank or stop words).  Mike McCandless 
pointed out that the unfortunate existing logic had to do with preventing the 
position from becoming -1 which doesn't work with payloads -- LUCENE-1542.  

My new patch here doesn't even have a pre-decrement nor post-increment and thus 
I find the code easier to follow.  It ignores the provided position increment 
of the first token (typically 1), voiding the need to shift them back and 
forth.  There is one oddity included here and that is I always add 1 to the 
position increment _gap_ (i.e. between values).  With this oddity included, all 
the tests pass (except for the test for this very issue, which I correct in 
this patch)  --yay!  Without this oddity, a handful of tests failed that 
depended on the first token adding one to the position.  My +1 up at the value 
loop can be seen as actually enforcing that the first token's position is 1, 
and also adding a +1 for when there is no token for a value (critical for 
aligning multiple fields).  Perhaps this +1 should happen at a different line 
number to be less confusing but the end result should be the same.

I expect for many people this is very confusing, especially if you're not knee 
deep in this subject as I am presently.  Mike, hopefully you're understanding 
what I'm up to here.  The tests pass, remember.

> always apply position increment gap between values
> --
>
> Key: LUCENE-2529
> URL: https://issues.apache.org/jira/browse/LUCENE-2529
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
> Environment: (I don't know which version to say this affects since 
> it's some quasi trunk release and the new versioning scheme confuses me.)
>Reporter: David Smiley
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: 
> LUCENE-2529_always_apply_position_increment_gap_between_values.patch, 
> LUCENE-2529_skip_posIncr_for_1st_token.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm doing some fancy stuff with span queries that is very sensitive to term 
> positions.  I discovered that the position increment gap on indexing is only 
> applied between values when there are existing terms indexed for the 
> document.  I suspect this logic wasn't deliberate, it's just how its always 
> been for no particular reason.  I think it should always apply the gap 
> between fields.  Reference DocInverterPerField.java line 82:
> if (fieldState.length > 0)
>   fieldState.position += 
> docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> This is checking fieldState.length.  I think the condition should simply be:  
> if (i > 0).
> I don't think this change will affect anyone at all but it will certainly 
> help me.  Presently, I can either change this line in Lucene, or I can put in 
> a hack so that the first value for the document is some dummy value which is 
> wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-792) Pivot (ie: Decision Tree) Faceting Component

2010-10-01 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-792:
--

Summary: Pivot (ie: Decision Tree) Faceting Component  (was: Tree Faceting 
Component)

Jira summary update based on the consensus of what this type of functionality 
should be called

> Pivot (ie: Decision Tree) Faceting Component
> 
>
> Key: SOLR-792
> URL: https://issues.apache.org/jira/browse/SOLR-792
> Project: Solr
>  Issue Type: New Feature
>Reporter: Erik Hatcher
>Assignee: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-792-PivotFaceting.patch, 
> SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, 
> SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, 
> SOLR-792.patch, SOLR-792.patch, SOLR-792.patch
>
>
> A component to do multi-level faceting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2133) ability to parse multiple value sources

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2133.
--

Fix Version/s: 3.1
   Resolution: Fixed

branch_3x: Committed revision 1003741.

> ability to parse multiple value sources
> ---
>
> Key: SOLR-2133
> URL: https://issues.apache.org/jira/browse/SOLR-2133
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2133.patch, SOLR-2133.patch
>
>
> To enable things like this:
> q=dist($pt)&pt=10,20
> The function query parser needs to have the option of parsing a list of value 
> sources rather than just one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (SOLR-2133) ability to parse multiple value sources

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-2133:
--


Reopening to incorporate this into 3x.

> ability to parse multiple value sources
> ---
>
> Key: SOLR-2133
> URL: https://issues.apache.org/jira/browse/SOLR-2133
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-2133.patch, SOLR-2133.patch
>
>
> To enable things like this:
> q=dist($pt)&pt=10,20
> The function query parser needs to have the option of parsing a list of value 
> sources rather than just one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2128) full parameter dereferencing for function queries

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2128.
--

Fix Version/s: 3.1
   Resolution: Fixed

branch_3x: Committed revision 1003739.

> full parameter dereferencing for function queries
> -
>
> Key: SOLR-2128
> URL: https://issues.apache.org/jira/browse/SOLR-2128
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2128.patch
>
>
> We should be able to specify function parameters as $foo (where foo is 
> another request parameter).
> Ideally the parameter could itself be a full function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (SOLR-2128) full parameter dereferencing for function queries

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-2128:
--


Reopening to incorporate this into 3x.

> full parameter dereferencing for function queries
> -
>
> Key: SOLR-2128
> URL: https://issues.apache.org/jira/browse/SOLR-2128
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 4.0
>
> Attachments: SOLR-2128.patch
>
>
> We should be able to specify function parameters as $foo (where foo is 
> another request parameter).
> Ideally the parameter could itself be a full function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-792) Tree Faceting Component

2010-10-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917144#action_12917144
 ] 

Yonik Seeley commented on SOLR-792:
---

One thing I noticed is that the "value" is always a string.  Example:  
"value":"6" as opposed to "value":6 when pivoting by popularity.

Result grouping on the other hand, does use the native value type:
http://localhost:8983/solr/select?q=*:*&group=true&group.field=popularity

One way to think about it is that the labels for faceting normally use string 
values.  But that's because they must for something like JSON.  A different way 
of thinking about it is that whenever we have values (as opposed to keys) we 
should use "native" types boolean, int, float, etc.

Thoughts?

> Tree Faceting Component
> ---
>
> Key: SOLR-792
> URL: https://issues.apache.org/jira/browse/SOLR-792
> Project: Solr
>  Issue Type: New Feature
>Reporter: Erik Hatcher
>Assignee: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-792-PivotFaceting.patch, 
> SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, 
> SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, 
> SOLR-792.patch, SOLR-792.patch, SOLR-792.patch
>
>
> A component to do multi-level faceting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1311

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[uschindler] LUCENE-2507: Fix Java 1.5 violation thanks to hudson with 1.5 :-)

--
[...truncated 14027 lines...]
  [javadoc] Building index for all the packages and classes...
  [javadoc] 
/usr:44:
 warning - Tag @link: reference not found: Directory
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/usr
  [javadoc] Note: Custom tags that were not seen:  @lucene.internal
  [javadoc] 5 warnings
  [jar] Building jar: 
/usr
 [echo] Building queries...

javadocs:
[mkdir] Created dir: 
/usr
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package org.apache.lucene.search.regex...
  [javadoc] Loading source files for package org.apache.lucene.search.similar...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.5.0_16-p9
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
/usr:35:
 warning - Tag @link: can't find prefix in 
org.apache.lucene.search.regex.JakartaRegexpCapabilities
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:33:
 warning - Tag @link: can't find prefix in 
org.apache.lucene.search.regex.JavaUtilRegexCapabilities
  [javadoc] 
/usr:33:
 warning - Tag @link: can't find match in 
org.apache.lucene.search.regex.JavaUtilRegexCapabilities
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:44:
 warning - @param argument "string" is not a parameter name.
  [javadoc] 
/usr:34:
 warning - Tag @see: reference not found: RegexTermEnum
  [javadoc] 
/usr:526:
 warning - Tag @see: reference not found: 
org.apache.lucene.analysis.StopFilter#makeStopSet StopFilter.makeStopSet()
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] Building index for all the packages and classes...
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/usr

[jira] Created: (SOLR-2140) Distributed search treats "score" as multivalued if schema has matching multivalued dynamicField

2010-10-01 Thread Hoss Man (JIRA)
Distributed search treats "score" as multivalued if schema has matching 
multivalued dynamicField


 Key: SOLR-2140
 URL: https://issues.apache.org/jira/browse/SOLR-2140
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4.1
Reporter: Hoss Man


http://search.lucidimagination.com/search/document/e8d10e56ee3ac24b/solr_with_example_jetty_and_score_problem

{noformat}
: But when I issue the query with shard(two instances), the response XML will
: be like following.
: as you can see, that score has bee tranfer to a element  of 
...
: 
: 1.9808292
: 

The root cause of these seems to be your catchall dynamic field
declaration...

:

...that line (specificly the fact that it's multiValued="true") seems to
be confusing the results aggregation code.  my guess is that it's
looping over all the fields, and looking them up in the schema to see if
they are single/multi valued but not recognizing that "score" is
special.
{noformat}

This is trivial to reproduce using the example schema, just add a dynamicField 
type like this...

{noformat}

{noformat}

Load up some data, and then hit this URL...
http://localhost:8983/solr/select?q=*:*&fl=score,id&shards=localhost:8983/solr/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1310

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[rmuir] fix test failure in TestUTF32ToUTF8 (the random regex-generator 
generates invalid ranges)

[rmuir] make tests deterministic

[uschindler] Enable unchecked warnings: we have now only some violations in 
contrib and the recently introduced ones by Ryan. As we want to get rid of them 
(the contrib ones seem to be easy), I switch it globally on.

[rmuir] LUCENE-2507: Add automaton spellchecker

[mikemccand] LUCENE-2678: prevent false failure due to fast GC

[uschindler] for clover runs we can disable the test repetitions, its simply 
too slow. For clover the number of runs is simply useless, as the code coverage 
(should) not really change.
Its better to have more loops in the main tests when clover runs fine.

[mikemccand] LUCENE-2676: disable reader pooling for this test case since that 
causes added disk usage

--
[...truncated 4809 lines...]
[javac] return new FieldsQuery(q, fieldNames, fieldOperator);
[javac]   ^
[javac] 
/usr:67:
 warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: 
java.util.List
[javac] return new OrQuery(queries, infix, orToken.image);
[javac]^
[javac] 
/usr:71:
 warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: 
java.util.List
[javac] return new AndQuery( queries, infix, andToken.image);
[javac]  ^
[javac] 
/usr:75:
 warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: 
java.util.List
[javac] return new NotQuery( queries, notToken.image);
[javac]  ^
[javac] 
/usr:98:
 warning: [unchecked] unchecked conversion
[javac] found   : java.util.List
[javac] required: 
java.util.List
[javac] DistanceQuery dq = new DistanceQuery(queries,
[javac]  ^
[javac] 
/usr:170:
 warning: [unchecked] unchecked call to add(E) as a member of the raw type 
java.util.ArrayList
[javac]   fieldNames.add(fieldName.image);
[javac] ^
[javac] 
/usr:195:
 warning: [unchecked] unchecked call to add(E) as a member of the raw type 
java.util.ArrayList
[javac] queries.add(q);
[javac]^
[javac] 
/usr:198:
 warning: [unchecked] unchecked call to add(E) as a member of the raw type 
java.util.ArrayList
[javac]   queries.add(q);
[javac]  ^
[javac] 
/usr:223:
 warning: [unchecked] unchecked call to add(E) as a member of the raw type 
java.util.ArrayList
[javac] queries.add(q);
[javac]^
[javac] 
/usr:226:
 warning: [unchecked] unchecked call to add(E) as a member of the raw type 
java.util.ArrayList
[javac]   queries.add(q);
[javac]  ^
[javac] 
/usr:251:
 warning: [unchecked] unchecked call to add(E) as a member of the raw type 
java.util.ArrayList
[javac] queries.add(q);
[javac]^
[javac] 
/usr:254:
 warning: [unchecked] unchecked call to add(E) as a member of the ra

[jira] Resolved: (SOLR-2135) ConcurrentLRUCache fails if getLatestAccessedItems(0) called

2010-10-01 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2135.


 Assignee: Hoss Man
Fix Version/s: 3.1
   4.0
   Resolution: Fixed

Thanks for the patch David (especially for the test)

trunk: Committed revision 1003703.

3x: Committed revision 1003707.


> ConcurrentLRUCache fails if getLatestAccessedItems(0) called
> 
>
> Key: SOLR-2135
> URL: https://issues.apache.org/jira/browse/SOLR-2135
> Project: Solr
>  Issue Type: Bug
>Reporter: David Smiley
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2135.patch
>
>
> I'll add a patch which adds a test which demonstrates this.
> ERROR  13:26:39 [o.a.s.core.SolrCore] - java.util.NoSuchElementException
>   at java.util.TreeMap.key(TreeMap.java:1206)
>   at java.util.TreeMap.lastKey(TreeMap.java:274)
>   at java.util.TreeSet.last(TreeSet.java:384)
>   at 
> org.apache.solr.common.util.ConcurrentLRUCache.getLatestAccessedItems(ConcurrentLRUCache.java:437)
>   at org.apache.solr.search.FastLRUCache.warm(FastLRUCache.java:158)
>   at 
> org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1490)
>   at org.apache.solr.core.SolrCore$2.call(SolrCore.java:1127)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:637)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917076#action_12917076
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

Ok, I have been stuck/excited about not having to use/understand the 
remap-docids method, because it's hard to debug.  However I see what you're 
saying, and why remap-docids exists.  I'll push the DWP buffered deletes to the 
flushed deletes.  

bq. we'll pay huge cost opening that massive grandaddy segment 

This large cost is from loading the terms index and deleted docs?   When those 
large segments are merged though, the IO cost is so substantial that loading 
tii or del into RAM probably doesn't account for much of the aggregate IO, 
they're probably in the noise?  Or are you referring to the NRT apply deletes 
flush, however that is on a presumably pooled reader?  Or you're just saying 
that today we're applying deletes across the board to all segments prior to a 
merge, regardless of whether or not they're even involved in the merge?  It 
seems like that is changeable?

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-10-01 Thread Michael McCandless (JIRA)
Improve how IndexWriter flushes deletes against existing segments
-

 Key: LUCENE-2680
 URL: https://issues.apache.org/jira/browse/LUCENE-2680
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.0


IndexWriter buffers up all deletes (by Term and Query) and only
applies them if 1) commit or NRT getReader() is called, or 2) a merge
is about to kickoff.

We do this because, for a large index, it's very costly to open a
SegmentReader for every segment in the index.  So we defer as long as
we can.  We do it just before merge so that the merge can eliminate
the deleted docs.

But, most merges are small, yet in a big index we apply deletes to all
of the segments, which is really very wasteful.

Instead, we should only apply the buffered deletes to the segments
that are about to be merged, and keep the buffer around for the
remaining segments.

I think it's not so hard to do; we'd have to have generations of
pending deletions, because the newly merged segment doesn't need the
same buffered deletions applied again.  So every time a merge kicks
off, we pinch off the current set of buffered deletions, open a new
set (the next generation), and record which segment was created as of
which generation.

This should be a very sizable gain for large indices that mix
deletes, though, less so in flex since opening the terms index is much
faster.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917055#action_12917055
 ] 

Michael McCandless commented on LUCENE-2655:


bq. We've implied an additional change to the way deletes are flushed in that 
today, they're flushed in applyDeletes when segments are merged, however with 
flush-DWPT we're applying deletes after flushing the DWPT segment.

Why is that change needed again?  Deferring until merge kickoff is a win, eg 
for a non N/R/T (heh) app, it means 1/10th the reader open cost (w/ default 
mergeFactor=10)?  Opening/closing readers can be costly for a large index.

Really, some day, we ought to only apply deletes to those segments about to be 
merged (and keep the buffer for the rest of the segments). Eg most merges are 
small... yet we'll pay huge cost opening that massive grandaddy segment every 
time these small merges kick off.  But that's another issue...

Why can't we just do what we do today?  Ie push the DWPT buffered deletes into 
the flushed deletes?

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917046#action_12917046
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

We've implied an additional change to the way deletes are flushed in that 
today, they're flushed in applyDeletes when segments are merged, however with 
flush-DWPT we're applying deletes after flushing the DWPT segment.

Also we'll have a globalesque buffered deletes presumably located in IW that 
buffers deletes for the existing segments, and these should [as today] be 
applied only when segments are merged or getreader is called?

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2507) automaton spellchecker

2010-10-01 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2507.
-

Resolution: Fixed

Committed revision 1003642.

> automaton spellchecker
> --
>
> Key: LUCENE-2507
> URL: https://issues.apache.org/jira/browse/LUCENE-2507
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/spellchecker
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch, 
> LUCENE-2507.patch, LUCENE-2507.patch
>
>
> The current spellchecker makes an n-gram index of your terms, and queries 
> this for spellchecking.
> The terms that come back from the n-gram query are then re-ranked by an 
> algorithm such as Levenshtein.
> Alternatively, we could just do a levenshtein query directly against the 
> index, then we wouldn't need
> a separate index to rebuild.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Hudson build is back to normal : Lucene-3.x #131

2010-10-01 Thread Uwe Schindler
The hudson build on the FreeBSD jail seems to work now after installing a 
recent openjdk6 (which works in FreeBSD).

But as no JDK5 is available without linux-compatibility package in freebsd, I 
set the symlink to latest1.5 also to this openjdk (so it will compile), but we 
cannot test real 1.5 builds. I asked infra about enabling the linuxprocfs, so 
that we can use Sun JDK 1.5 (which works with linux compatibility layer *and* 
procfs only).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Apache Hudson Server [mailto:hud...@hudson.apache.org]
> Sent: Friday, October 01, 2010 10:26 PM
> To: dev@lucene.apache.org
> Subject: Hudson build is back to normal : Lucene-3.x #131
> 
> See 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2676) TestIndexWriter failes for SimpleTextCodec

2010-10-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2676:
---

Fix Version/s: 3.1

> TestIndexWriter failes for SimpleTextCodec
> --
>
> Key: LUCENE-2676
> URL: https://issues.apache.org/jira/browse/LUCENE-2676
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Index
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> I just ran into this failure since SimpleText obviously takes a lot of disk 
> space though.
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testCommitOnCloseDiskUsage(org.apache.lucene.index.TestIndexWriter):FAILED
> [junit] writer used too much space while adding documents: mid=608162 
> start=5293 end=634214
> [junit] junit.framework.AssertionFailedError: writer used too much space 
> while adding documents: mid=608162 start=5293 end=634214
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter.testCommitOnCloseDiskUsage(TestIndexWriter.java:1047)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 3.281 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testCommitOnCloseDiskUsage 
> -Dtests.seed=-7526585723238322940:-1609544650150801239
> [junit] NOTE: test params are: codec=SimpleText, locale=th_TH, 
> timezone=UCT
> [junit] -  ---
> [junit] Test org.apache.lucene.index.TestIndexWriter FAILED
> {noformat}
> I did not look into SimpleText but I guess we need either change the 
> threshold for this test or exclude SimpleText from it.
> any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Hudson build is back to normal : Lucene-3.x #131

2010-10-01 Thread Apache Hudson Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2678) TestCachingSpanFilter sometimes fails

2010-10-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2678.


Fix Version/s: 3.1
   Resolution: Fixed

> TestCachingSpanFilter sometimes fails
> -
>
> Key: LUCENE-2678
> URL: https://issues.apache.org/jira/browse/LUCENE-2678
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> if I run 
> {noformat} 
> ant test -Dtestcase=TestCachingSpanFilter -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146 -Dtests.iter=100
> {noformat} 
> I get two failures on my machine against current trunk
> {noformat} 
> junit-sequential:
> [junit] Testsuite: org.apache.lucene.search.TestCachingSpanFilter
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Tests run: 100, Failures: 2, Errors: 0, Time elapsed: 2.297 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: test params are: 
> codec=MockVariableIntBlock(baseBlockSize=43), locale=fr, 
> timezone=Africa/Bangui
> [junit] -  ---
> [junit] Test org.apache.lucene.search.TestCachingSpanFilter FAILED
> {noformat}
> not sure what it is but it seems likely to be a WeakRef / GC issue in the 
> cache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2671) Add sort missing first/last ability to SortField and ValueComparator

2010-10-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917023#action_12917023
 ] 

Uwe Schindler commented on LUCENE-2671:
---

Additionally, the source code formatting is not Lucene-conform (should be no 
newline before {, no extra space around method parameters).

> Add sort missing first/last ability to SortField and ValueComparator
> 
>
> Key: LUCENE-2671
> URL: https://issues.apache.org/jira/browse/LUCENE-2671
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 4.0
>
> Attachments: LUCENE-2671-SortMissingLast.patch
>
>
> When SortField and ValueComparator use EntryCreators (from LUCENE-2649) they 
> use a special sort value when the field is missing.
> This enables lucene to implement 'sort missing last' or 'sort missing first' 
> for numeric values from the FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2671) Add sort missing first/last ability to SortField and ValueComparator

2010-10-01 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-2671:
---


Hi Ryan,

this patch causes tons of unchecked warnings, without it, Lucene compiles 
without any.

The generics policeman does not understand this code so he cannot fix:

{noformat}
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:209:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: T
[javac] key.creator.validate( (T)value, reader);
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:278:
 warning: [unchecked] unchecked call to 
Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreator) as a 
member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry
[javac] return (ByteValues)caches.get(Byte.TYPE).get(reader, new 
Entry(field, creator));
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:278:
 warning: [unchecked] unchecked call to 
get(org.apache.lucene.index.IndexReader,org.apache.lucene.search.FieldCacheImpl.Entry)
 as a member of the raw type org.apache.lucene.search.FieldCacheImpl.Cache
[javac] return (ByteValues)caches.get(Byte.TYPE).get(reader, new 
Entry(field, creator));
[javac] ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:293:
 warning: [unchecked] unchecked call to 
Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreator) as a 
member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry
[javac] return (ShortValues)caches.get(Short.TYPE).get(reader, new 
Entry(field, creator));
[javac]^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:293:
 warning: [unchecked] unchecked call to 
get(org.apache.lucene.index.IndexReader,org.apache.lucene.search.FieldCacheImpl.Entry)
 as a member of the raw type org.apache.lucene.search.FieldCacheImpl.Cache
[javac] return (ShortValues)caches.get(Short.TYPE).get(reader, new 
Entry(field, creator));
[javac]   ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:308:
 warning: [unchecked] unchecked call to 
Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreator) as a 
member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry
[javac] return (IntValues)caches.get(Integer.TYPE).get(reader, new 
Entry(field, creator));
[javac]^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:308:
 warning: [unchecked] unchecked call to 
get(org.apache.lucene.index.IndexReader,org.apache.lucene.search.FieldCacheImpl.Entry)
 as a member of the raw type org.apache.lucene.search.FieldCacheImpl.Cache
[javac] return (IntValues)caches.get(Integer.TYPE).get(reader, new 
Entry(field, creator));
[javac]   ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:323:
 warning: [unchecked] unchecked call to 
Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreator) as a 
member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry
[javac] return (FloatValues)caches.get(Float.TYPE).get(reader, new 
Entry(field, creator));
[javac]^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:323:
 warning: [unchecked] unchecked call to 
get(org.apache.lucene.index.IndexReader,org.apache.lucene.search.FieldCacheImpl.Entry)
 as a member of the raw type org.apache.lucene.search.FieldCacheImpl.Cache
[javac] return (FloatValues)caches.get(Float.TYPE).get(reader, new 
Entry(field, creator));
[javac]   ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-solr\lucene\src\java\org\apache\lucene\search\FieldCacheImpl.java:337:
 warning: [unchecked] unchecked call to 
Entry(java.lang.String,org.apache.lucene.search.cache.EntryCreator) as a 
member of the raw type org.apache.lucene.search.FieldCacheImpl.Entry
[javac] return (LongValues)caches.ge

[jira] Commented: (LUCENE-2678) TestCachingSpanFilter sometimes fails

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917019#action_12917019
 ] 

Michael McCandless commented on LUCENE-2678:


We hit this same issue in TestCachingWrapperFilter... I forgot to fix 
TestCachingSpanFilter too. Will fix shortly.

> TestCachingSpanFilter sometimes fails
> -
>
> Key: LUCENE-2678
> URL: https://issues.apache.org/jira/browse/LUCENE-2678
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
>
> if I run 
> {noformat} 
> ant test -Dtestcase=TestCachingSpanFilter -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146 -Dtests.iter=100
> {noformat} 
> I get two failures on my machine against current trunk
> {noformat} 
> junit-sequential:
> [junit] Testsuite: org.apache.lucene.search.TestCachingSpanFilter
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Tests run: 100, Failures: 2, Errors: 0, Time elapsed: 2.297 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: test params are: 
> codec=MockVariableIntBlock(baseBlockSize=43), locale=fr, 
> timezone=Africa/Bangui
> [junit] -  ---
> [junit] Test org.apache.lucene.search.TestCachingSpanFilter FAILED
> {noformat}
> not sure what it is but it seems likely to be a WeakRef / GC issue in the 
> cache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2678) TestCachingSpanFilter sometimes fails

2010-10-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2678:
--

Assignee: Michael McCandless

> TestCachingSpanFilter sometimes fails
> -
>
> Key: LUCENE-2678
> URL: https://issues.apache.org/jira/browse/LUCENE-2678
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Search
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
>
> if I run 
> {noformat} 
> ant test -Dtestcase=TestCachingSpanFilter -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146 -Dtests.iter=100
> {noformat} 
> I get two failures on my machine against current trunk
> {noformat} 
> junit-sequential:
> [junit] Testsuite: org.apache.lucene.search.TestCachingSpanFilter
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter):   FAILED
> [junit] expected:<2> but was:<3>
> [junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
> [junit] 
> [junit] 
> [junit] Tests run: 100, Failures: 2, Errors: 0, Time elapsed: 2.297 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
> -Dtestmethod=testEnforceDeletions 
> -Dtests.seed=5015158121350221714:-3342860915127740146
> [junit] NOTE: test params are: 
> codec=MockVariableIntBlock(baseBlockSize=43), locale=fr, 
> timezone=Africa/Bangui
> [junit] -  ---
> [junit] Test org.apache.lucene.search.TestCachingSpanFilter FAILED
> {noformat}
> not sure what it is but it seems likely to be a WeakRef / GC issue in the 
> cache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917018#action_12917018
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

bq. go back to how trunk how buffers deletes (maps to docID), but per-DWPT.

I don't think we need seq ids for the flush-by-DWPT merge to trunk.  I was 
getting confused about the docid being a start from, rather than a stop at.  
I'll implement a map of query/term -> docid-upto per DWPT.

bq. Basically, the DWPT change, alone, is hugely valuable and (I think) 
decouple-able from the trickier/riskier/RAM-consuminger RT changes.

Yes indeed!

bq. I'll try to use the right name

NRT -> RT.  The next one will be need to be either R or T, shall we decide now 
or later?




> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1309

2010-10-01 Thread Apache Hudson Server
See 

--
[...truncated 13443 lines...]
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.index...
  [javadoc] Loading source files for package 
org.apache.lucene.index.codecs.appending...
  [javadoc] Loading source files for package org.apache.lucene.misc...
  [javadoc] Loading source files for package org.apache.lucene.store...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/usr
  [javadoc] Note: Custom tags that were not seen:  @lucene.internal
  [jar] Building jar: 
/usr
 [echo] Building queries...

javadocs:
[mkdir] Created dir: 
/usr
  [javadoc] Generating Javadoc
  [javadoc] Javadoc execution
  [javadoc] Loading source files for package org.apache.lucene.search...
  [javadoc] Loading source files for package org.apache.lucene.search.regex...
  [javadoc] Loading source files for package org.apache.lucene.search.similar...
  [javadoc] Constructing Javadoc information...
  [javadoc] Standard Doclet version 1.6.0
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
/usr:35:
 warning - Tag @link: can't find prefix in 
org.apache.lucene.search.regex.JakartaRegexpCapabilities
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:33:
 warning - Tag @link: can't find prefix in 
org.apache.lucene.search.regex.JavaUtilRegexCapabilities
  [javadoc] 
/usr:33:
 warning - Tag @link: can't find match in 
org.apache.lucene.search.regex.JavaUtilRegexCapabilities
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] 
/usr:44:
 warning - @param argument "string" is not a parameter name.
  [javadoc] 
/usr:34:
 warning - Tag @see: reference not found: RegexTermEnum
  [javadoc] 
/usr:526:
 warning - Tag @see: reference not found: 
org.apache.lucene.analysis.StopFilter#makeStopSet StopFilter.makeStopSet()
  [javadoc] 
/usr:36:
 warning - Tag @link: reference not found: RegexTermEnum
  [javadoc] Building index for all the packages and classes...
  [javadoc] 
/usr:36:
 warning 

[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917004#action_12917004
 ] 

Michael McCandless commented on LUCENE-2655:


bq. I'm guessing you mean RT?

Duh sorry yes... I'll try to use the right name :)

bq. In the current RT revision, the deletes are held in one map in DW, guess we 
need to change that.

Right, one map that maps the del term to the long sequence ID right?  I'm 
thinking we revert just this part, and go back to how trunk how buffers deletes 
(maps to docID), but per-DWPT.

bq. However if we do, why do we need to keep the seq id or docid as the value 
in the map? When the delete arrives into the DWPT, we know that any buffered 
docs with that term/query need to be deleted on flush?  (ie, lets not worry 
about the RT search use case, yet). ie2, we can simply add the terms/queries to 
a set, and apply them on flush, ala LUCENE-2679?

For the interleaved case.  Ie, LUCENE-2679 is optional -- we still must handle 
the interleaved case correctly (and, I think, by default).  But if an app uses 
the opto in LUCENE-2679 then we only need a single Set.

Basically, the DWPT change, alone, is hugely valuable and (I think) 
decouple-able from the trickier/riskier/RAM-consuminger RT changes.

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2676) TestIndexWriter failes for SimpleTextCodec

2010-10-01 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2676.


Resolution: Fixed

> TestIndexWriter failes for SimpleTextCodec
> --
>
> Key: LUCENE-2676
> URL: https://issues.apache.org/jira/browse/LUCENE-2676
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Index
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
>
> I just ran into this failure since SimpleText obviously takes a lot of disk 
> space though.
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testCommitOnCloseDiskUsage(org.apache.lucene.index.TestIndexWriter):FAILED
> [junit] writer used too much space while adding documents: mid=608162 
> start=5293 end=634214
> [junit] junit.framework.AssertionFailedError: writer used too much space 
> while adding documents: mid=608162 start=5293 end=634214
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter.testCommitOnCloseDiskUsage(TestIndexWriter.java:1047)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 3.281 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testCommitOnCloseDiskUsage 
> -Dtests.seed=-7526585723238322940:-1609544650150801239
> [junit] NOTE: test params are: codec=SimpleText, locale=th_TH, 
> timezone=UCT
> [junit] -  ---
> [junit] Test org.apache.lucene.index.TestIndexWriter FAILED
> {noformat}
> I did not look into SimpleText but I guess we need either change the 
> threshold for this test or exclude SimpleText from it.
> any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2668) offset gap should be added regardless of existence of tokens in DocInverterPerField

2010-10-01 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916997#action_12916997
 ] 

David Smiley commented on LUCENE-2668:
--

Apologies; I meant to post to LUCENE-2529

> offset gap should be added regardless of existence of tokens in 
> DocInverterPerField
> ---
>
> Key: LUCENE-2668
> URL: https://issues.apache.org/jira/browse/LUCENE-2668
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2668.patch, LUCENE-2668.patch, LUCENE-2668.patch, 
> Test.java
>
>
> Problem: If a multiValued field which contains a stop word (e.g. "will" in 
> the following sample) only value is analyzed by StopAnalyzer when indexing, 
> the offsets of the subsequent tokens are not correct.
> {code:title=indexing a multiValued field}
> doc.add( new Field( F, "Mike", Store.YES, Index.ANALYZED, 
> TermVector.WITH_OFFSETS ) );
> doc.add( new Field( F, "will", Store.YES, Index.ANALYZED, 
> TermVector.WITH_OFFSETS ) );
> doc.add( new Field( F, "use", Store.YES, Index.ANALYZED, 
> TermVector.WITH_OFFSETS ) );
> doc.add( new Field( F, "Lucene", Store.YES, Index.ANALYZED, 
> TermVector.WITH_OFFSETS ) );
> {code}
> In this program (soon to be attached), if you use WhitespaceAnalyzer, you'll 
> get the offset(start,end) for "use" and "Lucene" will be use(10,13) and 
> Lucene(14,20). But if you use StopAnalyzer, the offsets will be use(9,12) and 
> lucene(13,19). When searching, since searcher cannot know what analyzer was 
> used at indexing time, this problem causes out of alignment of FVH.
> Cause of the problem: StopAnalyzer filters out "will", anyToken flag set to 
> false then offset gap is not added in DocInverterPerField:
> {code:title=DocInverterPerField.java}
> if (anyToken)
>   fieldState.offset += docState.analyzer.getOffsetGap(field);
> {code}
> I don't understand why the condition is there... If always the gap is added, 
> I think things are simple.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: PatternAnalyzer not implemented?

2010-10-01 Thread Andi Vajda


On Fri, 1 Oct 2010, Roman Chyla wrote:


I tried to use the PatternAnalyzer, but am getting NotImplementedError
- in case it is not available, shall I rather use PythonAnalyzer and
implement the regex pattern analyzer with that?

using version: 2.9.3

In [44]: import lucene
In [45]: import pyjama #<-- this package contains java.util.regex.Pattern
In [46]: p = pyjama.Pattern.compile("\\s")
In [47]: p
Out[47]: 
In [48]: import lucene.collections as col
In [49]: s = col.JavaSet([])
In [50]: s
Out[50]: 
In [51]: pa = lucene.PatternAnalyzer(p,True,s)
---
NotImplementedError   Traceback (most recent call last)

/Users/rca/ in ()

NotImplementedError: ('instantiating java class', )


This is because no constructors were generated for PatternAnalyzer. That in 
turn is because the java.util.regex package is missing from the JCC command 
line in PyLucene's Makefile, causing methods and constructors using classes 
in that package to be skipped.


To fix this, add
--package java.util.regex \
around line 214 to PyLucene's Makefile.

It is also strongly recommended that you rebuild pyjama with --import lucene
on the JCC command line so that you don't have JCC generate wrappers 
again for classes that are shared between pyjama and lucene.


Andi..


[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916988#action_12916988
 ] 

Michael McCandless commented on LUCENE-2662:


I instrumented trunk & the patch to see how many times we do new 
byte[bufferSize] while building 5M index, and they both alloc the same number 
of byte[] from the BBA.  So I don't think we have a memory issue...

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916979#action_12916979
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

{quote}when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab 
its current docID, and enroll Term/Query -> docID into that DWPT's pending 
deletes map.{quote}

Ok, that's the change you're referring to. In the current RT revision, the 
deletes are held in one map in DW, guess we need to change that. However if we 
do, why do we need to keep the seq id or docid as the value in the map? When 
the delete arrives into the DWPT, we know that any buffered docs with that 
term/query need to be deleted on flush? (ie, lets *not* worry about the RT 
search use case, yet). ie2, we can simply add the terms/queries to a set, and 
apply them on flush, ala LUCENE-2679?

bq. NRT improvements

We're referring to LUCENE-1516 as NRT and LUCENE-2312 as 'RT'. I'm guessing you 
mean RT? 

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916965#action_12916965
 ] 

Michael McCandless commented on LUCENE-2662:


I also ran a test w/ 5 threads -- they are close (22,402 docs/sec for patch, 
22,868 docs/sec for trunk), and this time avgUsedMem is closer (811 MB for 
trunk, 965 MB for patch).

I don't think the avgUsedMem is that meaningful -- it takes the max of 
Runtime.totalMemory() - Runtime.freeMemory() (which includes garbage I think), 
after each completed task, and then averages across all tasks.  In my case I 
think it's averaging 1 measure per thread, so it's really sort of measuring how 
much garbage there happened to be at the time.

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1308

2010-10-01 Thread Apache Hudson Server
See 

--
[...truncated 2884 lines...]
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/in/package.html
A analysis/common/src/java/org/apache/lucene/analysis/wikipedia
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.jflex
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cjk
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cjk/package.html
A analysis/common/src/java/org/apache/lucene/analysis/es
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/es/package.html
A analysis/common/src/java/org/apache/lucene/analysis/eu
AU
analysis/common/src/java/org/apache/lucene/analysis/eu/BasqueAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/eu/package.html
A analysis/common/src/java/org/apache/lucene/analysis/it
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/it/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cz
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cz/package.html
A analysis/common/src/java/org/apache/lucene/analysis/synonym
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java
A analysis/common/src/java/org/apache/lucene/analysis/util
AU
analysis/common/src/java/org/apache/lucene/analysis/util/StopwordAnalyzerBase.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/ReusableAnalyzerBase.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/CharArraySet.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/CharArrayMap.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/StemmerUtil.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/WordlistLoader.java
A analysis/common/src/java/org/apache/lucene/collation
AU
analysis/common/src/java/org/apache/lucene/collation/CollationKeyAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/collation/CollationKeyFilter.java
AUanalysis/common/src/java/org/apache/lucene/collation/package.html
A analysis/common/src/java/org/tartarus
A analysis/common/src/java/org/tartarus/snowball
AUanalysis/common/src/java/org/tartarus/snowball/TestApp.java
A analysis/common/src/java/org/tartarus/snowball/ext
AU
analysis/common/src/java/org/tartarus/snowball/ext/PortugueseStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/CatalanStemmer.java
AU
analysis/common/src/java/org/tartarus/snowball/ext/RomanianStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/SpanishStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/FrenchStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/SwedishStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/DanishStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/DutchStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/GermanStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/KpStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/LovinsStemmer.java
AUanalysis/common/src/java/org/tartarus/snowball/ext/PorterStemmer.java
AU
analysis/common/src/java/org/tartarus/snowball/ext/HungarianStemmer.java
AUanalysis/

Build failed in Hudson: Lucene-trunk #1307

2010-10-01 Thread Apache Hudson Server
See 

--
[...truncated 2862 lines...]
A analysis/common/src/java/org/apache/lucene/analysis/ar
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicLetterTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemmer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/ar/package.html
A analysis/common/src/java/org/apache/lucene/analysis/en
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemFilter.java
A 
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/en/package.html
A analysis/common/src/java/org/apache/lucene/analysis/position
AU
analysis/common/src/java/org/apache/lucene/analysis/position/PositionFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/position/package.html
A analysis/common/src/java/org/apache/lucene/analysis/in
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizationFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/in/package.html
A analysis/common/src/java/org/apache/lucene/analysis/wikipedia
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.jflex
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cjk
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cjk/package.html
A analysis/common/src/java/org/apache/lucene/analysis/es
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/es/package.html
A analysis/common/src/java/org/apache/lucene/analysis/eu
AU
analysis/common/src/java/org/apache/lucene/analysis/eu/BasqueAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/eu/package.html
A analysis/common/src/java/org/apache/lucene/analysis/it
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/it/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cz
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cz/package.html
A analysis/common/src/java/org/apache/lucene/analysis/synonym
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java
A analysis/common/src/java/org/apache/lucene/analysis/util
AU
analysis/common/src/java/org/apache/lucene/analysis/util/StopwordAnalyzerBase.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/ReusableAnalyzerBase.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/CharArraySet.java
AU
analysis

Build failed in Hudson: Lucene-trunk #1306

2010-10-01 Thread Apache Hudson Server
See 

--
[...truncated 3586 lines...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 46.691 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestScoreCachingWrappingScorer
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.224 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestScorerPerf
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.22 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSetNorm
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSimilarity
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSimpleExplanations
[junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 2.027 sec
[junit] 
[junit] Testsuite: 
org.apache.lucene.search.TestSimpleExplanationsOfNonMatches
[junit] Tests run: 53, Failures: 0, Errors: 0, Time elapsed: 0.098 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSloppyPhraseQuery
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.45 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSort
[junit] Tests run: 27, Failures: 0, Errors: 0, Time elapsed: 3.593 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSpanQueryFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.019 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestSubScorerFreqs
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.021 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 5.559 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermRangeQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.046 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermScorer
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.01 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTermVectors
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.34 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestThreadSafe
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.603 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.525 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.015 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.017 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcard
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.036 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.TestWildcardRandom
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.479 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.cache.TestEntryCreators
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.314 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 8.123 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestDocValues
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.029 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.297 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.085 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.function.TestValueSource
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.017 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.646 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.646 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestBasics
[junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 3.807 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestFieldMaskingSpanQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.408 sec
[junit] 
[junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOr

Build failed in Hudson: Lucene-trunk #1305

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[uschindler] prepare hudson on new machine, maven support is included in ant 
installation, so dir not needed (TODO: remove)

--
[...truncated 2861 lines...]
AUanalysis/common/src/java/org/apache/lucene/analysis/el/package.html
A analysis/common/src/java/org/apache/lucene/analysis/ar
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicLetterTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemmer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/ar/package.html
A analysis/common/src/java/org/apache/lucene/analysis/en
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemFilter.java
A 
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/en/package.html
A analysis/common/src/java/org/apache/lucene/analysis/position
AU
analysis/common/src/java/org/apache/lucene/analysis/position/PositionFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/position/package.html
A analysis/common/src/java/org/apache/lucene/analysis/in
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizationFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/in/package.html
A analysis/common/src/java/org/apache/lucene/analysis/wikipedia
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.jflex
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cjk
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cjk/package.html
A analysis/common/src/java/org/apache/lucene/analysis/es
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/es/package.html
A analysis/common/src/java/org/apache/lucene/analysis/eu
AU
analysis/common/src/java/org/apache/lucene/analysis/eu/BasqueAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/eu/package.html
A analysis/common/src/java/org/apache/lucene/analysis/it
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/it/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cz
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cz/package.html
A analysis/common/src/java/org/apache/lucene/analysis/synonym
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java
A analysis/common/src/java/org/apache/lucene/analysis/util
AU
analysis/common/src/java/org/apache/lucene/analysis/util/Stop

[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916946#action_12916946
 ] 

Michael McCandless commented on LUCENE-2655:


bq.  Are you saying we should simply make deletes work (as is, no BytesRefHash 
conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes?

Yes!  Just keep using the Map we now use, but now it must be per-DWPT.  And 
when IW.deleteDocs(Term/Query) is called, we must go to each DWPT, grab its 
current docID, and enroll Term/Query -> docID into that DWPT's pending deletes 
map.

bq. Also, from what I've seen, deletes seem to work, I'm not sure what exactly 
Michael is referring to.

But this is using the long sequenceID right?  (adds 8 bytes per buffered 
docID).  I think we wanted to explore ways to reduce that?  Or, if we can make 
it modal (you only spend these 8 bytes if IW is open in NRT mode), then that's 
OK?

I was hoping to cleanly separate the DWPT cutover (solves the nasty flush 
bottleneck) from the NRT improvements.

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2679) IndexWriter.deleteDocuments should have option to not apply to docs indexed in the current IW session

2010-10-01 Thread Michael McCandless (JIRA)
IndexWriter.deleteDocuments should have option to not apply to docs indexed in 
the current IW session
-

 Key: LUCENE-2679
 URL: https://issues.apache.org/jira/browse/LUCENE-2679
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless


In LUCENE-2655 we are struggling with how to handle buffered deletes,
with the new per-thread RAM buffers (DWPT).

But, the only reason why we must maintain a map of del term -> current
docID (or sequence ID) is to correctly handle the interleaved adds &
deletes case.

However, I suspect that for many apps that interleaving never happens.
Ie, most apps delete only docs from *before* the last commit or NRT
reopen.  For such apps, we don't need a Map... we just need a Set of
all del terms to apply to past segments but not to the currently
buffered docs.

And, importantly, with LUCENE-2655, this would be a single Set, not
one per DWPT.  It should be a a healthy RAM reduction on buffered
deletes, and should make the deletes call faster (add to one set instead of
N maps).

We of course must still support the interleaved case, and I think it
should be the default, but I think we should provide the option for
the common-case apps to take advantage of much less RAM usage.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916933#action_12916933
 ] 

Jason Rutherglen commented on LUCENE-2655:
--

Are you saying we should simply make deletes work (as is, no BytesRefHash 
conversion) then cleanup the RT branch as a merge to trunk of the DWPT changes? 
 I was thinking along those lines.  I can spend time making the rest of the 
unit tests work on the existing RT revision, though should this should probably 
happen in conjunction with a merge from trunk.  

Or simply make the tests pass, and merge RT -> trunk afterwards?

Also, from what I've seen, deletes seem to work, I'm not sure what exactly 
Michael is referring to.  I'll run the full 'suite' of unit tests, and just 
make each work?

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2507) automaton spellchecker

2010-10-01 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2507:


Attachment: LUCENE-2507.patch

here's the improved docs and tests.

I'd like to commit this one and we can iterate as discussed, hopefully improve 
both spellcheckers.

> automaton spellchecker
> --
>
> Key: LUCENE-2507
> URL: https://issues.apache.org/jira/browse/LUCENE-2507
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/spellchecker
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch, 
> LUCENE-2507.patch, LUCENE-2507.patch
>
>
> The current spellchecker makes an n-gram index of your terms, and queries 
> this for spellchecking.
> The terms that come back from the n-gram query are then re-ranked by an 
> algorithm such as Levenshtein.
> Alternatively, we could just do a levenshtein query directly against the 
> index, then we wouldn't need
> a separate index to rebuild.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-10-01 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916914#action_12916914
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

bq. I fear a copy-on-write check per-term is going to be a sizable perf hit.

For indexing?  The byte[] buffers are also using a page based system.  I think 
we'll need to measure the performance difference.  We can always shift the cost 
to getreader by copying from a writable (indexing based) tf array into a 
per-reader tf of paged-ints.  While this'd be a complete iteration the length 
of the terms, the CPU cache could make it extremely fast (because each page 
would be cached, and we'd be iterating sequentially over an array, methinks).

The other cost is the lookup of the upto when iterating the postings, however 
that'd be one time per term-docs instantiation, ie, negligible.  



> Concurrent byte and int block implementations
> -
>
> Key: LUCENE-2575
> URL: https://issues.apache.org/jira/browse/LUCENE-2575
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916913#action_12916913
 ] 

Michael McCandless commented on LUCENE-2662:


OK my 2nd indexing test (10M wikipedia docs, flush @ 256 MB ram used) finished 
and trunk/patch are essentially the same throughput, and, all flushes happened 
at identical points.  So I think we are good to go...

Nice work Simon!

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-792) Tree Faceting Component

2010-10-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916897#action_12916897
 ] 

Yonik Seeley commented on SOLR-792:
---

Hey guys - I was planning on sticking this into my "Lucene Revolution" 
presentation... but I'm not seeing any docs on it.
Could someone take a shot at adding a section on pivot faceting to 
http://wiki.apache.org/solr/SimpleFacetParameters

> Tree Faceting Component
> ---
>
> Key: SOLR-792
> URL: https://issues.apache.org/jira/browse/SOLR-792
> Project: Solr
>  Issue Type: New Feature
>Reporter: Erik Hatcher
>Assignee: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-792-PivotFaceting.patch, 
> SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, 
> SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, 
> SOLR-792.patch, SOLR-792.patch, SOLR-792.patch
>
>
> A component to do multi-level faceting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2678) TestCachingSpanFilter sometimes fails

2010-10-01 Thread Simon Willnauer (JIRA)
TestCachingSpanFilter sometimes fails
-

 Key: LUCENE-2678
 URL: https://issues.apache.org/jira/browse/LUCENE-2678
 Project: Lucene - Java
  Issue Type: Test
  Components: Search
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0


if I run 
{noformat} 
ant test -Dtestcase=TestCachingSpanFilter -Dtestmethod=testEnforceDeletions 
-Dtests.seed=5015158121350221714:-3342860915127740146 -Dtests.iter=100
{noformat} 

I get two failures on my machine against current trunk
{noformat} 

junit-sequential:
[junit] Testsuite: org.apache.lucene.search.TestCachingSpanFilter
[junit] Testcase: 
testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter): FAILED
[junit] expected:<2> but was:<3>
[junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
[junit] at 
org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
[junit] 
[junit] 
[junit] Testcase: 
testEnforceDeletions(org.apache.lucene.search.TestCachingSpanFilter): FAILED
[junit] expected:<2> but was:<3>
[junit] junit.framework.AssertionFailedError: expected:<2> but was:<3>
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:795)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:768)
[junit] at 
org.apache.lucene.search.TestCachingSpanFilter.testEnforceDeletions(TestCachingSpanFilter.java:101)
[junit] 
[junit] 
[junit] Tests run: 100, Failures: 2, Errors: 0, Time elapsed: 2.297 sec
[junit] 
[junit] - Standard Output ---
[junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
-Dtestmethod=testEnforceDeletions 
-Dtests.seed=5015158121350221714:-3342860915127740146
[junit] NOTE: reproduce with: ant test -Dtestcase=TestCachingSpanFilter 
-Dtestmethod=testEnforceDeletions 
-Dtests.seed=5015158121350221714:-3342860915127740146
[junit] NOTE: test params are: 
codec=MockVariableIntBlock(baseBlockSize=43), locale=fr, timezone=Africa/Bangui
[junit] -  ---
[junit] Test org.apache.lucene.search.TestCachingSpanFilter FAILED
{noformat}

not sure what it is but it seems likely to be a WeakRef / GC issue in the 
cache. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916885#action_12916885
 ] 

Simon Willnauer commented on LUCENE-2662:
-

bq. Simon, thank you for renaming the 'utf8' variables here.
YW :)

bq. In RecyclingByteBlockAllocator.recycleByteBlocks, if we cannot recycle all 
of the blocks (ie because it exceeds maxBufferedBlocks), we are failing to null 
out the entries in the incoming array?
Ahh you are right - I will fix. 

bq. Also maybe rename pos -> freeCount? (pos is a little too generic?)
I mean its internal though but I see your point.

thanks for reviewing it closely. 

{quote}
The avgUsedMem was quite a bit higher (1.5GB vs 1.0GB), but, I'm not sure this 
stat is trustworthy I'll re-run w/ infoStream enabled to see if anything 
looks suspicious (eg, we are somehow not tracking bytes used correctly).
{quote}

hmm I will dig once I get back to my workstation.

simon

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2507) automaton spellchecker

2010-10-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916883#action_12916883
 ] 

Robert Muir commented on LUCENE-2507:
-

I'll work on cleaning up tests and doc, i think we can then commit this with 
the functionality it has.

bq. It's great that it leverages the absurd speedups we've made to FuzzyQuery 
in 4.0.

Yes, if you read that scary fuzzy paper it seems thats its original use-case 
all along (we just did FuzzyQuery first, and re-used it here):
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652

Along the same lines, I think we can then later improve both spellcheckers in 
easy ways. For example,
it would be good to add a concept of "SpellCheckFilter" that can return 
true/false if a word is correctly spelled.

Docfreq-based stuff helps, but if you know the language, something like 
hunspell could go a long way here
to both preventing either spellchecker from trying to correct an 
already-correctly-spelled word or preventing
it from suggesting misspellings.


> automaton spellchecker
> --
>
> Key: LUCENE-2507
> URL: https://issues.apache.org/jira/browse/LUCENE-2507
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/spellchecker
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch, 
> LUCENE-2507.patch
>
>
> The current spellchecker makes an n-gram index of your terms, and queries 
> this for spellchecking.
> The terms that come back from the n-gram query are then re-ranked by an 
> algorithm such as Levenshtein.
> Alternatively, we could just do a levenshtein query directly against the 
> index, then we wouldn't need
> a separate index to rebuild.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



PatternAnalyzer not implemented?

2010-10-01 Thread Roman Chyla
Hello,

I tried to use the PatternAnalyzer, but am getting NotImplementedError
- in case it is not available, shall I rather use PythonAnalyzer and
implement the regex pattern analyzer with that?

using version: 2.9.3

In [44]: import lucene
In [45]: import pyjama #<-- this package contains java.util.regex.Pattern
In [46]: p = pyjama.Pattern.compile("\\s")
In [47]: p
Out[47]: 
In [48]: import lucene.collections as col
In [49]: s = col.JavaSet([])
In [50]: s
Out[50]: 
In [51]: pa = lucene.PatternAnalyzer(p,True,s)
---
NotImplementedError   Traceback (most recent call last)

/Users/rca/ in ()

NotImplementedError: ('instantiating java class', )

In [52]:

Kind regards,

  roman


[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916882#action_12916882
 ] 

Robert Muir commented on LUCENE-2662:
-

Simon, thank you for renaming the 'utf8' variables here. 


> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916875#action_12916875
 ] 

Michael McCandless commented on LUCENE-2662:


In RecyclingByteBlockAllocator.recycleByteBlocks, if we cannot recycle all of 
the blocks (ie because it exceeds maxBufferedBlocks), we are failing to null 
out the entries in the incoming array?

Also maybe rename pos -> freeCount?  (pos is a little too generic?)

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916873#action_12916873
 ] 

Michael McCandless commented on LUCENE-2662:


bq. Still, the resulting indices had identical structure (ie we seem to flush 
at exactly the same points), so I think bytes used is properly tracked.

Sorry, scratch that -- I was inadvertently flushing by doc count, not by RAM 
usage.  I'm re-running w/ flush-by-RAM to verify we flush at exactly the same 
points as trunk.

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2662) BytesHash

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916872#action_12916872
 ] 

Michael McCandless commented on LUCENE-2662:


I indexed 10M 1KB wikipedia docs, single threaded, and also see things a bit 
faster w/ the patch (10,353 docs/sec vs 10,182 docs/sec).  Nice to have a 
refactor improve performance for a change, heh.

The avgUsedMem was quite a bit higher (1.5GB vs 1.0GB), but, I'm not sure this 
stat is trustworthy I'll re-run w/ infoStream enabled to see if anything 
looks suspicious (eg, we are somehow not tracking bytes used correctly).

Still, the resulting indices had identical structure (ie we seem to flush at 
exactly the same points), so I think bytes used is properly tracked.

> BytesHash
> -
>
> Key: LUCENE-2662
> URL: https://issues.apache.org/jira/browse/LUCENE-2662
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch, 4.0
>Reporter: Jason Rutherglen
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch, 4.0
>
> Attachments: LUCENE-2662.patch, LUCENE-2662.patch, LUCENE-2662.patch, 
> LUCENE-2662.patch, LUCENE-2662.patch
>
>
> This issue will have the BytesHash separated out from LUCENE-2186

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2139) Wrong cast from string to float

2010-10-01 Thread Igor Rodionov (JIRA)
Wrong cast from string to float
---

 Key: SOLR-2139
 URL: https://issues.apache.org/jira/browse/SOLR-2139
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
 Environment: Ubuntu 9.04 
java.runtime.name = Java(TM) SE Runtime Environment
sun.boot.library.path = /usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/i386
java.vm.version = 16.3-b01
java.vm.vendor = Sun Microsystems Inc.
java.vendor.url = http://java.sun.com/
path.separator = :
java.vm.name = Java HotSpot(TM) Server VM
file.encoding.pkg = sun.io
user.country = RU
sun.java.launcher = SUN_STANDARD
sun.os.patch.level = unknown
java.vm.specification.name = Java Virtual Machine Specification
user.dir = /media/data/soft/apache-solr/drupal
java.runtime.version = 1.6.0_20-b02
java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
java.endorsed.dirs = /usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/endorsed
os.arch = i386
java.io.tmpdir = /tmp
line.separator = 

java.vm.specification.vendor = Sun Microsystems Inc.
os.name = Linux
sun.jnu.encoding = UTF-8
java.library.path = 
/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/i386/server:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/i386:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
java.specification.name = Java Platform API Specification
java.class.version = 50.0
jetty.home = /media/data/soft/apache-solr/drupal
sun.management.compiler = HotSpot Tiered Compilers
os.version = 2.6.28-19-generic
user.home = /root
user.timezone = Asia/Omsk
org.mortbay.jetty.Request.maxFormContentSize = 100
java.awt.printerjob = sun.print.PSPrinterJob
file.encoding = UTF-8
java.specification.version = 1.6
user.name = root
java.class.path = 
/media/data/soft/apache-solr/drupal/lib/jetty-6.1.22.jar:/media/data/soft/apache-solr/drupal/lib/jetty-util-6.1.22.jar:/media/data/soft/apache-solr/drupal/lib/servlet-api-2.5-20081211.jar:/media/data/soft/apache-solr/drupal/lib/jsp-2.1/core-3.1.1.jar:/media/data/soft/apache-solr/drupal/lib/jsp-2.1/jsp-2.1-glassfish-9.1.1.B60.25.p2.jar:/media/data/soft/apache-solr/drupal/lib/jsp-2.1/jsp-2.1-jetty-6.1.22.jar:/media/data/soft/apache-solr/drupal/lib/jsp-2.1/jsp-api-2.1-glassfish-9.1.1.B60.25.p2.jar:/usr/share/ant/lib/ant.jar
java.vm.specification.version = 1.0
sun.arch.data.model = 32
java.home = /usr/lib/jvm/java-6-sun-1.6.0.20/jre
java.specification.vendor = Sun Microsystems Inc.
user.language = ru
java.vm.info = mixed mode
java.version = 1.6.0_20
java.ext.dirs = 
/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/ext:/usr/java/packages/lib/ext
sun.boot.class.path = 
/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/resources.jar:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/rt.jar:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/jsse.jar:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/jce.jar:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/charsets.jar:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/classes
java.vendor = Sun Microsystems Inc.
file.separator = /
java.vendor.url.bug = http://java.sun.com/cgi-bin/bugreport.cgi
sun.cpu.endian = little
sun.io.unicode.encoding = UnicodeLittle
sun.cpu.isalist = 

Reporter: Igor Rodionov


Class java.org.apache.solr.spelling.suggest.Suggester str 60
Invalid cast from String to Float.

 threshold = config.get(THRESHOLD_TOKEN_FREQUENCY) == null ? 0.0f

: (Float) config.get(THRESHOLD_TOKEN_FREQUENCY);

config.get returns String, We want just cast it to Float.

correct code should be like 
 threshold = config.get(THRESHOLD_TOKEN_FREQUENCY) == null ? 0.0f
: 
Float.valueOf(config.get(THRESHOLD_TOKEN_FREQUENCY).trim()).floatValue();



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2110) Distributed faceting can create invalid refinement requests when "key" is complex

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2110:
-

Fix Version/s: 3.1

> Distributed faceting can create invalid refinement requests when "key" is 
> complex
> -
>
> Key: SOLR-2110
> URL: https://issues.apache.org/jira/browse/SOLR-2110
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2110.patch
>
>
> The NPE described here: 
> http://search.lucidimagination.com/search/document/a4c217d153635301
> is due to 2 issues.  An exception in a refinement request isn't checked for, 
> and a NPE results, masking the original exception.
> The original exception is caused by Solr using local param dereferencing with 
> a param name derived from the "key" for that facet.
> If the key is something like "a/b/c" the request fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2110) Distributed faceting can create invalid refinement requests when "key" is complex

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2110.
--

Resolution: Fixed

branch_3x: Committed revision 1003506.

> Distributed faceting can create invalid refinement requests when "key" is 
> complex
> -
>
> Key: SOLR-2110
> URL: https://issues.apache.org/jira/browse/SOLR-2110
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2110.patch
>
>
> The NPE described here: 
> http://search.lucidimagination.com/search/document/a4c217d153635301
> is due to 2 issues.  An exception in a refinement request isn't checked for, 
> and a NPE results, masking the original exception.
> The original exception is caused by Solr using local param dereferencing with 
> a param name derived from the "key" for that facet.
> If the key is something like "a/b/c" the request fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (SOLR-2110) Distributed faceting can create invalid refinement requests when "key" is complex

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-2110:
--

  Assignee: Yonik Seeley

I need this fix for branch_3x.

> Distributed faceting can create invalid refinement requests when "key" is 
> complex
> -
>
> Key: SOLR-2110
> URL: https://issues.apache.org/jira/browse/SOLR-2110
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2110.patch
>
>
> The NPE described here: 
> http://search.lucidimagination.com/search/document/a4c217d153635301
> is due to 2 issues.  An exception in a refinement request isn't checked for, 
> and a NPE results, masking the original exception.
> The original exception is caused by Solr using local param dereferencing with 
> a param name derived from the "key" for that facet.
> If the key is something like "a/b/c" the request fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2111) treat facet exceptions consistently

2010-10-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2111:
-

Fix Version/s: 3.1

back port to branch_3x. 

> treat facet exceptions consistently
> ---
>
> Key: SOLR-2111
> URL: https://issues.apache.org/jira/browse/SOLR-2111
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2111.patch
>
>
> Right now, faceting on a non-existent field will just add an "exception" to 
> the facet_counts.
> In distrib mode, it will cause the whole request to fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916853#action_12916853
 ] 

Michael McCandless commented on LUCENE-2575:


bq. The posting-upto should stop the reader prior to reaching a byte element 
whose value is 0, ie, it should never happen.

OK but then we are making a full copy of postings upto (int per term) on every 
reopen?  Or will we try to make this copy-on-write as well?

So now we need copy-on-write per-term int for tf and for posting upto?  
Anything else?

I fear a copy-on-write check per-term is going to be a sizable perf hit.

> Concurrent byte and int block implementations
> -
>
> Key: LUCENE-2575
> URL: https://issues.apache.org/jira/browse/LUCENE-2575
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2655) Get deletes working in the realtime branch

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916851#action_12916851
 ] 

Michael McCandless commented on LUCENE-2655:


Stepping back here...

There are two different goals mixed into the RT branch effort, I
think:

# Make thread states fully independent so flushing is no longer sync'd (plus 
it's a nice simplification, eg no more *PerThread in the indexing chain)
# Enable direct searching on the thread states RAM buffer, for awesome
  NRT performance

It seems to me like the first one is not so far off?  Ie we nearly
have it already (LUCENE-2324)... it's just that we don't have the
deletes working?

Whereas the 2nd one is a much bigger change, and still iterating under
LUCENE-2475.

Is it possible to decouple #1 from #2?  Ie, bring it to a committable
state and land it on trunk and let it bake some?

Eg, on deletes, what if we simply have each thread state buffer its own
delete term -> thread's docID, privately?  We know this approach
will "work" (it does today), right?  It's just wasteful of RAM (though,
cutover to BytesRefHash should help alot here), and, makes
deletes somewhat slower since you must now enroll the del term
into each thread state

It wouldn't actually be that wasteful of RAM, since the BytesRef instance
would be shared across all the maps.  Also, if we wanted (separately)
we could make a more efficient buffer when the app deletes many terms
at once, or, many calls to delete-by-single-term with no adds in between
(much like how Amazon optimizes N one-click purchases in a row...).

I really want to re-run my 6-thread indexing test on beast and see the
indexing throughput double!!

> Get deletes working in the realtime branch
> --
>
> Key: LUCENE-2655
> URL: https://issues.apache.org/jira/browse/LUCENE-2655
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: Realtime Branch
>Reporter: Jason Rutherglen
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Build failed in Hudson: Lucene-trunk #1304

2010-10-01 Thread Uwe Schindler
Sorry, this was a test oft he FreeBSD jail!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Apache Hudson Server [mailto:hud...@hudson.apache.org]
> Sent: Friday, October 01, 2010 12:12 PM
> To: dev@lucene.apache.org
> Subject: Build failed in Hudson: Lucene-trunk #1304
> 
> See 
> 
> --
> [...truncated 2861 lines...]
> AU
> analysis/common/src/java/org/apache/lucene/analysis/el/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/ar
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemFilter.jav
> a
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizer.ja
> va
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicAnalyzer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicLetterTokeniz
> er.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalization
> Filter.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemmer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/ar/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/en
> AU
> analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.jav
> a
> AU
> analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalSte
> mFilter.java
> A
> analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFil
> ter.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemFilter.jav
> a
> AU
> analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalSte
> mmer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/en/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/position
> AU
> analysis/common/src/java/org/apache/lucene/analysis/position/PositionFilter.j
> ava
> AU
> analysis/common/src/java/org/apache/lucene/analysis/position/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/in
> AU
> analysis/common/src/java/org/apache/lucene/analysis/in/IndicTokenizer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizationFi
> lter.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/in/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/wikipedia
> AU
> analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTo
> kenizer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTo
> kenizerImpl.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTo
> kenizerImpl.jflex
> AU
> analysis/common/src/java/org/apache/lucene/analysis/wikipedia/package.htm
> l
> A analysis/common/src/java/org/apache/lucene/analysis/cjk
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKAnalyzer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cjk/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/es
> AU
> analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemm
> er.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/es/SpanishAnalyzer.jav
> a
> AU
> analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemFilt
> er.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/es/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/eu
> AU
> analysis/common/src/java/org/apache/lucene/analysis/eu/BasqueAnalyzer.jav
> a
> AU
> analysis/common/src/java/org/apache/lucene/analysis/eu/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/it
> AU
> analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.
> java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/it/ItalianAnalyzer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemFilte
> r.java
> AUanalysis/common/src/java/org/apache/lucene/analysis/it/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/cz
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cz/CzechAnalyzer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemmer.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemFilter.jav
> a
> AU
> analysis/common/src/java/org/apache/lucene/analysis/cz/package.html
> A analysis/common/src/java/org/apache/lucene/analysis/synonym
> AU
> analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilte
> r.java
> AU
> analysis/common/src/java/org/apache/lucene/analysis/

Build failed in Hudson: Lucene-trunk #1304

2010-10-01 Thread Apache Hudson Server
See 

--
[...truncated 2861 lines...]
AUanalysis/common/src/java/org/apache/lucene/analysis/el/package.html
A analysis/common/src/java/org/apache/lucene/analysis/ar
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicLetterTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/ar/ArabicStemmer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/ar/package.html
A analysis/common/src/java/org/apache/lucene/analysis/en
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemFilter.java
A 
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/en/package.html
A analysis/common/src/java/org/apache/lucene/analysis/position
AU
analysis/common/src/java/org/apache/lucene/analysis/position/PositionFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/position/package.html
A analysis/common/src/java/org/apache/lucene/analysis/in
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizationFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/in/IndicNormalizer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/in/package.html
A analysis/common/src/java/org/apache/lucene/analysis/wikipedia
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.java
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/WikipediaTokenizerImpl.jflex
AU
analysis/common/src/java/org/apache/lucene/analysis/wikipedia/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cjk
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKTokenizer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cjk/package.html
A analysis/common/src/java/org/apache/lucene/analysis/es
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/es/package.html
A analysis/common/src/java/org/apache/lucene/analysis/eu
AU
analysis/common/src/java/org/apache/lucene/analysis/eu/BasqueAnalyzer.java
AUanalysis/common/src/java/org/apache/lucene/analysis/eu/package.html
A analysis/common/src/java/org/apache/lucene/analysis/it
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/it/package.html
A analysis/common/src/java/org/apache/lucene/analysis/cz
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechAnalyzer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemmer.java
AU
analysis/common/src/java/org/apache/lucene/analysis/cz/CzechStemFilter.java
AUanalysis/common/src/java/org/apache/lucene/analysis/cz/package.html
A analysis/common/src/java/org/apache/lucene/analysis/synonym
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymFilter.java
AU
analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java
A analysis/common/src/java/org/apache/lucene/analysis/util
AU
analysis/common/src/java/org/apache/lucene/analysis/util/StopwordAnalyzerBase.java
AU
analysis/common/src/java/org/apache/lucene/analysis/util/ReusableAnalyzerBase.java
AU
analysis/common/

[jira] Commented: (LUCENE-2507) automaton spellchecker

2010-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916841#action_12916841
 ] 

Michael McCandless commented on LUCENE-2507:


This is an awesome step forward!

It requires no parallel index, and, it gets better accuracy (if your metric is 
edit distance like) at a negligible perf hit.

It's great that it leverages the absurd speedups we've made to FuzzyQuery in 
4.0.

> automaton spellchecker
> --
>
> Key: LUCENE-2507
> URL: https://issues.apache.org/jira/browse/LUCENE-2507
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/spellchecker
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-2507.patch, LUCENE-2507.patch, LUCENE-2507.patch, 
> LUCENE-2507.patch
>
>
> The current spellchecker makes an n-gram index of your terms, and queries 
> this for spellchecking.
> The terms that come back from the n-gram query are then re-ranked by an 
> algorithm such as Levenshtein.
> Alternatively, we could just do a levenshtein query directly against the 
> index, then we wouldn't need
> a separate index to rebuild.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Solr-trunk #1264

2010-10-01 Thread Apache Hudson Server
See 

Changes:

[yonik] SOLR-1568: add field queries and range queries to LatLonType

[rmuir] make TestRegexpRandom2 harder

[yonik] SOLR-1297: fix sort by function parsing

--
[...truncated 4985 lines...]
AU
lucene/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/GeoHashDistanceFilter.java
AU
lucene/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/GeoHashUtils.java
AU
lucene/contrib/spatial/src/java/org/apache/lucene/spatial/geohash/package.html
AUlucene/contrib/spatial/src/java/overview.html
AUlucene/contrib/spatial/build.xml
A lucene/contrib/highlighter
AUlucene/contrib/highlighter/pom.xml.template
A lucene/contrib/highlighter/src
A lucene/contrib/highlighter/src/test
A lucene/contrib/highlighter/src/test/org
A lucene/contrib/highlighter/src/test/org/apache
A lucene/contrib/highlighter/src/test/org/apache/lucene
A lucene/contrib/highlighter/src/test/org/apache/lucene/search
A lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterPhraseTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/highlight/HighlighterTest.java
A 
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/AbstractTestCase.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/FieldTermStackTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/FieldPhraseListTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/IndexTimeSynonymTest.java
A 
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/SingleFragListBuilderTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/ScoreOrderFragmentsBuilderTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragmentsBuilderTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/FieldQueryTest.java
AU
lucene/contrib/highlighter/src/test/org/apache/lucene/search/vectorhighlight/SimpleFragListBuilderTest.java
A lucene/contrib/highlighter/src/java
A lucene/contrib/highlighter/src/java/org
A lucene/contrib/highlighter/src/java/org/apache
A lucene/contrib/highlighter/src/java/org/apache/lucene
A lucene/contrib/highlighter/src/java/org/apache/lucene/search
A lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/InvalidTokenOffsetsException.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/SimpleHTMLFormatter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/SpanGradientFormatter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Formatter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/SimpleFragmenter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TextFragment.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTerm.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/QueryTermScorer.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/package.html
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/SimpleHTMLEncoder.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/Encoder.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenStreamFromTermPositionVector.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/GradientFormatter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/QueryScorer.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/DefaultEncoder.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/TokenSources.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/NullFragmenter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/SimpleSpanFragmenter.java
AU
lucene/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedTerm.java
AU