[jira] [Reopened] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-10 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reopened LUCENE-4590:
-

Lucene Fields:   (was: New)

Reopen issue for making the categories file name method public: 
categoriesLineFile() so that it can easily be modified in the future without 
breaking apps logic.

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4590.patch


 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-10 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4590.
-

Resolution: Fixed

done.

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4590.patch


 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-09 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527399#comment-13527399
 ] 

Doron Cohen commented on LUCENE-4588:
-

Two more commits to trunk (uncaught by bot due to incorrect message format):
- [r1417871|http://svn.apache.org/viewvc?rev=1417871view=rev] -- LUCENE-4588 
(cont): (EnwikiContentSource fixes) avoid using the forbidden
StringBufferInputStream..
- [r1417921|http://svn.apache.org/viewvc?rev=1417921view=rev] -- LUCENE-4588 
(cont): simplify test input stream crration. 

 EnwikiContentSource silently swallows the last wiki doc
 ---

 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4588.patch


 Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-09 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4588.
-

   Resolution: Fixed
Lucene Fields:   (was: New)

Fixed.

As a side note, merging benchmark changes to 4x is so much easier than it used 
to be in 3x, now that trunk and branch are structured the same! Now if only 
'precommit' would run 60 times faster (that would be 12 seconds here)... 
wouldn't that be great? :) 

 EnwikiContentSource silently swallows the last wiki doc
 ---

 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4588.patch


 Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-09 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4595.
-

   Resolution: Fixed
Lucene Fields:   (was: New)

Fixed.

Seems the tag bot missed the trunk commit for this one,
so her they are both:

- trunk: [r1418281|http://svn.apache.org/viewvc?view=revisionrevision=1418281]
- 4x: [r1418925|http://svn.apache.org/viewvc?view=revisionrevision=1418925]

 EnwikiContentSource thread safety problem (NPE) in 'forever' mode
 -

 Key: LUCENE-4595
 URL: https://issues.apache.org/jira/browse/LUCENE-4595
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4595.patch


 If close() is invoked around when an additional input stream reader is 
 recreated for the 'forever' behavior, an uncaught NPE might occur.
 This bug was probably always there, just exposed now with the 
 EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-09 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4590.
-

Resolution: Fixed

Done.

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4590.patch


 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-07 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526326#comment-13526326
 ] 

Doron Cohen commented on LUCENE-4595:
-

Thanks for verifying Robert.
Committed the fix, let's see if the build becomes stable again.
Issue remains open for porting to 4x.

 EnwikiContentSource thread safety problem (NPE) in 'forever' mode
 -

 Key: LUCENE-4595
 URL: https://issues.apache.org/jira/browse/LUCENE-4595
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4595.patch


 If close() is invoked around when an additional input stream reader is 
 recreated for the 'forever' behavior, an uncaught NPE might occur.
 This bug was probably always there, just exposed now with the 
 EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-4588:
---

Assignee: Doron Cohen

 EnwikiContentSource silently swallows the last wiki doc
 ---

 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4588.patch


 Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)
Doron Cohen created LUCENE-4590:
---

 Summary: WriteEnwikiLineDoc which writes Wikipedia category pages 
to a separate file
 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


It may be convenient to split Wikipedia's line file into two separate files: 
category-pages and non-category ones. 
It is possible to split the original line file with grep or such.
It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4590:


Component/s: modules/benchmark

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511262#comment-13511262
 ] 

Doron Cohen commented on LUCENE-4590:
-

bq. Do you think perhaps that EnwikiContentSource should let the caller know 
whether the returned DocData represents a content page or category page?

That's what I planned at start, but decided to leave WriteLineDoc intact 
because it is general, that is, not aware of the unique structure of Wikipedia 
data, where some of the pages represent categories.

bq. So maybe, if someone wants to generate a line file from the pages only... 
flexibility that I think you are trying to achieve...

Actually I am after the two files... :) These category pages are (unique) 
taxonomy node names, but without the taxonomy structure, which can be deduced 
from the (parent) categories of the category pages. Having this separate 
category pages can be useful for deducing that taxonomy.

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-06 Thread Doron Cohen (JIRA)
Doron Cohen created LUCENE-4595:
---

 Summary: EnwikiContentSource thread safety problem (NPE) in 
'forever' mode
 Key: LUCENE-4595
 URL: https://issues.apache.org/jira/browse/LUCENE-4595
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


If close() is invoked around when an additional input stream reader is 
recreated for the 'forever' behavior, an uncaught NPE might occur.
This bug was probably always there, just exposed now with the 
EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13512113#comment-13512113
 ] 

Doron Cohen commented on LUCENE-4595:
-

Jenkin's reproduce params and error log: 
{noformat}
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3093/
Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC

1 tests failed.
FAILED:  
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSourceTest.testForever

Error Message:
Captured an uncaught exception in thread: Thread[id=140, name=Thread-2, 
state=RUNNABLE, group=TGRP-EnwikiContentSourceTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, 
group=TGRP-EnwikiContentSourceTest]
at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B:AB004FFFCF2C6B8C]:0)
Caused by: java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0)
at java.io.Reader.init(Reader.java:61)
at java.io.InputStreamReader.init(InputStreamReader.java:112)
at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186)
at java.lang.Thread.run(Thread.java:662)

Build Log:
[...truncated 5173 lines...]
[junit4:junit4] Suite: 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSourceTest
[junit4:junit4]   2 7 Δεκ 2012 6:39:53 πμ 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
[junit4:junit4]   2 WARNING: Uncaught exception in thread: 
Thread[Thread-2,5,TGRP-EnwikiContentSourceTest]
[junit4:junit4]   2 java.lang.NullPointerException
[junit4:junit4]   2at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0)
[junit4:junit4]   2at java.io.Reader.init(Reader.java:61)
[junit4:junit4]   2at 
java.io.InputStreamReader.init(InputStreamReader.java:112)
[junit4:junit4]   2at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186)
[junit4:junit4]   2at java.lang.Thread.run(Thread.java:662)
[junit4:junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=EnwikiContentSourceTest -Dtests.method=testForever 
-Dtests.seed=EF7AF10441351C3B -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=el -Dtests.timezone=SST -Dtests.file.encoding=UTF-8
[junit4:junit4] ERROR   0.07s J1 | EnwikiContentSourceTest.testForever 
[junit4:junit4] Throwable #1: 
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, 
group=TGRP-EnwikiContentSourceTest]
[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B:AB004FFFCF2C6B8C]:0)
[junit4:junit4] Caused by: java.lang.NullPointerException
[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0)
[junit4:junit4]at java.io.Reader.init(Reader.java:61)
[junit4:junit4]at 
java.io.InputStreamReader.init(InputStreamReader.java:112)
[junit4:junit4]at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186)
[junit4:junit4]at java.lang.Thread.run(Thread.java:662)
[junit4:junit4]   2 NOTE: test params are: codec=Lucene41: {}, 
sim=DefaultSimilarity, locale=el, timezone=SST
[junit4:junit4]   2 NOTE: Linux 3.2.0-34-generic i386/Sun Microsystems Inc. 
1.6.0_37 (32-bit)/cpus=8,threads=1,free=47084536,total=64946176
[junit4:junit4]   2 NOTE: All tests run in this JVM: [TrecContentSourceTest, 
TestConfig, DocMakerTest, SearchWithSortTaskTest, StreamUtilsTest, 
WriteLineDocTaskTest, CreateIndexTaskTest, TestQualityRun, LineDocSourceTest, 
TestPerfTasksParse, AddIndexesTaskTest, PerfTaskTest, AltPackageTaskTest, 
EnwikiContentSourceTest]
[junit4:junit4] Completed on J1 in 0.30s, 3 tests, 1 error  FAILURES!
{noformat}

 EnwikiContentSource thread safety problem (NPE) in 'forever' mode
 -

 Key: LUCENE-4595
 URL: https://issues.apache.org/jira/browse/LUCENE-4595
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 If close() is invoked around when an additional input stream reader is 
 recreated for the 'forever' behavior, an uncaught NPE might occur.
 This bug was probably always there, just exposed now with the 
 EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, 

[jira] [Updated] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4595:


Attachment: LUCENE-4595.patch

Patch supposed to fix this.
But I was not able to recreate the bug, so couldn't actually test it.

 EnwikiContentSource thread safety problem (NPE) in 'forever' mode
 -

 Key: LUCENE-4595
 URL: https://issues.apache.org/jira/browse/LUCENE-4595
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4595.patch


 If close() is invoked around when an additional input stream reader is 
 recreated for the 'forever' behavior, an uncaught NPE might occur.
 This bug was probably always there, just exposed now with the 
 EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13514644#comment-13514644
 ] 

Doron Cohen commented on LUCENE-4588:
-

Thanks for the review Shai, changed as you suggested and committed (while jira 
was down...)

 EnwikiContentSource silently swallows the last wiki doc
 ---

 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4588.patch


 Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13514649#comment-13514649
 ] 

Doron Cohen commented on LUCENE-4590:
-

Now I see what you mean. Spooky, it is as if you were looking into the patch I 
did not post here.. How did you know I chose not to modify EnwikiConentSource...

I agree that if someone wishes to index just the non-category pages, the new 
WriteEnwikiLineDoc would create the category pages file for no use. Also, if 
indexing is conducted straight away, not through a line file first, categories 
will be indexed. But then anyone could check the title and decide not to index 
those docs. So I see the advantage, just not tempted to add this at the moment, 
but it can be added.

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4590:


Attachment: LUCENE-4590.patch

Patch with the new task and a test.

 WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
 ---

 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4590.patch


 It may be convenient to split Wikipedia's line file into two separate files: 
 category-pages and non-category ones. 
 It is possible to split the original line file with grep or such.
 It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-05 Thread Doron Cohen (JIRA)
Doron Cohen created LUCENE-4588:
---

 Summary: EnwikiContentSource silently swallows the last wiki doc
 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Priority: Minor


Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-05 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510774#comment-13510774
 ] 

Doron Cohen commented on LUCENE-4588:
-

In addition, there's a thread leak in 'forever' mode.

 EnwikiContentSource silently swallows the last wiki doc
 ---

 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Priority: Minor

 Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-05 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4588:


Attachment: LUCENE-4588.patch

Patch adds a test for enwiki-content-source and fixes both the last doc problem 
and the thread leak.

 EnwikiContentSource silently swallows the last wiki doc
 ---

 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Priority: Minor
 Attachments: LUCENE-4588.patch


 Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3454) rename optimize to a less cool-sounding name

2011-09-26 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114617#comment-13114617
 ] 

Doron Cohen commented on LUCENE-3454:
-

To me merge(num) doing nothing because there are already no more than n 
segments is as fine as close() doing nothing because of already being closed 
so +1 for merge(num).


 rename optimize to a less cool-sounding name
 

 Key: LUCENE-3454
 URL: https://issues.apache.org/jira/browse/LUCENE-3454
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Robert Muir

 I think users see the name optimize and feel they must do this, because who 
 wants a suboptimal system? but this probably just results in wasted time and 
 resources.
 maybe rename to collapseSegments or something?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3464) Rename IndexReader.reopen to make it clear that reopen may not happen

2011-09-26 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114656#comment-13114656
 ] 

Doron Cohen commented on LUCENE-3464:
-

I liked reopen()... (but also like returning null in case there's nothing 
newer...)

If the name is going to change, two additional names to consider:
* newest()
* newer()

For newest() I think current behavior of returning this makes sense when 
this is the newest.
For newer() returning null in that case seems right.

One problem I have with these names is that they both seem to hide the fact 
that things are going on down there, when it is required to open a new reader...

 Rename IndexReader.reopen to make it clear that reopen may not happen
 -

 Key: LUCENE-3464
 URL: https://issues.apache.org/jira/browse/LUCENE-3464
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.5, 4.0


 Spinoff from LUCENE-3454 where Shai noted this inconsistency.
 IR.reopen sounds like an unconditional operation, which has trapped users in 
 the past into always closing the old reader instead of only closing it if the 
 returned reader is new.
 I think this hidden maybe-ness is trappy and we should rename it 
 (maybeReopen?  reopenIfNeeded?).
 In addition, instead of returning this when the reopen didn't happen, I 
 think we should return null to enforce proper usage of the maybe-ness of this 
 API.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)
Upgrade to commons-compress 1.2
---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0


Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
benchmark's StreamUtils is no longer required. Compress is also used in solr. 
Replace with new jar in both benchmark and solr and get rid of that workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3457:


Attachment: LUCENE-3457.patch

Attached simple patch with the fix.
After applying the patch need to also download commons-compress-1.2.jar and 
place it in under module/benchmark/lib and under solr/contrib/extraction/lib. 

Currently several solr tests fails for me with this patch, probably not related 
to replacing the compress jar, as when running alone (-Dtestcase) they pass.

 Upgrade to commons-compress 1.2
 ---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3457.patch


 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
 benchmark's StreamUtils is no longer required. Compress is also used in solr. 
 Replace with new jar in both benchmark and solr and get rid of that 
 workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114213#comment-13114213
 ] 

Doron Cohen commented on LUCENE-3457:
-

hmmm, this is strange.

These are the tests that failed with compress-1.2 for 'ant clean test' under 
solr:

- org.apache.solr.handler.TestReplicationHandler
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 39.968 sec
- org.apache.solr.handler.component.DebugComponentTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1.219 sec
- org.apache.solr.handler.component.TermVectorComponentTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1 sec
- org.apache.solr.request.JSONWriterTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.75 sec
- org.apache.solr.response.TestCSVResponseWriter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.719 sec
- org.apache.solr.schema.BadIndexSchemaTest
[junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 1.187 sec
- org.apache.solr.search.TestQueryUtils
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.14 sec
- org.apache.solr.search.similarities.TestBM25SimilarityFactory
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.187 sec
- org.apache.solr.servlet.DirectSolrConnectionTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.344 sec
- org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 3.984 sec

I replaced 1.1 and they all passed. 
However replaced to compress-1.2 and now they all passed.

I now see that I am on r1174072, I'll update and try again


 Upgrade to commons-compress 1.2
 ---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3457.patch


 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
 benchmark's StreamUtils is no longer required. Compress is also used in solr. 
 Replace with new jar in both benchmark and solr and get rid of that 
 workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114219#comment-13114219
 ] 

Doron Cohen edited comment on LUCENE-3457 at 9/25/11 11:44 AM:
---

Thanks Chris, almost sure I did a clean, will try again.

  was (Author: doronc):
Thanks Chriss, almost sure I did a clean, will try again.
  
 Upgrade to commons-compress 1.2
 ---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3457.patch


 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
 benchmark's StreamUtils is no longer required. Compress is also used in solr. 
 Replace with new jar in both benchmark and solr and get rid of that 
 workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3457:


Attachment: test.out.gz

Still it fails - this time running 'clean test' from trunk, all lucene tests 
pass, some of solr tests failed:

- org.apache.solr.handler.TestReplicationHandler
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 43.703 sec
- org.apache.solr.handler.component.DebugComponentTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1 sec
- org.apache.solr.handler.component.TermVectorComponentTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1.375 sec
- org.apache.solr.request.JSONWriterTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1.078 sec
- org.apache.solr.schema.BadIndexSchemaTest
[junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 1.266 sec
- org.apache.solr.schema.RequiredFieldsTest
[junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 1.422 sec
- org.apache.solr.search.QueryParsingTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.641 sec
- org.apache.solr.search.SpatialFilterTest
[junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 1.438 sec
- org.apache.solr.search.TestQueryTypes
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.953 sec
- org.apache.solr.servlet.CacheHeaderTest
[junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 0.984 sec
- org.apache.solr.spelling.SpellCheckCollatorTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1.281 sec
- org.apache.solr.update.DocumentBuilderTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 0.734 sec
- org.apache.solr.util.SolrPluginUtilsTest
[junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 0.766 sec

Running alone, TestReplicationHandler for example passes.
Same for DebugComponentTest.
I am not sure what is happenning here.
Attaching the test output in case someone wants take a look.

 Upgrade to commons-compress 1.2
 ---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3457.patch, test.out.gz


 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
 benchmark's StreamUtils is no longer required. Compress is also used in solr. 
 Replace with new jar in both benchmark and solr and get rid of that 
 workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114302#comment-13114302
 ] 

Doron Cohen commented on LUCENE-3457:
-

ok great, thanks Robert, so this has nothing to do with the comprees jar update.
I'll commit shortly.

 Upgrade to commons-compress 1.2
 ---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3457.patch, test.out.gz


 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
 benchmark's StreamUtils is no longer required. Compress is also used in solr. 
 Replace with new jar in both benchmark and solr and get rid of that 
 workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3457.
-

Resolution: Fixed

Fixed:
- 1175475 - trunk
- 1175528 - 3x

 Upgrade to commons-compress 1.2
 ---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3457.patch, test.out.gz


 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
 benchmark's StreamUtils is no longer required. Compress is also used in solr. 
 Replace with new jar in both benchmark and solr and get rid of that 
 workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-22 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3215.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.5

Fixed
- r1173961 - trunk
- r1174002 - 3x

Prior to committing I compared the performance of sloppy phrase queries 
with/out repeats for large documents with many candidate matches and did not 
see the anticipated speedup, though, at least, no degradations as well.

 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 Key: LUCENE-3215
 URL: https://issues.apache.org/jira/browse/LUCENE-3215
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Doron Cohen
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, 
 LUCENE-3215.patch, LUCENE-3215_test.patch, LUCENE-3215_test.patch


 reported on user list:
 http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109454#comment-13109454
 ] 

Doron Cohen commented on LUCENE-3390:
-

I wrote a small test that should fail with the bug Uwe fixed here and pass with 
the fix. For some reason it is still failing even with that fix. Tried this 
with previous patch, will now try with last one, though I think it it should 
pass also with previous one. I'll give it another try.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3390:


Attachment: LUCENE-3390-BitsInterface.patch

Attached patch with a test that fails before this fix (otherwise patch same as 
previous).

The test uses 4 collectors simultaneously, each with different missing values.

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-BitsInterface.patch, 
 LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-20 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108490#comment-13108490
 ] 

Doron Cohen commented on LUCENE-3390:
-

Hi Uwe, thanks for catching this. 
I agree that this is a bug, and needs to be fixed.
Just to make sure that we agree on what the problem is, let me describe it 
again: in current 3x code in setNextReader() we extract the values from the 
cache, e.g. by {code}FieldCache.DEFAULT.getDoubles(reader, field, 
parser);{code} and, if a missing value was set, we iterate the unvalued docs 
and set them to that missing value. However this settings takes place at the 
same array just obtained from the cache, and so this is (1) inefficient as it 
will happen again in the next sort with same field, (2) incorrect as if two 
sorts of *same* field have different missing value they will collide, and (3) 
unsafe as you indicated.
I was very happy with the reuse of the cache for caching the missing values so 
I would like to try to solve this with that frame... More later...

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
 LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107209#comment-13107209
 ] 

Doron Cohen commented on LUCENE-3215:
-

OK I think I have a fix for this.

While looking at it, I realized that PhraseScorer (the one that used to base 
both ExactSloppy phrase scorers but now is the base of only sloppy phrase 
scorer) is way too complicated and inefficient. All those sort calls after each 
matching doc can be avoided. 

So I am modifying PhraseScorer to not have a phrase-queue at all - just the 
sorted linked list, which is always kept sorted by advancing last beyond first. 
Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic 
allows more efficient manipulation of it. 

With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The 
queue size is set at the first candidate document. In order to handle 
repetitions (Same term in different query offsets) it will contain only some of 
the pps: those that either have no repetitions, or are the first (lower query 
offset) in a repeating group. A linked list of repeating pps was added: so 
PhrasePositions has a new member: nextRepeating.

Detection of repeating pps and creation of that list is done once per scorer: 
at the first candidate doc.

For solving the bugs reported here, in addition to the initiation of 'end' as 
explained in previous comment, advanceRepeatingPPs now also update two values:
- end, in case one of the repeating pps is far ahead (larger)
- position of the first pp in a repeating list (the one that is in the queue - 
in case the repeating pp is far behind (smaller). This can happen when there 
are holes in the query, as position = tpPOs - offset. It fixes the problem of 
false negative distances which caused this bug. It is tricky: relies on that 
PhrasePositions.nextPosition() ignores pp.position and just call 
positions.nextPosition(). But it is correct, as the modified position is used 
to replace pp in the queue.

Last, I think that the test added with holes had one wrong assert: It added 
four docs:
- drug drug
- drug druggy drug
- drug druggy druggy drug
- drug druggy drug druggy drug
defined this query (number is the offset):
- drug(1) drug(3)
and expected that with slop=1 the first doc would not be found.
I think it should be found, as the slop operates in both directions.
So modified the query to: drug(1) drug(3)

Patch to follow.

 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 Key: LUCENE-3215
 URL: https://issues.apache.org/jira/browse/LUCENE-3215
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Doron Cohen
 Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, 
 LUCENE-3215_test.patch


 reported on user list:
 http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107209#comment-13107209
 ] 

Doron Cohen edited comment on LUCENE-3215 at 9/17/11 6:56 PM:
--

OK I think I have a fix for this.

While looking at it, I realized that PhraseScorer (the one that used to base 
both ExactSloppy phrase scorers but now is the base of only sloppy phrase 
scorer) is way too complicated and inefficient. All those sort calls after each 
matching doc can be avoided. 

So I am modifying PhraseScorer to not have a phrase-queue at all - just the 
sorted linked list, which is always kept sorted by advancing last beyond first. 
Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic 
allows more efficient manipulation of it. 

With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The 
queue size is set at the first candidate document. In order to handle 
repetitions (Same term in different query offsets) it will contain only some of 
the pps: those that either have no repetitions, or are the first (lower query 
offset) in a repeating group. A linked list of repeating pps was added: so 
PhrasePositions has a new member: nextRepeating.

Detection of repeating pps and creation of that list is done once per scorer: 
at the first candidate doc.

For solving the bugs reported here, in addition to the initiation of 'end' as 
explained in previous comment, advanceRepeatingPPs now also update two values:
- end, in case one of the repeating pps is far ahead (larger)
- position of the first pp in a repeating list (the one that is in the queue - 
in case the repeating pp is far behind (smaller). This can happen when there 
are holes in the query, as position = tpPOs - offset. It fixes the problem of 
false negative distances which caused this bug. It is tricky: relies on that 
PhrasePositions.nextPosition() ignores pp.position and just call 
positions.nextPosition(). But it is correct, as the modified position is used 
to replace pp in the queue.

Last, I think that the test added with holes had one wrong assert: It added 
four docs:
- drug drug
- drug druggy drug
- drug druggy druggy drug
- drug druggy drug druggy drug

defined this query (number is the offset):
- drug(1) drug(3)

and expected that with slop=1 the first doc would not be found.
I think it should be found, as the slop operates in both directions.
So modified the query to: drug(1) drug(3)

Patch to follow.

  was (Author: doronc):
OK I think I have a fix for this.

While looking at it, I realized that PhraseScorer (the one that used to base 
both ExactSloppy phrase scorers but now is the base of only sloppy phrase 
scorer) is way too complicated and inefficient. All those sort calls after each 
matching doc can be avoided. 

So I am modifying PhraseScorer to not have a phrase-queue at all - just the 
sorted linked list, which is always kept sorted by advancing last beyond first. 
Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic 
allows more efficient manipulation of it. 

With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The 
queue size is set at the first candidate document. In order to handle 
repetitions (Same term in different query offsets) it will contain only some of 
the pps: those that either have no repetitions, or are the first (lower query 
offset) in a repeating group. A linked list of repeating pps was added: so 
PhrasePositions has a new member: nextRepeating.

Detection of repeating pps and creation of that list is done once per scorer: 
at the first candidate doc.

For solving the bugs reported here, in addition to the initiation of 'end' as 
explained in previous comment, advanceRepeatingPPs now also update two values:
- end, in case one of the repeating pps is far ahead (larger)
- position of the first pp in a repeating list (the one that is in the queue - 
in case the repeating pp is far behind (smaller). This can happen when there 
are holes in the query, as position = tpPOs - offset. It fixes the problem of 
false negative distances which caused this bug. It is tricky: relies on that 
PhrasePositions.nextPosition() ignores pp.position and just call 
positions.nextPosition(). But it is correct, as the modified position is used 
to replace pp in the queue.

Last, I think that the test added with holes had one wrong assert: It added 
four docs:
- drug drug
- drug druggy drug
- drug druggy druggy drug
- drug druggy drug druggy drug
defined this query (number is the offset):
- drug(1) drug(3)
and expected that with slop=1 the first doc would not be found.
I think it should be found, as the slop operates in both directions.
So modified the query to: drug(1) drug(3)

Patch to follow.
  
 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 

[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3215:


Attachment: LUCENE-3215.patch

Attached patch is based on r1166541 - before recent changes to scorers. Will 
merge with recent changes tomorrow or so. All tests pass.
I believe that sloppy scoring performance should improve with this change but 
did not check this.

 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 Key: LUCENE-3215
 URL: https://issues.apache.org/jira/browse/LUCENE-3215
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Doron Cohen
 Attachments: LUCENE-3215.patch, LUCENE-3215.patch, 
 LUCENE-3215_test.patch, LUCENE-3215_test.patch


 reported on user list:
 http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3215:


Attachment: LUCENE-3215.patch

Updated patch for current trunk r1172055.

 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 Key: LUCENE-3215
 URL: https://issues.apache.org/jira/browse/LUCENE-3215
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Doron Cohen
 Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, 
 LUCENE-3215_test.patch, LUCENE-3215_test.patch


 reported on user list:
 http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-08 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100098#comment-13100098
 ] 

Doron Cohen commented on LUCENE-3412:
-

Thanks Michael for verifying this, I'll go ahead and commit.

 SloppyPhraseScorer returns non-deterministic results for queries with many 
 repeats
 --

 Key: LUCENE-3412
 URL: https://issues.apache.org/jira/browse/LUCENE-3412
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1, 3.2, 3.3, 4.0
Reporter: Michael Ryan
Assignee: Doron Cohen
 Attachments: LUCENE-3412.patch, LUCENE-3412.patch


 Proximity queries with many repeats (four or more, based on my testing) 
 return non-deterministic results. I run the same query multiple times with 
 the same data set and get different results.
 So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
 trunk.
 Steps to reproduce (using the Solr example):
 1) In solrconfig.xml, set queryResultCache size to 0.
 2) Add some documents with text dog dog dog and dog dog dog dog. 
 http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
 3) Do a dog dog dog dog~1 query. 
 http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
 4) Repeat step 3 many times.
 Expected results: The document with id 2 should be returned.
 Actual results: The document with id 2 is always returned. The document with 
 id 1 is sometimes returned.
 Different proximity values show the same bug - dog dog dog dog~5, dog dog 
 dog dog~100, etc show the same behavior.
 So far I've traced it down to the repeats array in 
 SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
 elements in this array, the document may or may not match. I think the 
 HashSet may be to blame, but I'm not sure - that at least seems to be where 
 the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-08 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3412.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.5
Lucene Fields:   (was: [New])

Fix committed:
- r1166541 - trunk
- r1166563 - 3x

(fix not included in 3.4 RC, therefore marked as 3.5 above)

 SloppyPhraseScorer returns non-deterministic results for queries with many 
 repeats
 --

 Key: LUCENE-3412
 URL: https://issues.apache.org/jira/browse/LUCENE-3412
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1, 3.2, 3.3, 4.0
Reporter: Michael Ryan
Assignee: Doron Cohen
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3412.patch, LUCENE-3412.patch


 Proximity queries with many repeats (four or more, based on my testing) 
 return non-deterministic results. I run the same query multiple times with 
 the same data set and get different results.
 So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
 trunk.
 Steps to reproduce (using the Solr example):
 1) In solrconfig.xml, set queryResultCache size to 0.
 2) Add some documents with text dog dog dog and dog dog dog dog. 
 http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
 3) Do a dog dog dog dog~1 query. 
 http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
 4) Repeat step 3 many times.
 Expected results: The document with id 2 should be returned.
 Actual results: The document with id 2 is always returned. The document with 
 id 1 is sometimes returned.
 Different proximity values show the same bug - dog dog dog dog~5, dog dog 
 dog dog~100, etc show the same behavior.
 So far I've traced it down to the repeats array in 
 SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
 elements in this array, the document may or may not match. I think the 
 HashSet may be to blame, but I'm not sure - that at least seems to be where 
 the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-08 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3215:
---

Assignee: Doron Cohen

 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 Key: LUCENE-3215
 URL: https://issues.apache.org/jira/browse/LUCENE-3215
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Doron Cohen
 Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, 
 LUCENE-3215_test.patch


 reported on user list:
 http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-08 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100182#comment-13100182
 ] 

Doron Cohen commented on LUCENE-3215:
-

An update on this...

This is not related to LUCENE-3142 - the latter was fixed but this one still 
fails.

The patch fix which 'abs' the distance indeed avoids the infinite score 
problem, but I was not 100% comfortable with it - how can the distance be none 
positive?

Digging into it shows a wrong assumption in SloppyPhraseScorer:

{code}
private int initPhrasePositions() throws IOException {
int end = 0;
{code}

The initial value of end assumes that all positions will be nonnegative.
But this is wrong, as PP position is computed as 

{code}
  position = postings.nextPosition() - offset
{code}

So, whenever the query term appears in the doc in a position smaller than its 
offset in the query, the computed position is negative. The correct 
initialization for end is therefore:

{code}
private int initPhrasePositions() throws IOException {
int end = Integer.MIN_VALUE;
{code}

You would expect this bug to surfaced sooner...

Anyhow, for the 3 tests that Robert added, this only resolve 
testInfiniteFreq1() but the other two tests still fail, investigating...

 SloppyPhraseScorer sometimes computes Infinite freq
 ---

 Key: LUCENE-3215
 URL: https://issues.apache.org/jira/browse/LUCENE-3215
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Doron Cohen
 Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, 
 LUCENE-3215_test.patch


 reported on user list:
 http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-07 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3412:


Attachment: LUCENE-3412.patch

Attached patch with fix to this bug.

The fix is rather simple, - just process PP's in offset order. That is, when 
avoiding conflicts (a conflict means: more than a single query PP is landing on 
the same doc TP), make sure to handle PPs in a specific order: from first in 
query to last in query. 

This is crucial because the check for conflicts returns the PP with greater 
offset, and that one is advanced.

It was pretty quick to fix this, but took longer to justify the fix.

I added some explanations in the code so that next time justification would be 
faster :) and also renamed termPositionsDiffer() to termPositionsConflict() 
which more accurately describes the logic of that method.

now need to see if this fix is also related to LUCENE-3215.

 SloppyPhraseScorer returns non-deterministic results for queries with many 
 repeats
 --

 Key: LUCENE-3412
 URL: https://issues.apache.org/jira/browse/LUCENE-3412
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1, 3.2, 3.3, 4.0
Reporter: Michael Ryan
Assignee: Doron Cohen
 Attachments: LUCENE-3412.patch, LUCENE-3412.patch


 Proximity queries with many repeats (four or more, based on my testing) 
 return non-deterministic results. I run the same query multiple times with 
 the same data set and get different results.
 So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
 trunk.
 Steps to reproduce (using the Solr example):
 1) In solrconfig.xml, set queryResultCache size to 0.
 2) Add some documents with text dog dog dog and dog dog dog dog. 
 http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
 3) Do a dog dog dog dog~1 query. 
 http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
 4) Repeat step 3 many times.
 Expected results: The document with id 2 should be returned.
 Actual results: The document with id 2 is always returned. The document with 
 id 1 is sometimes returned.
 Different proximity values show the same bug - dog dog dog dog~5, dog dog 
 dog dog~100, etc show the same behavior.
 So far I've traced it down to the repeats array in 
 SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
 elements in this array, the document may or may not match. I think the 
 HashSet may be to blame, but I'm not sure - that at least seems to be where 
 the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3412:
---

Assignee: Doron Cohen

 SloppyPhraseScorer returns non-deterministic results for queries with many 
 repeats
 --

 Key: LUCENE-3412
 URL: https://issues.apache.org/jira/browse/LUCENE-3412
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1, 3.2, 3.3, 4.0
Reporter: Michael Ryan
Assignee: Doron Cohen

 Proximity queries with many repeats (four or more, based on my testing) 
 return non-deterministic results. I run the same query multiple times with 
 the same data set and get different results.
 So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
 trunk.
 Steps to reproduce (using the Solr example):
 1) In solrconfig.xml, set queryResultCache size to 0.
 2) Add some documents with text dog dog dog and dog dog dog dog. 
 http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
 3) Do a dog dog dog dog~1 query. 
 http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
 4) Repeat step 3 many times.
 Expected results: The document with id 2 should be returned.
 Actual results: The document with id 2 is always returned. The document with 
 id 1 is sometimes returned.
 Different proximity values show the same bug - dog dog dog dog~5, dog dog 
 dog dog~100, etc show the same behavior.
 So far I've traced it down to the repeats array in 
 SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
 elements in this array, the document may or may not match. I think the 
 HashSet may be to blame, but I'm not sure - that at least seems to be where 
 the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-06 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3412:


Attachment: LUCENE-3412.patch

I am able to see this inconsistent behavior!

Attached patch contains a test that fails on this. The test currently prints 
the trial number, and the first loop always pass in all 30 trials (expected) 
while the second loop always fail (for me) but is inconsistent about when it 
fails. Sometimes, it fails on the first iteration. Some other times it fails on 
the 3rd, 9th, etc.

Quite peculiar... investigating...

 SloppyPhraseScorer returns non-deterministic results for queries with many 
 repeats
 --

 Key: LUCENE-3412
 URL: https://issues.apache.org/jira/browse/LUCENE-3412
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.1, 3.2, 3.3, 4.0
Reporter: Michael Ryan
Assignee: Doron Cohen
 Attachments: LUCENE-3412.patch


 Proximity queries with many repeats (four or more, based on my testing) 
 return non-deterministic results. I run the same query multiple times with 
 the same data set and get different results.
 So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
 trunk.
 Steps to reproduce (using the Solr example):
 1) In solrconfig.xml, set queryResultCache size to 0.
 2) Add some documents with text dog dog dog and dog dog dog dog. 
 http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3Ecommit=true
 3) Do a dog dog dog dog~1 query. 
 http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
 4) Repeat step 3 many times.
 Expected results: The document with id 2 should be returned.
 Actual results: The document with id 2 is always returned. The document with 
 id 1 is sometimes returned.
 Different proximity values show the same bug - dog dog dog dog~5, dog dog 
 dog dog~100, etc show the same behavior.
 So far I've traced it down to the repeats array in 
 SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
 elements in this array, the document may or may not match. I think the 
 HashSet may be to blame, but I'm not sure - that at least seems to be where 
 the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-02 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3390.
-

   Resolution: Fixed
Fix Version/s: 3.4
Lucene Fields: [Patch Available]  (was: [New])

Fixed in 3.x r1164794.
Thanks Gilad!

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Fix For: 3.4

 Attachments: LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-01 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095409#comment-13095409
 ] 

Doron Cohen commented on LUCENE-3390:
-

I think it may be useful to solve this also in 3x - without the 
cached-array-creators of the trunk, but with similar concept - i.e. an 
additional cache type will cache the docs missing values for certain field, 
and will allow to use the default value assigned by apps calling 
setMissingValue() as in trunk. Gilad and I looked at this, will post a patch 
shortly for review...

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Attachments: SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-01 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3390:


Attachment: LUCENE-3390.patch

Attached patch fixing this bug. 
TestSort was enhanced to test the new setMissingValue() method - actually 
merging the test from trunk r1002460 (LUCENE-2671).

All search test passed (running the rest now..)

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Attachments: LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-01 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3390:
---

Assignee: Doron Cohen

 Incorrect sort by Numeric values for documents missing the sorting field
 

 Key: LUCENE-3390
 URL: https://issues.apache.org/jira/browse/LUCENE-3390
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.3
Reporter: Gilad Barkai
Assignee: Doron Cohen
Priority: Minor
  Labels: double, float, int, long, numeric, sort
 Attachments: LUCENE-3390.patch, SortByDouble.java


 While sorting results over a numeric field, documents which do not contain a 
 value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
 against Double, Float, Int  Long numeric fields ascending and descending 
 order).
 This behavior is unexpected, as zero is comparable to the rest of the 
 values. A better solution would either be allowing the user to define such a 
 non-value default, or always bring those document results as the last ones.
 Example scenario:
 Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
 value.
 Searching with MatchAllDocsQuery, with sort over that field in descending 
 order yields the docid results of 0, 2, 1.
 Asking for the top 2 documents brings the document without any value as the 
 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-06-30 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3142.
-

Resolution: Fixed

r1141465: trunk
r1141468: 3x

 benchmark/stats package is obsolete and unused - remove it
 --

 Key: LUCENE-3142
 URL: https://issues.apache.org/jira/browse/LUCENE-3142
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 This seems like a leftover from the original benchmark implementation and can 
 thus be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3153) Adding field w/ norms should fail if same field was added w/o norms already

2011-05-31 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041461#comment-13041461
 ] 

Doron Cohen commented on LUCENE-3153:
-

I was not clear enough.

I meant that when deciding on consistency of requested NORMS state, if relying 
only on committed data, then the handling of add/update requests is in a best 
effort manner, while the handling at commit is complete.

So, for this example:

* Index does not contain field F
* doc1 is added with F set to NO NORMS
* doc2 is added with F set to WITH NORMS

I was not sure about the ability to tell that F in doc2 is inconsistent, 
because of relying on committed data, and, perhaps, especially with DWPT.

At commit, it is def possible to check this.

Similarly this scenario has same problem:

* Index contains (committed) field F WITH NORMS
* doc1 is added with F set to NO NORMS
* doc2 is added with F set to WITH NORMS

Again, F in doc2, while consistent with F as committed in the index, is 
inconsistent with previously added F in doc1.

In this situation, throwing the exception due to inconsistencies might have to 
be late in some scenarios (at commit) and hence unacceptable IMO. At the least, 
such a behavior should be specifically requested by application, e.g. by 
setting a STRICT_NORMS mode or something like that in iwcfg. 

I am not convinced going that far is justified.

 Adding field w/ norms should fail if same field was added w/o norms already
 ---

 Key: LUCENE-3153
 URL: https://issues.apache.org/jira/browse/LUCENE-3153
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
 Fix For: 4.0


 A spinoff from LUCENE-3146. Consider the following two scenarios, according 
 to how 4.0 currently works:
 * Field a is added w/ norms. Sometime later field a is added to a 
 document w/o norms -- norms are disabled for field a, for all docs.
 * Field a is added w/o norms - norms are disabled for field a. Sometime 
 later field a is added to a document w/ norms -- app thinks norms were 
 added, while in fact they are dropped.
 This is a bug and case #2 should fail on add/updateDocument - app should know 
 norms were not added. While case #1 isn't great either, it's the only way an 
 app can choose to disable norms for field a, after instances of it already 
 contain norms, so we should support that scenario.
 In order to detect that early, we should track norms info in .fnx, as Mike 
 describes at LUCENE-3146. Since this changes the index format, we should also 
 update the file format page after we do it.
 Not sure what's the deal w/ 3.x indexes that are read by 4.0 code. Initially 
 they won't have .fnx file, so no central norms information exist to detect 
 the cases I've described above. Over time, as segments are merged, .fnx will 
 include information from more and more segments, but there's always a chance 
 few segments will still contain the norms for field a. I'm not very 
 familiar w/ that part of the code, but I think that:
 * If .fnx says no norms for field a, the we ignore any norms information 
 that may or may not exist in segments.
 * If .fnx says norms for field a, then we need to make up some norms values 
 for (old) segments w/ no norms? We need to make up values during segment 
 merge and search?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3153) Adding field w/ norms should fail if same field was added w/o norms already

2011-05-30 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041403#comment-13041403
 ] 

Doron Cohen commented on LUCENE-3153:
-

Can this be checked before any commit (/flush)?

Assume 10 docs were added without norms to a fresh index, now, without a commit 
or even a flush, a document is added with norms. Is the info required for 
checking the configuration for that field available at that time?

If it is not, this is still just a best effort check.

 Adding field w/ norms should fail if same field was added w/o norms already
 ---

 Key: LUCENE-3153
 URL: https://issues.apache.org/jira/browse/LUCENE-3153
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
 Fix For: 4.0


 A spinoff from LUCENE-3146. Consider the following two scenarios, according 
 to how 4.0 currently works:
 * Field a is added w/ norms. Sometime later field a is added to a 
 document w/o norms -- norms are disabled for field a, for all docs.
 * Field a is added w/o norms - norms are disabled for field a. Sometime 
 later field a is added to a document w/ norms -- app thinks norms were 
 added, while in fact they are dropped.
 This is a bug and case #2 should fail on add/updateDocument - app should know 
 norms were not added. While case #1 isn't great either, it's the only way an 
 app can choose to disable norms for field a, after instances of it already 
 contain norms, so we should support that scenario.
 In order to detect that early, we should track norms info in .fnx, as Mike 
 describes at LUCENE-3146. Since this changes the index format, we should also 
 update the file format page after we do it.
 Not sure what's the deal w/ 3.x indexes that are read by 4.0 code. Initially 
 they won't have .fnx file, so no central norms information exist to detect 
 the cases I've described above. Over time, as segments are merged, .fnx will 
 include information from more and more segments, but there's always a chance 
 few segments will still contain the norms for field a. I'm not very 
 familiar w/ that part of the code, but I think that:
 * If .fnx says no norms for field a, the we ignore any norms information 
 that may or may not exist in segments.
 * If .fnx says norms for field a, then we need to make up some norms values 
 for (old) segments w/ no norms? We need to make up values during segment 
 merge and search?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr

2011-05-30 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041409#comment-13041409
 ] 

Doron Cohen commented on LUCENE-3164:
-

Specifically, current files are:

lucene:
- CHANGES.txt
- contrib/benchmark/CHANGES.txt
- contrib/CHANGES.txt
- contrib/grouping/CHANGES.txt

solr
- CHANGES.txt
- client/ruby/flare/vendor/plugins/engines/CHANGELOG (\?)
- client/ruby/solr-ruby/CHANGES.yml (\?)
- contrib/analysis-extras/CHANGES.txt
- contrib/clustering/CHANGES.txt
- contrib/dataimporthandler/CHANGES.txt
- solr/contrib/extraction/CHANGES.txt
- solr/contrib/uima/CHANGES.txt

In favor of this, all changes would become more easily readable for users in 
the HTML format.

There is a risk that changes in contribs/modules would clutter the core 
changes. For example, today, even small changes in contrib/benchmark are listed 
in the changes file. But when this becomes part of the global changes file, not 
sure if all bm changes would be adequate to be listed there?

 consolidate various CHANGES.txt into two files: lucene and solr
 ---

 Key: LUCENE-3164
 URL: https://issues.apache.org/jira/browse/LUCENE-3164
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir

 There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the 
 benchmark package has its own CHANGES.txt, in trunk all the modules have 
 their own CHANGES.txt, and each solr contrib has its own CHANGES.txt
 I propose we merge these files into a CHANGES.txt for each product we make. 
 so that means lucene/CHANGES.txt and solr/CHANGES.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3161) consider warnings from the source compilation

2011-05-30 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041418#comment-13041418
 ] 

Doron Cohen commented on LUCENE-3161:
-

bq. And, I don't think we should in general hide any warnings, even to users 
for the reasons i mentioned above.

+1 for not hiding!

 consider warnings from the source compilation
 -

 Key: LUCENE-3161
 URL: https://issues.apache.org/jira/browse/LUCENE-3161
 Project: Lucene - Java
  Issue Type: Task
  Components: general/build
Reporter: Robert Muir
  Labels: maybe32blocker
 Fix For: 3.3, 4.0


 as Doron mentioned in his review: At compiling there are various warning 
 printed, I think it would be more assuring for downloaders if the build runs 
 without warning. These warnings are not a stopper.
 we could conditionalize these warnings so that they don't display when 
 compiling from actual releases, but I have to wonder if we should hide 
 these... being open source I think we should display all our warts, maybe 
 some contributor sees these warnings and decides they want to submit a patch 
 to fix some of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr

2011-05-30 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041425#comment-13041425
 ] 

Doron Cohen commented on LUCENE-3164:
-

I agree that with frequent releases this is less of an issue.

What are your thoughts about trunk in this regard - would you like there 3 
changes files, i.e. keep one for modules?

 consolidate various CHANGES.txt into two files: lucene and solr
 ---

 Key: LUCENE-3164
 URL: https://issues.apache.org/jira/browse/LUCENE-3164
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir

 There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the 
 benchmark package has its own CHANGES.txt, in trunk all the modules have 
 their own CHANGES.txt, and each solr contrib has its own CHANGES.txt
 I propose we merge these files into a CHANGES.txt for each product we make. 
 so that means lucene/CHANGES.txt and solr/CHANGES.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr

2011-05-30 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041439#comment-13041439
 ] 

Doron Cohen commented on LUCENE-3164:
-

Agreed, 3 for now, and then we'll see...

 consolidate various CHANGES.txt into two files: lucene and solr
 ---

 Key: LUCENE-3164
 URL: https://issues.apache.org/jira/browse/LUCENE-3164
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir

 There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the 
 benchmark package has its own CHANGES.txt, in trunk all the modules have 
 their own CHANGES.txt, and each solr contrib has its own CHANGES.txt
 I propose we merge these files into a CHANGES.txt for each product we make. 
 so that means lucene/CHANGES.txt and solr/CHANGES.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted

2011-05-25 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-929.


Resolution: Fixed

bq. Doron, that's fine to open a new issue and close this one, but it was this 
issue's fix that introduced the bug.

Thanks for clarifying!
Okay, so I will fix this in LUCENE-3137 (it makes sense to me at this time 
since this one was resolved 4 months ago and fixed something else) and resolve 
this one.

 contrib/benchmark build doesn't handle checking if content is properly 
 extracted
 

 Key: LUCENE-929
 URL: https://issues.apache.org/jira/browse/LUCENE-929
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.0, 3.1


 The contrib/benchmark build does not properly handle checking to see if the 
 content (such as Reuters coll.) is properly extracted.  It only checks to see 
 if the directory exists.  Thus, it is possible that the directory gets 
 created and the extraction fails.  Then, the next time it is run, it skips 
 the extraction part and tries to continue on running the benchmark.
 The workaround is to manually delete the extraction directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash

2011-05-25 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3137.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Trunk: r1127436
3x: r1127466

 Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir 
 param ends by slash
 ---

 Key: LUCENE-3137
 URL: https://issues.apache.org/jira/browse/LUCENE-3137
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Affects Versions: 3.2, 4.0
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3137.patch


 See LUCENE-929 for context.
 As result, it might fail to create the temp dir at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-05-25 Thread Doron Cohen (JIRA)
benchmark/stats package is obsolete and unused - remove it
--

 Key: LUCENE-3142
 URL: https://issues.apache.org/jira/browse/LUCENE-3142
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


This seems like a leftover from the original benchmark implementation and can 
thus be removed.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-05-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039066#comment-13039066
 ] 

Doron Cohen commented on LUCENE-3142:
-

Does anyone see why this should remain? (I will wait ~2 days before actually 
removing it)

 benchmark/stats package is obsolete and unused - remove it
 --

 Key: LUCENE-3142
 URL: https://issues.apache.org/jira/browse/LUCENE-3142
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 This seems like a leftover from the original benchmark implementation and can 
 thus be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-05-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039073#comment-13039073
 ] 

Doron Cohen commented on LUCENE-3142:
-

Just to make sure this is clear, the package in question is: 
o.a.l.benchmark.stats

 benchmark/stats package is obsolete and unused - remove it
 --

 Key: LUCENE-3142
 URL: https://issues.apache.org/jira/browse/LUCENE-3142
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor

 This seems like a leftover from the original benchmark implementation and can 
 thus be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash

2011-05-24 Thread Doron Cohen (JIRA)
Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir 
param ends by slash
---

 Key: LUCENE-3137
 URL: https://issues.apache.org/jira/browse/LUCENE-3137
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Affects Versions: 3.2, 4.0
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


See LUCENE-929 for context.
As result, it might fail to create the temp dir at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash

2011-05-24 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3137:


Attachment: LUCENE-3137.patch

Simple patch solving this slash problem.

 Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir 
 param ends by slash
 ---

 Key: LUCENE-3137
 URL: https://issues.apache.org/jira/browse/LUCENE-3137
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Affects Versions: 3.2, 4.0
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3137.patch


 See LUCENE-929 for context.
 As result, it might fail to create the temp dir at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted

2011-05-24 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038498#comment-13038498
 ] 

Doron Cohen commented on LUCENE-929:


bq. Note, this fix this doesn't work if the output dir has a trailing slash

I think this is a separate issue - I mean not handling a trailing slash. 
Created LUCENE-3137 for handling this.

 contrib/benchmark build doesn't handle checking if content is properly 
 extracted
 

 Key: LUCENE-929
 URL: https://issues.apache.org/jira/browse/LUCENE-929
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0


 The contrib/benchmark build does not properly handle checking to see if the 
 content (such as Reuters coll.) is properly extracted.  It only checks to see 
 if the directory exists.  Thus, it is possible that the directory gets 
 created and the extraction fails.  Then, the next time it is run, it skips 
 the extraction part and tries to continue on running the benchmark.
 The workaround is to manually delete the extraction directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted

2011-05-24 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038502#comment-13038502
 ] 

Doron Cohen commented on LUCENE-929:


There's now a simple patch for this in LUCENE-3137. 
I think this one can be closed?

 contrib/benchmark build doesn't handle checking if content is properly 
 extracted
 

 Key: LUCENE-929
 URL: https://issues.apache.org/jira/browse/LUCENE-929
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0


 The contrib/benchmark build does not properly handle checking to see if the 
 content (such as Reuters coll.) is properly extracted.  It only checks to see 
 if the directory exists.  Thus, it is possible that the directory gets 
 created and the extraction fails.  Then, the next time it is run, it skips 
 the extraction part and tries to continue on running the benchmark.
 The workaround is to manually delete the extraction directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0

2011-05-22 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned SOLR-2500:
-

Assignee: Doron Cohen

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Doron Cohen
 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0

2011-05-22 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved SOLR-2500.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

fixed in trunk: r1125932.
merged to 3x: r1125942.

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Doron Cohen
 Fix For: 3.2, 4.0

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-05-19 Thread Doron Cohen (JIRA)
span query matches too many docs when two query terms are the same unless 
inOrder=true
--

 Key: LUCENE-3120
 URL: https://issues.apache.org/jira/browse/LUCENE-3120
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0


spinoff of user list discussion - [SpanNearQuery - inOrder 
parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].

With 3 documents:
*  a b x c d
*  a b b d
*  a b x b y d

Here are a few queries (the number in parenthesis indicates expected #hits):


These ones work *as expected*:
* (1)  in-order, slop=0, b, x, b
* (1)  in-order, slop=0, b, b
* (2)  in-order, slop=1, b, b

These ones match *too many* hits:
* (1)  any-order, slop=0, b, x, b
* (1)  any-order, slop=1, b, x, b
* (1)  any-order, slop=2, b, x, b
* (1)  any-order, slop=3, b, x, b

These ones match *too many* hits as well:
* (1)  any-order, slop=0, b, b
* (2)  any-order, slop=1, b, b

Each of the above passes when using a phrase query (applying the slop, no 
in-order indication in phrase query).

This seems related to a known overlapping spans issue - [non-overlapping Span 
queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
so we might decide to close this bug after all, but I would like to at least 
have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-05-19 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3120:


Attachment: LUCENE-3120.patch

Attached test case demonstrating the bug.

 span query matches too many docs when two query terms are the same unless 
 inOrder=true
 --

 Key: LUCENE-3120
 URL: https://issues.apache.org/jira/browse/LUCENE-3120
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3120.patch


 spinoff of user list discussion - [SpanNearQuery - inOrder 
 parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
 With 3 documents:
 *  a b x c d
 *  a b b d
 *  a b x b y d
 Here are a few queries (the number in parenthesis indicates expected #hits):
 These ones work *as expected*:
 * (1)  in-order, slop=0, b, x, b
 * (1)  in-order, slop=0, b, b
 * (2)  in-order, slop=1, b, b
 These ones match *too many* hits:
 * (1)  any-order, slop=0, b, x, b
 * (1)  any-order, slop=1, b, x, b
 * (1)  any-order, slop=2, b, x, b
 * (1)  any-order, slop=3, b, x, b
 These ones match *too many* hits as well:
 * (1)  any-order, slop=0, b, b
 * (2)  any-order, slop=1, b, b
 Each of the above passes when using a phrase query (applying the slop, no 
 in-order indication in phrase query).
 This seems related to a known overlapping spans issue - [non-overlapping Span 
 queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
 so we might decide to close this bug after all, but I would like to at least 
 have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036107#comment-13036107
 ] 

Doron Cohen commented on LUCENE-3068:
-

Looking at http://people.apache.org/~mikemccand/lucenebench/SloppyPhrase.html 
(Mike this is a great tool!) I see no particular slowdown at the last runs.

A thought about these benchmarks, it would be helpful if the checked revision 
would be shown - perhaps as part of the hover text when hovering the mouse on a 
graph point...

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036111#comment-13036111
 ] 

Doron Cohen commented on LUCENE-3068:
-

bq. Note that if you go back to the root page, and click on a given day, it 
tells you the svn rev and also hg ref (of luceneutil)

Great, thanks!

So, this commit to trunk in r1124293 falls between these two:

- Tue 17/05/2011 Lucene/Solr trunk rev 1104671
- Wed 18/05/2011 Lucene/Solr trunk rev 1124524

... No measurable degradation, good!

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)
TestIndexWriter.testBackgroundOptimize fails with too many open files
-

 Key: LUCENE-3123
 URL: https://issues.apache.org/jira/browse/LUCENE-3123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
 Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
1.6.0_20 (32-bit)/cpus=1,threads=2
Reporter: Doron Cohen


Recreate with this line:

ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
-Dtests.seed=-3981504507637360146:51354004663342240

Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036163#comment-13036163
 ] 

Doron Cohen commented on LUCENE-3123:
-

This is on Ubuntu btw.

Run log:
{noformat}
NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testBackgroundOptimize 
-Dtests.seed=-3981504507637360146:51354004663342240
NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testBackgroundOptimize 
-Dtests.seed=-3981504507637360146:51354004663342240
The following exceptions were thrown by threads:
*** Thread: Lucene Merge Thread #0 ***
org.apache.lucene.index.MergePolicy$MergeException: 
java.io.FileNotFoundException: /tmp/test4907593285402510583tmp/_51_0.sd (Too 
many open files)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
Caused by: java.io.FileNotFoundException: 
/tmp/test4907593285402510583tmp/_51_0.sd (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:69)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:90)
at 
org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:56)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:337)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:402)
at 
org.apache.lucene.index.codecs.mockrandom.MockRandomCodec.fieldsProducer(MockRandomCodec.java:236)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.init(PerFieldCodecWrapper.java:113)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:210)
at 
org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:131)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:495)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:635)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3260)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2930)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
NOTE: test params are: codec=RandomCodecProvider: {field=MockRandom}, 
locale=nl_NL, timezone=Turkey
NOTE: all tests run in this JVM:
[TestIndexWriter]
NOTE: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 1.6.0_20 
(32-bit)/cpus=1,threads=2,free=26480072,total=33468416
{noformat}

 TestIndexWriter.testBackgroundOptimize fails with too many open files
 -

 Key: LUCENE-3123
 URL: https://issues.apache.org/jira/browse/LUCENE-3123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
 Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
 1.6.0_20 (32-bit)/cpus=1,threads=2
Reporter: Doron Cohen

 Recreate with this line:
 ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
 -Dtests.seed=-3981504507637360146:51354004663342240
 Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036242#comment-13036242
 ] 

Doron Cohen commented on SOLR-2500:
---

From Eclipse (XP), passed at 1st attempt, failed at the 2nd!

I am not familiar with this part of the code so it would be too much work to 
track it all the way myself, but I think I can now provide sufficient 
information for solving it.

In Eclipse, after cleaning the project the test passes, and then start failing 
in all successive runs. 
So I assume when you run it isolated you also do clean, which covers Eclipse's 
clean (and more). 

I tracked the content of the cleaned relevant dir before and after the test - 
it is (trunk/)bin/solr - there's only one file that differs between the runs - 
this is bin/solr/shared/solr.xml.

Not sure if this is a bug in the test not cleaning after itself or a bug in the 
code that reads the configuration...

I'll attach here the two file so that you can compare them.


 TestSolrCoreProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir

 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0

2011-05-19 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated SOLR-2500:
--

Attachment: solr-after-1st-run.xml
solr-clean.xml

solr.xml files from trunk/bin/solr/shared:
- clean - with which the test passes.
- after-1st-run - with which it fails.

 TestSolrCoreProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036288#comment-13036288
 ] 

Doron Cohen commented on SOLR-2500:
---

FWIW, also the first clean run would fail if test's tearDown() is modified like 
this:

{noformat}
-persistedFile.delete();
+assertTrue(could not delete +persistedFile, persistedFile.delete());
{noformat}

For some reason it fails to remove that file - in both Linux and Windows.

 TestSolrCoreProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: SOLR-2500.patch, solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036300#comment-13036300
 ] 

Doron Cohen commented on SOLR-2500:
---

Oops just noticed I was testing all this time TestSolrProperties and not 
TestSolrCoreProperties, and, because the error message was the same as in the 
issue description *No such core: core0* I was sure that this is the same 
test... Now this is confusing...

Hmmm.. the original exception reported above is 
[junit] at 
org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)

So perhaps I was working on the correct bug after all and just the JIRA issue 
title is inaccurate?
Or I need to call it a day... :)

Anyhow, TestSolrProperties consistently behaves as I described here, while 
TestSolrCoreProperties consistently passes (when ran in standalone mode).

 TestSolrCoreProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: SOLR-2500.patch, solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036322#comment-13036322
 ] 

Doron Cohen commented on LUCENE-3123:
-

Yes, thanks, now it passes (trunk) - with this seed as well quite a few times 
without specifying a seed. 
I'll now verify on 3x.

 TestIndexWriter.testBackgroundOptimize fails with too many open files
 -

 Key: LUCENE-3123
 URL: https://issues.apache.org/jira/browse/LUCENE-3123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
 Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
 1.6.0_20 (32-bit)/cpus=1,threads=2
Reporter: Doron Cohen

 Recreate with this line:
 ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
 -Dtests.seed=-3981504507637360146:51354004663342240
 Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036331#comment-13036331
 ] 

Doron Cohen commented on LUCENE-3123:
-

I fact in 3x this is not reproducible with same seed (expected as Robert once 
explained) and I was not able to reproduce it with no seed, tried with 
-Dtest.iter=100 as well (though I am not sure, would a new seed be created in 
each iteration? Need to verify this...)
Anyhow in 3x the test passes also after svn up with this fix.
So I think this can be resolved...

 TestIndexWriter.testBackgroundOptimize fails with too many open files
 -

 Key: LUCENE-3123
 URL: https://issues.apache.org/jira/browse/LUCENE-3123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
 Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
 1.6.0_20 (32-bit)/cpus=1,threads=2
Reporter: Doron Cohen

 Recreate with this line:
 ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
 -Dtests.seed=-3981504507637360146:51354004663342240
 Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3123.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Fixed by Mike, thanks Mike!

 TestIndexWriter.testBackgroundOptimize fails with too many open files
 -

 Key: LUCENE-3123
 URL: https://issues.apache.org/jira/browse/LUCENE-3123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
 Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
 1.6.0_20 (32-bit)/cpus=1,threads=2
Reporter: Doron Cohen
 Fix For: 3.2, 4.0


 Recreate with this line:
 ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
 -Dtests.seed=-3981504507637360146:51354004663342240
 Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2500) TestSolrCoreProperties sometimes fails with no such core: core0

2011-05-19 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated SOLR-2500:
--

Attachment: SOLR-2500.patch

Attached patch, test passes now in both IDE and cmd line:

* at setup() copies solr.xml to a private file. 

* use that private file as its solr.solr.home.

* erase that file at tearDown(), though not erasing it
  should not affect on further/re/tests.

* fixes the deletion at tearDown() to look at 
  solr.solr.home rather than solr.home.
  (I think this was a bug on a bug in this test - it used the
  original file at s.s.h but for cleanup 
  attempted to remove files from just s.h.

This debugging took place in pure darkness, better review...

 TestSolrCoreProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-05-19 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3120:


Attachment: LUCENE-3120.patch

Updated patch with fixed test to not depend on analysis module.

 span query matches too many docs when two query terms are the same unless 
 inOrder=true
 --

 Key: LUCENE-3120
 URL: https://issues.apache.org/jira/browse/LUCENE-3120
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3120.patch, LUCENE-3120.patch


 spinoff of user list discussion - [SpanNearQuery - inOrder 
 parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
 With 3 documents:
 *  a b x c d
 *  a b b d
 *  a b x b y d
 Here are a few queries (the number in parenthesis indicates expected #hits):
 These ones work *as expected*:
 * (1)  in-order, slop=0, b, x, b
 * (1)  in-order, slop=0, b, b
 * (2)  in-order, slop=1, b, b
 These ones match *too many* hits:
 * (1)  any-order, slop=0, b, x, b
 * (1)  any-order, slop=1, b, x, b
 * (1)  any-order, slop=2, b, x, b
 * (1)  any-order, slop=3, b, x, b
 These ones match *too many* hits as well:
 * (1)  any-order, slop=0, b, b
 * (2)  any-order, slop=1, b, b
 Each of the above passes when using a phrase query (applying the slop, no 
 in-order indication in phrase query).
 This seems related to a known overlapping spans issue - [non-overlapping Span 
 queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
 so we might decide to close this bug after all, but I would like to at least 
 have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035422#comment-13035422
 ] 

Doron Cohen commented on LUCENE-3068:
-

fixed in trunk in r1124293.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3068.
-

Resolution: Fixed

fix merged to 3x in r1124302.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035643#comment-13035643
 ] 

Doron Cohen commented on LUCENE-3068:
-

I wonder if this should be fixed also in 3.1 branch?
Probably so only if we make a 3.1.1, but not needed if its gonna be a 3.2. 
What's the best practice then? Reopen until decision?
Or rely on rescanning all 3.2 changes in case its gonna be 3.1.1?

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2736) Wrong implementation of DocIdSetIterator.advance

2011-05-17 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034618#comment-13034618
 ] 

Doron Cohen commented on LUCENE-2736:
-

Shai, with the modified text the NOTE on implementations freedom to not 
advance beyond in some situations becomes strange... I think that the original 
text stress the fact the real intended behavior is to do advance beyond 
current, just that for performance reasons the decision whether to advance 
beyond in some situations is left for implementation decision, and so, if 
caller provides a target which is not greater than current, it should be aware 
of this possibility. 

So I think it is perhaps better to either not modify this at all, or at most, 
to add (see NOTE below) just after beyond:

{noformat}
-   * Advances to the first beyond the current whose document number is greater
+   * Advances to the first beyond (see NOTE below) the current whose document 
number is greater
{noformat}

This would prevent the confusion I think?

 Wrong implementation of DocIdSetIterator.advance 
 -

 Key: LUCENE-2736
 URL: https://issues.apache.org/jira/browse/LUCENE-2736
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.2, 4.0
Reporter: Hardy Ferentschik
Assignee: Shai Erera
 Attachments: LUCENE-2736.patch


 Implementations of {{DocIdSetIterator}} behave differently when advanced is 
 called. Taking the following test for {{OpenBitSet}}, {{DocIdBitSet}} and 
 {{SortedVIntList}} only {{SortedVIntList}} passes the test:
 {code:title=org.apache.lucene.search.TestDocIdSet.java|borderStyle=solid}
 ...
   public void testAdvanceWithOpenBitSet() throws IOException {
   DocIdSet idSet = new OpenBitSet( new long[] { 1121 }, 1 );  // 
 bits 0, 5, 6, 10
   assertAdvance( idSet );
   }
   public void testAdvanceDocIdBitSet() throws IOException {
   BitSet bitSet = new BitSet();
   bitSet.set( 0 );
   bitSet.set( 5 );
   bitSet.set( 6 );
   bitSet.set( 10 );
   DocIdSet idSet = new DocIdBitSet(bitSet);
   assertAdvance( idSet );
   }
   public void testAdvanceWithSortedVIntList() throws IOException {
   DocIdSet idSet = new SortedVIntList( 0, 5, 6, 10 );
   assertAdvance( idSet );
   }   
   private void assertAdvance(DocIdSet idSet) throws IOException {
   DocIdSetIterator iter = idSet.iterator();
   int docId = iter.nextDoc();
   assertEquals( First doc id should be 0, 0, docId );
   docId = iter.nextDoc();
   assertEquals( Second doc id should be 5, 5, docId );
   docId = iter.advance( 5 );
   assertEquals( Advancing iterator should return the next doc 
 id, 6, docId );
   }
 {code}
 The javadoc for {{advance}} says:
 {quote}
 Advances to the first *beyond* the current whose document number is greater 
 than or equal to _target_.
 {quote}
 This seems to indicate that {{SortedVIntList}} behaves correctly, whereas the 
 other two don't. 
 Just looking at the {{DocIdBitSet}} implementation advance is implemented as:
 {code}
 bitSet.nextSetBit(target);
 {code}
 where the docs of {{nextSetBit}} say:
 {quote}
 Returns the index of the first bit that is set to true that occurs *on or 
 after* the specified starting index
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3034) If you vary a setting per round and that setting is a long string, the report padding/columns break down.

2011-05-12 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032453#comment-13032453
 ] 

Doron Cohen commented on LUCENE-3034:
-

Hi Mark, could you add an example algorithm with this behavior?

Also, this is from the package javadocs:

{code}
# multi val params are iterated by NewRound's, added to reports, start with 
column name.
merge.factor=mrg:10:20
max.buffered=buf:100:1000
{code}

Is it possible to workaround the problem by specifying a sufficiently long 
column name as the first value, that is, replacing e.g. 'mrg' or 'buf' in the 
above?

 If you vary a setting per round and that setting is a long string, the report 
 padding/columns break down.
 -

 Key: LUCENE-3034
 URL: https://issues.apache.org/jira/browse/LUCENE-3034
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Trivial
 Fix For: 3.1.1, 4.0


 This is especially noticeable if you vary a setting where the value is a 
 fully specified class name - in this case, it would be nice if columns in 
 each row still lined up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3034) If you vary a setting per round and that setting is a long string, the report padding/columns break down.

2011-05-12 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032499#comment-13032499
 ] 

Doron Cohen commented on LUCENE-3034:
-

bq. My original workaround was to simply pad the column name

Yeah that's what I meant, so ok, better formatting will help.

 If you vary a setting per round and that setting is a long string, the report 
 padding/columns break down.
 -

 Key: LUCENE-3034
 URL: https://issues.apache.org/jira/browse/LUCENE-3034
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Trivial
 Fix For: 3.1.1, 4.0


 This is especially noticeable if you vary a setting where the value is a 
 fully specified class name - in this case, it would be nice if columns in 
 each row still lined up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029150#comment-13029150
 ] 

Doron Cohen commented on LUCENE-3068:
-

This is more complex than I originally thought.

# QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) 
query positions is a multi-term.
# MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase 
query.
# PhraseQuery (PQ) sloppy scorer assumes each query position has a single term.
# PQ with several terms in same position cannot be created by parsing it with a 
QP, only manual.
  Manually created, it would have an AND semantics: only docs with ALL the 
terms in pos N should match.
  In other words, assume doc D terms and positions are: 
  a:0 b:1 c:1 d:2
  MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics)
  PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 
'b' in the same position (AND semantics).


Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the 
AND logic assumed by creating the PQ this way, by an OR logic as assumed in 
MPQ. 

{code:title=TestPositionIncrement.testSetPosition has a test for this case 
exactly}
// phrase query should fail for non existing searched term 
// even if there exist another searched terms in the same searched 
position. 
q = new PhraseQuery();
q.add(new Term(field, 3),0);
q.add(new Term(field, 9),0);
hits = searcher.search(q, null, 1000).scoreDocs;
assertEquals(0, hits.length);
{code}

Although QP by default will not create this PQ, I think we need to support it, 
for applications needing to be strict with the search results, with slop. 

So fixing this would need to take place inside SloppyScorer, digging further...

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Attached patch fixes this bug by excluding fro the repeats check those PPs 
originated fro same offset in the query. 

This allows more strict phrase queries: strict on terms in same position (AND 
logic) but still sloppy.

All tests pass, this is ready to go in (unless there are reservations).

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029274#comment-13029274
 ] 

Doron Cohen commented on LUCENE-3068:
-

Thanks for reviewing Shai!
I'll updated the patch with random newDirectory and newICFG - not the focus 
here, but may improve coverage anyhow,
I added tests for the combined case - some AND some OR - that is, using MPQ, 
some add() with a single term (AND), some with an array longer than 1 (OR). 
Also refactored the tests a bit so that now there's a small test method for 
each test case.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Patch with more test cases - AND/OR logic for MPQ is combined, and test code 
made simpler.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
 LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3068:
---

Assignee: Doron Cohen

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028895#comment-13028895
 ] 

Doron Cohen commented on LUCENE-3068:
-

bq. specifically when the doc itself has tokens at the same position.

I am not convinced yet that there is a bug here - I think the code does allow 
this? 

There is another assumption in the code, that any two different PPs are in 
different TPs - which underlines the assumption that originally each PP differs 
in position, This seems a valid assumption, because QP will create MFQ if there 
are two terms in the (phrase) query with same position. 

bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite 
to a MultiPhraseQuery and let it handle the same positions...? Is there any 
downside to that?

I think this is the correct behavior - in particular this will be the query 
that a QP will create. The only way to create a PQ (not MPQ) for PPs in same 
positions is to create it manually. But why would anyone do that? And they did, 
wouldn't such a rewrite be a surprise to them?

A patch to follow with a revised version of this test - one that uses the QP. 
In this patch the QP indeed creates an MFQ, and I am yet unable to make it 
fail. Still trying.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Attached modified version of the test - one that invokes the query parser to 
create an MFQ. The test passes.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3010) Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains

2011-04-11 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3010:


Description: I would like to be able to use the Lucene Benchmark code in 
Lucene contrib with Solr to run some indexing tests.  It would be nice if 
Lucene Benchmark could read my Solr configuration rather than having to 
translate my filter chain and other parameters into Lucene java code.  This 
relates to LUCENE-2845,   (was: I would like to be able to use the Lucene 
Benchmark code in Lucene contrib with Solr to run some indexing tests.  It 
would be nice if Lucene Benchmark could read my Solr configuration rather than 
having to translate my filter chain and other parameters into Lucene java code. 
 This relates to Lucene 2845, )

 Add the ability for the  Lucene Benchmark code to read Solr configuration 
 information for testing Analyzer/Filter Chains
 

 Key: LUCENE-3010
 URL: https://issues.apache.org/jira/browse/LUCENE-3010
 Project: Lucene - Java
  Issue Type: Wish
  Components: contrib/benchmark
Reporter: Tom Burton-West
Priority: Trivial

 I would like to be able to use the Lucene Benchmark code in Lucene contrib 
 with Solr to run some indexing tests.  It would be nice if Lucene Benchmark 
 could read my Solr configuration rather than having to translate my filter 
 chain and other parameters into Lucene java code.  This relates to 
 LUCENE-2845, 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity

2011-03-24 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010570#comment-13010570
 ] 

Doron Cohen commented on LUCENE-2986:
-

+1 for this change (I did not remember discussing this, but other than 
remembering I am consistent :))

Patch looks very clean.

Minor technical comments - concerning just some tests:

- some of the DSP implementations are still named xyzSimilarity - I think it 
would be more clear to name them xyzSimilarityProvider:
-- o.a.l.search.payloads.TestPayloadNearQuery.BoostingSimilarity
-- o.a.l.search.payloads.TestPayloadTermQuery.BoostingSimilarity
-- o.a.solr.schema.MockConfigurableSimilarity
-- o.a.l.index.TestIndexWriterConfig.MySimilarity
-- o.a.l.index.TestIndexReaderCloneNorms.SimilarityOne
-- o.a.l.index.TestNorms.SimilarityOne
-- o.a.l.index.TestOmitTf.SimpleSimilarity
-- o.a.l.search.TestSimilarity.SimpleSimilarity

- for few of the above it is not only the name - they are still doing both 
roles: {code}extends DefaultSimilarity implements SimilarityProvider{code}:
-- o.a.l.search.payloads.TestPayloadNearQuery.BoostingSimilarity
-- o.a.l.search.payloads.TestPayloadTermQuery.BoostingSimilarity
-- o.a.l.index.TestOmitTf.SimpleSimilarity
-- o.a.l.search.TestSimilarity.SimpleSimilarity

Other than that I think it is good to go in.

Also, tests from trunk/lucene and trunk/solr passed.
(I am seeing problems in running all trunk tests, at least on Windows, but I'll 
send a separate mail to the list on that)

 divorce defaultsimilarityprovider from defaultsimilarity
 

 Key: LUCENE-2986
 URL: https://issues.apache.org/jira/browse/LUCENE-2986
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2986.patch


 In LUCENE-2236 as a start, we made DefaultSimilarity which implements the 
 factory interface (SimilarityProvider), and also extends Similarity.
 Its factory interface just returns itself always by default.
 Doron mentioned it would be cleaner to split the two, and I thought it would 
 be good to revisit it later.
 Today as I was looking at SOLR-2338, it became pretty clear that we should do 
 this, it makes things a lot cleaner. I think currently its confusing to users 
 to see the two apis mixed if they are trying to subclass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-2988) trunk 'ant test' hangs

2011-03-24 Thread Doron Cohen (JIRA)
trunk 'ant test' hangs
--

 Key: LUCENE-2988
 URL: https://issues.apache.org/jira/browse/LUCENE-2988
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
 Environment: inspected so far on XP within Cygwin using IBM JDK 6
Reporter: Doron Cohen
Assignee: Doron Cohen
 Fix For: 4.0


Running 'ant test' from trunk on XP in a Cygwin shell hangs, taking 100% CPU.
There was no progress in the console for a long time, so i stopped the program.
Before stopping it, created 5 consecutive thread dumps to see where the code is.
It is not clear what is going on - does not seem like a Lucene code I think but 
not sure.
Opening this issue to keep an eye on this - I will try with other JDKs to see 
if this is persistent.
Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
LUCENE-2977 - I think the changes in these issues are related but will repeat 
the tests without these changes.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2988) trunk 'ant test' hangs

2011-03-24 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2988:


Attachment: 5-java-dumps.zip

Attached contains 5 consecutive java dumps taken when the tests were hanging - 
in the hope that comparing the thread states in the files would reveal the 
problem.

 trunk 'ant test' hangs
 --

 Key: LUCENE-2988
 URL: https://issues.apache.org/jira/browse/LUCENE-2988
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
 Environment: inspected so far on XP within Cygwin using IBM JDK 6
Reporter: Doron Cohen
Assignee: Doron Cohen
 Fix For: 4.0

 Attachments: 5-java-dumps.zip


 Running 'ant test' from trunk on XP in a Cygwin shell hangs, taking 100% CPU.
 There was no progress in the console for a long time, so i stopped the 
 program.
 Before stopping it, created 5 consecutive thread dumps to see where the code 
 is.
 It is not clear what is going on - does not seem like a Lucene code I think 
 but not sure.
 Opening this issue to keep an eye on this - I will try with other JDKs to see 
 if this is persistent.
 Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
 LUCENE-2977 - I think the changes in these issues are related but will repeat 
 the tests without these changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2988) trunk 'ant test' hangs

2011-03-24 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-2988:


Description: 
Running 'ant test' from trunk on XP in a Cygwin shell hangs.
There was no progress in the console for a long time, so i stopped the program.
Before stopping it, created 5 consecutive thread dumps to see where the code is.
It is not clear what is going on - does not seem like a Lucene code I think but 
not sure.
Opening this issue to keep an eye on this - I will try with other JDKs to see 
if this is persistent.
Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
LUCENE-2977 - I think the changes in these issues are related but will repeat 
the tests without these changes.


  was:
Running 'ant test' from trunk on XP in a Cygwin shell hangs, taking 100% CPU.
There was no progress in the console for a long time, so i stopped the program.
Before stopping it, created 5 consecutive thread dumps to see where the code is.
It is not clear what is going on - does not seem like a Lucene code I think but 
not sure.
Opening this issue to keep an eye on this - I will try with other JDKs to see 
if this is persistent.
Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
LUCENE-2977 - I think the changes in these issues are related but will repeat 
the tests without these changes.



Updated the description - removed the part about 100% CPU - apparently the CPU 
consumption was unrelated (though the tests do hang).

 trunk 'ant test' hangs
 --

 Key: LUCENE-2988
 URL: https://issues.apache.org/jira/browse/LUCENE-2988
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
 Environment: inspected so far on XP within Cygwin using IBM JDK 6
Reporter: Doron Cohen
Assignee: Doron Cohen
 Fix For: 4.0

 Attachments: 5-java-dumps.zip


 Running 'ant test' from trunk on XP in a Cygwin shell hangs.
 There was no progress in the console for a long time, so i stopped the 
 program.
 Before stopping it, created 5 consecutive thread dumps to see where the code 
 is.
 It is not clear what is going on - does not seem like a Lucene code I think 
 but not sure.
 Opening this issue to keep an eye on this - I will try with other JDKs to see 
 if this is persistent.
 Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
 LUCENE-2977 - I think the changes in these issues are related but will repeat 
 the tests without these changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name

2011-03-24 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-2977.
-

   Resolution: Fixed
Lucene Fields:   (was: [New])

Committed:
- r084929 - trunk
- r1084941 - 3x

Thanks Shai!

 WriteLineDocTask should write gzip/bzip2/txt according to the extension of 
 specified output file name
 -

 Key: LUCENE-2977
 URL: https://issues.apache.org/jira/browse/LUCENE-2977
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2977.patch, LUCENE-2977.patch


 Since the readers behave this way it would be nice and handy if also this 
 line writer would.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   >