date:20110610

[jira] [Commented] (SOLR-1804) Upgrade Carrot2 to 3.2.0

2011-06-10 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047047#comment-13047047
 ] 

Dawid Weiss commented on SOLR-1804:
---

I am not considering changes to my code to be a personal attack, so no worries. 
And, since I'm a former assembly guy, straightforward code flow is what I 
understand and relate to -- I use Guava from time to time to catch up with the 
young and hippie buzzwords (closures, abstraction, you know the like :).

And seriously, our 99% of Guava use is for instantiating lists and maps without 
providing generics (something that the new Java release will be able to infer 
from the code anyway -- at least if I'm reading commit logs to the javac 
compiler right). The remaining 1% is for cascading list filters and sorting 
orders (which, after you get used to them a little bit, work out and read 
pretty nicely).

I'm not by all means saying we should switch to Guava; I used it because I saw 
it was in global lib/ directory (and this happened after the patch to Carrot2 I 
believe).

> Upgrade Carrot2 to 3.2.0
> 
>
> Key: SOLR-1804
> URL: https://issues.apache.org/jira/browse/SOLR-1804
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Clustering
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-1804-carrot2-3.4.0-dev-trunk.patch, 
> SOLR-1804-carrot2-3.4.0-dev.patch, SOLR-1804-carrot2-3.4.0-libs.zip, 
> SOLR-1804.patch, carrot2-core-3.4.0-jdk1.5.jar
>
>
> http://project.carrot2.org/release-3.2.0-notes.html
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2529) DIH update trouble with sql field name "pk"

2011-06-10 Thread Thomas Gambier (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Gambier resolved SOLR-2529.
--

   Resolution: Fixed
Fix Version/s: 4.0

I've made some test on unreleased version 4.0 and it works well with a 
deltaQuery like :
"SELECT pk AS id FROM..."
Where "id" is the name of my primary key in DIH config.

TY

> DIH update trouble with sql field name "pk"
> ---
>
> Key: SOLR-2529
> URL: https://issues.apache.org/jira/browse/SOLR-2529
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 3.1, 3.2
> Environment: Debian Lenny, JRE 6
>Reporter: Thomas Gambier
>Priority: Blocker
> Fix For: 4.0
>
>
> We are unable to use the DIH when database columnName primary key is named 
> "pk".
> The reported solr error is :
> "deltaQuery has no column to resolve to declared primary key pk='pk'"
> We have made some investigations and found that the DIH have a mistake when 
> it's looking for the primary key between row's columns list.
> private String findMatchingPkColumn(String pk, Map row) {
> if (row.containsKey(pk))
>   throw new IllegalArgumentException(
> String.format("deltaQuery returned a row with null for primary key %s", 
> pk));
> String resolvedPk = null;
> for (String columnName : row.keySet()) {
>   if (columnName.endsWith("." + pk) || pk.endsWith("." + columnName)) {
> if (resolvedPk != null)
>   throw new IllegalArgumentException(
> String.format(
>   "deltaQuery has more than one column (%s and %s) that might resolve 
> to declared primary key pk='%s'",
>   resolvedPk, columnName, pk));
>   resolvedPk = columnName;
> }
>   }
>   if (resolvedPk == null)
> throw new IllegalArgumentException(
>   String.format("deltaQuery has no column to resolve to declared primary 
> key pk='%s'", pk));
>   LOG.info(String.format("Resolving deltaQuery column '%s' to match entity's 
> declared pk '%s'", resolvedPk, pk));
>   return resolvedPk;
> }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8743 - Failure

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8743/

2 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-122: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test2468724706tmp/_b.fnm
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-122:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test2468724706tmp/_b.fnm
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test2468724706tmp/_b.fnm
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  org.apache.lucene.index.TestIndexWriterReader.testMergeWarmer

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test3290832921tmp/_8e_3.doc
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test3290832921tmp/_8e_3.doc
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:416)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:375)
at 
org.apache.lucene.index.codecs.mockintblock.MockFixedIntBlockCodec$MockIntFactory.createOutput(MockFixedIntBlockCodec.java:101)
at 
org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:121)
at 
org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:107)
at 
org.apache.lucene.index.codecs.mockintblock.MockFixedIntBlockCodec.fieldsConsumer(MockFixedIntBlockCodec.java:125)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:67)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:55)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:58)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:80)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:75)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:457)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:421)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:313)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:385)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1233)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1214)
at 
org.apache.lucene.index.TestIndexWriterReader.testMergeWarmer(TestIndexWriterReader.java:663)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)




Build Log (for compile errors):
[...truncated 3393 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8748 - Failure

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8748/

No tests ran.

Build Log (for compile errors):
[...truncated 4186 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Ivan Dimitrov Vasilev (JIRA)

The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a 
non correct index
-

 Key: LUCENE-3188
 URL: https://issues.apache.org/jira/browse/LUCENE-3188
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/other
Affects Versions: 3.2, 3.0
 Environment: Bug is present for all environments.
I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
Reporter: Ivan Dimitrov Vasilev
Priority: Minor
 Fix For: 3.2, 3.0


When using the method IndexSplitter.split(File destDir, String[] segs) from the 
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it 
creates an index with segments descriptor file with wrong data. Namely wrong is 
the number representing the name of segment that would be created next in this 
index.
If some of the segments of the index already has this name this results either 
to impossibility to create new segment or in crating of an corrupted segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Ivan Dimitrov Vasilev (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Dimitrov Vasilev updated LUCENE-3188:
--

Attachment: IndexSplitter.java
TestIndexSplitter.java

The attached file TestIndexSplitter.java contains test that shows the bug (when 
running IndexSplitter from contrib) and the fix (when running IndexSplitter 
that is attached here as a patch)

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Priority: Minor
> Fix For: 3.0, 3.2
>
> Attachments: IndexSplitter.java, TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3189) TestIndexWriter.testThreadInterruptDeadlock failed (can't reproduce)

2011-06-10 Thread selckin (JIRA)

TestIndexWriter.testThreadInterruptDeadlock failed (can't reproduce)


 Key: LUCENE-3189
 URL: https://issues.apache.org/jira/browse/LUCENE-3189
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin


trunk: r1134163 

ran it a few times with tests.iter=200 and couldn't reproduce, but i believe 
you like an issue anyway.

{code}
[junit] Testsuite: org.apache.lucene.index.TestIndexWriter
[junit] Testcase: 
testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED
[junit]
[junit] junit.framework.AssertionFailedError:
[junit] at 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:1203)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
[junit]
[junit]
[junit] Tests run: 40, Failures: 1, Errors: 0, Time elapsed: 23.79 sec
[junit]
[junit] - Standard Output ---
[junit] CheckIndex failed
[junit] ERROR: could not read any segments file in directory
[junit] java.io.FileNotFoundException: segments_2w
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:407)
[junit] at 
org.apache.lucene.index.codecs.DefaultSegmentInfosReader.openInput(DefaultSegmentInfosReader.java:112)
[junit] at 
org.apache.lucene.index.codecs.DefaultSegmentInfosReader.read(DefaultSegmentInfosReader.java:45)
[junit] at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:257)
[junit] at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:287)
[junit] at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:698)
[junit] at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533)
[junit] at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:283)
[junit] at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:311)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
[junit] at 
org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:1154)
[junit]
[junit] CheckIndex FAILED: unexpected exception
[junit] java.lang.RuntimeException: CheckIndex failed
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144)
[junit] at 
org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:1154)
[junit] IndexReader.open FAILED: unexpected exception
[junit] java.io.FileNotFoundException: segments_2w
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:407)
[junit] at 
org.apache.lucene.index.codecs.DefaultSegmentInfosReader.openInput(DefaultSegmentInfosReader.java:112)
[junit] at 
org.apache.lucene.index.codecs.DefaultSegmentInfosReader.read(DefaultSegmentInfosReader.java:45)
[junit] at 
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:257)
[junit] at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:88)
[junit] at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:698)
[junit] at 
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84)
[junit] at 
org.apache.lucene.index.IndexReader.open(IndexReader.java:500)
[junit] at 
org.apache.lucene.index.IndexReader.open(IndexReader.java:293)
[junit] at 
org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:1161)

[junit] -  ---
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testThreadInterruptDeadlock 
-Dtests.seed=6733070832417768606:3130345095020099096
[junit] NOTE: test params are: codec=RandomCodecProvider: {=MockRandom, 
f6=SimpleText, f7=MockRandom, f8=MockSep, f9=Standard, f1=SimpleText, 
f0=Standard, f3=Standard, f2=MockSep, f5=Pulsing(freqCutoff=12),
 f4=MockFixedIntBlock(blockSize=552), c=MockVariableIntBlock(baseBlockSize=43), 
d9=MockVariableIntBlock(baseBlockSize=43), d8=MockRandom, d5=MockSep, 
d4=Pulsing(freqCutoff=12), d7=MockFixedIntBlock(blockSize=55
2), d6=MockVariableIntBlock(baseBlockSize=43), d25=MockSep, 
d0=MockVariableIntBlock(baseBlockSize=43), 
c29=MockFixedIntBlock(blockSize=552), d24=Pulsing(freqCutoff=12), 
d1=MockFixedIntBlock(blockSize=552), c28=
Standard, d23=SimpleText,

[jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene

2011-06-10 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-152:


Attachment: LUCENE-152_optimization.patch

very minor optimization to avoid a char[] allocation per stemmed word.

> [PATCH] KStem for Lucene
> 
>
> Key: LUCENE-152
> URL: https://issues.apache.org/jira/browse/LUCENE-152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
> Environment: Operating System: other
> Platform: Other
>Reporter: Otis Gospodnetic
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-152.patch, LUCENE-152_optimization.patch, 
> kstemTestData.zip, lucid_kstem.tgz
>
>
> September 10th 2003 contributionn from "Sergio Guzman-Lara" 
> 
> Original email:
> Hi all,
>   I have ported the kstem stemmer to Java and incorporated it to 
> Lucene. You can get the source code (Kstem.jar) from the following website:
> http://ciir.cs.umass.edu/downloads/
>   Just click on "KStem Java Implementation" (you will need to register 
> your e-mail, for free of course, with the CIIR --Center for Intelligent 
> Information Retrieval, UMass -- and get an access code).
> Content of Kstem.jar:
> java/org/apache/lucene/analysis/KStemData1.java
> java/org/apache/lucene/analysis/KStemData2.java
> java/org/apache/lucene/analysis/KStemData3.java
> java/org/apache/lucene/analysis/KStemData4.java
> java/org/apache/lucene/analysis/KStemData5.java
> java/org/apache/lucene/analysis/KStemData6.java
> java/org/apache/lucene/analysis/KStemData7.java
> java/org/apache/lucene/analysis/KStemData8.java
> java/org/apache/lucene/analysis/KStemFilter.java
> java/org/apache/lucene/analysis/KStemmer.java
> KStemData1.java, ..., KStemData8.java   Contain several lists of words 
> used by Kstem
> KStemmer.java  Implements the Kstem algorithm 
> KStemFilter.java Extends TokenFilter applying Kstem
> To compile
> unjar the file Kstem.jar to Lucene's "src" directory, and compile it 
> there. 
> What is Kstem?
>   A stemmer designed by Bob Krovetz (for more information see 
> http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
> Copyright issues
>   This is open source. The actual license agreement is included at the 
> top of every source file.
>  Any comments/questions/suggestions are welcome,
>   Sergio Guzman-Lara
>   Senior Research Fellow
>   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #148: POMs out of sync

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/148/

No tests ran.

Build Log (for compile errors):
[...truncated 8340 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene

2011-06-10 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-152:
---

Attachment: LUCENE-152_alt.patch

why create strings either?

> [PATCH] KStem for Lucene
> 
>
> Key: LUCENE-152
> URL: https://issues.apache.org/jira/browse/LUCENE-152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
> Environment: Operating System: other
> Platform: Other
>Reporter: Otis Gospodnetic
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, 
> LUCENE-152_optimization.patch, kstemTestData.zip, lucid_kstem.tgz
>
>
> September 10th 2003 contributionn from "Sergio Guzman-Lara" 
> 
> Original email:
> Hi all,
>   I have ported the kstem stemmer to Java and incorporated it to 
> Lucene. You can get the source code (Kstem.jar) from the following website:
> http://ciir.cs.umass.edu/downloads/
>   Just click on "KStem Java Implementation" (you will need to register 
> your e-mail, for free of course, with the CIIR --Center for Intelligent 
> Information Retrieval, UMass -- and get an access code).
> Content of Kstem.jar:
> java/org/apache/lucene/analysis/KStemData1.java
> java/org/apache/lucene/analysis/KStemData2.java
> java/org/apache/lucene/analysis/KStemData3.java
> java/org/apache/lucene/analysis/KStemData4.java
> java/org/apache/lucene/analysis/KStemData5.java
> java/org/apache/lucene/analysis/KStemData6.java
> java/org/apache/lucene/analysis/KStemData7.java
> java/org/apache/lucene/analysis/KStemData8.java
> java/org/apache/lucene/analysis/KStemFilter.java
> java/org/apache/lucene/analysis/KStemmer.java
> KStemData1.java, ..., KStemData8.java   Contain several lists of words 
> used by Kstem
> KStemmer.java  Implements the Kstem algorithm 
> KStemFilter.java Extends TokenFilter applying Kstem
> To compile
> unjar the file Kstem.jar to Lucene's "src" directory, and compile it 
> there. 
> What is Kstem?
>   A stemmer designed by Bob Krovetz (for more information see 
> http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
> Copyright issues
>   This is open source. The actual license agreement is included at the 
> top of every source file.
>  Any comments/questions/suggestions are welcome,
>   Sergio Guzman-Lara
>   Senior Research Fellow
>   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047160#comment-13047160
 ] 

Simon Willnauer commented on SOLR-1431:
---

I think this patch looks good, mark I think we should commit this soon.

simon

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-1431:
-

Assignee: Mark Miller  (was: Noble Paul)

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1431:
--

Fix Version/s: 4.0

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047164#comment-13047164
 ] 

Steven Rowe commented on LUCENE-3188:
-

Hi Ivan,

Your submissions should be in the form of a patch (for an explanation see e.g. 
http://en.wikipedia.org/wiki/Patch_%28computing%29).  To generate a patch, 
after you make modifications in a locally checked-out Subversion working copy, 
use the shell command "svn diff" at the top level, and redirect its output to a 
file named for the issue you want to attach to, with extension ".patch", e.g.: 
{{svn diff > ../LUCENE-3188.patch}}.

Also, when you attached the two files to this issue, you did not click on the 
radio button next to the text "Grant license to ASF for inclusion in ASF works 
(as per the Apache License §5)".  You must do this for the Lucene project to be 
able to use code you contribute.  When you attach your patch, please click on 
the radio button indicating you grant license to the ASF.  (I haven't looked at 
your code yet for this reason.)

Steve

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Priority: Minor
> Fix For: 3.0, 3.2
>
> Attachments: IndexSplitter.java, TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1260 - Failure

2011-06-10 Thread Apache Jenkins Server

Build: 
https://builds.apache.org/job/Lucene-Solr-tests-only-docvalues-branch/1260/

2 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-175: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/test/4/test5458054592tmp/_d_3.pos
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-175:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/test/4/test5458054592tmp/_d_3.pos
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/test/4/test5458054592tmp/_d_3.pos
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  
org.apache.lucene.store.TestLockFactory.testStressLocksNativeFSLockFactory

Error Message:
IndexSearcher hit unexpected exceptions

Stack Trace:
junit.framework.AssertionFailedError: IndexSearcher hit unexpected exceptions
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
at 
org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.java:165)
at 
org.apache.lucene.store.TestLockFactory.testStressLocksNativeFSLockFactory(TestLockFactory.java:144)




Build Log (for compile errors):
[...truncated 3361 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene

2011-06-10 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-152:


Attachment: LUCENE-152_optimization.patch

bq. why create strings either?

Good point.  I assume you mean something like this patch?


> [PATCH] KStem for Lucene
> 
>
> Key: LUCENE-152
> URL: https://issues.apache.org/jira/browse/LUCENE-152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
> Environment: Operating System: other
> Platform: Other
>Reporter: Otis Gospodnetic
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, 
> LUCENE-152_optimization.patch, LUCENE-152_optimization.patch, 
> kstemTestData.zip, lucid_kstem.tgz
>
>
> September 10th 2003 contributionn from "Sergio Guzman-Lara" 
> 
> Original email:
> Hi all,
>   I have ported the kstem stemmer to Java and incorporated it to 
> Lucene. You can get the source code (Kstem.jar) from the following website:
> http://ciir.cs.umass.edu/downloads/
>   Just click on "KStem Java Implementation" (you will need to register 
> your e-mail, for free of course, with the CIIR --Center for Intelligent 
> Information Retrieval, UMass -- and get an access code).
> Content of Kstem.jar:
> java/org/apache/lucene/analysis/KStemData1.java
> java/org/apache/lucene/analysis/KStemData2.java
> java/org/apache/lucene/analysis/KStemData3.java
> java/org/apache/lucene/analysis/KStemData4.java
> java/org/apache/lucene/analysis/KStemData5.java
> java/org/apache/lucene/analysis/KStemData6.java
> java/org/apache/lucene/analysis/KStemData7.java
> java/org/apache/lucene/analysis/KStemData8.java
> java/org/apache/lucene/analysis/KStemFilter.java
> java/org/apache/lucene/analysis/KStemmer.java
> KStemData1.java, ..., KStemData8.java   Contain several lists of words 
> used by Kstem
> KStemmer.java  Implements the Kstem algorithm 
> KStemFilter.java Extends TokenFilter applying Kstem
> To compile
> unjar the file Kstem.jar to Lucene's "src" directory, and compile it 
> there. 
> What is Kstem?
>   A stemmer designed by Bob Krovetz (for more information see 
> http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
> Copyright issues
>   This is open source. The actual license agreement is included at the 
> top of every source file.
>  Any comments/questions/suggestions are welcome,
>   Sergio Guzman-Lara
>   Senior Research Fellow
>   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene

2011-06-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047168#comment-13047168
 ] 

Robert Muir commented on LUCENE-152:


it looks good... i think its the same as the patch i uploaded (_alt.patch)... 
only i used the .append syntactic sugar

> [PATCH] KStem for Lucene
> 
>
> Key: LUCENE-152
> URL: https://issues.apache.org/jira/browse/LUCENE-152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
> Environment: Operating System: other
> Platform: Other
>Reporter: Otis Gospodnetic
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, 
> LUCENE-152_optimization.patch, LUCENE-152_optimization.patch, 
> kstemTestData.zip, lucid_kstem.tgz
>
>
> September 10th 2003 contributionn from "Sergio Guzman-Lara" 
> 
> Original email:
> Hi all,
>   I have ported the kstem stemmer to Java and incorporated it to 
> Lucene. You can get the source code (Kstem.jar) from the following website:
> http://ciir.cs.umass.edu/downloads/
>   Just click on "KStem Java Implementation" (you will need to register 
> your e-mail, for free of course, with the CIIR --Center for Intelligent 
> Information Retrieval, UMass -- and get an access code).
> Content of Kstem.jar:
> java/org/apache/lucene/analysis/KStemData1.java
> java/org/apache/lucene/analysis/KStemData2.java
> java/org/apache/lucene/analysis/KStemData3.java
> java/org/apache/lucene/analysis/KStemData4.java
> java/org/apache/lucene/analysis/KStemData5.java
> java/org/apache/lucene/analysis/KStemData6.java
> java/org/apache/lucene/analysis/KStemData7.java
> java/org/apache/lucene/analysis/KStemData8.java
> java/org/apache/lucene/analysis/KStemFilter.java
> java/org/apache/lucene/analysis/KStemmer.java
> KStemData1.java, ..., KStemData8.java   Contain several lists of words 
> used by Kstem
> KStemmer.java  Implements the Kstem algorithm 
> KStemFilter.java Extends TokenFilter applying Kstem
> To compile
> unjar the file Kstem.jar to Lucene's "src" directory, and compile it 
> there. 
> What is Kstem?
>   A stemmer designed by Bob Krovetz (for more information see 
> http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
> Copyright issues
>   This is open source. The actual license agreement is included at the 
> top of every source file.
>  Any comments/questions/suggestions are welcome,
>   Sergio Guzman-Lara
>   Senior Research Fellow
>   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene

2011-06-10 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047171#comment-13047171
 ] 

Yonik Seeley commented on LUCENE-152:
-

bq. i think its the same as the patch i uploaded

D'oh!  I hate that the "All" tab in JIRA isn't selected by default (and hence 
one doesn't see stuff like file uploads ;-)

> [PATCH] KStem for Lucene
> 
>
> Key: LUCENE-152
> URL: https://issues.apache.org/jira/browse/LUCENE-152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
> Environment: Operating System: other
> Platform: Other
>Reporter: Otis Gospodnetic
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, 
> LUCENE-152_optimization.patch, LUCENE-152_optimization.patch, 
> kstemTestData.zip, lucid_kstem.tgz
>
>
> September 10th 2003 contributionn from "Sergio Guzman-Lara" 
> 
> Original email:
> Hi all,
>   I have ported the kstem stemmer to Java and incorporated it to 
> Lucene. You can get the source code (Kstem.jar) from the following website:
> http://ciir.cs.umass.edu/downloads/
>   Just click on "KStem Java Implementation" (you will need to register 
> your e-mail, for free of course, with the CIIR --Center for Intelligent 
> Information Retrieval, UMass -- and get an access code).
> Content of Kstem.jar:
> java/org/apache/lucene/analysis/KStemData1.java
> java/org/apache/lucene/analysis/KStemData2.java
> java/org/apache/lucene/analysis/KStemData3.java
> java/org/apache/lucene/analysis/KStemData4.java
> java/org/apache/lucene/analysis/KStemData5.java
> java/org/apache/lucene/analysis/KStemData6.java
> java/org/apache/lucene/analysis/KStemData7.java
> java/org/apache/lucene/analysis/KStemData8.java
> java/org/apache/lucene/analysis/KStemFilter.java
> java/org/apache/lucene/analysis/KStemmer.java
> KStemData1.java, ..., KStemData8.java   Contain several lists of words 
> used by Kstem
> KStemmer.java  Implements the Kstem algorithm 
> KStemFilter.java Extends TokenFilter applying Kstem
> To compile
> unjar the file Kstem.jar to Lucene's "src" directory, and compile it 
> there. 
> What is Kstem?
>   A stemmer designed by Bob Krovetz (for more information see 
> http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). 
> Copyright issues
>   This is open source. The actual license agreement is included at the 
> top of every source file.
>  Any comments/questions/suggestions are welcome,
>   Sergio Guzman-Lara
>   Senior Research Fellow
>   CIIR UMass

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3108) Land DocValues on trunk

2011-06-10 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3108.
-

Resolution: Fixed

Reintegrated, Tested, Committed to trunk in revision  1134311

thanks guys, its in eventually!

> Land DocValues on trunk
> ---
>
> Key: LUCENE-3108
> URL: https://issues.apache.org/jira/browse/LUCENE-3108
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/index, core/search, core/store
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, 
> LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch
>
>
> Its time to move another feature from branch to trunk. I want to start this 
> process now while still a couple of issues remain on the branch. Currently I 
> am down to a single nocommit (javadocs on DocValues.java) and a couple of 
> testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
> but I think those are not worth separate issues so we can resolve them as we 
> go. 
> The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
> this process here IMO, we can fix them once we are on trunk. 
> Here is a quick feature overview of what has been implemented:
>  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
> Bytes (fixed / variable size each in sorted, straight and deref variations)
>  * Integration into Flex-API, Codec provides a 
> PerDocConsumer->DocValuesConsumer (write) / PerDocValues->DocValues (read) 
>  * By-Default enabled in all codecs except of PreFlex
>  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
> MultiPerDocValues if on DirReader etc.
>  * Integration into IndexWriter, FieldInfos etc.
>  * Random-testing enabled via RandomIW - injecting random DocValues into 
> documents
>  * Basic checks in CheckIndex (which runs after each test)
>  * FieldComparator for int and float variants (Sorting, currently directly 
> integrated into SortField, this might go into a separate DocValuesSortField 
> eventually)
>  * Extended TestSort for DocValues
>  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
> sequential access) -> Source.java / DocValuesEnum.java
>  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
> loaded into RAM only once and freed once IR is closed) -> SourceCache.java
>  
> PS: Currently the RAM resident API is named Source (Source.java) which seems 
> too generic. I think we should rename it into RamDocValues or something like 
> that, suggestion welcome!   
> Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047177#comment-13047177
 ] 

Mark Miller commented on SOLR-1431:
---

I've got to look a little closer here - there was a conflict on trunk - naively 
just fixed it to compile and now I'm getting errors that are perhaps ip6 
related? Need to investigate.
{quote}
 java.lang.IllegalArgumentException: Invalid uri 
'http://[::1]:2/solr|localhost:53574/solr/select': escaped absolute path 
not valid 
{quote}

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Commented] (LUCENENET-423) QueryParser differences between Java and .NET

2011-06-10 Thread Digy (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047179#comment-13047179
 ] 

Digy commented on LUCENENET-423:


You are right, I used a different date string.

.Net seems to parse the date-strings better.
I would leave it as is.

DIGY

> QueryParser differences between Java and .NET
> -
>
> Key: LUCENENET-423
> URL: https://issues.apache.org/jira/browse/LUCENENET-423
> Project: Lucene.Net
>  Issue Type: Bug
>Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g
>Reporter: Christopher Currens
>
> When trying to do a RangeQuery that uses dates in a certain format, .NET 
> behaves differently from its Java counterpart.  The code is the same between 
> them, but as far as I can tell, it appears that it is a difference in the way 
> Java parses dates vs how .NET parses dates.  To reproduce:
> {code:java}
> var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, 
> "FullText", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
> var query = queryParser.Parse("Field:[2001-01-17 TO 2001-01-20]");
> {code}
> You'll notice that query looks like the old DateField format (eg 
> "0g1d64542").  If you do the same query in Java (or Luke), you'll notice the 
> query gets parsed as if it were a RangeQuery of string.  AFAIK, Java cannot 
> parse a string formatted in that way.  If you change the string to use / 
> instead of - in the java, you'll get one that uses DateResolutions and 
> DateTools.DateToString().
> It seems an appropriate fix for this, if we wanted to keep this behavior 
> similar to Java, would be to write our own DateTime parser that behaved the 
> same way to Java's parser.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047185#comment-13047185
 ] 

Mark Miller commented on SOLR-1431:
---

So my bad - looks like this patch is for 3.x - need to do it for 4 and port 
back.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-10 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I think I have successfully threaded IOContext to the codecs and index package 
wherever required. There might be instances where I have used Context.Default 
wrongly.

I'll begin adding documentation.

In NRTCachingDir.doCacheWrite method where IOContext is used if the context has 
it's OnceMergeInfo field null might lead to a bug ? Should cases like those 
added to the docs ?

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Heads Up - Index File Format Change on Trunk

2011-06-10 Thread Simon Willnauer

Hey folks,

I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a
byte to FieldInfo.
If you are running on trunk you must / should re-index any trunk
indexes once you update to the latest trunk.

its likely if you open up old trunk (4.0) indexes, you will get an
exception related to Read Past EOF.

Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-06-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047208#comment-13047208
 ] 

Simon Willnauer commented on LUCENE-2878:
-

it doesn't seem that this issue is worth staying so tight coupled to the 
bulkpostings branch. I originally did this on bulk postings since it had 
support for positions in termscorer (new bulk API) where on trunk we don't have 
positions there at all. Yet, I think bulk postings should be sorted out 
separately and we should rather move this over to trunk. On trunk we can get 
rid of all the low level bulk API hacks in the patch. the only thing that is 
missing here is a TermScorer that can score based on positions / payloads. I 
think since we have ScoreContext and how this works here in the patch we can 
simply implement a TermScorer that works on DocsEnumAndPositions and swap it in 
once positions are requested. 

I think I can move this over to trunk soon.



> Allow Scorer to expose positions and payloads aka. nuke spans 
> --
>
> Key: LUCENE-2878
> URL: https://issues.apache.org/jira/browse/LUCENE-2878
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: Bulk Postings branch
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
> LUCENE-2878.patch
>
>
> Currently we have two somewhat separate types of queries, the one which can 
> make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
> doesn't really do scoring comparable to what other queries do and at the end 
> of the day they are duplicating lot of code all over lucene. Span*Queries are 
> also limited to other Span*Query instances such that you can not use a 
> TermQuery or a BooleanQuery with SpanNear or anthing like that. 
> Beside of the Span*Query limitation other queries lacking a quiet interesting 
> feature since they can not score based on term proximity since scores doesn't 
> expose any positional information. All those problems bugged me for a while 
> now so I stared working on that using the bulkpostings API. I would have done 
> that first cut on trunk but TermScorer is working on BlockReader that do not 
> expose positions while the one in this branch does. I started adding a new 
> Positions class which users can pull from a scorer, to prevent unnecessary 
> positions enums I added ScorerContext#needsPositions and eventually 
> Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
> currently only TermQuery / TermScorer implements this API and other simply 
> return null instead. 
> To show that the API really works and our BulkPostings work fine too with 
> positions I cut over TermSpanQuery to use a TermScorer under the hood and 
> nuked TermSpans entirely. A nice sideeffect of this was that the Position 
> BulkReading implementation got some exercise which now :) work all with 
> positions while Payloads for bulkreading are kind of experimental in the 
> patch and those only work with Standard codec. 
> So all spans now work on top of TermScorer ( I truly hate spans since today ) 
> including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
> to implement the other codecs yet since I want to get feedback on the API and 
> on this first cut before I go one with it. I will upload the corresponding 
> patch in a minute. 
> I also had to cut over SpanQuery.getSpans(IR) to 
> SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
> first but after that pain today I need a break first :).
> The patch passes all core tests 
> (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
> look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #145: POMs out of sync

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/145/

No tests ran.

Build Log (for compile errors):
[...truncated 7512 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2582) Use uniqueKey for error log in UIMAUpdateRequestProcessor

2011-06-10 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2582:
-

 Priority: Minor  (was: Major)
Fix Version/s: 4.0
 Assignee: Koji Sekiguchi
   Issue Type: Improvement  (was: Bug)
  Summary: Use uniqueKey for error log in UIMAUpdateRequestProcessor   
(was: UIMAUpdateRequestProcessor error handling with small texts)

Changed the issue type to improvement because the "bug part" of this issue is 
duplicate of SOLR-2579, which has been fixed.

> Use uniqueKey for error log in UIMAUpdateRequestProcessor 
> --
>
> Key: SOLR-2582
> URL: https://issues.apache.org/jira/browse/SOLR-2582
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.2
>Reporter: Tommaso Teofili
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.3, 4.0
>
>
> In UIMAUpdateRequestProcessor the catch block in processAdd() method can have 
> a StringIndexOutOfBoundsException while composing the error message if the 
> logging field is not set and the text being processed is shorter than 100 
> chars (...append(text.substring(0, 100))...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2582) Use uniqueKey for error log in UIMAUpdateRequestProcessor

2011-06-10 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2582:
-

Attachment: SOLR-2582.patch

> Use uniqueKey for error log in UIMAUpdateRequestProcessor 
> --
>
> Key: SOLR-2582
> URL: https://issues.apache.org/jira/browse/SOLR-2582
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.2
>Reporter: Tommaso Teofili
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: SOLR-2582.patch
>
>
> In UIMAUpdateRequestProcessor the catch block in processAdd() method can have 
> a StringIndexOutOfBoundsException while composing the error message if the 
> logging field is not set and the text being processed is shorter than 100 
> chars (...append(text.substring(0, 100))...).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-10 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2793:
--

Attachment: LUCENE-2793.patch

I had messed up the patch using eclipse. This should be ok.

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047234#comment-13047234
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Varun, your patch doesn't apply cleanly to my latest trunk. i think you should 
update your local copy again!

I have trouble to understand how you use MergeInfo etc. I figure there might be 
a misunderstanding so here is what I had in mind roughly: 

{code}

Index: lucene/src/java/org/apache/lucene/index/MergePolicy.java
===
--- lucene/src/java/org/apache/lucene/index/MergePolicy.java(revision 
1134335)
+++ lucene/src/java/org/apache/lucene/index/MergePolicy.java(working copy)
@@ -64,7 +64,7 @@
*  subset of segments to be merged as well as whether the
*  new segment should use the compound file format. */
 
-  public static class OneMerge {
+  public static class OneMerge extends MergeInfo {
 
 SegmentInfo info;   // used by IndexWriter
 boolean optimize;   // used by IndexWriter
@@ -72,25 +72,26 @@
 long mergeGen;  // used by IndexWriter
 boolean isExternal; // used by IndexWriter
 int maxNumSegmentsOptimize; // used by IndexWriter
-public long estimatedMergeBytes;   // used by IndexWriter
 List readers;// used by IndexWriter
 List readerClones;   // used by IndexWriter
-public final List segments;
-public final int totalDocCount;
+public final List segments = new ArrayList();
 boolean aborted;
 Throwable error;
 boolean paused;
 
 public OneMerge(List segments) {
+  super(getSegments(segments));
+}
+
+private static int getSegments(List segments) {
   if (0 == segments.size())
 throw new RuntimeException("segments must include at least one 
segment");
   // clone the list, as the in list may be based off original SegmentInfos 
and may be modified
-  this.segments = new ArrayList(segments);
   int count = 0;
   for(SegmentInfo info : segments) {
 count += info.docCount;
   }
-  totalDocCount = count;
+  return count;
 }
 
 /** Record that an exception occurred while executing
{code}

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Assignee: Varun Thacker
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
> LUCENE-2793.patch, LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047245#comment-13047245
 ] 

Steven Rowe commented on LUCENE-3188:
-

This is a better Wikipedia article on (source code) patching than the one I 
gave above: http://en.wikipedia.org/wiki/Patch_%28Unix%29

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Priority: Minor
> Fix For: 3.0, 3.2
>
> Attachments: IndexSplitter.java, TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

> but if instead you just run the query against your subs (rewrite()ing
> etc locally), and you merge search results... then you shouldn't see
> these issues

Maybe we should bring back the 'merge results' part of multi searcher
without the query rewrite.

On Thu, Jun 9, 2011 at 6:43 PM, Robert Muir  wrote:
> See https://issues.apache.org/jira/browse/LUCENE-2756 for an example
> test case (this sort of thing was reported by users several times)
>
> in this example, the problem is how the queries are executed, the
> thing would rewrite against the individual subs, and then call
> Query.combine() to form a "super query", and then run this against all
> the subs.
>
> but if instead you just run the query against your subs (rewrite()ing
> etc locally), and you merge search results... then you shouldn't see
> these issues.
>
> On Thu, Jun 9, 2011 at 8:03 PM, Jason Rutherglen
>  wrote:
>>> Yes, and rightfully so - it didn't handle properly some query types, so you
>>> would actually get wrong results.
>>
>> That's bad!
>>
>>> "roll your own (and contribute it back!)" if you are more advanced ;)
>>
>> Wouldn't "roll your own" basically mean resurrecting the previous
>> implementation of MultiSearcher?  Ie, what would be different?
>>
>> On Thu, Jun 9, 2011 at 4:07 PM, Andrzej Bialecki  wrote:
>>> On 6/10/11 12:10 AM, Jason Rutherglen wrote:

 Right, if that's not around, one needs to use multi searcher, that's
 gone too?
>>>
>>> Yes, and rightfully so - it didn't handle properly some query types, so you
>>> would actually get wrong results.
>>>
>>> For now the answer is "use Solr" if you are less advanced, or "roll your own
>>> (and contribute it back!)" if you are more advanced ;)
>>>
>>> --
>>> Best regards,
>>> Andrzej Bialecki     <><
>>>  ___. ___ ___ ___ _ _   __
>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>> http://www.sigram.com  Contact: info at sigram dot com
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 12:04 PM, Jason Rutherglen
 wrote:
>
> Maybe we should bring back the 'merge results' part of multi searcher
> without the query rewrite.
>

and how exactly? If this was easy, we wouldn't have to do
https://issues.apache.org/jira/browse/LUCENE-2837

I looked at this stupid multisearcher problem for way too much time
myself and I totally agreed the only proper bugfix was the "nuclear"
option (ridding of multisearcher entirely).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] Score(collector) called for each subReader - but not what I need

2011-06-10 Thread Robert Stewart

As I previously tried to explain, I have custom query for some pre-cached 
terms, which I load into RAM in efficient compressed form.  I need this for 
faster searching and also for much faster faceting.  So what I do is process 
incoming query and replace certain sub-queries with my own "CachedTermQuery" 
objects, which extend Query.  Since these are not per-segment, I only want 
scorer.Score(collector) called once, not once for each segment in my index.  
Essentially what happens now if I have a search is it collects the same 
documents N times, 1 time for each segment.  Is there anyway to combine 
different Scorers/Collectors such that I can control when it enumerates 
collection by multiple sub-readers, and when not to?  This all worked in 
previous version of Lucene because enumerating sub-indexes (segments) was 
pushed to a lower level inside Lucene API and not it is elevated to a higher 
level.

Thanks
Bob


On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote:

> I found the problem.  The problem is that I have a custom "query optimizer", 
> and that replaces certain TermQuery's within a Boolean query with a custom 
> Query and this query has its own weight/scorer that retrieves matching 
> documents from an in-memory cache (and that is not Lucene backed).  But it 
> looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper 
> which assumes Collect() needs called for multiple segments - so it is adding 
> a start offset to the doc ID that comes from my custom query implementation.  
> I looked at the new Collector class and it seems it works the same way 
> (assumes it needs to set the next index reader with some offset).  How can I 
> make my custom query work with the new API (so that there is basically a 
> single "segment" in RAM that my query uses, but still other query clauses in 
> same boolean query use multiple lucene segments)?  I am sure that is not 
> clear and will try to provide more detail soon.
> 
> Thanks
> Bob
> 
> 
> On Jun 9, 2011, at 1:48 PM, Digy wrote:
> 
>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the
>> problem.
>> DIGY
>> 
>> -Original Message-
>> From: Robert Stewart [mailto:robert_stew...@epam.com] 
>> Sent: Thursday, June 09, 2011 8:40 PM
>> To: 
>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>> 
>> I tried converting index using IndexWriter as follows:
>> 
>> Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+"_2.9",
>> new Lucene.Net.Analysis.KeywordAnalyzer());
>> 
>> writer.SetMaxBufferedDocs(2);
>> writer.SetMaxMergeDocs(100);
>> writer.SetMergeFactor(2);
>> 
>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
>> 
>> writer.Commit();
>> 
>> 
>> That seems to work (I get what looks like a valid index directory at least).
>> 
>> But still when I run some tests using IndexSearcher I get the same problem
>> (I get documents in Collect() which are larger than IndexReader.MaxDoc()).
>> Any idea what the problem could be?  
>> 
>> BTW, this is a problem because I lookup some fields (date ranges, etc.) in
>> some custom collectors which filter out documents, and it assumes I dont get
>> any documents larger than maxDoc.
>> 
>> Thanks,
>> Bob
>> 
>> 
>> On Jun 9, 2011, at 12:37 PM, Digy wrote:
>> 
>>> One more point, some write operations using Lucene.Net 2.9.2 (add, delete,
>>> optimize etc.) upgrades automatically your index to 2.9.2.
>>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this
>> may
>>> result in data loss.
>>> 
>>> DIGY
>>> 
>>> -Original Message-
>>> From: Robert Stewart [mailto:robert_stew...@epam.com] 
>>> Sent: Thursday, June 09, 2011 7:06 PM
>>> To: lucene-net-...@lucene.apache.org
>>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>>> 
>>> I have a Lucene index created with Lucene.Net 1.9.  I have a multi-segment
>>> index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that index,
>> I
>>> get IndexOutOfRange exceptions in my collectors.  It is giving me document
>>> IDs that are larger than maxDoc.  
>>> 
>>> My index contains 377831 documents, and IndexReader.MaxDoc() is returning
>>> 377831, but I get documents from Collect() with large values (for instance
>>> 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?  If
>>> not, is there some way I can convert it (in production we have many
>> indexes
>>> containing about 200 million docs so I'd rather convert existing indexes
>>> than rebuilt them).
>>> 
>>> Thanks
>>> Bob=
>>> 
>> 
>

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

I think we only need to resurrect the merge score/field-docs code, in
it's own class.  Eg, each sub-node is expected to create it's own
score/field-docs, then the merge code is centralized.

> Maybe we should bring back the 'merge results' part of multi searcher
> without the query rewrite.

This is how Solr works today.

On Fri, Jun 10, 2011 at 9:09 AM, Robert Muir  wrote:
> On Fri, Jun 10, 2011 at 12:04 PM, Jason Rutherglen
>  wrote:
>>
>> Maybe we should bring back the 'merge results' part of multi searcher
>> without the query rewrite.
>>
>
> and how exactly? If this was easy, we wouldn't have to do
> https://issues.apache.org/jira/browse/LUCENE-2837
>
> I looked at this stupid multisearcher problem for way too much time
> myself and I totally agreed the only proper bugfix was the "nuclear"
> option (ridding of multisearcher entirely).
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen
 wrote:
> I think we only need to resurrect the merge score/field-docs code, in
> it's own class.  Eg, each sub-node is expected to create it's own
> score/field-docs, then the merge code is centralized.
>
>> Maybe we should bring back the 'merge results' part of multi searcher
>> without the query rewrite.
>
> This is how Solr works today.
>

no its not, the problem with multisearcher was it was too low of a level.

its fine to have some higher-level class to support this crap, but it
shouldnt be some transparent searcher.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Heads Up - Index File Format Change on Trunk

2011-06-10 Thread Michael McCandless

Simon can you also email java-user@ and solr-user@?  Seems good to
over-communicate when trunk index format changes...

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jun 10, 2011 at 10:01 AM, Simon Willnauer
 wrote:
> Hey folks,
>
> I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a
> byte to FieldInfo.
> If you are running on trunk you must / should re-index any trunk
> indexes once you update to the latest trunk.
>
> its likely if you open up old trunk (4.0) indexes, you will get an
> exception related to Read Past EOF.
>
> Simon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Ivan Dimitrov Vasilev (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Dimitrov Vasilev updated LUCENE-3188:
--

Attachment: LUCENE-3188.patch

The file LUCENE-3188.patch contains the needed changes to IndexSplitter to fix 
this issue.

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Priority: Minor
> Fix For: 3.0, 3.2
>
> Attachments: IndexSplitter.java, LUCENE-3188.patch, 
> TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

> its fine to have some higher-level class to support this crap, but it
> shouldnt be some transparent searcher.

I'll create a patch and post to a Jira.

On a side note, for multi threaded calls I noticed there's a lock on
the PQ in IndexSearcher, is the performance of that OK?

On Fri, Jun 10, 2011 at 9:21 AM, Robert Muir  wrote:
> On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen
>  wrote:
>> I think we only need to resurrect the merge score/field-docs code, in
>> it's own class.  Eg, each sub-node is expected to create it's own
>> score/field-docs, then the merge code is centralized.
>>
>>> Maybe we should bring back the 'merge results' part of multi searcher
>>> without the query rewrite.
>>
>> This is how Solr works today.
>>
>
> no its not, the problem with multisearcher was it was too low of a level.
>
> its fine to have some higher-level class to support this crap, but it
> shouldnt be some transparent searcher.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Michael McCandless

I'm actually working on something like this, basically a utility
method to merge N TopDocs into 1.  I want to do this for grouping as
well to make it easy to do grouping across shards.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jun 10, 2011 at 12:25 PM, Jason Rutherglen
 wrote:
>> its fine to have some higher-level class to support this crap, but it
>> shouldnt be some transparent searcher.
>
> I'll create a patch and post to a Jira.
>
> On a side note, for multi threaded calls I noticed there's a lock on
> the PQ in IndexSearcher, is the performance of that OK?
>
> On Fri, Jun 10, 2011 at 9:21 AM, Robert Muir  wrote:
>> On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen
>>  wrote:
>>> I think we only need to resurrect the merge score/field-docs code, in
>>> it's own class.  Eg, each sub-node is expected to create it's own
>>> score/field-docs, then the merge code is centralized.
>>>
 Maybe we should bring back the 'merge results' part of multi searcher
 without the query rewrite.
>>>
>>> This is how Solr works today.
>>>
>>
>> no its not, the problem with multisearcher was it was too low of a level.
>>
>> its fine to have some higher-level class to support this crap, but it
>> shouldnt be some transparent searcher.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1431) CommComponent abstracted

2011-06-10 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047275#comment-13047275
 ] 

Jason Rutherglen commented on SOLR-1431:


I just downloaded http://svn.apache.org/repos/asf/lucene/dev/trunk and applied 
the patch, and test-core passed.  However the patch command mentioned specific 
hunks, though there was no .rej file.

> CommComponent abstracted
> 
>
> Key: SOLR-1431
> URL: https://issues.apache.org/jira/browse/SOLR-1431
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0
>Reporter: Jason Rutherglen
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
> SOLR-1431.patch, SOLR-1431.patch
>
>
> We'll abstract CommComponent in this issue.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [Lucene.Net] Score(collector) called for each subReader - but not what I need

2011-06-10 Thread Digy

Have you tried to use Lucene.Net as is, before working on optimizing your
code? There are a lot of speed improvements in it since 1.9.
There is also a Faceted Search project in contrib.
(https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search
)

DIGY



-Original Message-
From: Robert Stewart [mailto:robert_stew...@epam.com] 
Sent: Friday, June 10, 2011 7:14 PM
To: 
Subject: [Lucene.Net] Score(collector) called for each subReader - but not
what I need

As I previously tried to explain, I have custom query for some pre-cached
terms, which I load into RAM in efficient compressed form.  I need this for
faster searching and also for much faster faceting.  So what I do is process
incoming query and replace certain sub-queries with my own "CachedTermQuery"
objects, which extend Query.  Since these are not per-segment, I only want
scorer.Score(collector) called once, not once for each segment in my index.
Essentially what happens now if I have a search is it collects the same
documents N times, 1 time for each segment.  Is there anyway to combine
different Scorers/Collectors such that I can control when it enumerates
collection by multiple sub-readers, and when not to?  This all worked in
previous version of Lucene because enumerating sub-indexes (segments) was
pushed to a lower level inside Lucene API and not it is elevated to a higher
level.

Thanks
Bob


On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote:

> I found the problem.  The problem is that I have a custom "query
optimizer", and that replaces certain TermQuery's within a Boolean query
with a custom Query and this query has its own weight/scorer that retrieves
matching documents from an in-memory cache (and that is not Lucene backed).
But it looks like my custom hitcollectors are now wrapped in a
HitCollectorWrapper which assumes Collect() needs called for multiple
segments - so it is adding a start offset to the doc ID that comes from my
custom query implementation.  I looked at the new Collector class and it
seems it works the same way (assumes it needs to set the next index reader
with some offset).  How can I make my custom query work with the new API (so
that there is basically a single "segment" in RAM that my query uses, but
still other query clauses in same boolean query use multiple lucene
segments)?  I am sure that is not clear and will try to provide more detail
soon.
> 
> Thanks
> Bob
> 
> 
> On Jun 9, 2011, at 1:48 PM, Digy wrote:
> 
>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect
the
>> problem.
>> DIGY
>> 
>> -Original Message-
>> From: Robert Stewart [mailto:robert_stew...@epam.com] 
>> Sent: Thursday, June 09, 2011 8:40 PM
>> To: 
>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>> 
>> I tried converting index using IndexWriter as follows:
>> 
>> Lucene.Net.Index.IndexWriter writer = new
IndexWriter(TestIndexPath+"_2.9",
>> new Lucene.Net.Analysis.KeywordAnalyzer());
>> 
>> writer.SetMaxBufferedDocs(2);
>> writer.SetMaxMergeDocs(100);
>> writer.SetMergeFactor(2);
>> 
>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
>> 
>> writer.Commit();
>> 
>> 
>> That seems to work (I get what looks like a valid index directory at
least).
>> 
>> But still when I run some tests using IndexSearcher I get the same
problem
>> (I get documents in Collect() which are larger than
IndexReader.MaxDoc()).
>> Any idea what the problem could be?  
>> 
>> BTW, this is a problem because I lookup some fields (date ranges, etc.)
in
>> some custom collectors which filter out documents, and it assumes I dont
get
>> any documents larger than maxDoc.
>> 
>> Thanks,
>> Bob
>> 
>> 
>> On Jun 9, 2011, at 12:37 PM, Digy wrote:
>> 
>>> One more point, some write operations using Lucene.Net 2.9.2 (add,
delete,
>>> optimize etc.) upgrades automatically your index to 2.9.2.
>>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this
>> may
>>> result in data loss.
>>> 
>>> DIGY
>>> 
>>> -Original Message-
>>> From: Robert Stewart [mailto:robert_stew...@epam.com] 
>>> Sent: Thursday, June 09, 2011 7:06 PM
>>> To: lucene-net-...@lucene.apache.org
>>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>>> 
>>> I have a Lucene index created with Lucene.Net 1.9.  I have a
multi-segment
>>> index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that
index,
>> I
>>> get IndexOutOfRange exceptions in my collectors.  It is giving me
document
>>> IDs that are larger than maxDoc.  
>>> 
>>> My index contains 377831 documents, and IndexReader.MaxDoc() is
returning
>>> 377831, but I get documents from Collect() with large values (for
instance
>>> 379018).  Is an index built with Lucene.Net 1.9 compatible with 2.9.2?
If
>>> not, is there some way I can convert it (in production we have many
>> indexes
>>> containing about 200 million docs so I'd rather convert e

[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Ivan Dimitrov Vasilev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047279#comment-13047279
 ] 

Ivan Dimitrov Vasilev commented on LUCENE-3188:
---

Hi Steve,

I attached the patch to this issue as required from Apache (or at least I think 
so :) ).
I do not have lot of time now to read in depth Apache docs about the procedures 
when contributing but I saw on the wiki that I should provide also test cases 
that show the bug and they should be in form of unit tests. I provided a test 
case that is not in JUnit form but still works. As I saw when submitted patch 
the tests do not need granting license so you can use it. I do not have much 
time now because we are before release and this was one of the bugs that I 
should fix (and I did it).

I guess you are the one who discovered this Splitter. Thank you very much for 
this you saved me a lot of hard work because in our previous releases we used a 
class that generated segments descriptor file out of given segments and looking 
for content of this file was very difficult.

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Priority: Minor
> Fix For: 3.0, 3.2
>
> Attachments: IndexSplitter.java, LUCENE-3188.patch, 
> TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

Ok sounds good.  Grouping is an interesting distrib case.

On Fri, Jun 10, 2011 at 9:27 AM, Michael McCandless
 wrote:
> I'm actually working on something like this, basically a utility
> method to merge N TopDocs into 1.  I want to do this for grouping as
> well to make it easy to do grouping across shards.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Jun 10, 2011 at 12:25 PM, Jason Rutherglen
>  wrote:
>>> its fine to have some higher-level class to support this crap, but it
>>> shouldnt be some transparent searcher.
>>
>> I'll create a patch and post to a Jira.
>>
>> On a side note, for multi threaded calls I noticed there's a lock on
>> the PQ in IndexSearcher, is the performance of that OK?
>>
>> On Fri, Jun 10, 2011 at 9:21 AM, Robert Muir  wrote:
>>> On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen
>>>  wrote:
 I think we only need to resurrect the merge score/field-docs code, in
 it's own class.  Eg, each sub-node is expected to create it's own
 score/field-docs, then the merge code is centralized.

> Maybe we should bring back the 'merge results' part of multi searcher
> without the query rewrite.

 This is how Solr works today.

>>>
>>> no its not, the problem with multisearcher was it was too low of a level.
>>>
>>> its fine to have some higher-level class to support this crap, but it
>>> shouldnt be some transparent searcher.
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047291#comment-13047291
 ] 

Uwe Schindler commented on LUCENE-1768:
---

Hi Vinicius,

if you want the code be committed later, you should check the license box 
("Grant license to ASF for inclusion in ASF works (as per the Apache License 
§5)"), else we will be not able to submit it to the main repository.

If you want us to commit the patch only at the end of GSOC, it's enough to 
check this box in your final submission, but it should be noted, that we may 
submit minor parts of the work even before (once you are at a state where it is 
'useable' and passes existing tests). A second commit could e.g. adding 
sophisticated tests, and so on.

> NumericRange support for new query parser
> -
>
> Key: LUCENE-1768
> URL: https://issues.apache.org/jira/browse/LUCENE-1768
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Adriano Crestani
>  Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: week1.patch, week2.patch
>
>
> It would be good to specify some type of "schema" for the query parser in 
> future, to automatically create NumericRangeQuery for different numeric 
> types? It would then be possible to index a numeric value 
> (double,float,long,int) using NumericField and then the query parser knows, 
> which type of field this is and so it correctly creates a NumericRangeQuery 
> for strings like "[1.567..*]" or "(1.787..19.5]".
> There is currently no way to extract if a field is numeric from the index, so 
> the user will have to configure the FieldConfig objects in the ConfigHandler. 
> But if this is done, it will not be that difficult to implement the rest.
> The only difference between the current handling of RangeQuery is then the 
> instantiation of the correct Query type and conversion of the entered numeric 
> values (simple Number.valueOf(...) cast of the user entered numbers). 
> Evenerything else is identical, NumericRangeQuery also supports the MTQ 
> rewrite modes (as it is a MTQ).
> Another thing is a change in Date semantics. There are some strange flags in 
> the current parser that tells it how to handle dates.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047315#comment-13047315
 ] 

Uwe Schindler commented on LUCENE-1768:
---

One small thing I have seen after applying your patch:
The code guidelines of Lucene require no TABS but two whitespace to indent. We 
have a code style available for Eclipse and IDEA in the dev-tools folder (below 
trunk). You only have to install it.

> NumericRange support for new query parser
> -
>
> Key: LUCENE-1768
> URL: https://issues.apache.org/jira/browse/LUCENE-1768
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Adriano Crestani
>  Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: week1.patch, week2.patch
>
>
> It would be good to specify some type of "schema" for the query parser in 
> future, to automatically create NumericRangeQuery for different numeric 
> types? It would then be possible to index a numeric value 
> (double,float,long,int) using NumericField and then the query parser knows, 
> which type of field this is and so it correctly creates a NumericRangeQuery 
> for strings like "[1.567..*]" or "(1.787..19.5]".
> There is currently no way to extract if a field is numeric from the index, so 
> the user will have to configure the FieldConfig objects in the ConfigHandler. 
> But if this is done, it will not be that difficult to implement the rest.
> The only difference between the current handling of RangeQuery is then the 
> instantiation of the correct Query type and conversion of the entered numeric 
> values (simple Number.valueOf(...) cast of the user entered numbers). 
> Evenerything else is identical, NumericRangeQuery also supports the MTQ 
> rewrite modes (as it is a MTQ).
> Another thing is a change in Date semantics. There are some strange flags in 
> the current parser that tells it how to handle dates.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Andrzej Bialecki


On 6/10/11 6:27 PM, Michael McCandless wrote:

I'm actually working on something like this, basically a utility
method to merge N TopDocs into 1.  I want to do this for grouping as
well to make it easy to do grouping across shards.


Mike,

The straightforward merge that is used in Solr suffers from incomparable 
scores (due to the lack of global IDF). See my slides from the 
Buzzwords. Since we can handle global IDF in local searchers more easily 
that in Solr then we can reuse that DfCache trick from MultiSearcher.



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-1768) NumericRange support for new query parser

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047315#comment-13047315
 ] 

Uwe Schindler edited comment on LUCENE-1768 at 6/10/11 5:48 PM:


One small thing I have seen after applying your patch:
The code guidelines of Lucene require no TABS but two whitespace to indent. We 
have a code style available for Eclipse and IDEA in the dev-tools folder (below 
trunk). You only have to install it.

Also you are using Java 6 interface overrides, so the code does not compile 
with Java 5 (unfortunately this is a bug in Java 6's javac, as it does not 
complain when in "-source 1.5" mode). In Java 5 compatible code it is not 
allowed to add @Override to methods implemented for interfaces:

{noformat}
common.compile-core:
[mkdir] Created dir: C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java
[javac] Compiling 175 source files to C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:182:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:187:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\config\NumericFieldConfigListener.java:21:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:17:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:32:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:79:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:20:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:25:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:35:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:52:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:57:
 method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\parser\JavaCharStream.java:367:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   public int getEndColumn() {
[javac]  ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:34:
 warning: [dep-ann] deprecated name isnt annotated with @Deprecated
[javac]   int getColumn();
[javac]   ^
[javac] C:\Users\Uwe 
Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:41:
 warning: [dep-a

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

Out of curiosity, how is DF handled with the new automaton [regex] queries?

On Fri, Jun 10, 2011 at 10:48 AM, Andrzej Bialecki  wrote:
> On 6/10/11 6:27 PM, Michael McCandless wrote:
>>
>> I'm actually working on something like this, basically a utility
>> method to merge N TopDocs into 1.  I want to do this for grouping as
>> well to make it easy to do grouping across shards.
>
> Mike,
>
> The straightforward merge that is used in Solr suffers from incomparable
> scores (due to the lack of global IDF). See my slides from the Buzzwords.
> Since we can handle global IDF in local searchers more easily that in Solr
> then we can reuse that DfCache trick from MultiSearcher.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Heads Up - Index File Format Change on Trunk

2011-06-10 Thread Simon Willnauer

On Fri, Jun 10, 2011 at 6:22 PM, Michael McCandless
 wrote:
> Simon can you also email java-user@ and solr-user@?  Seems good to
> over-communicate when trunk index format changes...

good point, done!


>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Jun 10, 2011 at 10:01 AM, Simon Willnauer
>  wrote:
>> Hey folks,
>>
>> I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a
>> byte to FieldInfo.
>> If you are running on trunk you must / should re-index any trunk
>> indexes once you update to the latest trunk.
>>
>> its likely if you open up old trunk (4.0) indexes, you will get an
>> exception related to Read Past EOF.
>>
>> Simon
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Andrzej Bialecki


On 6/10/11 7:51 PM, Jason Rutherglen wrote:

Out of curiosity, how is DF handled with the new automaton [regex] queries?


Automaton is eventually resolved into a list of terms, and the IDF for 
each term is obtained in the usual way.


--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Distributed search capability

2011-06-10 Thread Uwe Schindler

Hi Jason,

Standard MTQ queries have no scoring at all (using ConstantScoreRewrite by
default). Exception is FuzzyQuery which has two modes: One using standard
BQ TermQuery scoring multiplied with factor calculated from levensthein
distance and another one with all TermQueries made constant score and only
boosted by levensthein distance.

For all MTQ queries you can change the rewrite mode (so you can even rewrite
a WildCard query using fuzzy scoring, but that makes no sense at all,
because all boost are 1.0). You can also make FuzzyQ constant and respecting
all terms that match somehow if you like, the standard is to use a PQ.

This is the same in Lucene 3.x.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: Friday, June 10, 2011 7:52 PM
> To: dev@lucene.apache.org
> Subject: Re: Distributed search capability
> 
> Out of curiosity, how is DF handled with the new automaton [regex]
queries?
> 
> On Fri, Jun 10, 2011 at 10:48 AM, Andrzej Bialecki  wrote:
> > On 6/10/11 6:27 PM, Michael McCandless wrote:
> >>
> >> I'm actually working on something like this, basically a utility
> >> method to merge N TopDocs into 1.  I want to do this for grouping as
> >> well to make it easy to do grouping across shards.
> >
> > Mike,
> >
> > The straightforward merge that is used in Solr suffers from
> > incomparable scores (due to the lack of global IDF). See my slides from
the
> Buzzwords.
> > Since we can handle global IDF in local searchers more easily that in
> > Solr then we can reuse that DfCache trick from MultiSearcher.
> >
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _   __
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||
> > \|  ||  |  Embedded Unix, System Integration http://www.sigram.com
> > Contact: info at sigram dot com
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
> > additional commands, e-mail: dev-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-10 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2564:
---

Priority: Blocker  (was: Major)

Marking this issue as a blocker for Solr 4.0 per McCandless comment in 
SOLR-2524...

{quote}
That said, the plan is definitely to get Solr 4.0 cutover to the
grouping module; it's just a matter of time. I don't think we should
ship 4.0 until we've done so.
{quote}

> Integrating grouping module into Solr 4.0
> -
>
> Key: SOLR-2564
> URL: https://issues.apache.org/jira/browse/SOLR-2564
> Project: Solr
>  Issue Type: Improvement
>Reporter: Martijn van Groningen
>Assignee: Martijn van Groningen
>Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
> SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this 
> up in Solr.
> Besides the current grouping features Solr provides, Solr will then also 
> support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047354#comment-13047354
 ] 

Steven Rowe commented on LUCENE-3188:
-

{quote}
I attached the patch to this issue as required from Apache (or at least I think 
so ).
I do not have lot of time now to read in depth Apache docs about the procedures 
when contributing but I saw on the wiki that I should provide also test cases 
that show the bug and they should be in form of unit tests. I provided a test 
case that is not in JUnit form but still works. As I saw when submitted patch 
the tests do not need granting license so you can use it. I do not have much 
time now because we are before release and this was one of the bugs that I 
should fix (and I did it).
{quote}

Thanks for reporting and providing a patch.  I can take it from here.

bq. I guess you are the one who discovered this Splitter.

I think you have me confused with someone else :) - Jason Rutherglen wrote it: 
LUCENE-1959.

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Priority: Minor
> Fix For: 3.0, 3.2
>
> Attachments: IndexSplitter.java, LUCENE-3188.patch, 
> TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Score(collector) called for each subReader - but not what I need

2011-06-10 Thread Robert Stewart

No I will try it though. Thanks.

Bob


On Jun 10, 2011, at 12:37 PM, Digy wrote:

> Have you tried to use Lucene.Net as is, before working on optimizing your
> code? There are a lot of speed improvements in it since 1.9.
> There is also a Faceted Search project in contrib.
> (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search
> )
> 
> DIGY
> 
> 
> 
> -Original Message-
> From: Robert Stewart [mailto:robert_stew...@epam.com] 
> Sent: Friday, June 10, 2011 7:14 PM
> To: 
> Subject: [Lucene.Net] Score(collector) called for each subReader - but not
> what I need
> 
> As I previously tried to explain, I have custom query for some pre-cached
> terms, which I load into RAM in efficient compressed form.  I need this for
> faster searching and also for much faster faceting.  So what I do is process
> incoming query and replace certain sub-queries with my own "CachedTermQuery"
> objects, which extend Query.  Since these are not per-segment, I only want
> scorer.Score(collector) called once, not once for each segment in my index.
> Essentially what happens now if I have a search is it collects the same
> documents N times, 1 time for each segment.  Is there anyway to combine
> different Scorers/Collectors such that I can control when it enumerates
> collection by multiple sub-readers, and when not to?  This all worked in
> previous version of Lucene because enumerating sub-indexes (segments) was
> pushed to a lower level inside Lucene API and not it is elevated to a higher
> level.
> 
> Thanks
> Bob
> 
> 
> On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote:
> 
>> I found the problem.  The problem is that I have a custom "query
> optimizer", and that replaces certain TermQuery's within a Boolean query
> with a custom Query and this query has its own weight/scorer that retrieves
> matching documents from an in-memory cache (and that is not Lucene backed).
> But it looks like my custom hitcollectors are now wrapped in a
> HitCollectorWrapper which assumes Collect() needs called for multiple
> segments - so it is adding a start offset to the doc ID that comes from my
> custom query implementation.  I looked at the new Collector class and it
> seems it works the same way (assumes it needs to set the next index reader
> with some offset).  How can I make my custom query work with the new API (so
> that there is basically a single "segment" in RAM that my query uses, but
> still other query clauses in same boolean query use multiple lucene
> segments)?  I am sure that is not clear and will try to provide more detail
> soon.
>> 
>> Thanks
>> Bob
>> 
>> 
>> On Jun 9, 2011, at 1:48 PM, Digy wrote:
>> 
>>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect
> the
>>> problem.
>>> DIGY
>>> 
>>> -Original Message-
>>> From: Robert Stewart [mailto:robert_stew...@epam.com] 
>>> Sent: Thursday, June 09, 2011 8:40 PM
>>> To: 
>>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
>>> 
>>> I tried converting index using IndexWriter as follows:
>>> 
>>> Lucene.Net.Index.IndexWriter writer = new
> IndexWriter(TestIndexPath+"_2.9",
>>> new Lucene.Net.Analysis.KeywordAnalyzer());
>>> 
>>> writer.SetMaxBufferedDocs(2);
>>> writer.SetMaxMergeDocs(100);
>>> writer.SetMergeFactor(2);
>>> 
>>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new
>>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) });
>>> 
>>> writer.Commit();
>>> 
>>> 
>>> That seems to work (I get what looks like a valid index directory at
> least).
>>> 
>>> But still when I run some tests using IndexSearcher I get the same
> problem
>>> (I get documents in Collect() which are larger than
> IndexReader.MaxDoc()).
>>> Any idea what the problem could be?  
>>> 
>>> BTW, this is a problem because I lookup some fields (date ranges, etc.)
> in
>>> some custom collectors which filter out documents, and it assumes I dont
> get
>>> any documents larger than maxDoc.
>>> 
>>> Thanks,
>>> Bob
>>> 
>>> 
>>> On Jun 9, 2011, at 12:37 PM, Digy wrote:
>>> 
 One more point, some write operations using Lucene.Net 2.9.2 (add,
> delete,
 optimize etc.) upgrades automatically your index to 2.9.2.
 But if your index is somehow corrupted(eg, due to some bug in 1.9) this
>>> may
 result in data loss.
 
 DIGY
 
 -Original Message-
 From: Robert Stewart [mailto:robert_stew...@epam.com] 
 Sent: Thursday, June 09, 2011 7:06 PM
 To: lucene-net-...@lucene.apache.org
 Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)?
 
 I have a Lucene index created with Lucene.Net 1.9.  I have a
> multi-segment
 index (non-optimized).   When I run Lucene.Net 2.9.2 on top of that
> index,
>>> I
 get IndexOutOfRange exceptions in my collectors.  It is giving me
> document
 IDs that are larger than maxDoc.  
 
 My index contains 377831 documents, and IndexReader.MaxDoc() is
> returning
 377831, but I get d

[jira] [Created] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-10 Thread selckin (JIRA)

TestStressIndexing2 testMultiConfig failure
---

 Key: LUCENE-3190
 URL: https://issues.apache.org/jira/browse/LUCENE-3190
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin


trunk: r1134311

reproducible

{code}
[junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
[junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
[junit] 
[junit] - Standard Error -
[junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
mem: 395100 active: 65808
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
[junit] at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
-Dtestmethod=testMultiConfig 
-Dtests.seed=2571834029692482827:-8116419692655152763
[junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
-Dtestmethod=testMultiConfig 
-Dtests.seed=2571834029692482827:-8116419692655152763
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Thread-0 ***
[junit] junit.framework.AssertionFailedError: java.lang.AssertionError: ram 
was 460908 expected: 408216 flush mem: 395100 active: 65808
[junit] at junit.framework.Assert.fail(Assert.java:47)
[junit] at 
org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
[junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, 
f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, 
f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), 
f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, f94=MockSep, 
f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, timezone=Pacific/Palau
[junit] NOTE: all tests run in this JVM:
[junit] [TestStressIndexing2]
[junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
(64-bit)/cpus=8,threads=1,free=133324528,total=158400512
[junit] -  ---
[junit] Testcase: 
testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
[junit] r1.numDocs()=17 vs r2.numDocs()=16
[junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs 
r2.numDocs()=16
[junit] at 
org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
[junit] at 
org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
[junit] at 
org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
[junit] 
[junit] 
[junit] Testcase: 
testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
[junit] Some threads threw uncaught exceptions!
[junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
exceptions!
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
[junit] 
[junit] 
[junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

2011-06-10 Thread Elmer Garduno (JIRA)

Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on 
insert
--

 Key: SOLR-2584
 URL: https://issues.apache.org/jira/browse/SOLR-2584
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.3, 4.0
Reporter: Elmer Garduno
Priority: Minor


Hi folks, 

I think that UIMAUpdateRequestProcessor should have a parameter to avoid 
duplicate values on the updated field. 

A typical use case is:

If you are using DictionaryAnnotator and there is a term that matches more than 
once it will be added two times in the mapped field. I think that we should add 
a parameter to avoid inserting duplicates as we are not preserving information 
on the position of the annotation. 

What do you think about it? I've already implemented this for branch 3x I'm 
writing some tests and I will submit a patch.

Regards



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

FYI: How to build and start Apache Solr admin app from source with Maven

2011-06-10 Thread Way Cool

Hi, guys,

FYI: Here is the link to how to build and start Apache Solr admin app from
source with Maven just in case you might be interested:
http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html

Have fun.

YH

[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-10 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-2535:
---

Attachment: SOLR-2535_fix_admin_file_handler_for_directory_listings.patch

The attached patch fixes this bug and adds new tests for a directory listing 
and getting a file. This bug was triggered with the introduction of SOLR-2263 
in which RawResponseWriter was changed to implement BinaryQueryResponseWriter. 
This wasn't a problem in and of itself, but the SolrDispatchFilter checks if a 
response writer is the binary variant and if so calls the 
write(OutputStream...) variant. But the responses from ShowFileRequestHandler 
that list directory contents are incompatible with the RawResponseWriter if 
RawResponseWriter's write(OutputStream...) method is uses, instead of a 
character based stream. The solution was to move the defaulting of the "raw" 
response type from ShowFileRequestHandler.init() into into a condition within 
handleRequestBody() where it knows the response is a file.

> In Solr 3.2 and trunk the admin/file handler fails to show directory listings
> -
>
> Key: SOLR-2535
> URL: https://issues.apache.org/jira/browse/SOLR-2535
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1, 3.2, 4.0
> Environment: java 1.6, jetty
>Reporter: Peter Wolanin
> Fix For: 3.3
>
> Attachments: 
> SOLR-2535_fix_admin_file_handler_for_directory_listings.patch
>
>
> In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
> listing of the conf directory, like:
> {noformat}
> 
> 0 name="QTime">1
> 
>   1274 name="modified">2011-03-06T20:42:54Z
>   ...
> 
> 
> {noformat}
> I can list the xslt sub-dir using solr/admin/files?file=/xslt
> In Solr 3.1.0, both of these fail with a 500 error:
> {noformat}
> HTTP ERROR 500
> Problem accessing /solr/admin/file/. Reason:
> did not find a CONTENT object
> java.io.IOException: did not find a CONTENT object
> {noformat}
> Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
> should still handle directory listings if not file name is given, or if the 
> file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)

Add TopDocs.merge to merge multiple TopDocs
---

 Key: LUCENE-3191
 URL: https://issues.apache.org/jira/browse/LUCENE-3191
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 3.3, 4.0


It's not easy today to merge TopDocs, eg produced by multiple shards,
supporting arbitrary Sort.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3191:
---

Attachment: LUCENE-3191.patch

Patch.

The basic idea is simple (use PQ to find top N across all shards),
but, I had to add FieldComparator.compare(Comparable, Comparable).
Ie, the FieldComparator should be able to compare the Comparables
returned by its value method.


> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3191:
--

Assignee: Michael McCandless

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Michael McCandless

On Fri, Jun 10, 2011 at 1:48 PM, Andrzej Bialecki  wrote:
> On 6/10/11 6:27 PM, Michael McCandless wrote:
>>
>> I'm actually working on something like this, basically a utility
>> method to merge N TopDocs into 1.  I want to do this for grouping as
>> well to make it easy to do grouping across shards.
>
> Mike,
>
> The straightforward merge that is used in Solr suffers from incomparable
> scores (due to the lack of global IDF). See my slides from the Buzzwords.
> Since we can handle global IDF in local searchers more easily that in Solr
> then we can reuse that DfCache trick from MultiSearcher.

This is cool stuff Andrzej!!

But, my patch (LUCENE-3191) is aiming for the lower-level problem of
just the mechanics of merging multiple TopDocs ie, something
"above" will have to handle "properly" setting scores of the incoming
TopDocs (if in fact the search sorts by score).

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047408#comment-13047408
 ] 

Simon Willnauer commented on LUCENE-3190:
-

I will dig!

> TestStressIndexing2 testMultiConfig failure
> ---
>
> Key: LUCENE-3190
> URL: https://issues.apache.org/jira/browse/LUCENE-3190
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Simon Willnauer
>
> trunk: r1134311
> reproducible
> {code}
> [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
> [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
> [junit] 
> [junit] - Standard Error -
> [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
> mem: 395100 active: 65808
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
> [junit] at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Thread-0 ***
> [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: 
> ram was 460908 expected: 408216 flush mem: 395100 active: 65808
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
> f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
> f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, 
> f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, 
> f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), 
> f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, 
> f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, 
> timezone=Pacific/Palau
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestStressIndexing2]
> [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=133324528,total=158400512
> [junit] -  ---
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] r1.numDocs()=17 vs r2.numDocs()=16
> [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs 
> r2.numDocs()=16
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit] at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
> [junit] 
> [junit] 
> [junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure

2011-06-10 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3190:
---

Assignee: Simon Willnauer

> TestStressIndexing2 testMultiConfig failure
> ---
>
> Key: LUCENE-3190
> URL: https://issues.apache.org/jira/browse/LUCENE-3190
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Simon Willnauer
>
> trunk: r1134311
> reproducible
> {code}
> [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2
> [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec
> [junit] 
> [junit] - Standard Error -
> [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush 
> mem: 395100 active: 65808
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102)
> [junit] at 
> org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164)
> [junit] at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> [junit] at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757)
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 
> -Dtestmethod=testMultiConfig 
> -Dtests.seed=2571834029692482827:-8116419692655152763
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Thread-0 ***
> [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: 
> ram was 460908 expected: 408216 flush mem: 395100 active: 65808
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, 
> f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, 
> f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, 
> f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, 
> f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), 
> f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, 
> f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, 
> timezone=Pacific/Palau
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestStressIndexing2]
> [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=133324528,total=158400512
> [junit] -  ---
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] r1.numDocs()=17 vs r2.numDocs()=16
> [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs 
> r2.numDocs()=16
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278)
> [junit] at 
> org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit] at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
> [junit] 
> [junit] 
> [junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-

RE: [Lucene.Net] Faceting

2011-06-10 Thread Digy

And yes for 1 & 2.
DIGY

-Original Message-
From: Robert Stewart [mailto:robert_stew...@epam.com] 
Sent: Friday, June 10, 2011 10:36 PM
To: 
Subject: [Lucene.Net] Faceting

I took a brief look at the documentation for faceting in contrib.  I did not
look at code yet.  Do you think it can work for these requirements:

1) Needs to compute facets for fields of more than one value per document
(for instance a document may have many company names associated to it).
2) Needs to compute facets over any arbitrary query
3) Needs to be fast:
a) I have 100 million docs distributed in about 10 indexes (10
million docs each) and use parallel distributed search and merge
b) For some facet fields, we have over 100,000 possible unique
values (for example, we have 150,000 possible company values)

In our case, we pre-cache compressed doc sets in memory for each unique
facet value.  if # of values < 1/9 size of index, then we use variable byte
encoding of integers, otherwise we use BitArray.
These doc sets are then sorted in descending order by document frequency (so
more frequent facets are counted first)
We open new index "snapshots" every couple minutes and pre-load these facet
doc sets into ram each time new snapshot is opened in the background.
We use about 32 GB of RAM when fully loaded.

At search time we gather all the doc IDs matching search into a BitArray.
Then we enumerate all the facet doc sets in desc order by overall doc
frequency, and count how many docs in search matched each facet.
These facet counts are passed into a priority queue to gather top N counts
(such that when the next total count < full priority queue min value, it
breaks out of loop, that is why we do it in desc order by total doc freq)

We also count # of docs per day over date range for each facet.
We also compute facets for about 10 fields during search, and get top 10
facets each.

Typically search over 100 million docs including facet counts and per-date
counts takes about 1300ms.  

Our current solution actually works pretty well - but it is a burden on RAM,
time to load new snapshots, and extra pressure on GC during busy times.

Do you think your current facet implementation can work as above, and should
I try to contrib what I have (it would definitely take a little
refactoring)?

Thanks,
Bob

On Jun 10, 2011, at 12:37 PM, Digy wrote:

> Have you tried to use Lucene.Net as is, before working on optimizing your
> code? There are a lot of speed improvements in it since 1.9.
> There is also a Faceted Search project in contrib.
>
(https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search
> )
> 
> DIGY
> 
> 
> 
> -Original Message-
> From: Robert Stewart [mailto:robert_stew...@epam.com] 
> Sent: Friday, June 10, 2011 7:14 PM
> To: 
> Subject: [Lucene.Net] Score(collector) called for each subReader - but not
> what I need
> 
> As I previously tried to explain, I have custom query for some pre-cached
> terms, which I load into RAM in efficient compressed form.  I need this
for
> faster searching and also for much faster faceting.  So what I do is
process
> incoming query and replace certain sub-queries with my own
"CachedTermQuery"
> objects, which extend Query.  Since these are not per-segment, I only want
> scorer.Score(collector) called once, not once for each segment in my
index.
> Essentially what happens now if I have a search is it collects the same
> documents N times, 1 time for each segment.  Is there anyway to combine
> different Scorers/Collectors such that I can control when it enumerates
> collection by multiple sub-readers, and when not to?  This all worked in
> previous version of Lucene because enumerating sub-indexes (segments) was
> pushed to a lower level inside Lucene API and not it is elevated to a
higher
> level.
> 
> Thanks
> Bob
> 
> 
> On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote:
> 
>> I found the problem.  The problem is that I have a custom "query
> optimizer", and that replaces certain TermQuery's within a Boolean query
> with a custom Query and this query has its own weight/scorer that
retrieves
> matching documents from an in-memory cache (and that is not Lucene
backed).
> But it looks like my custom hitcollectors are now wrapped in a
> HitCollectorWrapper which assumes Collect() needs called for multiple
> segments - so it is adding a start offset to the doc ID that comes from my
> custom query implementation.  I looked at the new Collector class and it
> seems it works the same way (assumes it needs to set the next index reader
> with some offset).  How can I make my custom query work with the new API
(so
> that there is basically a single "segment" in RAM that my query uses, but
> still other query clauses in same boolean query use multiple lucene
> segments)?  I am sure that is not clear and will try to provide more
detail
> soon.
>> 
>> Thanks
>> Bob
>> 
>> 
>> On Jun 9, 2011, at 1:48 PM, Digy wrote:
>> 
>>> Sorry no idea. Maybe optimizing the index wi

[jira] [Commented] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-10 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047411#comment-13047411
 ] 

Peter Wolanin commented on SOLR-2535:
-

Thanks for the patch.  Is every thing in there related to this bug?  Some of it 
looks like other cleanup.

> In Solr 3.2 and trunk the admin/file handler fails to show directory listings
> -
>
> Key: SOLR-2535
> URL: https://issues.apache.org/jira/browse/SOLR-2535
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1, 3.2, 4.0
> Environment: java 1.6, jetty
>Reporter: Peter Wolanin
> Fix For: 3.3
>
> Attachments: 
> SOLR-2535_fix_admin_file_handler_for_directory_listings.patch
>
>
> In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
> listing of the conf directory, like:
> {noformat}
> 
> 0 name="QTime">1
> 
>   1274 name="modified">2011-03-06T20:42:54Z
>   ...
> 
> 
> {noformat}
> I can list the xslt sub-dir using solr/admin/files?file=/xslt
> In Solr 3.1.0, both of these fail with a 500 error:
> {noformat}
> HTTP ERROR 500
> Problem accessing /solr/admin/file/. Reason:
> did not find a CONTENT object
> java.io.IOException: did not find a CONTENT object
> {noformat}
> Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
> should still handle directory listings if not file name is given, or if the 
> file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047416#comment-13047416
 ] 

Uwe Schindler commented on LUCENE-3191:
---

bq. The basic idea is simple (use PQ to find top N across all shards), but, I 
had to add FieldComparator.compare(Comparable, Comparable).

That makes no sense to me, because Comparable can always compare against 
each other without a separate comparator. The old MultiSearcher did exactly do 
this. This is why it returns Comparable. So instead 
FieldComparator.compare(a, b) just use a.compareTo(b). It's in the 
responsibility of the Comparator to return a correctly wrapped Comparable.

There might only be a bug in RelevanceComparator: Its getValue() method returns 
a comparable that sorts in wrong order. We have no test for this, so it might 
never cause a test failure.

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings

2011-06-10 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047420#comment-13047420
 ] 

David Smiley commented on SOLR-2535:


The relayout of import statements in SolrDisptatchFilter.java is inadvertent.
The QueryRequest.java one-liner was a null-check that I felt was an improvement 
so that I didn't have to pass in an empty params list to QueryRequest's 
constructor.

> In Solr 3.2 and trunk the admin/file handler fails to show directory listings
> -
>
> Key: SOLR-2535
> URL: https://issues.apache.org/jira/browse/SOLR-2535
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1, 3.2, 4.0
> Environment: java 1.6, jetty
>Reporter: Peter Wolanin
> Fix For: 3.3
>
> Attachments: 
> SOLR-2535_fix_admin_file_handler_for_directory_listings.patch
>
>
> In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted 
> listing of the conf directory, like:
> {noformat}
> 
> 0 name="QTime">1
> 
>   1274 name="modified">2011-03-06T20:42:54Z
>   ...
> 
> 
> {noformat}
> I can list the xslt sub-dir using solr/admin/files?file=/xslt
> In Solr 3.1.0, both of these fail with a 500 error:
> {noformat}
> HTTP ERROR 500
> Problem accessing /solr/admin/file/. Reason:
> did not find a CONTENT object
> java.io.IOException: did not find a CONTENT object
> {noformat}
> Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 
> should still handle directory listings if not file name is given, or if the 
> file is a directory, so I am filing this as a bug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047440#comment-13047440
 ] 

Michael McCandless commented on LUCENE-3191:


Uwe, you are right!  Now why didn't I think of that...

The returned Comparable should be expected to properly compare itself to any 
other Comparable returned from FieldComparator.value... so I'll do that and 
then the patch is nice and small.  And no API change for 3.x.

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047472#comment-13047472
 ] 

Michael McCandless commented on LUCENE-3191:


So... I started down this path (relying on the returned Comparable
from .value to .compareTo themselves, instead of adding new .compare
method to FieldComp), but I'm not sure I like it...

I had to add a ReverseFloatComparable inside RelevanceComp, since it
sorts opposite natural Float sort order by default.

But then what this means, for an app that wants to do some sharding,
suddenly a TopDocs might contain an instance of this class, whereas
now it contains plain Java objects (Float, Integer, etc.).

I also don't like that this is splitting up the logic of how relevacne
scores compare to one another across two places (RelevanceComp and
this new ReverseFloatComparable).

I think it'd be better if we keep simple objects in the TopDocs, to
keep it easy for apps to serialize themselves (since we don't impl
Serializable anymore), and then the front end would invoke
RelevanceComparator locally to properly compare the floats.

Ie, really FieldComp.value should never have returned Comparable, I
think?


> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047477#comment-13047477
 ] 

Uwe Schindler commented on LUCENE-3191:
---

This still confuses me:

bq. There might only be a bug in RelevanceComparator: Its getValue() method 
returns a comparable that sorts in wrong order. We have no test for this, so it 
might never cause a test failure.

In my opinion, it should return a negative Float object. But as far as I know, 
there was/is already some special case in the collectors merge code used to 
merge segment's results (FieldValueHitQueue.fillFields copys the values into 
the collected docs, but I am not sure if this is still used.

The good old deprecated FieldDocSortedHitQueue in 3.x (what's the replacement?) 
contains this special case:

{code}
} else {
  c = docA.fields[i].compareTo(docB.fields[i]);
  if (type == SortField.SCORE) {
c = -c;
  }
}
{code}

In trunk it's gone, so we can maybe fix this stupidness. The Comparable 
returned by RelevanceComparator (used with SortField.SCORE) should simply be 
negative? Else we have to add this special case in your TopDocs.merge, too.

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0

2011-06-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047478#comment-13047478
 ] 

Michael McCandless commented on SOLR-2564:
--

Hmm, I think this only needs to be a 4.0 blocker if we commit SOLR-2524 (3.x 
Solr grouping) first.

But at this point, since we are close on this issue, it looks like we should 
hold SOLR-2524 until we commit this, then backport & commit to 3.x.

> Integrating grouping module into Solr 4.0
> -
>
> Key: SOLR-2564
> URL: https://issues.apache.org/jira/browse/SOLR-2564
> Project: Solr
>  Issue Type: Improvement
>Reporter: Martijn van Groningen
>Assignee: Martijn van Groningen
>Priority: Blocker
> Fix For: 4.0
>
> Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, 
> SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch
>
>
> Since work on grouping module is going well. I think it is time to wire this 
> up in Solr.
> Besides the current grouping features Solr provides, Solr will then also 
> support second pass caching and total count based on groups.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047479#comment-13047479
 ] 

Uwe Schindler commented on LUCENE-3191:
---

By the way, in current trunk the value() method in FieldComparator is obsolete 
and slows down search, if the field values are not needed. But of course, this 
patch makes use of it again, but we should correct it.

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations

2011-06-10 Thread James Dyer (JIRA)

Context-Sensitive Spelling Suggestions & Collations
---

 Key: SOLR-2585
 URL: https://issues.apache.org/jira/browse/SOLR-2585
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor


Solr currently cannot offer what I'm calling here a "context-sensitive" 
spelling suggestion.  That is, if a user enters one or more words that have 
docFrequency > 0, but nevertheless are misspelled, then no suggestions are 
offered.  Currently, Solr will always consider a word "correctly spelled" if it 
is in the index and/or dictionary, regardless of context.  This issue & patch 
add support for context-sensitive spelling suggestions. 

See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical use 
case for this functionality.  This tests both using IndexBasedSepllChecker and 
DirectSolrSpellChecker. 

Two new Spelling Parameters were added:
  - spellcheck.alternativeTermCount - The count of suggestions to return for 
each query term existing in the index and/or dictionary.  Presumably, users 
will want fewer suggestions for words with docFrequency>0.  Also setting this 
value turns "on" context-sensitive spell suggestions. 
  - spellcheck.maxResultsForSuggest - The maximum number of hits the request 
can return in order to both generate spelling suggestions and set the 
"correctlySpelled" element to "false".  For example, if this is set to 5 and 
the user's query returns 5 or fewer results, the spellchecker will report 
"correctlySpelled=false" and also offer suggestions (and collations if 
requested).  Setting this greater than zero is useful for creating 
"did-you-mean" suggestions for queries that return a low number of hits.

I have also included a test using shards.  See additions to 
DistributedSpellCheckComponentTest. 

In Lucene, SpellChecker.java can already support this functionality (by passing 
a null IndexReader and field-name).  The DirectSpellChecker, however, needs a 
minor enhancement.  This gives the option to allow DirectSpellChecker to return 
suggestions for all query terms regardless of frequency.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations

2011-06-10 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2585:
-

Attachment: SOLR-2585.patch

> Context-Sensitive Spelling Suggestions & Collations
> ---
>
> Key: SOLR-2585
> URL: https://issues.apache.org/jira/browse/SOLR-2585
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2585.patch
>
>
> Solr currently cannot offer what I'm calling here a "context-sensitive" 
> spelling suggestion.  That is, if a user enters one or more words that have 
> docFrequency > 0, but nevertheless are misspelled, then no suggestions are 
> offered.  Currently, Solr will always consider a word "correctly spelled" if 
> it is in the index and/or dictionary, regardless of context.  This issue & 
> patch add support for context-sensitive spelling suggestions. 
> See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical 
> use case for this functionality.  This tests both using 
> IndexBasedSepllChecker and DirectSolrSpellChecker. 
> Two new Spelling Parameters were added:
>   - spellcheck.alternativeTermCount - The count of suggestions to return for 
> each query term existing in the index and/or dictionary.  Presumably, users 
> will want fewer suggestions for words with docFrequency>0.  Also setting this 
> value turns "on" context-sensitive spell suggestions. 
>   - spellcheck.maxResultsForSuggest - The maximum number of hits the request 
> can return in order to both generate spelling suggestions and set the 
> "correctlySpelled" element to "false".  For example, if this is set to 5 and 
> the user's query returns 5 or fewer results, the spellchecker will report 
> "correctlySpelled=false" and also offer suggestions (and collations if 
> requested).  Setting this greater than zero is useful for creating 
> "did-you-mean" suggestions for queries that return a low number of hits.
> I have also included a test using shards.  See additions to 
> DistributedSpellCheckComponentTest. 
> In Lucene, SpellChecker.java can already support this functionality (by 
> passing a null IndexReader and field-name).  The DirectSpellChecker, however, 
> needs a minor enhancement.  This gives the option to allow DirectSpellChecker 
> to return suggestions for all query terms regardless of frequency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Distributed search capability

2011-06-10 Thread Uwe Schindler

We have still a problem with queries that rewrite depending on index
contents - which was the reason for MTQ's deMorgan bug. If two
MultiTermQueries rewrite to different queries on two shards, the scores are
also not comparable, even with normalized idf. This does not affect
WildCard&Co (because default to constant score), but e.g. Fuzzy will be very
broken multi-sharded. MultiSearcher tried to prevent this by combining all
rewritten queries into one - and was buggy here.

We reinvent MultiSearcher because of this (Mike's code in 3191 is a partly
reincarnation of MultiSearcher), only the buggy combine is missing.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, June 10, 2011 9:57 PM
> To: dev@lucene.apache.org
> Subject: Re: Distributed search capability
> 
> On Fri, Jun 10, 2011 at 1:48 PM, Andrzej Bialecki  wrote:
> > On 6/10/11 6:27 PM, Michael McCandless wrote:
> >>
> >> I'm actually working on something like this, basically a utility
> >> method to merge N TopDocs into 1.  I want to do this for grouping as
> >> well to make it easy to do grouping across shards.
> >
> > Mike,
> >
> > The straightforward merge that is used in Solr suffers from
> > incomparable scores (due to the lack of global IDF). See my slides from
the
> Buzzwords.
> > Since we can handle global IDF in local searchers more easily that in
> > Solr then we can reuse that DfCache trick from MultiSearcher.
> 
> This is cool stuff Andrzej!!
> 
> But, my patch (LUCENE-3191) is aiming for the lower-level problem of just
> the mechanics of merging multiple TopDocs ie, something "above" will
> have to handle "properly" setting scores of the incoming TopDocs (if in
fact
> the search sorts by score).
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 6:06 PM, Uwe Schindler  wrote:
> We have still a problem with queries that rewrite depending on index
> contents - which was the reason for MTQ's deMorgan bug. If two
> MultiTermQueries rewrite to different queries on two shards, the scores are
> also not comparable, even with normalized idf. This does not affect
> WildCard&Co (because default to constant score), but e.g. Fuzzy will be very
> broken multi-sharded. MultiSearcher tried to prevent this by combining all
> rewritten queries into one - and was buggy here.
>

Really? because I see your description of the situation as mixing two
totally different things:
1. a situation where a distributed case returns scores different than
a single node case. Who cares? This should be up to the user to make
the appropriate tradeoffs (e.g. deciding to use distributed IDF or
not, or even different types of caching impls like andrzej hinted at,
or whatever)... but its not "wrong".
2. a situation where your query is A NOT B and it then returns B. This
was the real problem with MultiSearcher, and this is just wrong.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Distributed search capability

2011-06-10 Thread Uwe Schindler

> Really? because I see your description of the situation as mixing two totally
> different things:

They are connected because they follow each other.

> 1. a situation where a distributed case returns scores different than a single
> node case. Who cares? This should be up to the user to make the
> appropriate tradeoffs (e.g. deciding to use distributed IDF or not, or even
> different types of caching impls like andrzej hinted at, or whatever)... but 
> its
> not "wrong".

I just mentioned that for queries that rewrite to different queries on each 
node (like MTQ because TermsEnums are different) will even not produce 
comparable scores with a global IDF - that’s what I wanted to say.

The connection is here: The buggy MultiSearcher tried to prevent this by 
combining the rewritten queries and that caused the deMorgan bug.

> 2. a situation where your query is A NOT B and it then returns B. This was the
> real problem with MultiSearcher, and this is just wrong.

About the merging; when I look at Mikes code:
Except the global IDF and the bug in MTQ, the merging code is identical to what 
MultiSearcher did before. I would in trunk even recommend to undelete 
FieldDocSortedHitQueue and you have everything you need to merge two TopDocs 
instances.

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 6:49 PM, Uwe Schindler  wrote:
>> Really? because I see your description of the situation as mixing two totally
>> different things:
>
> They are connected because they follow each other.
>
>> 1. a situation where a distributed case returns scores different than a 
>> single
>> node case. Who cares? This should be up to the user to make the
>> appropriate tradeoffs (e.g. deciding to use distributed IDF or not, or even
>> different types of caching impls like andrzej hinted at, or whatever)... but 
>> its
>> not "wrong".
>
> I just mentioned that for queries that rewrite to different queries on each 
> node (like MTQ because TermsEnums are different) will even not produce 
> comparable scores with a global IDF - that’s what I wanted to say.
>

only 'by default'. but you can configure constant-score-filter-rewrite
and they will be completely comparable if for some reason this bothers
you, so where is the problem (thats your tradeoff to make, which might
be incredibly stupid for some of these MTQs, but you can still do it)

> About the merging; when I look at Mikes code:
> Except the global IDF and the bug in MTQ, the merging code is identical to 
> what MultiSearcher did before. I would in trunk even recommend to undelete 
> FieldDocSortedHitQueue and you have everything you need to merge two TopDocs 
> instances.
>

again, where is the bug in MTQ?
'stuff being different without special intervention' in the
distributed versus single node case isn't a bug, thats my point.
we need to separate what is a bug (flat out wrong, like the
multisearcher demorgan thing) from scores being slightly different by
default.

and if was really the case that the multisearcher 'flat out wrong bug'
was actually created for some theoretical equal-scores-in-all-cases
perfection, man what a bad tradeoff that was!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Distributed search capability

2011-06-10 Thread Uwe Schindler

> > About the merging; when I look at Mikes code:
> > Except the global IDF and the bug in MTQ, the merging code is identical to
> what MultiSearcher did before. I would in trunk even recommend to
> undelete FieldDocSortedHitQueue and you have everything you need to
> merge two TopDocs instances.
> >
> 
> again, where is the bug in MTQ?

Sorry a bug in my mail, I meant MultiSearcher, man :(

> 'stuff being different without special intervention' in the distributed versus
> single node case isn't a bug, thats my point.
> we need to separate what is a bug (flat out wrong, like the multisearcher
> demorgan thing) from scores being slightly different by default.
> 
> and if was really the case that the multisearcher 'flat out wrong bug'
> was actually created for some theoretical equal-scores-in-all-cases
> perfection, man what a bad tradeoff that was!

I agree.

And I still tend to undelete FieldDocSortedHitQueue as it merged TopDocs and 
LUCENE-3191 will get a very small patch. :-)

That's all I wanted to say and already discussed it with Mike on IRC: 
http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-06-10#l235
All other parts on JIRA.

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047748#comment-13047748
 ] 

Uwe Schindler commented on LUCENE-3191:
---

We had some discussions about cleaning this up in IRC: 
[http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-06-10#l235]

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3191:
---

Attachment: LUCENE-3191.patch

New patch:

  * Changes .value from Comparator (which is trappy because you think
you're free to .compareTo them) to parameterized type passed to
FieldComparator.

  * Renames .compare -> .compareValues, which are now type checked w/
generic.

  * Changes FieldDoc.fields from Comparable[] to Object[]

Will need to work out how we backport this to 3.x; the change from
Comparable to Object is an API break, though... maybe not many apps
are using FieldDoc.field.


> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-3.x - Build # 401 - Still Failing

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-3.x/401/

No tests ran.

Build Log (for compile errors):
[...truncated 9153 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 7:08 PM, Uwe Schindler  wrote:

> And I still tend to undelete FieldDocSortedHitQueue as it merged TopDocs and 
> LUCENE-3191 will get a very small patch. :-)
>

just please don't resurrect any Collators here in trunk!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Distributed search capability

2011-06-10 Thread Uwe Schindler

> > And I still tend to undelete FieldDocSortedHitQueue as it merged
> > TopDocs and LUCENE-3191 will get a very small patch. :-)
> >
> 
> just please don't resurrect any Collators here in trunk!

Of course without them :-)


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3191:
---

Attachment: LUCENE-3191.patch

New patch, adds default impl for FC.compareValues to just cast to Comparable 
and call .compareTo.  All but 2 places just use this default...

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs

2011-06-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047781#comment-13047781
 ] 

Uwe Schindler commented on LUCENE-3191:
---

Looks fine, I am happy now :-)

The RelevanceComparator should use simply to minimize unboxing:

{code}
+public int compareValues(Float first, Float second) {
+  return second.compareTo(first); // reverse!
+}
{code}

Will review more closely tomorrow!

> Add TopDocs.merge to merge multiple TopDocs
> ---
>
> Key: LUCENE-3191
> URL: https://issues.apache.org/jira/browse/LUCENE-3191
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.3, 4.0
>
> Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch
>
>
> It's not easy today to merge TopDocs, eg produced by multiple shards,
> supporting arbitrary Sort.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser

2011-06-10 Thread Vinicius Barros (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047788#comment-13047788
 ] 

Vinicius Barros commented on LUCENE-1768:
-

Hi Uwe,

Thanks for reviewing the patch again. I will fix the problems you mentioned.

I do not think the code is ready to be committed, I am just sending the patches 
so you can keep track of my progress. I hope to have something useable soon, 
then you can commit, probably before the end of gsoc.

> NumericRange support for new query parser
> -
>
> Key: LUCENE-1768
> URL: https://issues.apache.org/jira/browse/LUCENE-1768
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/queryparser
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Adriano Crestani
>  Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: week1.patch, week2.patch
>
>
> It would be good to specify some type of "schema" for the query parser in 
> future, to automatically create NumericRangeQuery for different numeric 
> types? It would then be possible to index a numeric value 
> (double,float,long,int) using NumericField and then the query parser knows, 
> which type of field this is and so it correctly creates a NumericRangeQuery 
> for strings like "[1.567..*]" or "(1.787..19.5]".
> There is currently no way to extract if a field is numeric from the index, so 
> the user will have to configure the FieldConfig objects in the ConfigHandler. 
> But if this is done, it will not be that difficult to implement the rest.
> The only difference between the current handling of RangeQuery is then the 
> instantiation of the correct Query type and conversion of the entered numeric 
> values (simple Number.valueOf(...) cast of the user entered numbers). 
> Evenerything else is identical, NumericRangeQuery also supports the MTQ 
> rewrite modes (as it is a MTQ).
> Another thing is a change in Date semantics. There are some strange flags in 
> the current parser that tells it how to handle dates.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert

2011-06-10 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047793#comment-13047793
 ] 

Koji Sekiguchi commented on SOLR-2584:
--

Or we can implement the function in the new update processor and place it after 
uima update processor in the chain.

Anyway I wish I could have the function.

> Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on 
> insert
> --
>
> Key: SOLR-2584
> URL: https://issues.apache.org/jira/browse/SOLR-2584
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.3, 4.0
>Reporter: Elmer Garduno
>Priority: Minor
>  Labels: uima
>
> Hi folks, 
> I think that UIMAUpdateRequestProcessor should have a parameter to avoid 
> duplicate values on the updated field. 
> A typical use case is:
> If you are using DictionaryAnnotator and there is a term that matches more 
> than once it will be added two times in the mapped field. I think that we 
> should add a parameter to avoid inserting duplicates as we are not preserving 
> information on the position of the annotation. 
> What do you think about it? I've already implemented this for branch 3x I'm 
> writing some tests and I will submit a patch.
> Regards

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: commit-check target for ant?

2011-06-10 Thread Chris Hostetter


: We could compile under 1.6 and have a jenkins (ant) target to check
: binary 1.5 compatibility -- bytecode versions and std. API calls. This
: way we could use 1.6, @Override annotations, etc. and still ensure 1.5
: support for folks that need it.

-0 ... wouldn't that mean that users running *actual* 1.5 JVM installs 
couldn't compile from source?  I think it would be a bad idea to say that 
our "compile" JVM requirements are differnet then our "run" JVM 
requirements.  I'd be more in favor of requireing 1.6 then having a weird 
build where 1.5 folks could use our binary releases but not compile 
themselves.

As far as your original question...

: >> javadoc parsing enabled). Can I propose that we add an aggregator
: >> target that will aggregate checks happening on jenkins (is there one
: >> already)? I'm thinking of a dummy target like:

lucene/build.xml has a target called "nightly" but it's not actually used 
directly.  The number of things hudson does has gotten kind of complicated...

http://svn.apache.org/repos/asf/lucene/dev/nightly/

http://svn.apache.org/repos/asf/lucene/dev/nightly/hudson-lucene-trunk.sh

...multiple invocations of ant are run, and some things are checked for 
in the shell script. (ie: nocommit).

But as is the "nightly" target should be a superset of all the ant targets 
that hudson does run (nightly) so it's still useable (although it would 
be nice to move that nocommit test into it).

: >> 

we could update the "nightly" target to include "clean" ... but i don't 
know that that's really a good idea. 


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: commit-check target for ant?

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 8:45 PM, Chris Hostetter
 wrote:
>
> : We could compile under 1.6 and have a jenkins (ant) target to check
> : binary 1.5 compatibility -- bytecode versions and std. API calls. This
> : way we could use 1.6, @Override annotations, etc. and still ensure 1.5
> : support for folks that need it.
>
> -0 ... wouldn't that mean that users running *actual* 1.5 JVM installs
> couldn't compile from source?  I think it would be a bad idea to say that
> our "compile" JVM requirements are differnet then our "run" JVM
> requirements.  I'd be more in favor of requireing 1.6 then having a weird
> build where 1.5 folks could use our binary releases but not compile
> themselves.

I agree with you completely (a rare occasion, i know!). I think it
would be better to just require 1.6 if we want to go that route.
Personally I find myself often breaking the build because the solr
bits require 1.6 but the lucene bits require 1.5 (and see below)

> lucene/build.xml has a target called "nightly" but it's not actually used
> directly.  The number of things hudson does has gotten kind of complicated...
>

Just for reference, there are two reasons for this:
1. lack of a java 5 JRE on freebsd that is capable of actually
compiling+running all of our tests. I've tried every option here,
including the linux one under emulation, but its fair to say without
bragging that our tests do actually stress a JRE a little bit, so its
really gotta be solid.
2. using java 6 with -source/-target for java 5 doesn't actually catch
invalid java 5, especially @Override on interface methods. However,
the native java 6 port to freebsd is quite solid (passes all of our
tests), and is actually being maintained and improved. Java 5 is dead
technology and I don't think anyone is working on solving #1.

Because of this, we must compile the lucene/modules bits with the java
5 COMPILER, but the solr bits with java 6 COMPILER to catch all
compile errors, then run all tests with java 6... currently the java5
compiler we have on hudson is only useful really for its "javac".

This is why hudson is complicated.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: commit-check target for ant?

2011-06-10 Thread Chris Hostetter


: I agree with you completely (a rare occasion, i know!). I think it

I don't think it's that rare -- i suspect we agree 80% of the time, but 
don't notice due to silent concenses.

: This is why hudson is complicated.

right .. no complaint from me, just explaining why we have a "nightly" 
target that isn't used ... but as a "do as close as possible to what the 
nightly build will do given my current JVM" it should work as is.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8775 - Failure

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8775/

All tests passed

Build Log (for compile errors):
[...truncated 15394 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: managing CHANGES.txt?

2011-06-10 Thread Robert Muir

On Fri, Jun 10, 2011 at 9:31 PM, Chris Hostetter
 wrote:
>
> Buf for bug fixes we really need to deal with this in a better way.  We
> need to track *every* bug fix made on a branch, even if they were
> backported to an earlier branch.
>

I think we have?

bugfixes are the only case (assuming we go with the plan where we
don't go back in time and issue 3.x feature releases after we release
4.x, etc) where we "go backwards".

I'll pick LUCENE-3042 as a random one.

Its in Lucene 3.2's CHANGES.txt
Its in branch-3.0's CHANGES.txt
its in branch-2.9's CHANGES.txt

its not in trunk's CHANGES.txt, since its fixed in a non-bugfix
release before 4.0 will be released.

In short, I don't think there is any problem... and as far back as I
can see, this is exactly how we have been handling all bugfixes with
the 2.9.x and 3.0.x bugfix releases.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk - Build # 1588 - Still Failing

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-trunk/1588/

No tests ran.

Build Log (for compile errors):
[...truncated 8001 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8772 - Failure

2011-06-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8772/

4 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexReader.testFilesOpenClose

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/TestIndexReader.testFilesOpenClose5529955374tmp/_0_0.tib
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/TestIndexReader.testFilesOpenClose5529955374tmp/_0_0.tib
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:416)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:375)
at 
org.apache.lucene.index.codecs.BlockTermsWriter.(BlockTermsWriter.java:75)
at 
org.apache.lucene.index.codecs.mocksep.MockSepCodec.fieldsConsumer(MockSepCodec.java:78)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:73)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:61)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:58)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:80)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:79)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:459)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:421)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:548)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2791)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2768)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1050)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1014)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:978)
at 
org.apache.lucene.index.TestIndexReader.testFilesOpenClose(TestIndexReader.java:580)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)


REGRESSION:  org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety

Error Message:
Error occurred in thread Thread-73: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test6032260336tmp/_y_0.skp
 (Too many open files in system)

Stack Trace:
junit.framework.AssertionFailedError: Error occurred in thread Thread-73:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test6032260336tmp/_y_0.skp
 (Too many open files in system)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321)
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test6032260336tmp/_y_0.skp
 (Too many open files in system)
at 
org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822)


REGRESSION:  org.apache.lucene.index.TestLongPostings.testLongPostings

Error Message:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/longpostings.-10875160908602353455486954872tmp/_7.tvf
 (Too many open files in system)

Stack Trace:
java.io.FileNotFoundException: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/longpostings.-10875160908602353455486954872tmp/_7.tvf
 (Too many open files in system)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:69)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:91)
at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:326)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:415)
at org.apache.lucene.store.Directory.openInput(Directory.java:118)
at 
org.apache.lucene.index.T

[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3188:


Attachment: LUCENE-3188.patch

Patch against branch_3x.

I converted Ivan's test class into a unit test.  Without Ivan's patch, the test 
fails, and with the patch, it succeeds.

Here's the test failure I got without Ivan's patch:

{noformat}
org.apache.lucene.index.TestIndexSplitter,testDeleteThenOptimize
NOTE: reproduce with: ant test -Dtestcase=TestIndexSplitter 
-Dtestmethod=testDeleteThenOptimize 
-Dtests.seed=5250008618328265481:-4070453331991284264
WARNING: test class left thread running: merge thread: _0(3.3):c2/1 into _0 
[optimize]
RESOURCE LEAK: test class left 1 thread(s) running
Exception in thread "Lucene Merge Thread #0" NOTE: test params are: 
locale=es_BO, timezone=Australia/Tasmania
org.apache.lucene.util.ThreadInterruptedException: 
java.lang.InterruptedException: sleep interrupted
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:515)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:513)
... 1 more
NOTE: all tests run in this JVM:
[TestIndexSplitter]
NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.5.0_22 
(64-bit)/cpus=4,threads=2,free=99874080,total=128057344

java.io.IOException: background merge hit exception: _0(3.3):c2/1 into _0 
[optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2536)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2474)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2444)
at 
org.apache.lucene.index.TestIndexSplitter.testDeleteThenOptimize(TestIndexSplitter.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:94)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Caused by: java.io.IOException: MockDirectoryWrapper: file "_0.cfs" is still 
open: cannot overwrite
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:360)
at 
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:167)
at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:137)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4242)
at org.apache.lucene.index.I

[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

2011-06-10 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3188:


Fix Version/s: (was: 3.2)
   (was: 3.0)
   4.0
   3.3
 Assignee: Steven Rowe

> The class from cotrub directory org.apache.lucene.index.IndexSplitter creates 
> a non correct index
> -
>
> Key: LUCENE-3188
> URL: https://issues.apache.org/jira/browse/LUCENE-3188
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.0, 3.2
> Environment: Bug is present for all environments.
> I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.
>Reporter: Ivan Dimitrov Vasilev
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 3.3, 4.0
>
> Attachments: IndexSplitter.java, LUCENE-3188.patch, 
> LUCENE-3188.patch, TestIndexSplitter.java
>
>
> When using the method IndexSplitter.split(File destDir, String[] segs) from 
> the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) 
> it creates an index with segments descriptor file with wrong data. Namely 
> wrong is the number representing the name of segment that would be created 
> next in this index.
> If some of the segments of the index already has this name this results 
> either to impossibility to create new segment or in crating of an corrupted 
> segment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-1232) Add spellchecker example

2011-06-10 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed SOLR-1232.
--

Resolution: Fixed

This was fixed long ago with the /spell request handler.

> Add spellchecker example
> 
>
> Key: SOLR-1232
> URL: https://issues.apache.org/jira/browse/SOLR-1232
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.3
>Reporter: Dominic Mitchell
>Priority: Minor
>
> I got caught out by the wiki documentation last night whilst attempting to 
> add in spellchecker support.  I'm still _relatively_ new, so I didn't quite 
> get the idea that you had to add it in to a requestHandler in order for it to 
> become effective.  I'd like to propose adding a commented out example to 
> {{example/solr/conf/solrconfig.xml}} showing that this needs to be done.
> {noformat}
> diff --git a/example/solr/conf/solrconfig.xml 
> b/example/solr/conf/solrconfig.xml
> index c007d7c..6e42e48 100755
> --- a/example/solr/conf/solrconfig.xml
> +++ b/example/solr/conf/solrconfig.xml
> @@ -412,20 +412,26 @@
>
>  
>   
> explicit
> 
>   
> + 
> + 
>
>  
>  
>  
>

1 2 >

1 - 100 of 104 matches

Mail list logo