[jira] [Commented] (SOLR-1804) Upgrade Carrot2 to 3.2.0
[ https://issues.apache.org/jira/browse/SOLR-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047047#comment-13047047 ] Dawid Weiss commented on SOLR-1804: --- I am not considering changes to my code to be a personal attack, so no worries. And, since I'm a former assembly guy, straightforward code flow is what I understand and relate to -- I use Guava from time to time to catch up with the young and hippie buzzwords (closures, abstraction, you know the like :). And seriously, our 99% of Guava use is for instantiating lists and maps without providing generics (something that the new Java release will be able to infer from the code anyway -- at least if I'm reading commit logs to the javac compiler right). The remaining 1% is for cascading list filters and sorting orders (which, after you get used to them a little bit, work out and read pretty nicely). I'm not by all means saying we should switch to Guava; I used it because I saw it was in global lib/ directory (and this happened after the patch to Carrot2 I believe). > Upgrade Carrot2 to 3.2.0 > > > Key: SOLR-1804 > URL: https://issues.apache.org/jira/browse/SOLR-1804 > Project: Solr > Issue Type: Improvement > Components: contrib - Clustering >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll > Fix For: 3.1, 4.0 > > Attachments: SOLR-1804-carrot2-3.4.0-dev-trunk.patch, > SOLR-1804-carrot2-3.4.0-dev.patch, SOLR-1804-carrot2-3.4.0-libs.zip, > SOLR-1804.patch, carrot2-core-3.4.0-jdk1.5.jar > > > http://project.carrot2.org/release-3.2.0-notes.html > Carrot2 is now LGPL free, which means we should be able to bundle the binary! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2529) DIH update trouble with sql field name "pk"
[ https://issues.apache.org/jira/browse/SOLR-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Gambier resolved SOLR-2529. -- Resolution: Fixed Fix Version/s: 4.0 I've made some test on unreleased version 4.0 and it works well with a deltaQuery like : "SELECT pk AS id FROM..." Where "id" is the name of my primary key in DIH config. TY > DIH update trouble with sql field name "pk" > --- > > Key: SOLR-2529 > URL: https://issues.apache.org/jira/browse/SOLR-2529 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler >Affects Versions: 3.1, 3.2 > Environment: Debian Lenny, JRE 6 >Reporter: Thomas Gambier >Priority: Blocker > Fix For: 4.0 > > > We are unable to use the DIH when database columnName primary key is named > "pk". > The reported solr error is : > "deltaQuery has no column to resolve to declared primary key pk='pk'" > We have made some investigations and found that the DIH have a mistake when > it's looking for the primary key between row's columns list. > private String findMatchingPkColumn(String pk, Map row) { > if (row.containsKey(pk)) > throw new IllegalArgumentException( > String.format("deltaQuery returned a row with null for primary key %s", > pk)); > String resolvedPk = null; > for (String columnName : row.keySet()) { > if (columnName.endsWith("." + pk) || pk.endsWith("." + columnName)) { > if (resolvedPk != null) > throw new IllegalArgumentException( > String.format( > "deltaQuery has more than one column (%s and %s) that might resolve > to declared primary key pk='%s'", > resolvedPk, columnName, pk)); > resolvedPk = columnName; > } > } > if (resolvedPk == null) > throw new IllegalArgumentException( > String.format("deltaQuery has no column to resolve to declared primary > key pk='%s'", pk)); > LOG.info(String.format("Resolving deltaQuery column '%s' to match entity's > declared pk '%s'", resolvedPk, pk)); > return resolvedPk; > } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8743 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8743/ 2 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-122: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test2468724706tmp/_b.fnm (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-122: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test2468724706tmp/_b.fnm (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/test2468724706tmp/_b.fnm (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.index.TestIndexWriterReader.testMergeWarmer Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test3290832921tmp/_8e_3.doc (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/3/test3290832921tmp/_8e_3.doc (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:416) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:375) at org.apache.lucene.index.codecs.mockintblock.MockFixedIntBlockCodec$MockIntFactory.createOutput(MockFixedIntBlockCodec.java:101) at org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:121) at org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:107) at org.apache.lucene.index.codecs.mockintblock.MockFixedIntBlockCodec.fieldsConsumer(MockFixedIntBlockCodec.java:125) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:67) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:55) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:58) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:80) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:75) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:457) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:421) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:313) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:385) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1233) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1214) at org.apache.lucene.index.TestIndexWriterReader.testMergeWarmer(TestIndexWriterReader.java:663) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) Build Log (for compile errors): [...truncated 3393 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8748 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8748/ No tests ran. Build Log (for compile errors): [...truncated 4186 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index - Key: LUCENE-3188 URL: https://issues.apache.org/jira/browse/LUCENE-3188 Project: Lucene - Java Issue Type: Bug Components: modules/other Affects Versions: 3.2, 3.0 Environment: Bug is present for all environments. I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. Reporter: Ivan Dimitrov Vasilev Priority: Minor Fix For: 3.2, 3.0 When using the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Dimitrov Vasilev updated LUCENE-3188: -- Attachment: IndexSplitter.java TestIndexSplitter.java The attached file TestIndexSplitter.java contains test that shows the bug (when running IndexSplitter from contrib) and the fix (when running IndexSplitter that is attached here as a patch) > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Priority: Minor > Fix For: 3.0, 3.2 > > Attachments: IndexSplitter.java, TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3189) TestIndexWriter.testThreadInterruptDeadlock failed (can't reproduce)
TestIndexWriter.testThreadInterruptDeadlock failed (can't reproduce) Key: LUCENE-3189 URL: https://issues.apache.org/jira/browse/LUCENE-3189 Project: Lucene - Java Issue Type: Bug Reporter: selckin trunk: r1134163 ran it a few times with tests.iter=200 and couldn't reproduce, but i believe you like an issue anyway. {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED [junit] [junit] junit.framework.AssertionFailedError: [junit] at org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:1203) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) [junit] [junit] [junit] Tests run: 40, Failures: 1, Errors: 0, Time elapsed: 23.79 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] ERROR: could not read any segments file in directory [junit] java.io.FileNotFoundException: segments_2w [junit] at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:407) [junit] at org.apache.lucene.index.codecs.DefaultSegmentInfosReader.openInput(DefaultSegmentInfosReader.java:112) [junit] at org.apache.lucene.index.codecs.DefaultSegmentInfosReader.read(DefaultSegmentInfosReader.java:45) [junit] at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:257) [junit] at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:287) [junit] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:698) [junit] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) [junit] at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:283) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:311) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:1154) [junit] [junit] CheckIndex FAILED: unexpected exception [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:1154) [junit] IndexReader.open FAILED: unexpected exception [junit] java.io.FileNotFoundException: segments_2w [junit] at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:407) [junit] at org.apache.lucene.index.codecs.DefaultSegmentInfosReader.openInput(DefaultSegmentInfosReader.java:112) [junit] at org.apache.lucene.index.codecs.DefaultSegmentInfosReader.read(DefaultSegmentInfosReader.java:45) [junit] at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:257) [junit] at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:88) [junit] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:698) [junit] at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84) [junit] at org.apache.lucene.index.IndexReader.open(IndexReader.java:500) [junit] at org.apache.lucene.index.IndexReader.open(IndexReader.java:293) [junit] at org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:1161) [junit] - --- [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testThreadInterruptDeadlock -Dtests.seed=6733070832417768606:3130345095020099096 [junit] NOTE: test params are: codec=RandomCodecProvider: {=MockRandom, f6=SimpleText, f7=MockRandom, f8=MockSep, f9=Standard, f1=SimpleText, f0=Standard, f3=Standard, f2=MockSep, f5=Pulsing(freqCutoff=12), f4=MockFixedIntBlock(blockSize=552), c=MockVariableIntBlock(baseBlockSize=43), d9=MockVariableIntBlock(baseBlockSize=43), d8=MockRandom, d5=MockSep, d4=Pulsing(freqCutoff=12), d7=MockFixedIntBlock(blockSize=55 2), d6=MockVariableIntBlock(baseBlockSize=43), d25=MockSep, d0=MockVariableIntBlock(baseBlockSize=43), c29=MockFixedIntBlock(blockSize=552), d24=Pulsing(freqCutoff=12), d1=MockFixedIntBlock(blockSize=552), c28= Standard, d23=SimpleText,
[jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-152: Attachment: LUCENE-152_optimization.patch very minor optimization to avoid a char[] allocation per stemmed word. > [PATCH] KStem for Lucene > > > Key: LUCENE-152 > URL: https://issues.apache.org/jira/browse/LUCENE-152 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Environment: Operating System: other > Platform: Other >Reporter: Otis Gospodnetic >Assignee: Robert Muir >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-152.patch, LUCENE-152_optimization.patch, > kstemTestData.zip, lucid_kstem.tgz > > > September 10th 2003 contributionn from "Sergio Guzman-Lara" > > Original email: > Hi all, > I have ported the kstem stemmer to Java and incorporated it to > Lucene. You can get the source code (Kstem.jar) from the following website: > http://ciir.cs.umass.edu/downloads/ > Just click on "KStem Java Implementation" (you will need to register > your e-mail, for free of course, with the CIIR --Center for Intelligent > Information Retrieval, UMass -- and get an access code). > Content of Kstem.jar: > java/org/apache/lucene/analysis/KStemData1.java > java/org/apache/lucene/analysis/KStemData2.java > java/org/apache/lucene/analysis/KStemData3.java > java/org/apache/lucene/analysis/KStemData4.java > java/org/apache/lucene/analysis/KStemData5.java > java/org/apache/lucene/analysis/KStemData6.java > java/org/apache/lucene/analysis/KStemData7.java > java/org/apache/lucene/analysis/KStemData8.java > java/org/apache/lucene/analysis/KStemFilter.java > java/org/apache/lucene/analysis/KStemmer.java > KStemData1.java, ..., KStemData8.java Contain several lists of words > used by Kstem > KStemmer.java Implements the Kstem algorithm > KStemFilter.java Extends TokenFilter applying Kstem > To compile > unjar the file Kstem.jar to Lucene's "src" directory, and compile it > there. > What is Kstem? > A stemmer designed by Bob Krovetz (for more information see > http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). > Copyright issues > This is open source. The actual license agreement is included at the > top of every source file. > Any comments/questions/suggestions are welcome, > Sergio Guzman-Lara > Senior Research Fellow > CIIR UMass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #148: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/148/ No tests ran. Build Log (for compile errors): [...truncated 8340 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-152: --- Attachment: LUCENE-152_alt.patch why create strings either? > [PATCH] KStem for Lucene > > > Key: LUCENE-152 > URL: https://issues.apache.org/jira/browse/LUCENE-152 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Environment: Operating System: other > Platform: Other >Reporter: Otis Gospodnetic >Assignee: Robert Muir >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, > LUCENE-152_optimization.patch, kstemTestData.zip, lucid_kstem.tgz > > > September 10th 2003 contributionn from "Sergio Guzman-Lara" > > Original email: > Hi all, > I have ported the kstem stemmer to Java and incorporated it to > Lucene. You can get the source code (Kstem.jar) from the following website: > http://ciir.cs.umass.edu/downloads/ > Just click on "KStem Java Implementation" (you will need to register > your e-mail, for free of course, with the CIIR --Center for Intelligent > Information Retrieval, UMass -- and get an access code). > Content of Kstem.jar: > java/org/apache/lucene/analysis/KStemData1.java > java/org/apache/lucene/analysis/KStemData2.java > java/org/apache/lucene/analysis/KStemData3.java > java/org/apache/lucene/analysis/KStemData4.java > java/org/apache/lucene/analysis/KStemData5.java > java/org/apache/lucene/analysis/KStemData6.java > java/org/apache/lucene/analysis/KStemData7.java > java/org/apache/lucene/analysis/KStemData8.java > java/org/apache/lucene/analysis/KStemFilter.java > java/org/apache/lucene/analysis/KStemmer.java > KStemData1.java, ..., KStemData8.java Contain several lists of words > used by Kstem > KStemmer.java Implements the Kstem algorithm > KStemFilter.java Extends TokenFilter applying Kstem > To compile > unjar the file Kstem.jar to Lucene's "src" directory, and compile it > there. > What is Kstem? > A stemmer designed by Bob Krovetz (for more information see > http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). > Copyright issues > This is open source. The actual license agreement is included at the > top of every source file. > Any comments/questions/suggestions are welcome, > Sergio Guzman-Lara > Senior Research Fellow > CIIR UMass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047160#comment-13047160 ] Simon Willnauer commented on SOLR-1431: --- I think this patch looks good, mark I think we should commit this soon. simon > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Noble Paul > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-1431: - Assignee: Mark Miller (was: Noble Paul) > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1431: -- Fix Version/s: 4.0 > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047164#comment-13047164 ] Steven Rowe commented on LUCENE-3188: - Hi Ivan, Your submissions should be in the form of a patch (for an explanation see e.g. http://en.wikipedia.org/wiki/Patch_%28computing%29). To generate a patch, after you make modifications in a locally checked-out Subversion working copy, use the shell command "svn diff" at the top level, and redirect its output to a file named for the issue you want to attach to, with extension ".patch", e.g.: {{svn diff > ../LUCENE-3188.patch}}. Also, when you attached the two files to this issue, you did not click on the radio button next to the text "Grant license to ASF for inclusion in ASF works (as per the Apache License §5)". You must do this for the Lucene project to be able to use code you contribute. When you attach your patch, please click on the radio button indicating you grant license to the ASF. (I haven't looked at your code yet for this reason.) Steve > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Priority: Minor > Fix For: 3.0, 3.2 > > Attachments: IndexSplitter.java, TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1260 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-docvalues-branch/1260/ 2 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-175: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/test/4/test5458054592tmp/_d_3.pos (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-175: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/test/4/test5458054592tmp/_d_3.pos (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-docvalues-branch/checkout/lucene/build/test/4/test5458054592tmp/_d_3.pos (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.store.TestLockFactory.testStressLocksNativeFSLockFactory Error Message: IndexSearcher hit unexpected exceptions Stack Trace: junit.framework.AssertionFailedError: IndexSearcher hit unexpected exceptions at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) at org.apache.lucene.store.TestLockFactory._testStressLocks(TestLockFactory.java:165) at org.apache.lucene.store.TestLockFactory.testStressLocksNativeFSLockFactory(TestLockFactory.java:144) Build Log (for compile errors): [...truncated 3361 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-152) [PATCH] KStem for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-152: Attachment: LUCENE-152_optimization.patch bq. why create strings either? Good point. I assume you mean something like this patch? > [PATCH] KStem for Lucene > > > Key: LUCENE-152 > URL: https://issues.apache.org/jira/browse/LUCENE-152 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Environment: Operating System: other > Platform: Other >Reporter: Otis Gospodnetic >Assignee: Robert Muir >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, > LUCENE-152_optimization.patch, LUCENE-152_optimization.patch, > kstemTestData.zip, lucid_kstem.tgz > > > September 10th 2003 contributionn from "Sergio Guzman-Lara" > > Original email: > Hi all, > I have ported the kstem stemmer to Java and incorporated it to > Lucene. You can get the source code (Kstem.jar) from the following website: > http://ciir.cs.umass.edu/downloads/ > Just click on "KStem Java Implementation" (you will need to register > your e-mail, for free of course, with the CIIR --Center for Intelligent > Information Retrieval, UMass -- and get an access code). > Content of Kstem.jar: > java/org/apache/lucene/analysis/KStemData1.java > java/org/apache/lucene/analysis/KStemData2.java > java/org/apache/lucene/analysis/KStemData3.java > java/org/apache/lucene/analysis/KStemData4.java > java/org/apache/lucene/analysis/KStemData5.java > java/org/apache/lucene/analysis/KStemData6.java > java/org/apache/lucene/analysis/KStemData7.java > java/org/apache/lucene/analysis/KStemData8.java > java/org/apache/lucene/analysis/KStemFilter.java > java/org/apache/lucene/analysis/KStemmer.java > KStemData1.java, ..., KStemData8.java Contain several lists of words > used by Kstem > KStemmer.java Implements the Kstem algorithm > KStemFilter.java Extends TokenFilter applying Kstem > To compile > unjar the file Kstem.jar to Lucene's "src" directory, and compile it > there. > What is Kstem? > A stemmer designed by Bob Krovetz (for more information see > http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). > Copyright issues > This is open source. The actual license agreement is included at the > top of every source file. > Any comments/questions/suggestions are welcome, > Sergio Guzman-Lara > Senior Research Fellow > CIIR UMass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047168#comment-13047168 ] Robert Muir commented on LUCENE-152: it looks good... i think its the same as the patch i uploaded (_alt.patch)... only i used the .append syntactic sugar > [PATCH] KStem for Lucene > > > Key: LUCENE-152 > URL: https://issues.apache.org/jira/browse/LUCENE-152 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Environment: Operating System: other > Platform: Other >Reporter: Otis Gospodnetic >Assignee: Robert Muir >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, > LUCENE-152_optimization.patch, LUCENE-152_optimization.patch, > kstemTestData.zip, lucid_kstem.tgz > > > September 10th 2003 contributionn from "Sergio Guzman-Lara" > > Original email: > Hi all, > I have ported the kstem stemmer to Java and incorporated it to > Lucene. You can get the source code (Kstem.jar) from the following website: > http://ciir.cs.umass.edu/downloads/ > Just click on "KStem Java Implementation" (you will need to register > your e-mail, for free of course, with the CIIR --Center for Intelligent > Information Retrieval, UMass -- and get an access code). > Content of Kstem.jar: > java/org/apache/lucene/analysis/KStemData1.java > java/org/apache/lucene/analysis/KStemData2.java > java/org/apache/lucene/analysis/KStemData3.java > java/org/apache/lucene/analysis/KStemData4.java > java/org/apache/lucene/analysis/KStemData5.java > java/org/apache/lucene/analysis/KStemData6.java > java/org/apache/lucene/analysis/KStemData7.java > java/org/apache/lucene/analysis/KStemData8.java > java/org/apache/lucene/analysis/KStemFilter.java > java/org/apache/lucene/analysis/KStemmer.java > KStemData1.java, ..., KStemData8.java Contain several lists of words > used by Kstem > KStemmer.java Implements the Kstem algorithm > KStemFilter.java Extends TokenFilter applying Kstem > To compile > unjar the file Kstem.jar to Lucene's "src" directory, and compile it > there. > What is Kstem? > A stemmer designed by Bob Krovetz (for more information see > http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). > Copyright issues > This is open source. The actual license agreement is included at the > top of every source file. > Any comments/questions/suggestions are welcome, > Sergio Guzman-Lara > Senior Research Fellow > CIIR UMass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-152) [PATCH] KStem for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047171#comment-13047171 ] Yonik Seeley commented on LUCENE-152: - bq. i think its the same as the patch i uploaded D'oh! I hate that the "All" tab in JIRA isn't selected by default (and hence one doesn't see stuff like file uploads ;-) > [PATCH] KStem for Lucene > > > Key: LUCENE-152 > URL: https://issues.apache.org/jira/browse/LUCENE-152 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Environment: Operating System: other > Platform: Other >Reporter: Otis Gospodnetic >Assignee: Robert Muir >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: LUCENE-152.patch, LUCENE-152_alt.patch, > LUCENE-152_optimization.patch, LUCENE-152_optimization.patch, > kstemTestData.zip, lucid_kstem.tgz > > > September 10th 2003 contributionn from "Sergio Guzman-Lara" > > Original email: > Hi all, > I have ported the kstem stemmer to Java and incorporated it to > Lucene. You can get the source code (Kstem.jar) from the following website: > http://ciir.cs.umass.edu/downloads/ > Just click on "KStem Java Implementation" (you will need to register > your e-mail, for free of course, with the CIIR --Center for Intelligent > Information Retrieval, UMass -- and get an access code). > Content of Kstem.jar: > java/org/apache/lucene/analysis/KStemData1.java > java/org/apache/lucene/analysis/KStemData2.java > java/org/apache/lucene/analysis/KStemData3.java > java/org/apache/lucene/analysis/KStemData4.java > java/org/apache/lucene/analysis/KStemData5.java > java/org/apache/lucene/analysis/KStemData6.java > java/org/apache/lucene/analysis/KStemData7.java > java/org/apache/lucene/analysis/KStemData8.java > java/org/apache/lucene/analysis/KStemFilter.java > java/org/apache/lucene/analysis/KStemmer.java > KStemData1.java, ..., KStemData8.java Contain several lists of words > used by Kstem > KStemmer.java Implements the Kstem algorithm > KStemFilter.java Extends TokenFilter applying Kstem > To compile > unjar the file Kstem.jar to Lucene's "src" directory, and compile it > there. > What is Kstem? > A stemmer designed by Bob Krovetz (for more information see > http://ciir.cs.umass.edu/pubfiles/ir-35.pdf). > Copyright issues > This is open source. The actual license agreement is included at the > top of every source file. > Any comments/questions/suggestions are welcome, > Sergio Guzman-Lara > Senior Research Fellow > CIIR UMass -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3108) Land DocValues on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3108. - Resolution: Fixed Reintegrated, Tested, Committed to trunk in revision 1134311 thanks guys, its in eventually! > Land DocValues on trunk > --- > > Key: LUCENE-3108 > URL: https://issues.apache.org/jira/browse/LUCENE-3108 > Project: Lucene - Java > Issue Type: Task > Components: core/index, core/search, core/store >Affects Versions: CSF branch, 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Fix For: 4.0 > > Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, > LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch > > > Its time to move another feature from branch to trunk. I want to start this > process now while still a couple of issues remain on the branch. Currently I > am down to a single nocommit (javadocs on DocValues.java) and a couple of > testing TODOs (explicit multithreaded tests and unoptimized with deletions) > but I think those are not worth separate issues so we can resolve them as we > go. > The already created issues (LUCENE-3075 and LUCENE-3074) should not block > this process here IMO, we can fix them once we are on trunk. > Here is a quick feature overview of what has been implemented: > * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, > Bytes (fixed / variable size each in sorted, straight and deref variations) > * Integration into Flex-API, Codec provides a > PerDocConsumer->DocValuesConsumer (write) / PerDocValues->DocValues (read) > * By-Default enabled in all codecs except of PreFlex > * Follows other flex-API patterns like non-segment reader throw UOE forcing > MultiPerDocValues if on DirReader etc. > * Integration into IndexWriter, FieldInfos etc. > * Random-testing enabled via RandomIW - injecting random DocValues into > documents > * Basic checks in CheckIndex (which runs after each test) > * FieldComparator for int and float variants (Sorting, currently directly > integrated into SortField, this might go into a separate DocValuesSortField > eventually) > * Extended TestSort for DocValues > * RAM-Resident random access API plus on-disk DocValuesEnum (currently only > sequential access) -> Source.java / DocValuesEnum.java > * Extensible Cache implementation for RAM-Resident DocValues (by-default > loaded into RAM only once and freed once IR is closed) -> SourceCache.java > > PS: Currently the RAM resident API is named Source (Source.java) which seems > too generic. I think we should rename it into RamDocValues or something like > that, suggestion welcome! > Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047177#comment-13047177 ] Mark Miller commented on SOLR-1431: --- I've got to look a little closer here - there was a conflict on trunk - naively just fixed it to compile and now I'm getting errors that are perhaps ip6 related? Need to investigate. {quote} java.lang.IllegalArgumentException: Invalid uri 'http://[::1]:2/solr|localhost:53574/solr/select': escaped absolute path not valid {quote} > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-423) QueryParser differences between Java and .NET
[ https://issues.apache.org/jira/browse/LUCENENET-423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047179#comment-13047179 ] Digy commented on LUCENENET-423: You are right, I used a different date string. .Net seems to parse the date-strings better. I would leave it as is. DIGY > QueryParser differences between Java and .NET > - > > Key: LUCENENET-423 > URL: https://issues.apache.org/jira/browse/LUCENENET-423 > Project: Lucene.Net > Issue Type: Bug >Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4, Lucene.Net 2.9.4g >Reporter: Christopher Currens > > When trying to do a RangeQuery that uses dates in a certain format, .NET > behaves differently from its Java counterpart. The code is the same between > them, but as far as I can tell, it appears that it is a difference in the way > Java parses dates vs how .NET parses dates. To reproduce: > {code:java} > var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, > "FullText", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29)); > var query = queryParser.Parse("Field:[2001-01-17 TO 2001-01-20]"); > {code} > You'll notice that query looks like the old DateField format (eg > "0g1d64542"). If you do the same query in Java (or Luke), you'll notice the > query gets parsed as if it were a RangeQuery of string. AFAIK, Java cannot > parse a string formatted in that way. If you change the string to use / > instead of - in the java, you'll get one that uses DateResolutions and > DateTools.DateToString(). > It seems an appropriate fix for this, if we wanted to keep this behavior > similar to Java, would be to write our own DateTime parser that behaved the > same way to Java's parser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047185#comment-13047185 ] Mark Miller commented on SOLR-1431: --- So my bad - looks like this patch is for 3.x - need to do it for 4 and port back. > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch I think I have successfully threaded IOContext to the codecs and index package wherever required. There might be instances where I have used Context.Default wrongly. I'll begin adding documentation. In NRTCachingDir.doCacheWrite method where IOContext is used if the context has it's OnceMergeInfo field null might lead to a bug ? Should cases like those added to the docs ? > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Heads Up - Index File Format Change on Trunk
Hey folks, I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a byte to FieldInfo. If you are running on trunk you must / should re-index any trunk indexes once you update to the latest trunk. its likely if you open up old trunk (4.0) indexes, you will get an exception related to Read Past EOF. Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047208#comment-13047208 ] Simon Willnauer commented on LUCENE-2878: - it doesn't seem that this issue is worth staying so tight coupled to the bulkpostings branch. I originally did this on bulk postings since it had support for positions in termscorer (new bulk API) where on trunk we don't have positions there at all. Yet, I think bulk postings should be sorted out separately and we should rather move this over to trunk. On trunk we can get rid of all the low level bulk API hacks in the patch. the only thing that is missing here is a TermScorer that can score based on positions / payloads. I think since we have ScoreContext and how this works here in the patch we can simply implement a TermScorer that works on DocsEnumAndPositions and swap it in once positions are requested. I think I can move this over to trunk soon. > Allow Scorer to expose positions and payloads aka. nuke spans > -- > > Key: LUCENE-2878 > URL: https://issues.apache.org/jira/browse/LUCENE-2878 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search >Affects Versions: Bulk Postings branch >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, > LUCENE-2878.patch > > > Currently we have two somewhat separate types of queries, the one which can > make use of positions (mainly spans) and payloads (spans). Yet Span*Query > doesn't really do scoring comparable to what other queries do and at the end > of the day they are duplicating lot of code all over lucene. Span*Queries are > also limited to other Span*Query instances such that you can not use a > TermQuery or a BooleanQuery with SpanNear or anthing like that. > Beside of the Span*Query limitation other queries lacking a quiet interesting > feature since they can not score based on term proximity since scores doesn't > expose any positional information. All those problems bugged me for a while > now so I stared working on that using the bulkpostings API. I would have done > that first cut on trunk but TermScorer is working on BlockReader that do not > expose positions while the one in this branch does. I started adding a new > Positions class which users can pull from a scorer, to prevent unnecessary > positions enums I added ScorerContext#needsPositions and eventually > Scorere#needsPayloads to create the corresponding enum on demand. Yet, > currently only TermQuery / TermScorer implements this API and other simply > return null instead. > To show that the API really works and our BulkPostings work fine too with > positions I cut over TermSpanQuery to use a TermScorer under the hood and > nuked TermSpans entirely. A nice sideeffect of this was that the Position > BulkReading implementation got some exercise which now :) work all with > positions while Payloads for bulkreading are kind of experimental in the > patch and those only work with Standard codec. > So all spans now work on top of TermScorer ( I truly hate spans since today ) > including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother > to implement the other codecs yet since I want to get feedback on the API and > on this first cut before I go one with it. I will upload the corresponding > patch in a minute. > I also had to cut over SpanQuery.getSpans(IR) to > SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk > first but after that pain today I need a break first :). > The patch passes all core tests > (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't > look into the MemoryIndex BulkPostings API yet) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #145: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/145/ No tests ran. Build Log (for compile errors): [...truncated 7512 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2582) Use uniqueKey for error log in UIMAUpdateRequestProcessor
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2582: - Priority: Minor (was: Major) Fix Version/s: 4.0 Assignee: Koji Sekiguchi Issue Type: Improvement (was: Bug) Summary: Use uniqueKey for error log in UIMAUpdateRequestProcessor (was: UIMAUpdateRequestProcessor error handling with small texts) Changed the issue type to improvement because the "bug part" of this issue is duplicate of SOLR-2579, which has been fixed. > Use uniqueKey for error log in UIMAUpdateRequestProcessor > -- > > Key: SOLR-2582 > URL: https://issues.apache.org/jira/browse/SOLR-2582 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.2 >Reporter: Tommaso Teofili >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.3, 4.0 > > > In UIMAUpdateRequestProcessor the catch block in processAdd() method can have > a StringIndexOutOfBoundsException while composing the error message if the > logging field is not set and the text being processed is shorter than 100 > chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2582) Use uniqueKey for error log in UIMAUpdateRequestProcessor
[ https://issues.apache.org/jira/browse/SOLR-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-2582: - Attachment: SOLR-2582.patch > Use uniqueKey for error log in UIMAUpdateRequestProcessor > -- > > Key: SOLR-2582 > URL: https://issues.apache.org/jira/browse/SOLR-2582 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.2 >Reporter: Tommaso Teofili >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: SOLR-2582.patch > > > In UIMAUpdateRequestProcessor the catch block in processAdd() method can have > a StringIndexOutOfBoundsException while composing the error message if the > logging field is not set and the text being processed is shorter than 100 > chars (...append(text.substring(0, 100))...). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch I had messed up the patch using eclipse. This should be ok. > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047234#comment-13047234 ] Simon Willnauer commented on LUCENE-2793: - Varun, your patch doesn't apply cleanly to my latest trunk. i think you should update your local copy again! I have trouble to understand how you use MergeInfo etc. I figure there might be a misunderstanding so here is what I had in mind roughly: {code} Index: lucene/src/java/org/apache/lucene/index/MergePolicy.java === --- lucene/src/java/org/apache/lucene/index/MergePolicy.java(revision 1134335) +++ lucene/src/java/org/apache/lucene/index/MergePolicy.java(working copy) @@ -64,7 +64,7 @@ * subset of segments to be merged as well as whether the * new segment should use the compound file format. */ - public static class OneMerge { + public static class OneMerge extends MergeInfo { SegmentInfo info; // used by IndexWriter boolean optimize; // used by IndexWriter @@ -72,25 +72,26 @@ long mergeGen; // used by IndexWriter boolean isExternal; // used by IndexWriter int maxNumSegmentsOptimize; // used by IndexWriter -public long estimatedMergeBytes; // used by IndexWriter List readers;// used by IndexWriter List readerClones; // used by IndexWriter -public final List segments; -public final int totalDocCount; +public final List segments = new ArrayList(); boolean aborted; Throwable error; boolean paused; public OneMerge(List segments) { + super(getSegments(segments)); +} + +private static int getSegments(List segments) { if (0 == segments.size()) throw new RuntimeException("segments must include at least one segment"); // clone the list, as the in list may be based off original SegmentInfos and may be modified - this.segments = new ArrayList(segments); int count = 0; for(SegmentInfo info : segments) { count += info.docCount; } - totalDocCount = count; + return count; } /** Record that an exception occurred while executing {code} > Directory createOutput and openInput should take an IOContext > - > > Key: LUCENE-2793 > URL: https://issues.apache.org/jira/browse/LUCENE-2793 > Project: Lucene - Java > Issue Type: Improvement > Components: core/store >Reporter: Michael McCandless >Assignee: Varun Thacker > Labels: gsoc2011, lucene-gsoc-11, mentor > Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, > LUCENE-2793.patch, LUCENE-2793.patch > > > Today for merging we pass down a larger readBufferSize than for searching > because we get better performance. > I think we should generalize this to a class (IOContext), which would hold > the buffer size, but then could hold other flags like DIRECT (bypass OS's > buffer cache), SEQUENTIAL, etc. > Then, we can make the DirectIOLinuxDirectory fully usable because we would > only use DIRECT/SEQUENTIAL during merging. > This will require fixing how IW pools readers, so that a reader opened for > merging is not then used for searching, and vice/versa. Really, it's only > all the open file handles that need to be different -- we could in theory > share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047245#comment-13047245 ] Steven Rowe commented on LUCENE-3188: - This is a better Wikipedia article on (source code) patching than the one I gave above: http://en.wikipedia.org/wiki/Patch_%28Unix%29 > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Priority: Minor > Fix For: 3.0, 3.2 > > Attachments: IndexSplitter.java, TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
> but if instead you just run the query against your subs (rewrite()ing > etc locally), and you merge search results... then you shouldn't see > these issues Maybe we should bring back the 'merge results' part of multi searcher without the query rewrite. On Thu, Jun 9, 2011 at 6:43 PM, Robert Muir wrote: > See https://issues.apache.org/jira/browse/LUCENE-2756 for an example > test case (this sort of thing was reported by users several times) > > in this example, the problem is how the queries are executed, the > thing would rewrite against the individual subs, and then call > Query.combine() to form a "super query", and then run this against all > the subs. > > but if instead you just run the query against your subs (rewrite()ing > etc locally), and you merge search results... then you shouldn't see > these issues. > > On Thu, Jun 9, 2011 at 8:03 PM, Jason Rutherglen > wrote: >>> Yes, and rightfully so - it didn't handle properly some query types, so you >>> would actually get wrong results. >> >> That's bad! >> >>> "roll your own (and contribute it back!)" if you are more advanced ;) >> >> Wouldn't "roll your own" basically mean resurrecting the previous >> implementation of MultiSearcher? Ie, what would be different? >> >> On Thu, Jun 9, 2011 at 4:07 PM, Andrzej Bialecki wrote: >>> On 6/10/11 12:10 AM, Jason Rutherglen wrote: Right, if that's not around, one needs to use multi searcher, that's gone too? >>> >>> Yes, and rightfully so - it didn't handle properly some query types, so you >>> would actually get wrong results. >>> >>> For now the answer is "use Solr" if you are less advanced, or "roll your own >>> (and contribute it back!)" if you are more advanced ;) >>> >>> -- >>> Best regards, >>> Andrzej Bialecki <>< >>> ___. ___ ___ ___ _ _ __ >>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >>> ___|||__|| \| || | Embedded Unix, System Integration >>> http://www.sigram.com Contact: info at sigram dot com >>> >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On Fri, Jun 10, 2011 at 12:04 PM, Jason Rutherglen wrote: > > Maybe we should bring back the 'merge results' part of multi searcher > without the query rewrite. > and how exactly? If this was easy, we wouldn't have to do https://issues.apache.org/jira/browse/LUCENE-2837 I looked at this stupid multisearcher problem for way too much time myself and I totally agreed the only proper bugfix was the "nuclear" option (ridding of multisearcher entirely). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] Score(collector) called for each subReader - but not what I need
As I previously tried to explain, I have custom query for some pre-cached terms, which I load into RAM in efficient compressed form. I need this for faster searching and also for much faster faceting. So what I do is process incoming query and replace certain sub-queries with my own "CachedTermQuery" objects, which extend Query. Since these are not per-segment, I only want scorer.Score(collector) called once, not once for each segment in my index. Essentially what happens now if I have a search is it collects the same documents N times, 1 time for each segment. Is there anyway to combine different Scorers/Collectors such that I can control when it enumerates collection by multiple sub-readers, and when not to? This all worked in previous version of Lucene because enumerating sub-indexes (segments) was pushed to a lower level inside Lucene API and not it is elevated to a higher level. Thanks Bob On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: > I found the problem. The problem is that I have a custom "query optimizer", > and that replaces certain TermQuery's within a Boolean query with a custom > Query and this query has its own weight/scorer that retrieves matching > documents from an in-memory cache (and that is not Lucene backed). But it > looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper > which assumes Collect() needs called for multiple segments - so it is adding > a start offset to the doc ID that comes from my custom query implementation. > I looked at the new Collector class and it seems it works the same way > (assumes it needs to set the next index reader with some offset). How can I > make my custom query work with the new API (so that there is basically a > single "segment" in RAM that my query uses, but still other query clauses in > same boolean query use multiple lucene segments)? I am sure that is not > clear and will try to provide more detail soon. > > Thanks > Bob > > > On Jun 9, 2011, at 1:48 PM, Digy wrote: > >> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the >> problem. >> DIGY >> >> -Original Message- >> From: Robert Stewart [mailto:robert_stew...@epam.com] >> Sent: Thursday, June 09, 2011 8:40 PM >> To: >> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >> >> I tried converting index using IndexWriter as follows: >> >> Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+"_2.9", >> new Lucene.Net.Analysis.KeywordAnalyzer()); >> >> writer.SetMaxBufferedDocs(2); >> writer.SetMaxMergeDocs(100); >> writer.SetMergeFactor(2); >> >> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new >> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); >> >> writer.Commit(); >> >> >> That seems to work (I get what looks like a valid index directory at least). >> >> But still when I run some tests using IndexSearcher I get the same problem >> (I get documents in Collect() which are larger than IndexReader.MaxDoc()). >> Any idea what the problem could be? >> >> BTW, this is a problem because I lookup some fields (date ranges, etc.) in >> some custom collectors which filter out documents, and it assumes I dont get >> any documents larger than maxDoc. >> >> Thanks, >> Bob >> >> >> On Jun 9, 2011, at 12:37 PM, Digy wrote: >> >>> One more point, some write operations using Lucene.Net 2.9.2 (add, delete, >>> optimize etc.) upgrades automatically your index to 2.9.2. >>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this >> may >>> result in data loss. >>> >>> DIGY >>> >>> -Original Message- >>> From: Robert Stewart [mailto:robert_stew...@epam.com] >>> Sent: Thursday, June 09, 2011 7:06 PM >>> To: lucene-net-...@lucene.apache.org >>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >>> >>> I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment >>> index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, >> I >>> get IndexOutOfRange exceptions in my collectors. It is giving me document >>> IDs that are larger than maxDoc. >>> >>> My index contains 377831 documents, and IndexReader.MaxDoc() is returning >>> 377831, but I get documents from Collect() with large values (for instance >>> 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If >>> not, is there some way I can convert it (in production we have many >> indexes >>> containing about 200 million docs so I'd rather convert existing indexes >>> than rebuilt them). >>> >>> Thanks >>> Bob= >>> >> >
Re: Distributed search capability
I think we only need to resurrect the merge score/field-docs code, in it's own class. Eg, each sub-node is expected to create it's own score/field-docs, then the merge code is centralized. > Maybe we should bring back the 'merge results' part of multi searcher > without the query rewrite. This is how Solr works today. On Fri, Jun 10, 2011 at 9:09 AM, Robert Muir wrote: > On Fri, Jun 10, 2011 at 12:04 PM, Jason Rutherglen > wrote: >> >> Maybe we should bring back the 'merge results' part of multi searcher >> without the query rewrite. >> > > and how exactly? If this was easy, we wouldn't have to do > https://issues.apache.org/jira/browse/LUCENE-2837 > > I looked at this stupid multisearcher problem for way too much time > myself and I totally agreed the only proper bugfix was the "nuclear" > option (ridding of multisearcher entirely). > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen wrote: > I think we only need to resurrect the merge score/field-docs code, in > it's own class. Eg, each sub-node is expected to create it's own > score/field-docs, then the merge code is centralized. > >> Maybe we should bring back the 'merge results' part of multi searcher >> without the query rewrite. > > This is how Solr works today. > no its not, the problem with multisearcher was it was too low of a level. its fine to have some higher-level class to support this crap, but it shouldnt be some transparent searcher. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Heads Up - Index File Format Change on Trunk
Simon can you also email java-user@ and solr-user@? Seems good to over-communicate when trunk index format changes... Mike McCandless http://blog.mikemccandless.com On Fri, Jun 10, 2011 at 10:01 AM, Simon Willnauer wrote: > Hey folks, > > I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a > byte to FieldInfo. > If you are running on trunk you must / should re-index any trunk > indexes once you update to the latest trunk. > > its likely if you open up old trunk (4.0) indexes, you will get an > exception related to Read Past EOF. > > Simon > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Dimitrov Vasilev updated LUCENE-3188: -- Attachment: LUCENE-3188.patch The file LUCENE-3188.patch contains the needed changes to IndexSplitter to fix this issue. > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Priority: Minor > Fix For: 3.0, 3.2 > > Attachments: IndexSplitter.java, LUCENE-3188.patch, > TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
> its fine to have some higher-level class to support this crap, but it > shouldnt be some transparent searcher. I'll create a patch and post to a Jira. On a side note, for multi threaded calls I noticed there's a lock on the PQ in IndexSearcher, is the performance of that OK? On Fri, Jun 10, 2011 at 9:21 AM, Robert Muir wrote: > On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen > wrote: >> I think we only need to resurrect the merge score/field-docs code, in >> it's own class. Eg, each sub-node is expected to create it's own >> score/field-docs, then the merge code is centralized. >> >>> Maybe we should bring back the 'merge results' part of multi searcher >>> without the query rewrite. >> >> This is how Solr works today. >> > > no its not, the problem with multisearcher was it was too low of a level. > > its fine to have some higher-level class to support this crap, but it > shouldnt be some transparent searcher. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
I'm actually working on something like this, basically a utility method to merge N TopDocs into 1. I want to do this for grouping as well to make it easy to do grouping across shards. Mike McCandless http://blog.mikemccandless.com On Fri, Jun 10, 2011 at 12:25 PM, Jason Rutherglen wrote: >> its fine to have some higher-level class to support this crap, but it >> shouldnt be some transparent searcher. > > I'll create a patch and post to a Jira. > > On a side note, for multi threaded calls I noticed there's a lock on > the PQ in IndexSearcher, is the performance of that OK? > > On Fri, Jun 10, 2011 at 9:21 AM, Robert Muir wrote: >> On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen >> wrote: >>> I think we only need to resurrect the merge score/field-docs code, in >>> it's own class. Eg, each sub-node is expected to create it's own >>> score/field-docs, then the merge code is centralized. >>> Maybe we should bring back the 'merge results' part of multi searcher without the query rewrite. >>> >>> This is how Solr works today. >>> >> >> no its not, the problem with multisearcher was it was too low of a level. >> >> its fine to have some higher-level class to support this crap, but it >> shouldnt be some transparent searcher. >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1431) CommComponent abstracted
[ https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047275#comment-13047275 ] Jason Rutherglen commented on SOLR-1431: I just downloaded http://svn.apache.org/repos/asf/lucene/dev/trunk and applied the patch, and test-core passed. However the patch command mentioned specific hunks, though there was no .rej file. > CommComponent abstracted > > > Key: SOLR-1431 > URL: https://issues.apache.org/jira/browse/SOLR-1431 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 4.0 >Reporter: Jason Rutherglen >Assignee: Mark Miller > Fix For: 4.0 > > Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, > SOLR-1431.patch, SOLR-1431.patch > > > We'll abstract CommComponent in this issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Score(collector) called for each subReader - but not what I need
Have you tried to use Lucene.Net as is, before working on optimizing your code? There are a lot of speed improvements in it since 1.9. There is also a Faceted Search project in contrib. (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search ) DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Friday, June 10, 2011 7:14 PM To: Subject: [Lucene.Net] Score(collector) called for each subReader - but not what I need As I previously tried to explain, I have custom query for some pre-cached terms, which I load into RAM in efficient compressed form. I need this for faster searching and also for much faster faceting. So what I do is process incoming query and replace certain sub-queries with my own "CachedTermQuery" objects, which extend Query. Since these are not per-segment, I only want scorer.Score(collector) called once, not once for each segment in my index. Essentially what happens now if I have a search is it collects the same documents N times, 1 time for each segment. Is there anyway to combine different Scorers/Collectors such that I can control when it enumerates collection by multiple sub-readers, and when not to? This all worked in previous version of Lucene because enumerating sub-indexes (segments) was pushed to a lower level inside Lucene API and not it is elevated to a higher level. Thanks Bob On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: > I found the problem. The problem is that I have a custom "query optimizer", and that replaces certain TermQuery's within a Boolean query with a custom Query and this query has its own weight/scorer that retrieves matching documents from an in-memory cache (and that is not Lucene backed). But it looks like my custom hitcollectors are now wrapped in a HitCollectorWrapper which assumes Collect() needs called for multiple segments - so it is adding a start offset to the doc ID that comes from my custom query implementation. I looked at the new Collector class and it seems it works the same way (assumes it needs to set the next index reader with some offset). How can I make my custom query work with the new API (so that there is basically a single "segment" in RAM that my query uses, but still other query clauses in same boolean query use multiple lucene segments)? I am sure that is not clear and will try to provide more detail soon. > > Thanks > Bob > > > On Jun 9, 2011, at 1:48 PM, Digy wrote: > >> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect the >> problem. >> DIGY >> >> -Original Message- >> From: Robert Stewart [mailto:robert_stew...@epam.com] >> Sent: Thursday, June 09, 2011 8:40 PM >> To: >> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >> >> I tried converting index using IndexWriter as follows: >> >> Lucene.Net.Index.IndexWriter writer = new IndexWriter(TestIndexPath+"_2.9", >> new Lucene.Net.Analysis.KeywordAnalyzer()); >> >> writer.SetMaxBufferedDocs(2); >> writer.SetMaxMergeDocs(100); >> writer.SetMergeFactor(2); >> >> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new >> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); >> >> writer.Commit(); >> >> >> That seems to work (I get what looks like a valid index directory at least). >> >> But still when I run some tests using IndexSearcher I get the same problem >> (I get documents in Collect() which are larger than IndexReader.MaxDoc()). >> Any idea what the problem could be? >> >> BTW, this is a problem because I lookup some fields (date ranges, etc.) in >> some custom collectors which filter out documents, and it assumes I dont get >> any documents larger than maxDoc. >> >> Thanks, >> Bob >> >> >> On Jun 9, 2011, at 12:37 PM, Digy wrote: >> >>> One more point, some write operations using Lucene.Net 2.9.2 (add, delete, >>> optimize etc.) upgrades automatically your index to 2.9.2. >>> But if your index is somehow corrupted(eg, due to some bug in 1.9) this >> may >>> result in data loss. >>> >>> DIGY >>> >>> -Original Message- >>> From: Robert Stewart [mailto:robert_stew...@epam.com] >>> Sent: Thursday, June 09, 2011 7:06 PM >>> To: lucene-net-...@lucene.apache.org >>> Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >>> >>> I have a Lucene index created with Lucene.Net 1.9. I have a multi-segment >>> index (non-optimized). When I run Lucene.Net 2.9.2 on top of that index, >> I >>> get IndexOutOfRange exceptions in my collectors. It is giving me document >>> IDs that are larger than maxDoc. >>> >>> My index contains 377831 documents, and IndexReader.MaxDoc() is returning >>> 377831, but I get documents from Collect() with large values (for instance >>> 379018). Is an index built with Lucene.Net 1.9 compatible with 2.9.2? If >>> not, is there some way I can convert it (in production we have many >> indexes >>> containing about 200 million docs so I'd rather convert e
[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047279#comment-13047279 ] Ivan Dimitrov Vasilev commented on LUCENE-3188: --- Hi Steve, I attached the patch to this issue as required from Apache (or at least I think so :) ). I do not have lot of time now to read in depth Apache docs about the procedures when contributing but I saw on the wiki that I should provide also test cases that show the bug and they should be in form of unit tests. I provided a test case that is not in JUnit form but still works. As I saw when submitted patch the tests do not need granting license so you can use it. I do not have much time now because we are before release and this was one of the bugs that I should fix (and I did it). I guess you are the one who discovered this Splitter. Thank you very much for this you saved me a lot of hard work because in our previous releases we used a class that generated segments descriptor file out of given segments and looking for content of this file was very difficult. > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Priority: Minor > Fix For: 3.0, 3.2 > > Attachments: IndexSplitter.java, LUCENE-3188.patch, > TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
Ok sounds good. Grouping is an interesting distrib case. On Fri, Jun 10, 2011 at 9:27 AM, Michael McCandless wrote: > I'm actually working on something like this, basically a utility > method to merge N TopDocs into 1. I want to do this for grouping as > well to make it easy to do grouping across shards. > > Mike McCandless > > http://blog.mikemccandless.com > > On Fri, Jun 10, 2011 at 12:25 PM, Jason Rutherglen > wrote: >>> its fine to have some higher-level class to support this crap, but it >>> shouldnt be some transparent searcher. >> >> I'll create a patch and post to a Jira. >> >> On a side note, for multi threaded calls I noticed there's a lock on >> the PQ in IndexSearcher, is the performance of that OK? >> >> On Fri, Jun 10, 2011 at 9:21 AM, Robert Muir wrote: >>> On Fri, Jun 10, 2011 at 12:18 PM, Jason Rutherglen >>> wrote: I think we only need to resurrect the merge score/field-docs code, in it's own class. Eg, each sub-node is expected to create it's own score/field-docs, then the merge code is centralized. > Maybe we should bring back the 'merge results' part of multi searcher > without the query rewrite. This is how Solr works today. >>> >>> no its not, the problem with multisearcher was it was too low of a level. >>> >>> its fine to have some higher-level class to support this crap, but it >>> shouldnt be some transparent searcher. >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047291#comment-13047291 ] Uwe Schindler commented on LUCENE-1768: --- Hi Vinicius, if you want the code be committed later, you should check the license box ("Grant license to ASF for inclusion in ASF works (as per the Apache License §5)"), else we will be not able to submit it to the main repository. If you want us to commit the patch only at the end of GSOC, it's enough to check this box in your final submission, but it should be noted, that we may submit minor parts of the work even before (once you are at a state where it is 'useable' and passes existing tests). A second commit could e.g. adding sophisticated tests, and so on. > NumericRange support for new query parser > - > > Key: LUCENE-1768 > URL: https://issues.apache.org/jira/browse/LUCENE-1768 > Project: Lucene - Java > Issue Type: New Feature > Components: core/queryparser >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Adriano Crestani > Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: week1.patch, week2.patch > > > It would be good to specify some type of "schema" for the query parser in > future, to automatically create NumericRangeQuery for different numeric > types? It would then be possible to index a numeric value > (double,float,long,int) using NumericField and then the query parser knows, > which type of field this is and so it correctly creates a NumericRangeQuery > for strings like "[1.567..*]" or "(1.787..19.5]". > There is currently no way to extract if a field is numeric from the index, so > the user will have to configure the FieldConfig objects in the ConfigHandler. > But if this is done, it will not be that difficult to implement the rest. > The only difference between the current handling of RangeQuery is then the > instantiation of the correct Query type and conversion of the entered numeric > values (simple Number.valueOf(...) cast of the user entered numbers). > Evenerything else is identical, NumericRangeQuery also supports the MTQ > rewrite modes (as it is a MTQ). > Another thing is a change in Date semantics. There are some strange flags in > the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047315#comment-13047315 ] Uwe Schindler commented on LUCENE-1768: --- One small thing I have seen after applying your patch: The code guidelines of Lucene require no TABS but two whitespace to indent. We have a code style available for Eclipse and IDEA in the dev-tools folder (below trunk). You only have to install it. > NumericRange support for new query parser > - > > Key: LUCENE-1768 > URL: https://issues.apache.org/jira/browse/LUCENE-1768 > Project: Lucene - Java > Issue Type: New Feature > Components: core/queryparser >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Adriano Crestani > Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: week1.patch, week2.patch > > > It would be good to specify some type of "schema" for the query parser in > future, to automatically create NumericRangeQuery for different numeric > types? It would then be possible to index a numeric value > (double,float,long,int) using NumericField and then the query parser knows, > which type of field this is and so it correctly creates a NumericRangeQuery > for strings like "[1.567..*]" or "(1.787..19.5]". > There is currently no way to extract if a field is numeric from the index, so > the user will have to configure the FieldConfig objects in the ConfigHandler. > But if this is done, it will not be that difficult to implement the rest. > The only difference between the current handling of RangeQuery is then the > instantiation of the correct Query type and conversion of the entered numeric > values (simple Number.valueOf(...) cast of the user entered numbers). > Evenerything else is identical, NumericRangeQuery also supports the MTQ > rewrite modes (as it is a MTQ). > Another thing is a change in Date semantics. There are some strange flags in > the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On 6/10/11 6:27 PM, Michael McCandless wrote: I'm actually working on something like this, basically a utility method to merge N TopDocs into 1. I want to do this for grouping as well to make it easy to do grouping across shards. Mike, The straightforward merge that is used in Solr suffers from incomparable scores (due to the lack of global IDF). See my slides from the Buzzwords. Since we can handle global IDF in local searchers more easily that in Solr then we can reuse that DfCache trick from MultiSearcher. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047315#comment-13047315 ] Uwe Schindler edited comment on LUCENE-1768 at 6/10/11 5:48 PM: One small thing I have seen after applying your patch: The code guidelines of Lucene require no TABS but two whitespace to indent. We have a code style available for Eclipse and IDEA in the dev-tools folder (below trunk). You only have to install it. Also you are using Java 6 interface overrides, so the code does not compile with Java 5 (unfortunately this is a bug in Java 6's javac, as it does not complain when in "-source 1.5" mode). In Java 5 compatible code it is not allowed to add @Override to methods implemented for interfaces: {noformat} common.compile-core: [mkdir] Created dir: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java [javac] Compiling 175 source files to C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\build\contrib\queryparser\classes\java [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:182: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\core\nodes\FieldQueryNode.java:187: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\config\NumericFieldConfigListener.java:21: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:17: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:32: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\AbstractRangeQueryNode.java:79: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:20: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:25: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:35: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:52: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\nodes\NumericQueryNode.java:57: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\standard\parser\JavaCharStream.java:367: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] public int getEndColumn() { [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:34: warning: [dep-ann] deprecated name isnt annotated with @Deprecated [javac] int getColumn(); [javac] ^ [javac] C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr2\lucene\contrib\queryparser\src\java\org\apache\lucene\queryParser\surround\parser\CharStream.java:41: warning: [dep-a
Re: Distributed search capability
Out of curiosity, how is DF handled with the new automaton [regex] queries? On Fri, Jun 10, 2011 at 10:48 AM, Andrzej Bialecki wrote: > On 6/10/11 6:27 PM, Michael McCandless wrote: >> >> I'm actually working on something like this, basically a utility >> method to merge N TopDocs into 1. I want to do this for grouping as >> well to make it easy to do grouping across shards. > > Mike, > > The straightforward merge that is used in Solr suffers from incomparable > scores (due to the lack of global IDF). See my slides from the Buzzwords. > Since we can handle global IDF in local searchers more easily that in Solr > then we can reuse that DfCache trick from MultiSearcher. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Heads Up - Index File Format Change on Trunk
On Fri, Jun 10, 2011 at 6:22 PM, Michael McCandless wrote: > Simon can you also email java-user@ and solr-user@? Seems good to > over-communicate when trunk index format changes... good point, done! > > Mike McCandless > > http://blog.mikemccandless.com > > On Fri, Jun 10, 2011 at 10:01 AM, Simon Willnauer > wrote: >> Hey folks, >> >> I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a >> byte to FieldInfo. >> If you are running on trunk you must / should re-index any trunk >> indexes once you update to the latest trunk. >> >> its likely if you open up old trunk (4.0) indexes, you will get an >> exception related to Read Past EOF. >> >> Simon >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On 6/10/11 7:51 PM, Jason Rutherglen wrote: Out of curiosity, how is DF handled with the new automaton [regex] queries? Automaton is eventually resolved into a list of terms, and the IDF for each term is obtained in the usual way. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Distributed search capability
Hi Jason, Standard MTQ queries have no scoring at all (using ConstantScoreRewrite by default). Exception is FuzzyQuery which has two modes: One using standard BQ TermQuery scoring multiplied with factor calculated from levensthein distance and another one with all TermQueries made constant score and only boosted by levensthein distance. For all MTQ queries you can change the rewrite mode (so you can even rewrite a WildCard query using fuzzy scoring, but that makes no sense at all, because all boost are 1.0). You can also make FuzzyQ constant and respecting all terms that match somehow if you like, the standard is to use a PQ. This is the same in Lucene 3.x. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: Friday, June 10, 2011 7:52 PM > To: dev@lucene.apache.org > Subject: Re: Distributed search capability > > Out of curiosity, how is DF handled with the new automaton [regex] queries? > > On Fri, Jun 10, 2011 at 10:48 AM, Andrzej Bialecki wrote: > > On 6/10/11 6:27 PM, Michael McCandless wrote: > >> > >> I'm actually working on something like this, basically a utility > >> method to merge N TopDocs into 1. I want to do this for grouping as > >> well to make it easy to do grouping across shards. > > > > Mike, > > > > The straightforward merge that is used in Solr suffers from > > incomparable scores (due to the lack of global IDF). See my slides from the > Buzzwords. > > Since we can handle global IDF in local searchers more easily that in > > Solr then we can reuse that DfCache trick from MultiSearcher. > > > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| > > \| || | Embedded Unix, System Integration http://www.sigram.com > > Contact: info at sigram dot com > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > > additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2564: --- Priority: Blocker (was: Major) Marking this issue as a blocker for Solr 4.0 per McCandless comment in SOLR-2524... {quote} That said, the plan is definitely to get Solr 4.0 cutover to the grouping module; it's just a matter of time. I don't think we should ship 4.0 until we've done so. {quote} > Integrating grouping module into Solr 4.0 > - > > Key: SOLR-2564 > URL: https://issues.apache.org/jira/browse/SOLR-2564 > Project: Solr > Issue Type: Improvement >Reporter: Martijn van Groningen >Assignee: Martijn van Groningen >Priority: Blocker > Fix For: 4.0 > > Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, > SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch > > > Since work on grouping module is going well. I think it is time to wire this > up in Solr. > Besides the current grouping features Solr provides, Solr will then also > support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047354#comment-13047354 ] Steven Rowe commented on LUCENE-3188: - {quote} I attached the patch to this issue as required from Apache (or at least I think so ). I do not have lot of time now to read in depth Apache docs about the procedures when contributing but I saw on the wiki that I should provide also test cases that show the bug and they should be in form of unit tests. I provided a test case that is not in JUnit form but still works. As I saw when submitted patch the tests do not need granting license so you can use it. I do not have much time now because we are before release and this was one of the bugs that I should fix (and I did it). {quote} Thanks for reporting and providing a patch. I can take it from here. bq. I guess you are the one who discovered this Splitter. I think you have me confused with someone else :) - Jason Rutherglen wrote it: LUCENE-1959. > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Priority: Minor > Fix For: 3.0, 3.2 > > Attachments: IndexSplitter.java, LUCENE-3188.patch, > TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Score(collector) called for each subReader - but not what I need
No I will try it though. Thanks. Bob On Jun 10, 2011, at 12:37 PM, Digy wrote: > Have you tried to use Lucene.Net as is, before working on optimizing your > code? There are a lot of speed improvements in it since 1.9. > There is also a Faceted Search project in contrib. > (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search > ) > > DIGY > > > > -Original Message- > From: Robert Stewart [mailto:robert_stew...@epam.com] > Sent: Friday, June 10, 2011 7:14 PM > To: > Subject: [Lucene.Net] Score(collector) called for each subReader - but not > what I need > > As I previously tried to explain, I have custom query for some pre-cached > terms, which I load into RAM in efficient compressed form. I need this for > faster searching and also for much faster faceting. So what I do is process > incoming query and replace certain sub-queries with my own "CachedTermQuery" > objects, which extend Query. Since these are not per-segment, I only want > scorer.Score(collector) called once, not once for each segment in my index. > Essentially what happens now if I have a search is it collects the same > documents N times, 1 time for each segment. Is there anyway to combine > different Scorers/Collectors such that I can control when it enumerates > collection by multiple sub-readers, and when not to? This all worked in > previous version of Lucene because enumerating sub-indexes (segments) was > pushed to a lower level inside Lucene API and not it is elevated to a higher > level. > > Thanks > Bob > > > On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: > >> I found the problem. The problem is that I have a custom "query > optimizer", and that replaces certain TermQuery's within a Boolean query > with a custom Query and this query has its own weight/scorer that retrieves > matching documents from an in-memory cache (and that is not Lucene backed). > But it looks like my custom hitcollectors are now wrapped in a > HitCollectorWrapper which assumes Collect() needs called for multiple > segments - so it is adding a start offset to the doc ID that comes from my > custom query implementation. I looked at the new Collector class and it > seems it works the same way (assumes it needs to set the next index reader > with some offset). How can I make my custom query work with the new API (so > that there is basically a single "segment" in RAM that my query uses, but > still other query clauses in same boolean query use multiple lucene > segments)? I am sure that is not clear and will try to provide more detail > soon. >> >> Thanks >> Bob >> >> >> On Jun 9, 2011, at 1:48 PM, Digy wrote: >> >>> Sorry no idea. Maybe optimizing the index with 2.9.2 can help to detect > the >>> problem. >>> DIGY >>> >>> -Original Message- >>> From: Robert Stewart [mailto:robert_stew...@epam.com] >>> Sent: Thursday, June 09, 2011 8:40 PM >>> To: >>> Subject: Re: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? >>> >>> I tried converting index using IndexWriter as follows: >>> >>> Lucene.Net.Index.IndexWriter writer = new > IndexWriter(TestIndexPath+"_2.9", >>> new Lucene.Net.Analysis.KeywordAnalyzer()); >>> >>> writer.SetMaxBufferedDocs(2); >>> writer.SetMaxMergeDocs(100); >>> writer.SetMergeFactor(2); >>> >>> writer.AddIndexesNoOptimize(new Lucene.Net.Store.Directory[] { new >>> Lucene.Net.Store.SimpleFSDirectory(new DirectoryInfo(TestIndexPath)) }); >>> >>> writer.Commit(); >>> >>> >>> That seems to work (I get what looks like a valid index directory at > least). >>> >>> But still when I run some tests using IndexSearcher I get the same > problem >>> (I get documents in Collect() which are larger than > IndexReader.MaxDoc()). >>> Any idea what the problem could be? >>> >>> BTW, this is a problem because I lookup some fields (date ranges, etc.) > in >>> some custom collectors which filter out documents, and it assumes I dont > get >>> any documents larger than maxDoc. >>> >>> Thanks, >>> Bob >>> >>> >>> On Jun 9, 2011, at 12:37 PM, Digy wrote: >>> One more point, some write operations using Lucene.Net 2.9.2 (add, > delete, optimize etc.) upgrades automatically your index to 2.9.2. But if your index is somehow corrupted(eg, due to some bug in 1.9) this >>> may result in data loss. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Thursday, June 09, 2011 7:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] index version compatibility (1.9 to 2.9.2)? I have a Lucene index created with Lucene.Net 1.9. I have a > multi-segment index (non-optimized). When I run Lucene.Net 2.9.2 on top of that > index, >>> I get IndexOutOfRange exceptions in my collectors. It is giving me > document IDs that are larger than maxDoc. My index contains 377831 documents, and IndexReader.MaxDoc() is > returning 377831, but I get d
[jira] [Created] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
TestStressIndexing2 testMultiConfig failure --- Key: LUCENE-3190 URL: https://issues.apache.org/jira/browse/LUCENE-3190 Project: Lucene - Java Issue Type: Bug Reporter: selckin trunk: r1134311 reproducible {code} [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec [junit] [junit] - Standard Error - [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush mem: 395100 active: 65808 [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) [junit] at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 -Dtestmethod=testMultiConfig -Dtests.seed=2571834029692482827:-8116419692655152763 [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 -Dtestmethod=testMultiConfig -Dtests.seed=2571834029692482827:-8116419692655152763 [junit] The following exceptions were thrown by threads: [junit] *** Thread: Thread-0 *** [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: ram was 460908 expected: 408216 flush mem: 395100 active: 65808 [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, timezone=Pacific/Palau [junit] NOTE: all tests run in this JVM: [junit] [TestStressIndexing2] [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 (64-bit)/cpus=8,threads=1,free=133324528,total=158400512 [junit] - --- [junit] Testcase: testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED [junit] r1.numDocs()=17 vs r2.numDocs()=16 [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs r2.numDocs()=16 [junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308) [junit] at org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278) [junit] at org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) [junit] [junit] [junit] Testcase: testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED [junit] Some threads threw uncaught exceptions! [junit] junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) [junit] [junit] [junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert -- Key: SOLR-2584 URL: https://issues.apache.org/jira/browse/SOLR-2584 Project: Solr Issue Type: Improvement Affects Versions: 3.3, 4.0 Reporter: Elmer Garduno Priority: Minor Hi folks, I think that UIMAUpdateRequestProcessor should have a parameter to avoid duplicate values on the updated field. A typical use case is: If you are using DictionaryAnnotator and there is a term that matches more than once it will be added two times in the mapped field. I think that we should add a parameter to avoid inserting duplicates as we are not preserving information on the position of the annotation. What do you think about it? I've already implemented this for branch 3x I'm writing some tests and I will submit a patch. Regards -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
FYI: How to build and start Apache Solr admin app from source with Maven
Hi, guys, FYI: Here is the link to how to build and start Apache Solr admin app from source with Maven just in case you might be interested: http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html Have fun. YH
[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2535: --- Attachment: SOLR-2535_fix_admin_file_handler_for_directory_listings.patch The attached patch fixes this bug and adds new tests for a directory listing and getting a file. This bug was triggered with the introduction of SOLR-2263 in which RawResponseWriter was changed to implement BinaryQueryResponseWriter. This wasn't a problem in and of itself, but the SolrDispatchFilter checks if a response writer is the binary variant and if so calls the write(OutputStream...) variant. But the responses from ShowFileRequestHandler that list directory contents are incompatible with the RawResponseWriter if RawResponseWriter's write(OutputStream...) method is uses, instead of a character based stream. The solution was to move the defaulting of the "raw" response type from ShowFileRequestHandler.init() into into a condition within handleRequestBody() where it knows the response is a file. > In Solr 3.2 and trunk the admin/file handler fails to show directory listings > - > > Key: SOLR-2535 > URL: https://issues.apache.org/jira/browse/SOLR-2535 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1, 3.2, 4.0 > Environment: java 1.6, jetty >Reporter: Peter Wolanin > Fix For: 3.3 > > Attachments: > SOLR-2535_fix_admin_file_handler_for_directory_listings.patch > > > In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted > listing of the conf directory, like: > {noformat} > > 0 name="QTime">1 > > 1274 name="modified">2011-03-06T20:42:54Z > ... > > > {noformat} > I can list the xslt sub-dir using solr/admin/files?file=/xslt > In Solr 3.1.0, both of these fail with a 500 error: > {noformat} > HTTP ERROR 500 > Problem accessing /solr/admin/file/. Reason: > did not find a CONTENT object > java.io.IOException: did not find a CONTENT object > {noformat} > Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 > should still handle directory listings if not file name is given, or if the > file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
Add TopDocs.merge to merge multiple TopDocs --- Key: LUCENE-3191 URL: https://issues.apache.org/jira/browse/LUCENE-3191 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 3.3, 4.0 It's not easy today to merge TopDocs, eg produced by multiple shards, supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3191: --- Attachment: LUCENE-3191.patch Patch. The basic idea is simple (use PQ to find top N across all shards), but, I had to add FieldComparator.compare(Comparable, Comparable). Ie, the FieldComparator should be able to compare the Comparables returned by its value method. > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3191: -- Assignee: Michael McCandless > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On Fri, Jun 10, 2011 at 1:48 PM, Andrzej Bialecki wrote: > On 6/10/11 6:27 PM, Michael McCandless wrote: >> >> I'm actually working on something like this, basically a utility >> method to merge N TopDocs into 1. I want to do this for grouping as >> well to make it easy to do grouping across shards. > > Mike, > > The straightforward merge that is used in Solr suffers from incomparable > scores (due to the lack of global IDF). See my slides from the Buzzwords. > Since we can handle global IDF in local searchers more easily that in Solr > then we can reuse that DfCache trick from MultiSearcher. This is cool stuff Andrzej!! But, my patch (LUCENE-3191) is aiming for the lower-level problem of just the mechanics of merging multiple TopDocs ie, something "above" will have to handle "properly" setting scores of the incoming TopDocs (if in fact the search sorts by score). Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
[ https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047408#comment-13047408 ] Simon Willnauer commented on LUCENE-3190: - I will dig! > TestStressIndexing2 testMultiConfig failure > --- > > Key: LUCENE-3190 > URL: https://issues.apache.org/jira/browse/LUCENE-3190 > Project: Lucene - Java > Issue Type: Bug >Reporter: selckin >Assignee: Simon Willnauer > > trunk: r1134311 > reproducible > {code} > [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 > [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec > [junit] > [junit] - Standard Error - > [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush > mem: 395100 active: 65808 > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) > [junit] at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] The following exceptions were thrown by threads: > [junit] *** Thread: Thread-0 *** > [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: > ram was 460908 expected: 408216 flush mem: 395100 active: 65808 > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) > [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, > f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, > f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, > f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, > f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), > f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, > f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, > timezone=Pacific/Palau > [junit] NOTE: all tests run in this JVM: > [junit] [TestStressIndexing2] > [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 > (64-bit)/cpus=8,threads=1,free=133324528,total=158400512 > [junit] - --- > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] r1.numDocs()=17 vs r2.numDocs()=16 > [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs > r2.numDocs()=16 > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308) > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278) > [junit] at > org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > [junit] > [junit] > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] Some threads threw uncaught exceptions! > [junit] junit.framework.AssertionFailedError: Some threads threw uncaught > exceptions! > [junit] at > org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > [junit] > [junit] > [junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (LUCENE-3190) TestStressIndexing2 testMultiConfig failure
[ https://issues.apache.org/jira/browse/LUCENE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3190: --- Assignee: Simon Willnauer > TestStressIndexing2 testMultiConfig failure > --- > > Key: LUCENE-3190 > URL: https://issues.apache.org/jira/browse/LUCENE-3190 > Project: Lucene - Java > Issue Type: Bug >Reporter: selckin >Assignee: Simon Willnauer > > trunk: r1134311 > reproducible > {code} > [junit] Testsuite: org.apache.lucene.index.TestStressIndexing2 > [junit] Tests run: 1, Failures: 2, Errors: 0, Time elapsed: 0.882 sec > [junit] > [junit] - Standard Error - > [junit] java.lang.AssertionError: ram was 460908 expected: 408216 flush > mem: 395100 active: 65808 > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.assertMemory(DocumentsWriterFlushControl.java:102) > [junit] at > org.apache.lucene.index.DocumentsWriterFlushControl.doAfterDocument(DocumentsWriterFlushControl.java:164) > [junit] at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:380) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) > [junit] at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.indexDoc(TestStressIndexing2.java:723) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:757) > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] NOTE: reproduce with: ant test -Dtestcase=TestStressIndexing2 > -Dtestmethod=testMultiConfig > -Dtests.seed=2571834029692482827:-8116419692655152763 > [junit] The following exceptions were thrown by threads: > [junit] *** Thread: Thread-0 *** > [junit] junit.framework.AssertionFailedError: java.lang.AssertionError: > ram was 460908 expected: 408216 flush mem: 395100 active: 65808 > [junit] at junit.framework.Assert.fail(Assert.java:47) > [junit] at > org.apache.lucene.index.TestStressIndexing2$IndexingThread.run(TestStressIndexing2.java:762) > [junit] NOTE: test params are: codec=RandomCodecProvider: {f33=Standard, > f57=MockFixedIntBlock(blockSize=649), f11=Standard, f41=MockRandom, > f40=Standard, f62=MockRandom, f75=Standard, f73=MockSep, > f29=MockFixedIntBlock(blockSize=649), f83=MockRandom, f66=MockSep, > f49=MockVariableIntBlock(baseBlockSize=9), f72=Pulsing(freqCutoff=7), > f54=Standard, id=MockFixedIntBlock(blockSize=649), f80=MockRandom, > f94=MockSep, f93=Pulsing(freqCutoff=7), f95=Standard}, locale=en_SG, > timezone=Pacific/Palau > [junit] NOTE: all tests run in this JVM: > [junit] [TestStressIndexing2] > [junit] NOTE: Linux 2.6.39-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 > (64-bit)/cpus=8,threads=1,free=133324528,total=158400512 > [junit] - --- > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] r1.numDocs()=17 vs r2.numDocs()=16 > [junit] junit.framework.AssertionFailedError: r1.numDocs()=17 vs > r2.numDocs()=16 > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:308) > [junit] at > org.apache.lucene.index.TestStressIndexing2.verifyEquals(TestStressIndexing2.java:278) > [junit] at > org.apache.lucene.index.TestStressIndexing2.testMultiConfig(TestStressIndexing2.java:124) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > [junit] > [junit] > [junit] Testcase: > testMultiConfig(org.apache.lucene.index.TestStressIndexing2): FAILED > [junit] Some threads threw uncaught exceptions! > [junit] junit.framework.AssertionFailedError: Some threads threw uncaught > exceptions! > [junit] at > org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:603) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) > [junit] > [junit] > [junit] Test org.apache.lucene.index.TestStressIndexing2 FAILED > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira -
RE: [Lucene.Net] Faceting
And yes for 1 & 2. DIGY -Original Message- From: Robert Stewart [mailto:robert_stew...@epam.com] Sent: Friday, June 10, 2011 10:36 PM To: Subject: [Lucene.Net] Faceting I took a brief look at the documentation for faceting in contrib. I did not look at code yet. Do you think it can work for these requirements: 1) Needs to compute facets for fields of more than one value per document (for instance a document may have many company names associated to it). 2) Needs to compute facets over any arbitrary query 3) Needs to be fast: a) I have 100 million docs distributed in about 10 indexes (10 million docs each) and use parallel distributed search and merge b) For some facet fields, we have over 100,000 possible unique values (for example, we have 150,000 possible company values) In our case, we pre-cache compressed doc sets in memory for each unique facet value. if # of values < 1/9 size of index, then we use variable byte encoding of integers, otherwise we use BitArray. These doc sets are then sorted in descending order by document frequency (so more frequent facets are counted first) We open new index "snapshots" every couple minutes and pre-load these facet doc sets into ram each time new snapshot is opened in the background. We use about 32 GB of RAM when fully loaded. At search time we gather all the doc IDs matching search into a BitArray. Then we enumerate all the facet doc sets in desc order by overall doc frequency, and count how many docs in search matched each facet. These facet counts are passed into a priority queue to gather top N counts (such that when the next total count < full priority queue min value, it breaks out of loop, that is why we do it in desc order by total doc freq) We also count # of docs per day over date range for each facet. We also compute facets for about 10 fields during search, and get top 10 facets each. Typically search over 100 million docs including facet counts and per-date counts takes about 1300ms. Our current solution actually works pretty well - but it is a burden on RAM, time to load new snapshots, and extra pressure on GC during busy times. Do you think your current facet implementation can work as above, and should I try to contrib what I have (it would definitely take a little refactoring)? Thanks, Bob On Jun 10, 2011, at 12:37 PM, Digy wrote: > Have you tried to use Lucene.Net as is, before working on optimizing your > code? There are a lot of speed improvements in it since 1.9. > There is also a Faceted Search project in contrib. > (https://cwiki.apache.org/confluence/display/LUCENENET/Simple+Faceted+Search > ) > > DIGY > > > > -Original Message- > From: Robert Stewart [mailto:robert_stew...@epam.com] > Sent: Friday, June 10, 2011 7:14 PM > To: > Subject: [Lucene.Net] Score(collector) called for each subReader - but not > what I need > > As I previously tried to explain, I have custom query for some pre-cached > terms, which I load into RAM in efficient compressed form. I need this for > faster searching and also for much faster faceting. So what I do is process > incoming query and replace certain sub-queries with my own "CachedTermQuery" > objects, which extend Query. Since these are not per-segment, I only want > scorer.Score(collector) called once, not once for each segment in my index. > Essentially what happens now if I have a search is it collects the same > documents N times, 1 time for each segment. Is there anyway to combine > different Scorers/Collectors such that I can control when it enumerates > collection by multiple sub-readers, and when not to? This all worked in > previous version of Lucene because enumerating sub-indexes (segments) was > pushed to a lower level inside Lucene API and not it is elevated to a higher > level. > > Thanks > Bob > > > On Jun 9, 2011, at 4:33 PM, Robert Stewart wrote: > >> I found the problem. The problem is that I have a custom "query > optimizer", and that replaces certain TermQuery's within a Boolean query > with a custom Query and this query has its own weight/scorer that retrieves > matching documents from an in-memory cache (and that is not Lucene backed). > But it looks like my custom hitcollectors are now wrapped in a > HitCollectorWrapper which assumes Collect() needs called for multiple > segments - so it is adding a start offset to the doc ID that comes from my > custom query implementation. I looked at the new Collector class and it > seems it works the same way (assumes it needs to set the next index reader > with some offset). How can I make my custom query work with the new API (so > that there is basically a single "segment" in RAM that my query uses, but > still other query clauses in same boolean query use multiple lucene > segments)? I am sure that is not clear and will try to provide more detail > soon. >> >> Thanks >> Bob >> >> >> On Jun 9, 2011, at 1:48 PM, Digy wrote: >> >>> Sorry no idea. Maybe optimizing the index wi
[jira] [Commented] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047411#comment-13047411 ] Peter Wolanin commented on SOLR-2535: - Thanks for the patch. Is every thing in there related to this bug? Some of it looks like other cleanup. > In Solr 3.2 and trunk the admin/file handler fails to show directory listings > - > > Key: SOLR-2535 > URL: https://issues.apache.org/jira/browse/SOLR-2535 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1, 3.2, 4.0 > Environment: java 1.6, jetty >Reporter: Peter Wolanin > Fix For: 3.3 > > Attachments: > SOLR-2535_fix_admin_file_handler_for_directory_listings.patch > > > In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted > listing of the conf directory, like: > {noformat} > > 0 name="QTime">1 > > 1274 name="modified">2011-03-06T20:42:54Z > ... > > > {noformat} > I can list the xslt sub-dir using solr/admin/files?file=/xslt > In Solr 3.1.0, both of these fail with a 500 error: > {noformat} > HTTP ERROR 500 > Problem accessing /solr/admin/file/. Reason: > did not find a CONTENT object > java.io.IOException: did not find a CONTENT object > {noformat} > Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 > should still handle directory listings if not file name is given, or if the > file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047416#comment-13047416 ] Uwe Schindler commented on LUCENE-3191: --- bq. The basic idea is simple (use PQ to find top N across all shards), but, I had to add FieldComparator.compare(Comparable, Comparable). That makes no sense to me, because Comparable can always compare against each other without a separate comparator. The old MultiSearcher did exactly do this. This is why it returns Comparable. So instead FieldComparator.compare(a, b) just use a.compareTo(b). It's in the responsibility of the Comparator to return a correctly wrapped Comparable. There might only be a bug in RelevanceComparator: Its getValue() method returns a comparable that sorts in wrong order. We have no test for this, so it might never cause a test failure. > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047420#comment-13047420 ] David Smiley commented on SOLR-2535: The relayout of import statements in SolrDisptatchFilter.java is inadvertent. The QueryRequest.java one-liner was a null-check that I felt was an improvement so that I didn't have to pass in an empty params list to QueryRequest's constructor. > In Solr 3.2 and trunk the admin/file handler fails to show directory listings > - > > Key: SOLR-2535 > URL: https://issues.apache.org/jira/browse/SOLR-2535 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1, 3.2, 4.0 > Environment: java 1.6, jetty >Reporter: Peter Wolanin > Fix For: 3.3 > > Attachments: > SOLR-2535_fix_admin_file_handler_for_directory_listings.patch > > > In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted > listing of the conf directory, like: > {noformat} > > 0 name="QTime">1 > > 1274 name="modified">2011-03-06T20:42:54Z > ... > > > {noformat} > I can list the xslt sub-dir using solr/admin/files?file=/xslt > In Solr 3.1.0, both of these fail with a 500 error: > {noformat} > HTTP ERROR 500 > Problem accessing /solr/admin/file/. Reason: > did not find a CONTENT object > java.io.IOException: did not find a CONTENT object > {noformat} > Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 > should still handle directory listings if not file name is given, or if the > file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047440#comment-13047440 ] Michael McCandless commented on LUCENE-3191: Uwe, you are right! Now why didn't I think of that... The returned Comparable should be expected to properly compare itself to any other Comparable returned from FieldComparator.value... so I'll do that and then the patch is nice and small. And no API change for 3.x. > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047472#comment-13047472 ] Michael McCandless commented on LUCENE-3191: So... I started down this path (relying on the returned Comparable from .value to .compareTo themselves, instead of adding new .compare method to FieldComp), but I'm not sure I like it... I had to add a ReverseFloatComparable inside RelevanceComp, since it sorts opposite natural Float sort order by default. But then what this means, for an app that wants to do some sharding, suddenly a TopDocs might contain an instance of this class, whereas now it contains plain Java objects (Float, Integer, etc.). I also don't like that this is splitting up the logic of how relevacne scores compare to one another across two places (RelevanceComp and this new ReverseFloatComparable). I think it'd be better if we keep simple objects in the TopDocs, to keep it easy for apps to serialize themselves (since we don't impl Serializable anymore), and then the front end would invoke RelevanceComparator locally to properly compare the floats. Ie, really FieldComp.value should never have returned Comparable, I think? > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047477#comment-13047477 ] Uwe Schindler commented on LUCENE-3191: --- This still confuses me: bq. There might only be a bug in RelevanceComparator: Its getValue() method returns a comparable that sorts in wrong order. We have no test for this, so it might never cause a test failure. In my opinion, it should return a negative Float object. But as far as I know, there was/is already some special case in the collectors merge code used to merge segment's results (FieldValueHitQueue.fillFields copys the values into the collected docs, but I am not sure if this is still used. The good old deprecated FieldDocSortedHitQueue in 3.x (what's the replacement?) contains this special case: {code} } else { c = docA.fields[i].compareTo(docB.fields[i]); if (type == SortField.SCORE) { c = -c; } } {code} In trunk it's gone, so we can maybe fix this stupidness. The Comparable returned by RelevanceComparator (used with SortField.SCORE) should simply be negative? Else we have to add this special case in your TopDocs.merge, too. > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2564) Integrating grouping module into Solr 4.0
[ https://issues.apache.org/jira/browse/SOLR-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047478#comment-13047478 ] Michael McCandless commented on SOLR-2564: -- Hmm, I think this only needs to be a 4.0 blocker if we commit SOLR-2524 (3.x Solr grouping) first. But at this point, since we are close on this issue, it looks like we should hold SOLR-2524 until we commit this, then backport & commit to 3.x. > Integrating grouping module into Solr 4.0 > - > > Key: SOLR-2564 > URL: https://issues.apache.org/jira/browse/SOLR-2564 > Project: Solr > Issue Type: Improvement >Reporter: Martijn van Groningen >Assignee: Martijn van Groningen >Priority: Blocker > Fix For: 4.0 > > Attachments: LUCENE-2564.patch, SOLR-2564.patch, SOLR-2564.patch, > SOLR-2564.patch, SOLR-2564.patch, SOLR-2564.patch > > > Since work on grouping module is going well. I think it is time to wire this > up in Solr. > Besides the current grouping features Solr provides, Solr will then also > support second pass caching and total count based on groups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047479#comment-13047479 ] Uwe Schindler commented on LUCENE-3191: --- By the way, in current trunk the value() method in FieldComparator is obsolete and slows down search, if the field values are not needed. But of course, this patch makes use of it again, but we should correct it. > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations
Context-Sensitive Spelling Suggestions & Collations --- Key: SOLR-2585 URL: https://issues.apache.org/jira/browse/SOLR-2585 Project: Solr Issue Type: Improvement Components: spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Solr currently cannot offer what I'm calling here a "context-sensitive" spelling suggestion. That is, if a user enters one or more words that have docFrequency > 0, but nevertheless are misspelled, then no suggestions are offered. Currently, Solr will always consider a word "correctly spelled" if it is in the index and/or dictionary, regardless of context. This issue & patch add support for context-sensitive spelling suggestions. See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical use case for this functionality. This tests both using IndexBasedSepllChecker and DirectSolrSpellChecker. Two new Spelling Parameters were added: - spellcheck.alternativeTermCount - The count of suggestions to return for each query term existing in the index and/or dictionary. Presumably, users will want fewer suggestions for words with docFrequency>0. Also setting this value turns "on" context-sensitive spell suggestions. - spellcheck.maxResultsForSuggest - The maximum number of hits the request can return in order to both generate spelling suggestions and set the "correctlySpelled" element to "false". For example, if this is set to 5 and the user's query returns 5 or fewer results, the spellchecker will report "correctlySpelled=false" and also offer suggestions (and collations if requested). Setting this greater than zero is useful for creating "did-you-mean" suggestions for queries that return a low number of hits. I have also included a test using shards. See additions to DistributedSpellCheckComponentTest. In Lucene, SpellChecker.java can already support this functionality (by passing a null IndexReader and field-name). The DirectSpellChecker, however, needs a minor enhancement. This gives the option to allow DirectSpellChecker to return suggestions for all query terms regardless of frequency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2585) Context-Sensitive Spelling Suggestions & Collations
[ https://issues.apache.org/jira/browse/SOLR-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2585: - Attachment: SOLR-2585.patch > Context-Sensitive Spelling Suggestions & Collations > --- > > Key: SOLR-2585 > URL: https://issues.apache.org/jira/browse/SOLR-2585 > Project: Solr > Issue Type: Improvement > Components: spellchecker >Affects Versions: 4.0 >Reporter: James Dyer >Priority: Minor > Attachments: SOLR-2585.patch > > > Solr currently cannot offer what I'm calling here a "context-sensitive" > spelling suggestion. That is, if a user enters one or more words that have > docFrequency > 0, but nevertheless are misspelled, then no suggestions are > offered. Currently, Solr will always consider a word "correctly spelled" if > it is in the index and/or dictionary, regardless of context. This issue & > patch add support for context-sensitive spelling suggestions. > See SpellCheckCollatorTest.testContextSensitiveCollate() for a the typical > use case for this functionality. This tests both using > IndexBasedSepllChecker and DirectSolrSpellChecker. > Two new Spelling Parameters were added: > - spellcheck.alternativeTermCount - The count of suggestions to return for > each query term existing in the index and/or dictionary. Presumably, users > will want fewer suggestions for words with docFrequency>0. Also setting this > value turns "on" context-sensitive spell suggestions. > - spellcheck.maxResultsForSuggest - The maximum number of hits the request > can return in order to both generate spelling suggestions and set the > "correctlySpelled" element to "false". For example, if this is set to 5 and > the user's query returns 5 or fewer results, the spellchecker will report > "correctlySpelled=false" and also offer suggestions (and collations if > requested). Setting this greater than zero is useful for creating > "did-you-mean" suggestions for queries that return a low number of hits. > I have also included a test using shards. See additions to > DistributedSpellCheckComponentTest. > In Lucene, SpellChecker.java can already support this functionality (by > passing a null IndexReader and field-name). The DirectSpellChecker, however, > needs a minor enhancement. This gives the option to allow DirectSpellChecker > to return suggestions for all query terms regardless of frequency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Distributed search capability
We have still a problem with queries that rewrite depending on index contents - which was the reason for MTQ's deMorgan bug. If two MultiTermQueries rewrite to different queries on two shards, the scores are also not comparable, even with normalized idf. This does not affect WildCard&Co (because default to constant score), but e.g. Fuzzy will be very broken multi-sharded. MultiSearcher tried to prevent this by combining all rewritten queries into one - and was buggy here. We reinvent MultiSearcher because of this (Mike's code in 3191 is a partly reincarnation of MultiSearcher), only the buggy combine is missing. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, June 10, 2011 9:57 PM > To: dev@lucene.apache.org > Subject: Re: Distributed search capability > > On Fri, Jun 10, 2011 at 1:48 PM, Andrzej Bialecki wrote: > > On 6/10/11 6:27 PM, Michael McCandless wrote: > >> > >> I'm actually working on something like this, basically a utility > >> method to merge N TopDocs into 1. I want to do this for grouping as > >> well to make it easy to do grouping across shards. > > > > Mike, > > > > The straightforward merge that is used in Solr suffers from > > incomparable scores (due to the lack of global IDF). See my slides from the > Buzzwords. > > Since we can handle global IDF in local searchers more easily that in > > Solr then we can reuse that DfCache trick from MultiSearcher. > > This is cool stuff Andrzej!! > > But, my patch (LUCENE-3191) is aiming for the lower-level problem of just > the mechanics of merging multiple TopDocs ie, something "above" will > have to handle "properly" setting scores of the incoming TopDocs (if in fact > the search sorts by score). > > Mike McCandless > > http://blog.mikemccandless.com > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On Fri, Jun 10, 2011 at 6:06 PM, Uwe Schindler wrote: > We have still a problem with queries that rewrite depending on index > contents - which was the reason for MTQ's deMorgan bug. If two > MultiTermQueries rewrite to different queries on two shards, the scores are > also not comparable, even with normalized idf. This does not affect > WildCard&Co (because default to constant score), but e.g. Fuzzy will be very > broken multi-sharded. MultiSearcher tried to prevent this by combining all > rewritten queries into one - and was buggy here. > Really? because I see your description of the situation as mixing two totally different things: 1. a situation where a distributed case returns scores different than a single node case. Who cares? This should be up to the user to make the appropriate tradeoffs (e.g. deciding to use distributed IDF or not, or even different types of caching impls like andrzej hinted at, or whatever)... but its not "wrong". 2. a situation where your query is A NOT B and it then returns B. This was the real problem with MultiSearcher, and this is just wrong. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Distributed search capability
> Really? because I see your description of the situation as mixing two totally > different things: They are connected because they follow each other. > 1. a situation where a distributed case returns scores different than a single > node case. Who cares? This should be up to the user to make the > appropriate tradeoffs (e.g. deciding to use distributed IDF or not, or even > different types of caching impls like andrzej hinted at, or whatever)... but > its > not "wrong". I just mentioned that for queries that rewrite to different queries on each node (like MTQ because TermsEnums are different) will even not produce comparable scores with a global IDF - that’s what I wanted to say. The connection is here: The buggy MultiSearcher tried to prevent this by combining the rewritten queries and that caused the deMorgan bug. > 2. a situation where your query is A NOT B and it then returns B. This was the > real problem with MultiSearcher, and this is just wrong. About the merging; when I look at Mikes code: Except the global IDF and the bug in MTQ, the merging code is identical to what MultiSearcher did before. I would in trunk even recommend to undelete FieldDocSortedHitQueue and you have everything you need to merge two TopDocs instances. Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On Fri, Jun 10, 2011 at 6:49 PM, Uwe Schindler wrote: >> Really? because I see your description of the situation as mixing two totally >> different things: > > They are connected because they follow each other. > >> 1. a situation where a distributed case returns scores different than a >> single >> node case. Who cares? This should be up to the user to make the >> appropriate tradeoffs (e.g. deciding to use distributed IDF or not, or even >> different types of caching impls like andrzej hinted at, or whatever)... but >> its >> not "wrong". > > I just mentioned that for queries that rewrite to different queries on each > node (like MTQ because TermsEnums are different) will even not produce > comparable scores with a global IDF - that’s what I wanted to say. > only 'by default'. but you can configure constant-score-filter-rewrite and they will be completely comparable if for some reason this bothers you, so where is the problem (thats your tradeoff to make, which might be incredibly stupid for some of these MTQs, but you can still do it) > About the merging; when I look at Mikes code: > Except the global IDF and the bug in MTQ, the merging code is identical to > what MultiSearcher did before. I would in trunk even recommend to undelete > FieldDocSortedHitQueue and you have everything you need to merge two TopDocs > instances. > again, where is the bug in MTQ? 'stuff being different without special intervention' in the distributed versus single node case isn't a bug, thats my point. we need to separate what is a bug (flat out wrong, like the multisearcher demorgan thing) from scores being slightly different by default. and if was really the case that the multisearcher 'flat out wrong bug' was actually created for some theoretical equal-scores-in-all-cases perfection, man what a bad tradeoff that was! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Distributed search capability
> > About the merging; when I look at Mikes code: > > Except the global IDF and the bug in MTQ, the merging code is identical to > what MultiSearcher did before. I would in trunk even recommend to > undelete FieldDocSortedHitQueue and you have everything you need to > merge two TopDocs instances. > > > > again, where is the bug in MTQ? Sorry a bug in my mail, I meant MultiSearcher, man :( > 'stuff being different without special intervention' in the distributed versus > single node case isn't a bug, thats my point. > we need to separate what is a bug (flat out wrong, like the multisearcher > demorgan thing) from scores being slightly different by default. > > and if was really the case that the multisearcher 'flat out wrong bug' > was actually created for some theoretical equal-scores-in-all-cases > perfection, man what a bad tradeoff that was! I agree. And I still tend to undelete FieldDocSortedHitQueue as it merged TopDocs and LUCENE-3191 will get a very small patch. :-) That's all I wanted to say and already discussed it with Mike on IRC: http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-06-10#l235 All other parts on JIRA. Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047748#comment-13047748 ] Uwe Schindler commented on LUCENE-3191: --- We had some discussions about cleaning this up in IRC: [http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-06-10#l235] > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3191: --- Attachment: LUCENE-3191.patch New patch: * Changes .value from Comparator (which is trappy because you think you're free to .compareTo them) to parameterized type passed to FieldComparator. * Renames .compare -> .compareValues, which are now type checked w/ generic. * Changes FieldDoc.fields from Comparable[] to Object[] Will need to work out how we backport this to 3.x; the change from Comparable to Object is an API break, though... maybe not many apps are using FieldDoc.field. > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch, LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-3.x - Build # 401 - Still Failing
Build: https://builds.apache.org/job/Lucene-3.x/401/ No tests ran. Build Log (for compile errors): [...truncated 9153 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Distributed search capability
On Fri, Jun 10, 2011 at 7:08 PM, Uwe Schindler wrote: > And I still tend to undelete FieldDocSortedHitQueue as it merged TopDocs and > LUCENE-3191 will get a very small patch. :-) > just please don't resurrect any Collators here in trunk! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Distributed search capability
> > And I still tend to undelete FieldDocSortedHitQueue as it merged > > TopDocs and LUCENE-3191 will get a very small patch. :-) > > > > just please don't resurrect any Collators here in trunk! Of course without them :-) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3191: --- Attachment: LUCENE-3191.patch New patch, adds default impl for FC.compareValues to just cast to Comparable and call .compareTo. All but 2 places just use this default... > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3191) Add TopDocs.merge to merge multiple TopDocs
[ https://issues.apache.org/jira/browse/LUCENE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047781#comment-13047781 ] Uwe Schindler commented on LUCENE-3191: --- Looks fine, I am happy now :-) The RelevanceComparator should use simply to minimize unboxing: {code} +public int compareValues(Float first, Float second) { + return second.compareTo(first); // reverse! +} {code} Will review more closely tomorrow! > Add TopDocs.merge to merge multiple TopDocs > --- > > Key: LUCENE-3191 > URL: https://issues.apache.org/jira/browse/LUCENE-3191 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.3, 4.0 > > Attachments: LUCENE-3191.patch, LUCENE-3191.patch, LUCENE-3191.patch > > > It's not easy today to merge TopDocs, eg produced by multiple shards, > supporting arbitrary Sort. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047788#comment-13047788 ] Vinicius Barros commented on LUCENE-1768: - Hi Uwe, Thanks for reviewing the patch again. I will fix the problems you mentioned. I do not think the code is ready to be committed, I am just sending the patches so you can keep track of my progress. I hope to have something useable soon, then you can commit, probably before the end of gsoc. > NumericRange support for new query parser > - > > Key: LUCENE-1768 > URL: https://issues.apache.org/jira/browse/LUCENE-1768 > Project: Lucene - Java > Issue Type: New Feature > Components: core/queryparser >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Adriano Crestani > Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: week1.patch, week2.patch > > > It would be good to specify some type of "schema" for the query parser in > future, to automatically create NumericRangeQuery for different numeric > types? It would then be possible to index a numeric value > (double,float,long,int) using NumericField and then the query parser knows, > which type of field this is and so it correctly creates a NumericRangeQuery > for strings like "[1.567..*]" or "(1.787..19.5]". > There is currently no way to extract if a field is numeric from the index, so > the user will have to configure the FieldConfig objects in the ConfigHandler. > But if this is done, it will not be that difficult to implement the rest. > The only difference between the current handling of RangeQuery is then the > instantiation of the correct Query type and conversion of the entered numeric > values (simple Number.valueOf(...) cast of the user entered numbers). > Evenerything else is identical, NumericRangeQuery also supports the MTQ > rewrite modes (as it is a MTQ). > Another thing is a change in Date semantics. There are some strange flags in > the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2584) Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on insert
[ https://issues.apache.org/jira/browse/SOLR-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047793#comment-13047793 ] Koji Sekiguchi commented on SOLR-2584: -- Or we can implement the function in the new update processor and place it after uima update processor in the chain. Anyway I wish I could have the function. > Add a parameter in UIMAUpdateRequestProcessor to avoid duplicated values on > insert > -- > > Key: SOLR-2584 > URL: https://issues.apache.org/jira/browse/SOLR-2584 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.3, 4.0 >Reporter: Elmer Garduno >Priority: Minor > Labels: uima > > Hi folks, > I think that UIMAUpdateRequestProcessor should have a parameter to avoid > duplicate values on the updated field. > A typical use case is: > If you are using DictionaryAnnotator and there is a term that matches more > than once it will be added two times in the mapped field. I think that we > should add a parameter to avoid inserting duplicates as we are not preserving > information on the position of the annotation. > What do you think about it? I've already implemented this for branch 3x I'm > writing some tests and I will submit a patch. > Regards -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: commit-check target for ant?
: We could compile under 1.6 and have a jenkins (ant) target to check : binary 1.5 compatibility -- bytecode versions and std. API calls. This : way we could use 1.6, @Override annotations, etc. and still ensure 1.5 : support for folks that need it. -0 ... wouldn't that mean that users running *actual* 1.5 JVM installs couldn't compile from source? I think it would be a bad idea to say that our "compile" JVM requirements are differnet then our "run" JVM requirements. I'd be more in favor of requireing 1.6 then having a weird build where 1.5 folks could use our binary releases but not compile themselves. As far as your original question... : >> javadoc parsing enabled). Can I propose that we add an aggregator : >> target that will aggregate checks happening on jenkins (is there one : >> already)? I'm thinking of a dummy target like: lucene/build.xml has a target called "nightly" but it's not actually used directly. The number of things hudson does has gotten kind of complicated... http://svn.apache.org/repos/asf/lucene/dev/nightly/ http://svn.apache.org/repos/asf/lucene/dev/nightly/hudson-lucene-trunk.sh ...multiple invocations of ant are run, and some things are checked for in the shell script. (ie: nocommit). But as is the "nightly" target should be a superset of all the ant targets that hudson does run (nightly) so it's still useable (although it would be nice to move that nocommit test into it). : >> we could update the "nightly" target to include "clean" ... but i don't know that that's really a good idea. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: commit-check target for ant?
On Fri, Jun 10, 2011 at 8:45 PM, Chris Hostetter wrote: > > : We could compile under 1.6 and have a jenkins (ant) target to check > : binary 1.5 compatibility -- bytecode versions and std. API calls. This > : way we could use 1.6, @Override annotations, etc. and still ensure 1.5 > : support for folks that need it. > > -0 ... wouldn't that mean that users running *actual* 1.5 JVM installs > couldn't compile from source? I think it would be a bad idea to say that > our "compile" JVM requirements are differnet then our "run" JVM > requirements. I'd be more in favor of requireing 1.6 then having a weird > build where 1.5 folks could use our binary releases but not compile > themselves. I agree with you completely (a rare occasion, i know!). I think it would be better to just require 1.6 if we want to go that route. Personally I find myself often breaking the build because the solr bits require 1.6 but the lucene bits require 1.5 (and see below) > lucene/build.xml has a target called "nightly" but it's not actually used > directly. The number of things hudson does has gotten kind of complicated... > Just for reference, there are two reasons for this: 1. lack of a java 5 JRE on freebsd that is capable of actually compiling+running all of our tests. I've tried every option here, including the linux one under emulation, but its fair to say without bragging that our tests do actually stress a JRE a little bit, so its really gotta be solid. 2. using java 6 with -source/-target for java 5 doesn't actually catch invalid java 5, especially @Override on interface methods. However, the native java 6 port to freebsd is quite solid (passes all of our tests), and is actually being maintained and improved. Java 5 is dead technology and I don't think anyone is working on solving #1. Because of this, we must compile the lucene/modules bits with the java 5 COMPILER, but the solr bits with java 6 COMPILER to catch all compile errors, then run all tests with java 6... currently the java5 compiler we have on hudson is only useful really for its "javac". This is why hudson is complicated. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: commit-check target for ant?
: I agree with you completely (a rare occasion, i know!). I think it I don't think it's that rare -- i suspect we agree 80% of the time, but don't notice due to silent concenses. : This is why hudson is complicated. right .. no complaint from me, just explaining why we have a "nightly" target that isn't used ... but as a "do as close as possible to what the nightly build will do given my current JVM" it should work as is. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8775 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/8775/ All tests passed Build Log (for compile errors): [...truncated 15394 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: managing CHANGES.txt?
On Fri, Jun 10, 2011 at 9:31 PM, Chris Hostetter wrote: > > Buf for bug fixes we really need to deal with this in a better way. We > need to track *every* bug fix made on a branch, even if they were > backported to an earlier branch. > I think we have? bugfixes are the only case (assuming we go with the plan where we don't go back in time and issue 3.x feature releases after we release 4.x, etc) where we "go backwards". I'll pick LUCENE-3042 as a random one. Its in Lucene 3.2's CHANGES.txt Its in branch-3.0's CHANGES.txt its in branch-2.9's CHANGES.txt its not in trunk's CHANGES.txt, since its fixed in a non-bugfix release before 4.0 will be released. In short, I don't think there is any problem... and as far back as I can see, this is exactly how we have been handling all bugfixes with the 2.9.x and 3.0.x bugfix releases. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1588 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1588/ No tests ran. Build Log (for compile errors): [...truncated 8001 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8772 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/8772/ 4 tests failed. REGRESSION: org.apache.lucene.index.TestIndexReader.testFilesOpenClose Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/TestIndexReader.testFilesOpenClose5529955374tmp/_0_0.tib (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/5/TestIndexReader.testFilesOpenClose5529955374tmp/_0_0.tib (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:416) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:293) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:375) at org.apache.lucene.index.codecs.BlockTermsWriter.(BlockTermsWriter.java:75) at org.apache.lucene.index.codecs.mocksep.MockSepCodec.fieldsConsumer(MockSepCodec.java:78) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:73) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:61) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:58) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:80) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:79) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:459) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:421) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:548) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2791) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2768) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1050) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1014) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:978) at org.apache.lucene.index.TestIndexReader.testFilesOpenClose(TestIndexReader.java:580) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) REGRESSION: org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety Error Message: Error occurred in thread Thread-73: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test6032260336tmp/_y_0.skp (Too many open files in system) Stack Trace: junit.framework.AssertionFailedError: Error occurred in thread Thread-73: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test6032260336tmp/_y_0.skp (Too many open files in system) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1403) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1321) /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/2/test6032260336tmp/_y_0.skp (Too many open files in system) at org.apache.lucene.index.TestIndexReaderReopen.testThreadSafety(TestIndexReaderReopen.java:822) REGRESSION: org.apache.lucene.index.TestLongPostings.testLongPostings Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/longpostings.-10875160908602353455486954872tmp/_7.tvf (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/6/longpostings.-10875160908602353455486954872tmp/_7.tvf (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:69) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:90) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:91) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:326) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:415) at org.apache.lucene.store.Directory.openInput(Directory.java:118) at org.apache.lucene.index.T
[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3188: Attachment: LUCENE-3188.patch Patch against branch_3x. I converted Ivan's test class into a unit test. Without Ivan's patch, the test fails, and with the patch, it succeeds. Here's the test failure I got without Ivan's patch: {noformat} org.apache.lucene.index.TestIndexSplitter,testDeleteThenOptimize NOTE: reproduce with: ant test -Dtestcase=TestIndexSplitter -Dtestmethod=testDeleteThenOptimize -Dtests.seed=5250008618328265481:-4070453331991284264 WARNING: test class left thread running: merge thread: _0(3.3):c2/1 into _0 [optimize] RESOURCE LEAK: test class left 1 thread(s) running Exception in thread "Lucene Merge Thread #0" NOTE: test params are: locale=es_BO, timezone=Australia/Tasmania org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:515) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:513) ... 1 more NOTE: all tests run in this JVM: [TestIndexSplitter] NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.5.0_22 (64-bit)/cpus=4,threads=2,free=99874080,total=128057344 java.io.IOException: background merge hit exception: _0(3.3):c2/1 into _0 [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2536) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2474) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2444) at org.apache.lucene.index.TestIndexSplitter.testDeleteThenOptimize(TestIndexSplitter.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:94) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115) Caused by: java.io.IOException: MockDirectoryWrapper: file "_0.cfs" is still open: cannot overwrite at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:360) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:167) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:137) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4242) at org.apache.lucene.index.I
[jira] [Updated] (LUCENE-3188) The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index
[ https://issues.apache.org/jira/browse/LUCENE-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3188: Fix Version/s: (was: 3.2) (was: 3.0) 4.0 3.3 Assignee: Steven Rowe > The class from cotrub directory org.apache.lucene.index.IndexSplitter creates > a non correct index > - > > Key: LUCENE-3188 > URL: https://issues.apache.org/jira/browse/LUCENE-3188 > Project: Lucene - Java > Issue Type: Bug > Components: modules/other >Affects Versions: 3.0, 3.2 > Environment: Bug is present for all environments. > I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine. >Reporter: Ivan Dimitrov Vasilev >Assignee: Steven Rowe >Priority: Minor > Fix For: 3.3, 4.0 > > Attachments: IndexSplitter.java, LUCENE-3188.patch, > LUCENE-3188.patch, TestIndexSplitter.java > > > When using the method IndexSplitter.split(File destDir, String[] segs) from > the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) > it creates an index with segments descriptor file with wrong data. Namely > wrong is the number representing the name of segment that would be created > next in this index. > If some of the segments of the index already has this name this results > either to impossibility to create new segment or in crating of an corrupted > segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-1232) Add spellchecker example
[ https://issues.apache.org/jira/browse/SOLR-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed SOLR-1232. -- Resolution: Fixed This was fixed long ago with the /spell request handler. > Add spellchecker example > > > Key: SOLR-1232 > URL: https://issues.apache.org/jira/browse/SOLR-1232 > Project: Solr > Issue Type: Improvement > Components: documentation >Affects Versions: 1.3 >Reporter: Dominic Mitchell >Priority: Minor > > I got caught out by the wiki documentation last night whilst attempting to > add in spellchecker support. I'm still _relatively_ new, so I didn't quite > get the idea that you had to add it in to a requestHandler in order for it to > become effective. I'd like to propose adding a commented out example to > {{example/solr/conf/solrconfig.xml}} showing that this needs to be done. > {noformat} > diff --git a/example/solr/conf/solrconfig.xml > b/example/solr/conf/solrconfig.xml > index c007d7c..6e42e48 100755 > --- a/example/solr/conf/solrconfig.xml > +++ b/example/solr/conf/solrconfig.xml > @@ -412,20 +412,26 @@ > > > > explicit > > > + > + > > > > >