[jira] [Commented] (LUCENE-3509) Add settings to IWC to optimize IDV indices for CPU or RAM respectivly
[ https://issues.apache.org/jira/browse/LUCENE-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133872#comment-13133872 ] Simon Willnauer commented on LUCENE-3509: - We should expose this via low level DocValues implementation and maybe not via IWC. I think a consistent way would be enableing this in MemoryCodec and use the more ram efficient variant by default. This is just like CFS which is disabled in SepCodec. Add settings to IWC to optimize IDV indices for CPU or RAM respectivly -- Key: LUCENE-3509 URL: https://issues.apache.org/jira/browse/LUCENE-3509 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-3509.patch spinnoff from LUCENE-3496 - we are seeing much better performance if required bits for PackedInts are rounded up to a 8/16/32/64. We should add this option to IWC and default to round up ie. more RAM faster lookups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 720 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/720/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterReader.testAddIndexesAndDoDeletesThreads Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/test/5/test3469534702tmp/_59_4.tim (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/test/5/test3469534702tmp/_59_4.tim (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:99) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:122) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:58) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:433) at org.apache.lucene.index.codecs.BlockTreeTermsReader.init(BlockTreeTermsReader.java:119) at org.apache.lucene.index.codecs.pulsing.PulsingCodec.fieldsProducer(PulsingCodec.java:114) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.init(PerFieldCodecWrapper.java:114) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:182) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:91) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:112) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:91) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:123) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:89) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:699) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84) at org.apache.lucene.index.IndexReader.open(IndexReader.java:536) at org.apache.lucene.index.IndexReader.open(IndexReader.java:329) at org.apache.lucene.index.TestIndexWriterReader.testAddIndexesAndDoDeletesThreads(TestIndexWriterReader.java:395) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:610) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) Build Log (for compile errors): [...truncated 1879 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk - Build # 1709 - Failure
I committed fix. Mike McCandless http://blog.mikemccandless.com On Sun, Oct 23, 2011 at 11:54 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-trunk/1709/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting Error Message: GC overhead limit exceeded Stack Trace: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Integer.toString(Integer.java:329) at org.apache.lucene.index.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:147) at org.apache.lucene.index.codecs.PostingsConsumer.merge(PostingsConsumer.java:97) at org.apache.lucene.index.codecs.TermsConsumer.merge(TermsConsumer.java:114) at org.apache.lucene.index.codecs.FieldsConsumer.merge(FieldsConsumer.java:53) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:573) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3679) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3315) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1969) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1963) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1959) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1566) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1318) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1299) at org.apache.lucene.index.TestIndexWriterDelete.testIndexingThenDeleting(TestIndexWriterDelete.java:924) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:610) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) Build Log (for compile errors): [...truncated 13042 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
Mike, We have an annotation for this... No assume needed anymore. :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: mikemcc...@apache.org [mailto:mikemcc...@apache.org] Sent: Monday, October 24, 2011 1:00 PM To: comm...@lucene.apache.org Subject: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterD elete.java Author: mikemccand Date: Mon Oct 24 11:00:29 2011 New Revision: 1188089 URL: http://svn.apache.org/viewvc?rev=1188089view=rev Log: don't use MemCodec/SimpleText for this test Modified: lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe lete.java Modified: lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe lete.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/luc ene/index/TestIndexWriterDelete.java?rev=1188089r1=1188088r2=118808 9view=diff == --- lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe lete.java (original) +++ lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWr +++ iterDelete.java Mon Oct 24 11:00:29 2011 @@ -23,14 +23,15 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Random; -import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.lucene.analysis.*; import org.apache.lucene.document.Document; import org.apache.lucene.document.FieldType; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; +import org.apache.lucene.index.codecs.CodecProvider; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; @@ -896,6 +897,8 @@ public class TestIndexWriterDelete exten } public void testIndexingThenDeleting() throws Exception { +assumeFalse(This test cannot run with Memory codec, CodecProvider.getDefault().getFieldCodec(field).equals(Memory)); +assumeFalse(This test cannot run with SimpleText codec, + CodecProvider.getDefault().getFieldCodec(field).equals(SimpleText) + ); final Random r = random; Directory dir = newDirectory(); // note this test explicitly disables payloads - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133990#comment-13133990 ] Uwe Schindler commented on LUCENE-1536: --- I will commit this tomorrow, if nobody objects and we will work on further issues to improve Weight.scorer() API, CachingWrapperFilter,... There is no slowdown, only speedups with room to improve. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133994#comment-13133994 ] Robert Muir commented on LUCENE-1536: - +1, lets commit this one and make progress here. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
But then I should break test into new class right? /me was being lazy... and this test only uses the one field... Mike McCandless http://blog.mikemccandless.com On Mon, Oct 24, 2011 at 7:29 AM, Uwe Schindler u...@thetaphi.de wrote: Mike, We have an annotation for this... No assume needed anymore. :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: mikemcc...@apache.org [mailto:mikemcc...@apache.org] Sent: Monday, October 24, 2011 1:00 PM To: comm...@lucene.apache.org Subject: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterD elete.java Author: mikemccand Date: Mon Oct 24 11:00:29 2011 New Revision: 1188089 URL: http://svn.apache.org/viewvc?rev=1188089view=rev Log: don't use MemCodec/SimpleText for this test Modified: lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe lete.java Modified: lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe lete.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apache/luc ene/index/TestIndexWriterDelete.java?rev=1188089r1=1188088r2=118808 9view=diff == --- lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDe lete.java (original) +++ lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWr +++ iterDelete.java Mon Oct 24 11:00:29 2011 @@ -23,14 +23,15 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Random; -import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.lucene.analysis.*; import org.apache.lucene.document.Document; import org.apache.lucene.document.FieldType; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; +import org.apache.lucene.index.codecs.CodecProvider; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; @@ -896,6 +897,8 @@ public class TestIndexWriterDelete exten } public void testIndexingThenDeleting() throws Exception { + assumeFalse(This test cannot run with Memory codec, CodecProvider.getDefault().getFieldCodec(field).equals(Memory)); + assumeFalse(This test cannot run with SimpleText codec, + CodecProvider.getDefault().getFieldCodec(field).equals(SimpleText) + ); final Random r = random; Directory dir = newDirectory(); // note this test explicitly disables payloads - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3509) Add settings to IWC to optimize IDV indices for CPU or RAM respectivly
[ https://issues.apache.org/jira/browse/LUCENE-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134031#comment-13134031 ] Michael McCandless commented on LUCENE-3509: I think enabling at the codec impl level makes sense. But I'd prefer to have the defaulting match what we do for FieldCache, ie default to fasterButMoreRAM. Add settings to IWC to optimize IDV indices for CPU or RAM respectivly -- Key: LUCENE-3509 URL: https://issues.apache.org/jira/browse/LUCENE-3509 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-3509.patch spinnoff from LUCENE-3496 - we are seeing much better performance if required bits for PackedInts are rounded up to a 8/16/32/64. We should add this option to IWC and default to round up ie. more RAM faster lookups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3501) random sampler is not random (and so facet SamplingWrapperTest occasionally fails)
[ https://issues.apache.org/jira/browse/LUCENE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3501. - Resolution: Fixed Fix merged to 3x: 1188129. Thanks Gilad and Shai for helping to fix this. random sampler is not random (and so facet SamplingWrapperTest occasionally fails) -- Key: LUCENE-3501 URL: https://issues.apache.org/jira/browse/LUCENE-3501 Project: Lucene - Java Issue Type: Bug Components: modules/facet Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-3501.patch RandomSample is not random at all: It does not even import java.util.Random, and its behavior is deterministic. in addition, the test testCountUsingSamping() never retries as it was supposed to (for taking care of the hoped-for randomness). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
Thats right, this is still an open issue :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Monday, October 24, 2011 2:43 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWriterD elete.java But then I should break test into new class right? /me was being lazy... and this test only uses the one field... Mike McCandless http://blog.mikemccandless.com On Mon, Oct 24, 2011 at 7:29 AM, Uwe Schindler u...@thetaphi.de wrote: Mike, We have an annotation for this... No assume needed anymore. :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: mikemcc...@apache.org [mailto:mikemcc...@apache.org] Sent: Monday, October 24, 2011 1:00 PM To: comm...@lucene.apache.org Subject: svn commit: r1188089 - /lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWr iterD elete.java Author: mikemccand Date: Mon Oct 24 11:00:29 2011 New Revision: 1188089 URL: http://svn.apache.org/viewvc?rev=1188089view=rev Log: don't use MemCodec/SimpleText for this test Modified: lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWri terDe lete.java Modified: lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWri terDe lete.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test/org/apa che/luc ene/index/TestIndexWriterDelete.java?rev=1188089r1=1188088r2=118808 9view=diff == --- lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestIndexWri terDe lete.java (original) +++ lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/TestInde +++ xWr iterDelete.java Mon Oct 24 11:00:29 2011 @@ -23,14 +23,15 @@ import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Random; -import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import org.apache.lucene.analysis.*; import org.apache.lucene.document.Document; import org.apache.lucene.document.FieldType; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; +import org.apache.lucene.index.codecs.CodecProvider; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; @@ -896,6 +897,8 @@ public class TestIndexWriterDelete exten } public void testIndexingThenDeleting() throws Exception { + assumeFalse(This test cannot run with Memory codec, CodecProvider.getDefault().getFieldCodec(field).equals(Memory)); + assumeFalse(This test cannot run with SimpleText codec, + CodecProvider.getDefault().getFieldCodec(field).equals(SimpleTex + t) + ); final Random r = random; Directory dir = newDirectory(); // note this test explicitly disables payloads - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1536: -- Attachment: LUCENE-1536.patch Here the updated patch after some changes in trunk. It also adds missCount checks back to Caching*Filters, I lost then during cleanup. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536_hack.patch, changes-yonik-uwe.patch, luceneutil.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10981 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10981/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.AutoCommitTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:469) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:527) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:437) Build Log (for compile errors): [...truncated 7846 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3526: Attachment: LUCENE-3526_test.patch Updated set of tests, I changed TestRegexpRandom2 to sometimes use an empty field name for better testing. this seems to trigger its own problems: {noformat} [junit] Testcase: testRegexps(org.apache.lucene.search.TestRegexpRandom2): FAILED [junit] Terms are out of order: field= (number 0) lastField= (number -1) text= lastText= [junit] junit.framework.AssertionFailedError: Terms are out of order: field= (number 0) lastField= (number -1) text= lastText= [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:149) [junit] at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:51) [junit] at org.apache.lucene.index.codecs.preflexrw.TermInfosWriter.add(TermInfosWriter.java:213) [junit] at org.apache.lucene.index.codecs.preflexrw.PreFlexFieldsWriter$PreFlexTermsWriter.finishTerm(PreFlexFieldsWriter.java:192) [junit] at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:409) [junit] at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:92) {noformat} I had thought to workaround this original issue with this hack-patch, but i still get that fail... perhaps its a bad assert/something unrelated? {noformat} Index: src/java/org/apache/lucene/index/codecs/preflex/PreFlexFields.java === --- src/java/org/apache/lucene/index/codecs/preflex/PreFlexFields.java (revision 1188010) +++ src/java/org/apache/lucene/index/codecs/preflex/PreFlexFields.java (working copy) @@ -711,7 +711,12 @@ } else { getTermsDict().seekEnum(termEnum, term, true); } - skipNext = true; + if (internedFieldName == ) { +// hackedy-hack: we aren't actually positioned yet +skipNext = false; + } else { +skipNext = true; + } unicodeSortOrder = sortTermsByUnicode(); {noformat} preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3526: Attachment: LUCENE-3526_test.patch ok, here's a patch... all tests pass now. The assert fail in the writer was a bad assert, we previously had: {noformat} // If there is a field named (empty string) then we // will get 0 on this comparison, yet, it's OK. But // it's not OK if two different field numbers map to // the same name. if (cmp != 0 || lastFieldNumber != -1) return cmp; {noformat} which is nice, but it doesn't cover the case of empty term PLUS empty string: Term(, ). in this case we would fall thru and return 0, which is wrong. preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3526: Attachment: LUCENE-3526.patch oops, wrong patch. here is the correct one preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134071#comment-13134071 ] Robert Muir commented on LUCENE-3526: - I will add an additional test to 3.x for Term(, ) and see if it has any bad asserts like this, and add it to the patch. preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
IndexableField(Type) interfaces, abstract classes and back compat.
Hi, I was perusing trunk code on the way back from Eurocon and noticed the new FieldType stuff has some interfaces in it. In the past we've tried to stick to interfaces for only simple ones (i.e. one or two methods that aren't likely to change at all) and instead used abstract classes for bigger classes that may be subject to change more often. On the one side, interfaces are cleaner design wise, but adding new methods makes it hard for supporting back compatibility if we wish to add new methods. Abstract classes allow for back compat, but they are perhaps a bit less clean b/c they often tie an implementation to the broader API. In the past, we've been bitten by interfaces b/c let's face it, we can't predict the future (Fieldable is the most notorious -- and this stuff has a very Fieldable feel to it -- but there are others, please see the archives for past discussions.) I think in an ideal world, interfaces are kept quite compact and use use multiple of them and then you provide a base abstract class that provides most of the implementation for most people by implementing said interfaces. Logically, this doesn't always work out. An alternative is to mark it all as experimental and punt for now. In the end, I just want to make sure we have the discussion about it so that we don't find ourselves having to wait until 5.x in order to add a new method to one of these interfaces. Alternatively, perhaps we won't need to at all or perhaps we think no one other than core Lucene will implement these. Just trying to avoid past pain and headaches in the future. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexableField(Type) interfaces, abstract classes and back compat.
On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote: Hi, I was perusing trunk code on the way back from Eurocon and noticed the new FieldType stuff has some interfaces in it. In the past we've tried to stick to interfaces for only simple ones (i.e. one or two methods that aren't likely to change at all) and instead used abstract classes for bigger classes that may be subject to change more often. I think its good you brought this up Grant. I wanted to mention this: as far as interfaces versus abstract classes, in my opinion Lucene was under a false sense of security before thinking that abstract classes actually solve these back compat problems. In fact they can create serious problems like https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if someone writes a delegator over an abstract class, its asking for trouble. On the other hand, delegators over interfaces are safe because they (and we) get a compile-time break for the new methods. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexableField(Type) interfaces, abstract classes and back compat.
On Oct 24, 2011, at 9:56 AM, Robert Muir wrote: On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote: Hi, I was perusing trunk code on the way back from Eurocon and noticed the new FieldType stuff has some interfaces in it. In the past we've tried to stick to interfaces for only simple ones (i.e. one or two methods that aren't likely to change at all) and instead used abstract classes for bigger classes that may be subject to change more often. I think its good you brought this up Grant. I wanted to mention this: as far as interfaces versus abstract classes, in my opinion Lucene was under a false sense of security before thinking that abstract classes actually solve these back compat problems. In fact they can create serious problems like https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if someone writes a delegator over an abstract class, its asking for trouble. On the other hand, delegators over interfaces are safe because they (and we) get a compile-time break for the new methods. Good point. Basically, going down this line, are we saying that we would still allow new methods on minor versions on Interfaces? My personal take is that if we do, we primarily just need to communicate it ahead of time. Ideally, at least one release ahead, but maybe it is just an email. We just want to avoid surprises for people where possible. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134083#comment-13134083 ] Robert Muir commented on LUCENE-3526: - There are more serious problems in 3.x here. * if you create new Field(, ), you get IllegalArgumentException from Field's ctor: name and value cannot both be empty * But there are tons of other ways to index an empty term for the empty field (for example initially make it garbage then .setValue(), or via tokenstream). * If you do this, and you have assertions enabled, you will trip the same assert bug i fixed in trunk here. * If you don't have assertions enabled, you will create a corrupt index: test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num docs deleted 0] So we need to figure out what the semantics should be for 3.x. is Term(, ) allowed or not? preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: IndexableField(Type) interfaces, abstract classes and back compat.
Hi, Beyond that, we should add final modifier to all methods that simply delegate to other methods from the same class. This is another trap when trying to be backwards compatible. An easy-to-use method that simply takes some defaults for specific parameters of a telescopic other one should always be final. If somebody subclasses, he can only overwrite the large extended telescope and don't need to take care of the easy-to-use methods. I revised lots of classes for that, but there are still some worse cases e.g. in IndexReader. If we don't make such delegating methods final, we also have the same backwards compatibility problem like with tokenStream or FilteredIndexReader. This is just seen to be as an additional comments about stuff that easily goes wrong when making APIs. Make everything final that's not intended to be modified in subclasses (or make the whole class final). And most methods are not needed to be overridden, only open them up for subclassing when there is *really* a use case! We can remove final later easily, but initially we should prevent subclassing. This would remove lot's of VirtualMethod usages in 3.x (my abstraction of the TokenStream backwards layer). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, October 24, 2011 4:02 PM To: dev@lucene.apache.org Subject: Re: IndexableField(Type) interfaces, abstract classes and back compat. On Oct 24, 2011, at 9:56 AM, Robert Muir wrote: On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote: Hi, I was perusing trunk code on the way back from Eurocon and noticed the new FieldType stuff has some interfaces in it. In the past we've tried to stick to interfaces for only simple ones (i.e. one or two methods that aren't likely to change at all) and instead used abstract classes for bigger classes that may be subject to change more often. I think its good you brought this up Grant. I wanted to mention this: as far as interfaces versus abstract classes, in my opinion Lucene was under a false sense of security before thinking that abstract classes actually solve these back compat problems. In fact they can create serious problems like https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if someone writes a delegator over an abstract class, its asking for trouble. On the other hand, delegators over interfaces are safe because they (and we) get a compile-time break for the new methods. Good point. Basically, going down this line, are we saying that we would still allow new methods on minor versions on Interfaces? My personal take is that if we do, we primarily just need to communicate it ahead of time. Ideally, at least one release ahead, but maybe it is just an email. We just want to avoid surprises for people where possible. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134121#comment-13134121 ] James Dyer commented on SOLR-2848: -- Robert, I think your first suggestion (moving configuration and response formatting out of the *SolrSpellCheck) is desirable and doable, but I wanted to keep this issue focused on increasing test coverage and to make DirectSolrSpellChecker mirror what AbstractLuceneSpellChecker already does so that it can pass. Obviously, if every SpellChecker plug-in implemented or extended something that had a getStringDistance or getAccuracy method then we wouldn't need to do instanceof and cast. Once again, a big structural change like this seems inappropriate in a bug fix, especially as we are not introducing these checks for the first time. This is a long-standing problem. It looks to me like internal levenshtein is just a dummy class designed to technically meet the api requirements while not actually doing anything. But SpellCheckComponent.finishStage() needs to be able to get the StringDistance impl that was used to generate suggestions during the first stage, then re-compute distances using its getDistance() method. This is how it can know how to order the varying suggestions from multiple shards after-the-fact. I see from the notes in DirectSpellChecker that using the internal StringDistance yields performance improvements over using a pluggable s.d. I did not look enough to determine if internal levenshtein could be modified to re-compute these internally-generated distance calculations and be usable externally, without sacrificing the performance gain. If possible, this would probably be our best bet, eliminating the Exception hack and any possible discrepancies using 2 different s.d. classes would cause. Do you agree? DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134123#comment-13134123 ] Robert Muir commented on SOLR-2848: --- {quote} But SpellCheckComponent.finishStage() needs to be able to get the StringDistance impl that was used to generate suggestions during the first stage, then re-compute distances using its getDistance() method. {quote} This is the part i dont understand... we already have the scores in the results, so why recompute? DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2804) Logging error causes entire DIH process to fail
[ https://issues.apache.org/jira/browse/SOLR-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134141#comment-13134141 ] Adam Neal commented on SOLR-2804: - Are you using the multithreading in the DIH? I have the same problem but when I remove the maxthreads attribute the indexing completes successfully. Logging error causes entire DIH process to fail --- Key: SOLR-2804 URL: https://issues.apache.org/jira/browse/SOLR-2804 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode) Model Name: MacBook Pro Model Identifier: MacBookPro8,2 Processor Name: Intel Core i7 Processor Speed:2.2 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core):256 KB L3 Cache: 6 MB Memory: 4 GB System Software Overview: System Version: Mac OS X 10.6.8 (10K549) Kernel Version: Darwin 10.8.0 Reporter: Pulkit Singhal Labels: dih Original Estimate: 48h Remaining Estimate: 48h SEVERE: Full Import failed:java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String at org.apache.solr.common.util.NamedList.getName(NamedList.java:127) at org.apache.solr.common.util.NamedList.toString(NamedList.java:263) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:188) at org.apache.solr.handler.dataimport.SolrWriter.close(SolrWriter.java:57) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:265) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:372) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:440) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:421) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134142#comment-13134142 ] James Dyer commented on SOLR-2848: -- finishStage() is being run on the Master Shard. It receives spelling results from all of the shards and then has to integrate them together. Solr doesn't return the scores with spelling suggestions back to the client. I suppose the authors of SOLR-785 could have modified the response Solr sends back to its clients. However, it probably seemed inexpensive enough to just re-compute the string distance after-the-fact (indeed Lucene In Action 2nd ed sect 8.5.3 mentions doing the same thing, so I take it this is a common thing to do?). The problem now we have is we've got a spellchecker that doesn't fully implement a StringDistance all the time. I'd imagine the best bet is to try and change that. Possibly, the slight discrepancies our current patch leave are not serious enough to fix? If neither option is good, then we'd probably have to modify the solr response to include scores. DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134150#comment-13134150 ] Robert Muir commented on SOLR-2848: --- {quote} I'd imagine the best bet is to try and change that. {quote} OK, Lets do this, such that the distance impl is a real one computing levenshtein like Lucene does and not a fake one. Then its one less hack. Want to open a LUCENE issue for this? I can help if you want. DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134160#comment-13134160 ] Robert Muir commented on SOLR-2848: --- {quote} The problem now we have is we've got a spellchecker that doesn't fully implement a StringDistance all the time. {quote} we should fix that hack as i mentioned I think (its just a hack, caused by me, sorry!). But then we should think about how to make sure that SpellChecker subclasses always work correctly distributed if we aren't going to change the wire format. Rather than instanceof/StringDistance maybe we could add a merge() method that would be more general? DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134182#comment-13134182 ] James Dyer commented on SOLR-2848: -- {quote} OK, Lets do this, such that the distance impl is a real one computing levenshtein like Lucene does {quote} I opened LUCENE-3527. {quote} Rather than instanceof/StringDistance maybe we could add a merge() method that would be more general? {quote} Are you thinking each *SolrSpellChecker should have a merge() that finishStage() calls? This sounds reasonable to me. DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2848) DirectSolrSpellChecker fails in distributed environment
[ https://issues.apache.org/jira/browse/SOLR-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134187#comment-13134187 ] Robert Muir commented on SOLR-2848: --- Yeah, this way a spellchecker can decide how it merges results (since we arent going to put any 'score' in the wire format or require it). So for example, the default impl of AbstractLuceneSpellChecker's merge() would use getComparator and such (we can just put this in the abstract class) DirectSolrSpellChecker fails in distributed environment --- Key: SOLR-2848 URL: https://issues.apache.org/jira/browse/SOLR-2848 Project: Solr Issue Type: Bug Components: SolrCloud, spellchecker Affects Versions: 4.0 Reporter: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-2848.patch While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the IndexBased and Direct spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the IndexBased spell checker with SOLR-2083. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134192#comment-13134192 ] Michael McCandless commented on LUCENE-3183: I think the hack is actually correct, but maybe change it to check termEnum.position = 0? So this was a case we missed from LUCENE-3183 (maybe there are more!?), where we decided for the corner case of empty field and term text, the caller must handle that the returned enum is unpositioned (in exchange for not adding an if per next). And maybe add the same comment about LUCENE-3183 on top of that logic? TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Assignee: Michael McCandless Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134205#comment-13134205 ] Michael McCandless commented on LUCENE-3526: I think the hack is actually correct, but maybe change it to check termEnum.position = 0? So this was a case we missed from LUCENE-3183 (maybe there are more!?), where we decided for the corner case of empty field and term text, the caller must handle that the returned enum is unpositioned (in exchange for not adding an if per next). And maybe add the same comment about LUCENE-3183 on top of that logic? preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3183) TestIndexWriter failure: AIOOBE
[ https://issues.apache.org/jira/browse/LUCENE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134206#comment-13134206 ] Michael McCandless commented on LUCENE-3183: Woops, above comment was meant for LUCENE-3526. TestIndexWriter failure: AIOOBE --- Key: LUCENE-3183 URL: https://issues.apache.org/jira/browse/LUCENE-3183 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: selckin Assignee: Michael McCandless Attachments: LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183.patch, LUCENE-3183_test.patch trunk: r1133486 {code} [junit] Testsuite: org.apache.lucene.index.TestIndexWriter [junit] Testcase: testEmptyFieldName(org.apache.lucene.index.TestIndexWriter): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:158) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1362) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1280) [junit] [junit] [junit] Tests run: 39, Failures: 0, Errors: 1, Time elapsed: 17.634 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_0 docCount=1 [junit] codec=SegmentCodecs [codecs=[PreFlex], provider=org.apache.lucene.index.codecs.CoreCodecProvider@3f78807] [junit] compound=false [junit] hasProx=true [junit] numFiles=8 [junit] size (MB)=0 [junit] diagnostics = {os.version=2.6.39-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=flush, os.arch=amd64, java.version=1.6.0_25, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [1 fields] [junit] test: field norms.OK [1 fields] [junit] test: terms, freq, prox...ERROR: java.lang.ArrayIndexOutOfBoundsException: -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:212) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.seekEnum(TermInfosReader.java:301) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.get(TermInfosReader.java:234) [junit] at org.apache.lucene.index.codecs.preflex.TermInfosReader.terms(TermInfosReader.java:371) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTermsEnum.reset(PreFlexFields.java:719) [junit] at org.apache.lucene.index.codecs.preflex.PreFlexFields$PreTerms.iterator(PreFlexFields.java:249) [junit] at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader$FieldsIterator.terms(PerFieldCodecWrapper.java:147) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:610) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:154) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:144) [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:477) [junit] at org.apache.lucene.index.TestIndexWriter.testEmptyFieldName(TestIndexWriter.java:857) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at
[jira] [Updated] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3526: --- Attachment: LUCENE-3526.patch Patch, putting back the safer-but-if-per-scan from LUCENE-3183; this fixed another test failure. preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134211#comment-13134211 ] Robert Muir commented on LUCENE-3526: - +1, i'm running the tests a lot, this seems solid. preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134238#comment-13134238 ] Robert Muir commented on LUCENE-3526: - I committed this, thanks Mike! Now to figure out wtf to do for 3.x... preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-3526.patch, LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexableField(Type) interfaces, abstract classes and back compat.
Thanks for raising this Grant. My feeling is we can stick with an interface here, and mark it @experimental. This is a very-low-level-very-expert API. Most users will use the sugar field impls (TextField, BinaryField, NumericField, etc.). Expert users will build their own FieldType and pass that to Field. Waaay expert users will skip our user-space Document/Field/FieldType entirely and code directly to this low level minimal indexing API. For example maybe their app sucks streamed bytes off a socket, parses out fields and immediately hands that data off to IndexWriter for indexing (never making FieldTypes/Fields/Documents). So I think such way-expert users can handle hard breaks on the API, and would likely want to see the hard break so they know they're something to fix / new to add to indexing. Mike McCandless http://blog.mikemccandless.com On Mon, Oct 24, 2011 at 10:02 AM, Grant Ingersoll gsing...@apache.org wrote: On Oct 24, 2011, at 9:56 AM, Robert Muir wrote: On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote: Hi, I was perusing trunk code on the way back from Eurocon and noticed the new FieldType stuff has some interfaces in it. In the past we've tried to stick to interfaces for only simple ones (i.e. one or two methods that aren't likely to change at all) and instead used abstract classes for bigger classes that may be subject to change more often. I think its good you brought this up Grant. I wanted to mention this: as far as interfaces versus abstract classes, in my opinion Lucene was under a false sense of security before thinking that abstract classes actually solve these back compat problems. In fact they can create serious problems like https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if someone writes a delegator over an abstract class, its asking for trouble. On the other hand, delegators over interfaces are safe because they (and we) get a compile-time break for the new methods. Good point. Basically, going down this line, are we saying that we would still allow new methods on minor versions on Interfaces? My personal take is that if we do, we primarily just need to communicate it ahead of time. Ideally, at least one release ahead, but maybe it is just an email. We just want to avoid surprises for people where possible. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
[ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134247#comment-13134247 ] Uwe Schindler commented on LUCENE-3473: --- Robert: In your patch is an additional test for CheckIndex on the old indexes. This is implicitely already done by: testSearchOldIndex, which calls Testutil's checkindex as first step. So this test is duplicate and slows down, right? CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms --- Key: LUCENE-3473 URL: https://issues.apache.org/jira/browse/LUCENE-3473 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.4, 4.0 Reporter: Robert Muir Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch Just glancing at the code it seems to sorta do this check, but only in the hasOrd==true case maybe (which seems to be testing something else)? It would be nice to verify this also for terms dicts that dont support ord. we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and preflex -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
[ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134248#comment-13134248 ] Robert Muir commented on LUCENE-3473: - Uwe yes: i was actually adding this test only for debugging... I'll remove it (it does not give us any additional test coverage) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms --- Key: LUCENE-3473 URL: https://issues.apache.org/jira/browse/LUCENE-3473 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.4, 4.0 Reporter: Robert Muir Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch Just glancing at the code it seems to sorta do this check, but only in the hasOrd==true case maybe (which seems to be testing something else)? It would be nice to verify this also for terms dicts that dont support ord. we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and preflex -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
[ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3473: Attachment: LUCENE-3473.patch updated patch, now that LUCENE-3526 is fixed, all tests passed. * removed the useless TestBackwardsCompatibility test (i was just debugging) * fixed TestRollingUpdates to not combine PreFlexCodec and MemoryCodec in PerFieldCodecWrapper (this is stupid, and causes my assert to trip) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms --- Key: LUCENE-3473 URL: https://issues.apache.org/jira/browse/LUCENE-3473 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.4, 4.0 Reporter: Robert Muir Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch Just glancing at the code it seems to sorta do this check, but only in the hasOrd==true case maybe (which seems to be testing something else)? It would be nice to verify this also for terms dicts that dont support ord. we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and preflex -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11003 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11003/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest.testMultiCore Error Message: Index directory exists after core unload with deleteIndex=true Stack Trace: junit.framework.AssertionFailedError: Index directory exists after core unload with deleteIndex=true at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.solr.client.solrj.MultiCoreExampleTestBase.testMultiCore(MultiCoreExampleTestBase.java:163) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435) Build Log (for compile errors): [...truncated 1 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3526) preflex codec returns wrong terms if you use an empty field name
[ https://issues.apache.org/jira/browse/LUCENE-3526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134278#comment-13134278 ] Robert Muir commented on LUCENE-3526: - I'm gonna close this issue and open a separate issue for Term(, ) on 3.x... preflex codec returns wrong terms if you use an empty field name Key: LUCENE-3526 URL: https://issues.apache.org/jira/browse/LUCENE-3526 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3526.patch, LUCENE-3526.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch, LUCENE-3526_test.patch spinoff from LUCENE-3473. I have a standalone test for this... the termsenum is returning a bogus extra empty-term (I assume it has no postings, i didnt try). This causes the checkindex test in LUCENE-3473 to fail, because there are 4 terms instead of 3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3509) Add settings to IWC to optimize IDV indices for CPU or RAM respectivly
[ https://issues.apache.org/jira/browse/LUCENE-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134293#comment-13134293 ] Martijn van Groningen commented on LUCENE-3509: --- I also prefer to have a default that matches with the FieldCache. I will change the patch so that the option is at the codec impl level (DefaultDocValuesConsumer). Add settings to IWC to optimize IDV indices for CPU or RAM respectivly -- Key: LUCENE-3509 URL: https://issues.apache.org/jira/browse/LUCENE-3509 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-3509.patch spinnoff from LUCENE-3496 - we are seeing much better performance if required bits for PackedInts are rounded up to a 8/16/32/64. We should add this option to IWC and default to round up ie. more RAM faster lookups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexableField(Type) interfaces, abstract classes and back compat.
On Oct 24, 2011, at 1:01 PM, Michael McCandless wrote: Thanks for raising this Grant. My feeling is we can stick with an interface here, and mark it @experimental. This is a very-low-level-very-expert API. :-) We thought the same of Fieldable once upon a time! At any rate, +1 on all of this. I think we are much more sane about this stuff now! Most users will use the sugar field impls (TextField, BinaryField, NumericField, etc.). Expert users will build their own FieldType and pass that to Field. Waaay expert users will skip our user-space Document/Field/FieldType entirely and code directly to this low level minimal indexing API. For example maybe their app sucks streamed bytes off a socket, parses out fields and immediately hands that data off to IndexWriter for indexing (never making FieldTypes/Fields/Documents). So I think such way-expert users can handle hard breaks on the API, and would likely want to see the hard break so they know they're something to fix / new to add to indexing. Mike McCandless http://blog.mikemccandless.com On Mon, Oct 24, 2011 at 10:02 AM, Grant Ingersoll gsing...@apache.org wrote: On Oct 24, 2011, at 9:56 AM, Robert Muir wrote: On Mon, Oct 24, 2011 at 9:52 AM, Grant Ingersoll gsing...@apache.org wrote: Hi, I was perusing trunk code on the way back from Eurocon and noticed the new FieldType stuff has some interfaces in it. In the past we've tried to stick to interfaces for only simple ones (i.e. one or two methods that aren't likely to change at all) and instead used abstract classes for bigger classes that may be subject to change more often. I think its good you brought this up Grant. I wanted to mention this: as far as interfaces versus abstract classes, in my opinion Lucene was under a false sense of security before thinking that abstract classes actually solve these back compat problems. In fact they can create serious problems like https://issues.apache.org/jira/browse/LUCENE-2828. In other words, if someone writes a delegator over an abstract class, its asking for trouble. On the other hand, delegators over interfaces are safe because they (and we) get a compile-time break for the new methods. Good point. Basically, going down this line, are we saying that we would still allow new methods on minor versions on Interfaces? My personal take is that if we do, we primarily just need to communicate it ahead of time. Ideally, at least one release ahead, but maybe it is just an email. We just want to avoid surprises for people where possible. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org Grant Ingersoll http://www.lucidimagination.com
[jira] [Commented] (LUCENE-3528) TestNRTManager hang
[ https://issues.apache.org/jira/browse/LUCENE-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134328#comment-13134328 ] Robert Muir commented on LUCENE-3528: - {noformat} [junit] 2011-10-24 14:28:25 [junit] Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode): [junit] [junit] Thread-2 daemon prio=10 tid=0x7fe66005b800 nid=0x22d6 waiting on condition [0x7fe66d854000] [junit]java.lang.Thread.State: WAITING (parking) [junit] at sun.misc.Unsafe.park(Native Method) [junit] - parking to wait for 0xe0002120 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) [junit] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) [junit] at org.apache.lucene.index.NRTManager.waitOnGenCondition(NRTManager.java:251) [junit] at org.apache.lucene.index.NRTManager.waitForGeneration(NRTManager.java:232) [junit] at org.apache.lucene.index.NRTManager.waitForGeneration(NRTManager.java:196) [junit] at org.apache.lucene.index.TestNRTManager.addDocuments(TestNRTManager.java:95) [junit] at org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase$1.run(ThreadedIndexingAndSearchingTestCase.java:223) [junit] [junit] NRT Reopen Thread daemon prio=10 tid=0x7fe660024800 nid=0x22d5 in Object.wait() [0x7fe66d955000] [junit]java.lang.Thread.State: TIMED_WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] at java.lang.Object.wait(Object.java:443) [junit] at org.apache.lucene.index.NRTManagerReopenThread.run(NRTManagerReopenThread.java:162) [junit] - locked 0xe0006000 (a org.apache.lucene.index.NRTManagerReopenThread) [junit] [junit] Low Memory Detector daemon prio=10 tid=0x7fe668001000 nid=0x22ba runnable [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] CompilerThread1 daemon prio=10 tid=0x40f8b000 nid=0x22b7 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] CompilerThread0 daemon prio=10 tid=0x40f88000 nid=0x22b5 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] Signal Dispatcher daemon prio=10 tid=0x40f86000 nid=0x22b2 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] Finalizer daemon prio=10 tid=0x40f69000 nid=0x229e in Object.wait() [0x7fe66e7d7000] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) [junit] - locked 0xe0002528 (a java.lang.ref.ReferenceQueue$Lock) [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) [junit] at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) [junit] [junit] Reference Handler daemon prio=10 tid=0x40f62000 nid=0x2298 in Object.wait() [0x7fe66e8d8000] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] at java.lang.Object.wait(Object.java:485) [junit] at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) [junit] - locked 0xe0006090 (a java.lang.ref.Reference$Lock) [junit] [junit] main prio=10 tid=0x40ef6000 nid=0x2240 in Object.wait() [0x7fe673e7d000] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] - waiting on 0xe0002090 (a org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase$1) [junit] at java.lang.Thread.join(Thread.java:1186) [junit] - locked 0xe0002090 (a org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase$1) [junit] at java.lang.Thread.join(Thread.java:1239) [junit] at org.apache.lucene.index.ThreadedIndexingAndSearchingTestCase.runTest(ThreadedIndexingAndSearchingTestCase.java:524) [junit] at org.apache.lucene.index.TestNRTManager.testNRTManager(TestNRTManager.java:37) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at
[jira] [Created] (LUCENE-3528) TestNRTManager hang
TestNRTManager hang --- Key: LUCENE-3528 URL: https://issues.apache.org/jira/browse/LUCENE-3528 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir didn't check 3.x yet, just encountered this one running the tests -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3529) creating empty field + empty term leads to invalid index
creating empty field + empty term leads to invalid index Key: LUCENE-3529 URL: https://issues.apache.org/jira/browse/LUCENE-3529 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Spinoff from LUCENE-3526. * if you create new Field(, ), you get IllegalArgumentException from Field's ctor: name and value cannot both be empty * But there are tons of other ways to index an empty term for the empty field (for example initially make it garbage then .setValue(), or via tokenstream). * If you do this, and you have assertions enabled, you will trip an assert (the assert is fixed in trunk, in LUCENE-3526) * But If you don't have assertions enabled, you will create a corrupt index: test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num docs deleted 0] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3529) creating empty field + empty term leads to invalid index
[ https://issues.apache.org/jira/browse/LUCENE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3529: Attachment: LUCENE-3529_test.patch attached is a test (committed to trunk). I also fixed the assert and removed the bogus check in Field's ctor. But the checkIndex fails (as it does before, if you index this term with assertions disabled). So next step is to figure out a fix... creating empty field + empty term leads to invalid index Key: LUCENE-3529 URL: https://issues.apache.org/jira/browse/LUCENE-3529 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Attachments: LUCENE-3529_test.patch Spinoff from LUCENE-3526. * if you create new Field(, ), you get IllegalArgumentException from Field's ctor: name and value cannot both be empty * But there are tons of other ways to index an empty term for the empty field (for example initially make it garbage then .setValue(), or via tokenstream). * If you do this, and you have assertions enabled, you will trip an assert (the assert is fixed in trunk, in LUCENE-3526) * But If you don't have assertions enabled, you will create a corrupt index: test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num docs deleted 0] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2849) Solr maven dependencies: logging
Solr maven dependencies: logging Key: SOLR-2849 URL: https://issues.apache.org/jira/browse/SOLR-2849 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 4.0 Reporter: David Smiley Priority: Trivial I was looking at my maven based project's Solr-core dependencies (trunk), and observed some issues that I think should be fixed in Solr's maven poms. I ran {{mvn dependency:tree}} -- the output is further below. There are two changes I see needed, related to logging: * slf4j-jdk14 should be runtime scope, and optional. * httpclient depends on commons-logging. Exclude this dependency from the httpclient dependency, and add a dependency on jcl-over-slf4j with compile scope. * Zookeeper depends on Log4j, unfortunately. There is an issue to change this to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use log4j-over-slf4j with compile scope, at the solrj pom. As an aside, it's unfortunate to see all those velocity dependencies. It even depends on struts -- seriously?! I hope solritas gets put back into a contrib sometime: SOLR-2588 Steve, if you'd like to me to create the patch, I will. {code} [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile [INFO] | | +- log4j:log4j:jar:1.2.15:compile [INFO] | | | \- javax.mail:mail:jar:1.4:compile [INFO] | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | \- jline:jline:jar:0.9.94:compile [INFO] | +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile [INFO] | | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile [INFO] | +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | +- commons-fileupload:commons-fileupload:jar:1.2.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | \- commons-logging:commons-logging:jar:1.0.4:compile [INFO] | +- commons-io:commons-io:jar:1.4:compile [INFO] | +- org.apache.velocity:velocity:jar:1.6.4:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | \- oro:oro:jar:2.0.8:compile [INFO] | +- org.apache.velocity:velocity-tools:jar:2.0:compile [INFO] | | +- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | +- commons-chain:commons-chain:jar:1.1:compile [INFO] | | +- commons-validator:commons-validator:jar:1.3.1:compile [INFO] | | +- dom4j:dom4j:jar:1.1:compile [INFO] | | +- sslext:sslext:jar:1.2-0:compile [INFO] | | +- org.apache.struts:struts-core:jar:1.3.8:compile [INFO] | | | \- antlr:antlr:jar:2.7.2:compile [INFO] | | +- org.apache.struts:struts-taglib:jar:1.3.8:compile [INFO] | | \- org.apache.struts:struts-tiles:jar:1.3.8:compile [INFO] | +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile [INFO] | \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
System model for Apache solr
Hi guys, First of all, thanks for this good project. I would like to know if exists papers or documents related with theoretic model of response time of Apache Solr or Apache Lucene. I am writing an article and I would like to compare my experimental data. Best regards.
[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging
[ https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134357#comment-13134357 ] Jason Rutherglen commented on SOLR-2849: {quote}As an aside, it's unfortunate to see all those velocity dependencies. It even depends on struts -- seriously?! I hope solritas gets put back into a contrib sometime: SOLR-2588{quote} +1, move it out! Solr maven dependencies: logging Key: SOLR-2849 URL: https://issues.apache.org/jira/browse/SOLR-2849 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 4.0 Reporter: David Smiley Priority: Trivial I was looking at my maven based project's Solr-core dependencies (trunk), and observed some issues that I think should be fixed in Solr's maven poms. I ran {{mvn dependency:tree}} -- the output is further below. There are two changes I see needed, related to logging: * slf4j-jdk14 should be runtime scope, and optional. * httpclient depends on commons-logging. Exclude this dependency from the httpclient dependency, and add a dependency on jcl-over-slf4j with compile scope. * Zookeeper depends on Log4j, unfortunately. There is an issue to change this to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use log4j-over-slf4j with compile scope, at the solrj pom. As an aside, it's unfortunate to see all those velocity dependencies. It even depends on struts -- seriously?! I hope solritas gets put back into a contrib sometime: SOLR-2588 Steve, if you'd like to me to create the patch, I will. {code} [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile [INFO] | | +- log4j:log4j:jar:1.2.15:compile [INFO] | | | \- javax.mail:mail:jar:1.4:compile [INFO] | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | \- jline:jline:jar:0.9.94:compile [INFO] | +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile [INFO] | | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile [INFO] | +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | +- commons-fileupload:commons-fileupload:jar:1.2.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | \- commons-logging:commons-logging:jar:1.0.4:compile [INFO] | +- commons-io:commons-io:jar:1.4:compile [INFO] | +- org.apache.velocity:velocity:jar:1.6.4:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | \- oro:oro:jar:2.0.8:compile [INFO] | +- org.apache.velocity:velocity-tools:jar:2.0:compile [INFO] | | +- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | +- commons-chain:commons-chain:jar:1.1:compile [INFO] | | +- commons-validator:commons-validator:jar:1.3.1:compile [INFO] | | +- dom4j:dom4j:jar:1.1:compile [INFO] | | +- sslext:sslext:jar:1.2-0:compile [INFO] | | +- org.apache.struts:struts-core:jar:1.3.8:compile [INFO] | | | \- antlr:antlr:jar:2.7.2:compile [INFO] | | +- org.apache.struts:struts-taglib:jar:1.3.8:compile [INFO] | | \- org.apache.struts:struts-tiles:jar:1.3.8:compile [INFO] | +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile [INFO] | \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging
[ https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134359#comment-13134359 ] Steven Rowe commented on SOLR-2849: --- bq. Steve, if you'd like to me to create the patch, I will. Sure, please do. Solr maven dependencies: logging Key: SOLR-2849 URL: https://issues.apache.org/jira/browse/SOLR-2849 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 4.0 Reporter: David Smiley Priority: Trivial I was looking at my maven based project's Solr-core dependencies (trunk), and observed some issues that I think should be fixed in Solr's maven poms. I ran {{mvn dependency:tree}} -- the output is further below. There are two changes I see needed, related to logging: * slf4j-jdk14 should be runtime scope, and optional. * httpclient depends on commons-logging. Exclude this dependency from the httpclient dependency, and add a dependency on jcl-over-slf4j with compile scope. * Zookeeper depends on Log4j, unfortunately. There is an issue to change this to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use log4j-over-slf4j with compile scope, at the solrj pom. As an aside, it's unfortunate to see all those velocity dependencies. It even depends on struts -- seriously?! I hope solritas gets put back into a contrib sometime: SOLR-2588 Steve, if you'd like to me to create the patch, I will. {code} [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile [INFO] | | +- log4j:log4j:jar:1.2.15:compile [INFO] | | | \- javax.mail:mail:jar:1.4:compile [INFO] | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | \- jline:jline:jar:0.9.94:compile [INFO] | +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile [INFO] | | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile [INFO] | +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | +- commons-fileupload:commons-fileupload:jar:1.2.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | \- commons-logging:commons-logging:jar:1.0.4:compile [INFO] | +- commons-io:commons-io:jar:1.4:compile [INFO] | +- org.apache.velocity:velocity:jar:1.6.4:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | \- oro:oro:jar:2.0.8:compile [INFO] | +- org.apache.velocity:velocity-tools:jar:2.0:compile [INFO] | | +- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | +- commons-chain:commons-chain:jar:1.1:compile [INFO] | | +- commons-validator:commons-validator:jar:1.3.1:compile [INFO] | | +- dom4j:dom4j:jar:1.1:compile [INFO] | | +- sslext:sslext:jar:1.2-0:compile [INFO] | | +- org.apache.struts:struts-core:jar:1.3.8:compile [INFO] | | | \- antlr:antlr:jar:2.7.2:compile [INFO] | | +- org.apache.struts:struts-taglib:jar:1.3.8:compile [INFO] | | \- org.apache.struts:struts-tiles:jar:1.3.8:compile [INFO] | +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile [INFO] | \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2849) Solr maven dependencies: logging
[ https://issues.apache.org/jira/browse/SOLR-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134361#comment-13134361 ] Erik Hatcher commented on SOLR-2849: bq. As an aside, it's unfortunate to see all those velocity dependencies. It even depends on struts – seriously?! I hope solritas gets put back into a contrib sometime: SOLR-2588 I hear ya loud and clear. I'll aim to move it out over the next week or so. There's some test hiccup in moving it out, IIRC, let me dust it off and get it relocated. As far as the Struts dependency, that's just some transitive POM listing, not some run (or compile)-time dependency. We certainly don't ship any Struts JARs from Solr and it all works fine. Solr maven dependencies: logging Key: SOLR-2849 URL: https://issues.apache.org/jira/browse/SOLR-2849 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 4.0 Reporter: David Smiley Priority: Trivial I was looking at my maven based project's Solr-core dependencies (trunk), and observed some issues that I think should be fixed in Solr's maven poms. I ran {{mvn dependency:tree}} -- the output is further below. There are two changes I see needed, related to logging: * slf4j-jdk14 should be runtime scope, and optional. * httpclient depends on commons-logging. Exclude this dependency from the httpclient dependency, and add a dependency on jcl-over-slf4j with compile scope. * Zookeeper depends on Log4j, unfortunately. There is an issue to change this to SLF4J: ZOOKEEPER-850. In the mean time we should exclude it and use log4j-over-slf4j with compile scope, at the solrj pom. As an aside, it's unfortunate to see all those velocity dependencies. It even depends on struts -- seriously?! I hope solritas gets put back into a contrib sometime: SOLR-2588 Steve, if you'd like to me to create the patch, I will. {code} [INFO] +- org.apache.solr:solr-core:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-solrj:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.zookeeper:zookeeper:jar:3.3.3:compile [INFO] | | +- log4j:log4j:jar:1.2.15:compile [INFO] | | | \- javax.mail:mail:jar:1.4:compile [INFO] | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | \- jline:jline:jar:0.9.94:compile [INFO] | +- org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-highlighter:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-memory:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-misc:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queryparser:jar:4.0-SNAPSHOT:compile [INFO] | | \- org.apache.lucene:lucene-sandbox:jar:4.0-SNAPSHOT:compile [INFO] | | \- jakarta-regexp:jakarta-regexp:jar:1.4:compile [INFO] | +- org.apache.lucene:lucene-spatial:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-suggest:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-grouping:jar:4.0-SNAPSHOT:compile [INFO] | +- org.apache.solr:solr-commons-csv:jar:4.0-SNAPSHOT:compile [INFO] | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | +- commons-fileupload:commons-fileupload:jar:1.2.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | \- commons-logging:commons-logging:jar:1.0.4:compile [INFO] | +- commons-io:commons-io:jar:1.4:compile [INFO] | +- org.apache.velocity:velocity:jar:1.6.4:compile [INFO] | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | \- oro:oro:jar:2.0.8:compile [INFO] | +- org.apache.velocity:velocity-tools:jar:2.0:compile [INFO] | | +- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | +- commons-chain:commons-chain:jar:1.1:compile [INFO] | | +- commons-validator:commons-validator:jar:1.3.1:compile [INFO] | | +- dom4j:dom4j:jar:1.1:compile [INFO] | | +- sslext:sslext:jar:1.2-0:compile [INFO] | | +- org.apache.struts:struts-core:jar:1.3.8:compile [INFO] | | | \- antlr:antlr:jar:2.7.2:compile [INFO] | | +- org.apache.struts:struts-taglib:jar:1.3.8:compile [INFO] | | \- org.apache.struts:struts-tiles:jar:1.3.8:compile [INFO] | +- org.slf4j:slf4j-jdk14:jar:1.6.1:compile [INFO] | \- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Request for Feedback for Patch to Allow DIH to Archive Files
Hi - We are using SOLR to process XML input files using the Data Import Handler. I didn't see a way to move the xml files out of the way after processing, so I wrote a small extension to allow this. The How to Contribute http://wiki.apache.org/solr/HowToContribute page says to pitch the request to the developer list in order to decide whether or not to submit a patch. As such, here goes: The new code basically extends FileDataSource and wraps the underlying reader such that when the close method on the input stream is called, the file is moved to a configurable archive directory. It is unclear to me whether this is the correct place to put it (I pondered changing the FileListEntityProcessor but this somehow felt safer). I realize that a more robust implementation would consider the success status of the file being processed and would also allow for configurable policies rather than a concrete implementation. Nonetheless, I didn't want the perfect to be the enemy of the good. Please peruse the attached source code file and provide feedback as to the merit of the idea, whether I ought to submit a JIRA ticket/patch and if my approach is correct. Thanks! Josh Harness ArchivingFileDataSource.java Description: Binary data - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Patch submission for DataImportHandler's FileListEntityProcessor to sort files
Hello, I noticed what appears to be a bug in DataImportHandler's FileListEntityProcessor. Specifically, it relies on Java's File.list() method to retrieve a list of files from the configured dataimport directory, but list() does not guarantee a sort order. This means that if you have two files that update the same record, the results are non-deterministic. Typically, list() does in fact return them lexigraphically sorted, but this is not guaranteed. An example of how you can get into trouble is to imagine the following: xyz.xml -- Created one hour ago. Contains updates to records Foo and Bar. abc.xml -- Created one minute ago. Contains updates to records Bar and Baz. In this case, the newest file, in abc.xml, would (likely, but not guaranteed) be run first, updating the Bar and Baz records. Next, the older file, xyz.xml, would update Foo and overwrite Bar with outdated changes. The HowToContribute wiki page suggested I send my request here before opening an actual bug ticket, so please let me know if there's anything else I can or should do to get this patch submitted and approved. I've attached a patch of FileListEntityProcessor, along with an updated test, please let me know if it's kosher. Thank you, Gabriel. Index: src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java === --- src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java (revision 1188246) +++ src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java (working copy) @@ -36,12 +36,19 @@ @Test @SuppressWarnings(unchecked) public void testSimple() throws IOException { +final String CREATED_FIRST = b.xml; +final String CREATED_SECOND = a.xml; File tmpdir = File.createTempFile(test, tmp, TEMP_DIR); tmpdir.delete(); tmpdir.mkdir(); tmpdir.deleteOnExit(); +createFile(tmpdir, b.xml, b.xml.getBytes(), false); +try { + Thread.sleep(1000); +} catch (Exception e) { +// Don't care if interrupted. Pass. +} createFile(tmpdir, a.xml, a.xml.getBytes(), false); -createFile(tmpdir, b.xml, b.xml.getBytes(), false); createFile(tmpdir, c.props, c.props.getBytes(), false); Map attrs = createMap( FileListEntityProcessor.FILE_NAME, xml$, @@ -58,6 +65,9 @@ fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE)); } assertEquals(2, fList.size()); + +assertTrue(File created first should have appeared first, fList.get(0).endsWith(CREATED_FIRST)); +assertTrue(File created second should have appeared second, fList.get(1).endsWith(CREATED_SECOND)); } @Test Index: src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java === --- src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java (revision 1188246) +++ src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java (working copy) @@ -219,25 +219,24 @@ } private void getFolderFiles(File dir, final ListMapString, Object fileDetails) { -// Fetch an array of file objects that pass the filter, however the -// returned array is never populated; accept() always returns false. -// Rather we make use of the fileDetails array which is populated as -// a side affect of the accept method. -dir.list(new FilenameFilter() { - public boolean accept(File dir, String name) { -File fileObj = new File(dir, name); -if (fileObj.isDirectory()) { - if (recursive) getFolderFiles(fileObj, fileDetails); -} else if (fileNamePattern == null) { - addDetails(fileDetails, dir, name); -} else if (fileNamePattern.matcher(name).find()) { - if (excludesPattern != null excludesPattern.matcher(name).find()) -return false; - addDetails(fileDetails, dir, name); +File[] files = dir.listFiles(); +Arrays.sort(files, new ComparatorFile(){ +public int compare(File f1, File f2) { +return ((Long)f1.lastModified()).compareTo(f2.lastModified()); } -return false; - } }); + +for(File fileObj : files) { + String name = fileObj.getName(); + if (fileObj.isDirectory()) { +if (recursive) getFolderFiles(fileObj, fileDetails); + } else if (fileNamePattern == null) { +addDetails(fileDetails, dir, name); + } else if (fileNamePattern.matcher(name).find()) { +if (excludesPattern == null || !excludesPattern.matcher(name).find()) + addDetails(fileDetails, dir, name); + } +} } private void addDetails(ListMapString, Object files, File dir, String name) { - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes
[ https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3508: -- Attachment: LUCENE-3508.patch Attached you will find a new patch for trunk. I made some improvements to the copy operations and CompoundTokenClass: - copy operations no longer create useless String objects or clones of String's internal char[] (this slows down indexing a lot) - the algorithmic hyphenator uses CTA's char[] directly as it did for Token before (see above) and uses optimized append() - the broken non-unicode-conform lowercasing was removed, instead, the CharArraySet is created case insensitive. If you pass in an own CharArraySet, it has to be case insensitive, if not, filter will fail (what to do? Robert, how do we handle that otherwise?) - As all tokens are again CTAs, the CAS lookup is fast again. - Some whitespace cleanup in the test and relicts in base source file (Lucene requires 2 spaces, no tabs) Robert, if you could look into it, it would be great. I did not test it with Solr, but for me it looks correct. Uwe Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes Key: LUCENE-3508 URL: https://issues.apache.org/jira/browse/LUCENE-3508 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.4, 4.0 Reporter: Spyros Kapnissis Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3508.patch, LUCENE-3508.patch The CompoundWordTokenFilterBase.setToken method will call clearAttributes() and then will reset only the default Token attributes (term, position, flags, etc) resulting in any custom attributes losing their value. Commenting out clearAttributes() seems to do the trick, but will fail the TestCompoundWordTokenFilter tests.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 731 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/731/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability Error Message: No live SolrServers available to handle this request Stack Trace: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:222) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:177) at org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:435) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246) at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:206) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) Build Log (for compile errors): [...truncated 14448 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes
[ https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3508: -- Attachment: LUCENE-3508.patch More cleanup: - As original token is always preserved, is not put into the list at all and returned without modifying (no extra copy operations) - removed deprecated makeDictionary method and corrected matchVersion usage. Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes Key: LUCENE-3508 URL: https://issues.apache.org/jira/browse/LUCENE-3508 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.4, 4.0 Reporter: Spyros Kapnissis Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch The CompoundWordTokenFilterBase.setToken method will call clearAttributes() and then will reset only the default Token attributes (term, position, flags, etc) resulting in any custom attributes losing their value. Commenting out clearAttributes() seems to do the trick, but will fail the TestCompoundWordTokenFilter tests.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes
[ https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3508: -- Attachment: LUCENE-3508.patch One more time the filter was revisited and partly rewritten: - it no longer clones the orginal token, as decompounding is done when TokenStream is on this token, which does not change. The decompounder simply takes termAtt/offsetAtt and produces new CompoundToken instances out of it, added to the LinkedList. The original is returned unmodified by a simple return true. This filter actually only creates new opjects when compounds are found, all other tokens are passed as is. - CompoundToken is now a simple wrapper around the characters and the offsets, copied out of the unmodified original termAtt. I think thats the most effective implementation of this filters. I think it's ready to commit. Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes Key: LUCENE-3508 URL: https://issues.apache.org/jira/browse/LUCENE-3508 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.4, 4.0 Reporter: Spyros Kapnissis Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch The CompoundWordTokenFilterBase.setToken method will call clearAttributes() and then will reset only the default Token attributes (term, position, flags, etc) resulting in any custom attributes losing their value. Commenting out clearAttributes() seems to do the trick, but will fail the TestCompoundWordTokenFilter tests.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10989 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10989/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.update.AutoCommitTest Error Message: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33) Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: directory of test was not closed, opened from: org.apache.solr.core.MockDirectoryFactory.create(MockDirectoryFactory.java:33) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:469) at org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:527) at org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:437) Build Log (for compile errors): [...truncated 7871 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes
[ https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3508: -- Attachment: LUCENE-3508.patch Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes Key: LUCENE-3508 URL: https://issues.apache.org/jira/browse/LUCENE-3508 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.4, 4.0 Reporter: Spyros Kapnissis Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch The CompoundWordTokenFilterBase.setToken method will call clearAttributes() and then will reset only the default Token attributes (term, position, flags, etc) resulting in any custom attributes losing their value. Commenting out clearAttributes() seems to do the trick, but will fail the TestCompoundWordTokenFilter tests.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2205: --- Attachment: LUCENE-2205.patch New patch, iterated from Aaron's last patch. I moved the DataInput/Output impls into PagedBytes, so they can directly operate on the byte[] blocks. I also don't write skipOffset unless df = skipInterval. I think this is ready! Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: LUCENE-2205.patch, RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, lowmemory_w_utf8_encoding.v4.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3515) Possible slowdown of indexing/merging on 3.x vs trunk
[ https://issues.apache.org/jira/browse/LUCENE-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3515: --- Fix Version/s: (was: 3.5) This bug was only present in 4.0. Possible slowdown of indexing/merging on 3.x vs trunk - Key: LUCENE-3515 URL: https://issues.apache.org/jira/browse/LUCENE-3515 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3515.patch, LUCENE-3515.patch, LUCENE-index-34.patch, LUCENE-index-40.patch, TestGenerationTime.java.3x, TestGenerationTime.java.40, stdout-snow-leopard.tar.gz Opening an issue to pursue the possible slowdown Marc Sturlese uncovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2850) Do not refine facets when minCount == 1
Do not refine facets when minCount == 1 --- Key: SOLR-2850 URL: https://issues.apache.org/jira/browse/SOLR-2850 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.4 Environment: Ubuntu, distributed Reporter: Matt Smith Currently there is a special case in the code to not refine facets if minCount==0. It seems this could be extended to minCount = 1 as there would be no need to take the extra step to refine facets if minCount is 1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3529) creating empty field + empty term leads to invalid index
[ https://issues.apache.org/jira/browse/LUCENE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3529. - Resolution: Fixed Fix Version/s: 3.5 Thanks Mike, your fix from 3183 was correct all along... we should have just gone with it originally... creating empty field + empty term leads to invalid index Key: LUCENE-3529 URL: https://issues.apache.org/jira/browse/LUCENE-3529 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Fix For: 3.5 Attachments: LUCENE-3529.patch, LUCENE-3529_test.patch Spinoff from LUCENE-3526. * if you create new Field(, ), you get IllegalArgumentException from Field's ctor: name and value cannot both be empty * But there are tons of other ways to index an empty term for the empty field (for example initially make it garbage then .setValue(), or via tokenstream). * If you do this, and you have assertions enabled, you will trip an assert (the assert is fixed in trunk, in LUCENE-3526) * But If you don't have assertions enabled, you will create a corrupt index: test: terms, freq, prox...ERROR [term : docFreq=1 != num docs seen 0 + num docs deleted 0] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3473) CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
[ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3473. - Resolution: Fixed Fix Version/s: 4.0 3.5 Assignee: Robert Muir CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms --- Key: LUCENE-3473 URL: https://issues.apache.org/jira/browse/LUCENE-3473 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.4, 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch, LUCENE-3473.patch Just glancing at the code it seems to sorta do this check, but only in the hasOrd==true case maybe (which seems to be testing something else)? It would be nice to verify this also for terms dicts that dont support ord. we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and preflex -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3508) Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes
[ https://issues.apache.org/jira/browse/LUCENE-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134676#comment-13134676 ] Robert Muir commented on LUCENE-3508: - Just one idea: if the base has makeDictionary(String[]), then maybe deprecate-3x-remove-trunk the stupid string[] ctors and just take the CharArraySet? I think this would remove about half the ctors in both base and subclasses, and I think these ctors are stupid myself. Otherwise, looks great! Decompounders based on CompoundWordTokenFilterBase cannot be used with custom attributes Key: LUCENE-3508 URL: https://issues.apache.org/jira/browse/LUCENE-3508 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 3.4, 4.0 Reporter: Spyros Kapnissis Assignee: Uwe Schindler Fix For: 3.5, 4.0 Attachments: LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch, LUCENE-3508.patch The CompoundWordTokenFilterBase.setToken method will call clearAttributes() and then will reset only the default Token attributes (term, position, flags, etc) resulting in any custom attributes losing their value. Commenting out clearAttributes() seems to do the trick, but will fail the TestCompoundWordTokenFilter tests.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3440: --- Attachment: LUCENE-3440.patch New patch, still has failures in test, though. FastVectorHighlighter: IDF-weighted terms for ordered fragments Key: LUCENE-3440 URL: https://issues.apache.org/jira/browse/LUCENE-3440 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.5, 4.0 Reporter: sebastian L. Priority: Minor Labels: FastVectorHighlighter Fix For: 3.5, 4.0 Attachments: LUCENE-3.5-SNAPSHOT-3440-8.patch, LUCENE-3440.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, weight-vs-boost_table01.html, weight-vs-boost_table02.html The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.
[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134717#comment-13134717 ] Aaron McCurry commented on LUCENE-2205: --- Awesome! Good job! Thank you for working on this with me! Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure. --- Key: LUCENE-2205 URL: https://issues.apache.org/jira/browse/LUCENE-2205 Project: Lucene - Java Issue Type: Improvement Components: core/index Environment: Java5 Reporter: Aaron McCurry Assignee: Michael McCandless Fix For: 3.5 Attachments: LUCENE-2205.patch, RandomAccessTest.java, TermInfosReader.java, TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, lowmemory_w_utf8_encoding.v4.patch, patch-final.txt, rawoutput.txt Basically packing those three arrays into a byte array with an int array as an index offset. The performance benefits are stagering on my test index (of size 6.2 GB, with ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the terminfos into memory were reduced to 17% of there original size. From 291.5 MB to 49.7 MB. The random access speed has been made better by 1-2%, load time of the segments are ~40% faster as well, and full GC's on my JVM were made 7 times faster. I have already performed the work and am offering this code as a patch. Currently all test in the trunk pass with this new code enabled. I did write a system property switch to allow for the original implementation to be used as well. -Dorg.apache.lucene.index.TermInfosReader=default or small I have also written a blog about this patch here is the link. http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org