[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b106) - Build # 7485 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7485/ Java: 64bit/jdk1.8.0-ea-b106 -XX:-UseCompressedOops -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields Error Message: invalid value for doc=351, field=f1 expected:15 but was:14 Stack Trace: java.lang.AssertionError: invalid value for doc=351, field=f1 expected:15 but was:14 at __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:491) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:724) Build Log: [...truncated 776 lines...] [junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestNumericDocValuesUpdates -Dtests.method=testManyReopensAndFields -Dtests.seed=5E1E0079E35D52E
[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768098#comment-13768098 ] Littlestar commented on LUCENE-5218: PagedBytes.java#fillSlice maybe wrong start?? public void fillSlice(BytesRef b, long start, int length) { assert length = 0: length= + length; assert length = blockSize+1; final int index = (int) (start blockBits); final int offset = (int) (start blockMask); b.length = length; if (blockSize - offset = length) { // Within block b.bytes = blocks[index]; b.offset = offset; } else { // Split b.bytes = new byte[length]; b.offset = 0; System.arraycopy(blocks[index], offset, b.bytes, 0, blockSize-offset); System.arraycopy(blocks[1+index], 0, b.bytes, blockSize-offset, length-(blockSize-offset)); } } background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768098#comment-13768098 ] Littlestar edited comment on LUCENE-5218 at 9/16/13 6:24 AM: - PagedBytes.java#fillSlice maybe wrong start?? {noformat} public void fillSlice(BytesRef b, long start, int length) { assert length = 0: length= + length; assert length = blockSize+1; final int index = (int) (start blockBits); final int offset = (int) (start blockMask); b.length = length; if (blockSize - offset = length) { // Within block b.bytes = blocks[index]; b.offset = offset; } else { // Split b.bytes = new byte[length]; b.offset = 0; System.arraycopy(blocks[index], offset, b.bytes, 0, blockSize-offset); System.arraycopy(blocks[1+index], 0, b.bytes, blockSize-offset, length-(blockSize-offset)); } } {noformat} was (Author: cnstar9988): PagedBytes.java#fillSlice maybe wrong start?? public void fillSlice(BytesRef b, long start, int length) { assert length = 0: length= + length; assert length = blockSize+1; final int index = (int) (start blockBits); final int offset = (int) (start blockMask); b.length = length; if (blockSize - offset = length) { // Within block b.bytes = blocks[index]; b.offset = offset; } else { // Split b.bytes = new byte[length]; b.offset = 0; System.arraycopy(blocks[index], offset, b.bytes, 0, blockSize-offset); System.arraycopy(blocks[1+index], 0, b.bytes, blockSize-offset, length-(blockSize-offset)); } } background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768107#comment-13768107 ] Shai Erera commented on LUCENE-5215: Ok I'll add just segmentSuffix Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Parquet dictionary encoding bit packing
indeed, I did look at Parquet and had the same feeling as Otis, some striking similarity with terminology used around stored fields. If I got it right, parquet chunk stores sets of documents in chunks, just like lucene does but each chunk is column stride. Maybe possible to apply this idea to compressing stored fields (chunks in column stride fashion)? On Sun, Sep 15, 2013 at 11:17 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I was reading the Parquet announcement from July: https://blog.twitter.com/2013/announcing-parquet-10-columnar-storage-for-hadoop And a few things caught my attention - Dictionary encoding and (dynamic) bit packing. This smells like something Adrien likes to eat for breakfast. Over in the Hadoop ecosystem Parquet interest has picked up: http://search-hadoop.com/?q=parquet I thought I'd point it out as I haven't seen anyone bring this up. I imagine there are ideas to be borrowed there. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5217) disable transitive dependencies in maven config
[ https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768115#comment-13768115 ] Steve Rowe commented on LUCENE-5217: bq. This is really hard to configure and maintain I agree. bq. maven supports wildcard exclusions: MNG-3832 I did not know that. bq. I think it just means we have to require a minimum of maven 3 instead of also supporting 2. Since this has been out for 3 years (in fact older than the ant 1.8.2 that we require), I don't see this as a significant imposition on anyone? +1, though this will be a viral change, unlike the Ant upgrade: for Ant, we only forced Lucene/Solr source users to upgrade, but for Maven, everybody who depends on binary Lucene/Solr artifacts will have to upgrade their own projects to Maven 3 - I think. I'll do some testing to confirm. disable transitive dependencies in maven config --- Key: LUCENE-5217 URL: https://issues.apache.org/jira/browse/LUCENE-5217 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Our ivy configuration does this: each dependency is specified and so we know what will happen. Unfortunately the maven setup is not configured the same way. Instead the maven setup is configured to download the internet: and it excludes certain things specifically. This is really hard to configure and maintain: we added a 'validate-maven-dependencies' that tries to fail on any extra jars, but all it really does is run a license check after maven runs. It wouldnt find unnecessary dependencies being dragged in if something else in lucene was using them and thus they had a license file. Since maven supports wildcard exclusions: MNG-3832, we can disable this transitive shit completely. We should do this, so its configuration is the exact parallel of ivy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0-ea-b106) - Build # 7485 - Failure!
I failed to reproduce with the reported seed, master seed, random seeds ... all with iters. I'll dig. Shai On Mon, Sep 16, 2013 at 9:07 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7485/ Java: 64bit/jdk1.8.0-ea-b106 -XX:-UseCompressedOops -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields Error Message: invalid value for doc=351, field=f1 expected:15 but was:14 Stack Trace: java.lang.AssertionError: invalid value for doc=351, field=f1 expected:15 but was:14 at __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:491) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:724) Build Log:
[jira] [Commented] (LUCENE-5217) disable transitive dependencies in maven config
[ https://issues.apache.org/jira/browse/LUCENE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768152#comment-13768152 ] Robert Muir commented on LUCENE-5217: - I wont comment on viral change :) But I think this is a totally fair thing to do for 5.0, since its a new major release. disable transitive dependencies in maven config --- Key: LUCENE-5217 URL: https://issues.apache.org/jira/browse/LUCENE-5217 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Our ivy configuration does this: each dependency is specified and so we know what will happen. Unfortunately the maven setup is not configured the same way. Instead the maven setup is configured to download the internet: and it excludes certain things specifically. This is really hard to configure and maintain: we added a 'validate-maven-dependencies' that tries to fail on any extra jars, but all it really does is run a license check after maven runs. It wouldnt find unnecessary dependencies being dragged in if something else in lucene was using them and thus they had a license file. Since maven supports wildcard exclusions: MNG-3832, we can disable this transitive shit completely. We should do this, so its configuration is the exact parallel of ivy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Parquet dictionary encoding bit packing
Thanks for pointing this out, Otis! I think the columnar nature of Parquet makes it more similar to doc values than to stored fields, and indeed, if you look at the parquet file-format specification [1], it is very similar to what we have for doc values [2]. In both cases, we have - dictionary encoding (PLAIN_DICTIONARY in parquet, TABLE_COMPRESSED in Lucene45DVF), - bit-packing (BIT_PACKED(/RLE) in parquet, DELTA_COMPRESSED in Lucene45DVF). Parquet also uses run-length encoding (RLE) which is unfortunately not doable for doc values since they need to support random access. Parquet's RLE compression is actually closer to what we have for postings lists (a postings list of X values is encoded as X/128 blocs of 128 packed values and X%128 RLE-encoded (VInt) values). On the other hand, doc values have GCD_COMPRESSED (which efficiently compresses any sequence of longs where all values can be expressed as a * x + b) which is typically useful for storing dates that don't have millisecond precision. About stored fields, it would indeed be possible to store all values of a given field in a column-stride fashion per chunk. However, I think parquet doesn't optimize for the same thing as stored fields: parquet needs to run computations on the values of a few fields of many documents (like doc values) while with stored fields, we usually need to get all values of a single document. This makes columnar storage a bit unconvenient for stored fields, although I think we could try it on our chunks of stored documents given that it may improve the compression ratio. I only have a very superficial understanding of parquet so if you know I said something which is wrong about parquet, please tell me! [1] https://github.com/parquet/parquet-format [2] https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768204#comment-13768204 ] Simon Willnauer commented on LUCENE-5189: - I only briefly looked at the changed in DW, DWPT, IW BDS and I have 2 questions: - SegmentWriteState flushState; in DWPT is unused - can we remove it? (I generally want this class to have only final members as well if possible) - In DW the `updateNumericDocValue` method is synchronized - I don't think it needs to. The other two deletes methods don't need to be synced either - maybe we can open another issue to remove the synchronization? It won't be possible to just drop it but it won't be much work. I really like the way how this is implemented piggybacking on the delete queue to get a total ordering :) nice one! Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 807 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/807/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9834 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=2C69D4309AF28FAE -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.6 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=4.6-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 -classpath
[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768253#comment-13768253 ] Michael McCandless commented on LUCENE-5218: Which JVM are you using? background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768274#comment-13768274 ] Shai Erera commented on LUCENE-5189: Jenkins reported this failure, which I'm unable to reproduce with and without the seed (master and child), with iters. {noformat} 1 tests failed. REGRESSION: org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields Error Message: invalid value for doc=351, field=f1 expected:15 but was:14 Stack Trace: java.lang.AssertionError: invalid value for doc=351, field=f1 expected:15 but was:14 at __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757) ... Build Log: [...truncated 776 lines...] [junit4] Suite: org.apache.lucene.index.TestNumericDocValuesUpdates [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestNumericDocValuesUpdates -Dtests.method=testManyReopensAndFields -Dtests.seed=5E1E0079E35D52E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=Etc/GMT-6 -Dtests.file.encoding=US-ASCII [junit4] FAILURE 1.40s J0 | TestNumericDocValuesUpdates.testManyReopensAndFields [junit4] Throwable #1: java.lang.AssertionError: invalid value for doc=351, field=f1 expected:15 but was:14 [junit4]at __randomizedtesting.SeedInfo.seed([5E1E0079E35D52E:331D82281FC0B632]:0) [junit4]at org.apache.lucene.index.TestNumericDocValuesUpdates.testManyReopensAndFields(TestNumericDocValuesUpdates.java:757) [junit4]at java.lang.Thread.run(Thread.java:724) [junit4] 2 NOTE: test params are: codec=Asserting, sim=RandomSimilarityProvider(queryNorm=false,coord=no): {}, locale=tr, timezone=Etc/GMT-6 [junit4] 2 NOTE: Linux 3.2.0-53-generic amd64/Oracle Corporation 1.8.0-ea (64-bit)/cpus=8,threads=1,free=66621176,total=210272256 [junit4] 2 NOTE: All tests run in this JVM: [TestSegmentReader, TestStressNRT, TestSort, TestShardSearching, TestEliasFanoSequence, TestBytesRefHash, TestPhrasePrefixQuery, TestLucene45DocValuesFormat, TestFastCompressionMode, TestEliasFanoDocIdSet, TestSearchForDuplicates, TestFixedBitSet, TestIsCurrent, TestFilteredSearch, TestFieldCacheSanityChecker, TestSegmentTermEnum, TestDeletionPolicy, TestSimpleExplanations, TestRegexpRandom, TestIndexCommit, TestCloseableThreadLocal, TestNumericRangeQuery32, TestTwoPhaseCommitTool, TestIndexWriterOnDiskFull, TestPhraseQuery, TestSearchAfter, TestParallelReaderEmptyIndex, TestMaxTermFrequency, TestFlushByRamOrCountsPolicy, TestSimilarity, TestNumericRangeQuery64, TestByteSlices, TestSameScoresWithThreads, TestDocValuesWithThreads, TestMockAnalyzer, TestArrayUtil, TestPostingsOffsets, TestCompressingTermVectorsFormat, TestSentinelIntSet, TestCustomNorms, TestExternalCodecs, TestNumericDocValuesUpdates] [junit4] Completed on J0 in 83.46s, 24 tests, 1 failure FAILURES! {noformat} Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768271#comment-13768271 ] Shai Erera commented on LUCENE-5189: bq. SegmentWriteState flushState; in DWPT is unused +1 to remove it. Indeed it's unused, but because it's package-private, eclipse doesn't complain about it. bq. In DW the `updateNumericDocValue` method is synchronized I followed the other two delete methods. I'm fine with opening a separate issue to remove the synchronization, especially if it's not trivial. bq. I really like the way how this is implemented piggybacking on the delete queue to get a total ordering Thanks, it was very helpful to have deletes already covered like that. I only had to follow their breadcrumbs :). Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768322#comment-13768322 ] Littlestar commented on LUCENE-5218: java version 1.7.0_25 I also build openjdk 7u40 with openjdk-7u40-fcs-src-b43-26_aug_2013.zip two jdks has same problem. background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768322#comment-13768322 ] Littlestar edited comment on LUCENE-5218 at 9/16/13 2:09 PM: - java version 1.7.0_25 I also build jdk 7u40 with openjdk-7u40-fcs-src-b43-26_aug_2013.zip two jdks has same problem. was (Author: cnstar9988): java version 1.7.0_25 I also build openjdk 7u40 with openjdk-7u40-fcs-src-b43-26_aug_2013.zip two jdks has same problem. background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768344#comment-13768344 ] Littlestar edited comment on LUCENE-5218 at 9/16/13 2:22 PM: - my app continue insert records, may be 10-1 records per seconds. lucene index with a lots of small segments, so I call forceMerge(80) before each call. was (Author: cnstar9988): my app continue insert records, may be 10-1 records per seconds. lucene index with very small segments, so I call forceMerge(80) before each call. background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768344#comment-13768344 ] Littlestar commented on LUCENE-5218: my app continue insert records, may be 10-1 records per seconds. lucene index with very small segments, so I call forceMerge(80) before each call. background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768362#comment-13768362 ] David Smiley commented on SOLR-2548: I just committed to trunk; I'll wait a day just in case and for any more feedback before applying to 4x. Multithreaded faceting -- Key: SOLR-2548 URL: https://issues.apache.org/jira/browse/SOLR-2548 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.1 Reporter: Janne Majaranta Assignee: Erick Erickson Priority: Minor Labels: facet Fix For: 4.5, 5.0 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5218) background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768380#comment-13768380 ] Michael McCandless commented on LUCENE-5218: Don't use 7u40: there is apparently a JVM bug that can cause index corruption like this (LUCENE-5212). But 7u25 should be safe. If you use only 7u25, and start from a new index, you can reproduce this exception? Can you run CheckIndex on the resulting index and post the output? background merge hit exception Caused by: java.lang.ArrayIndexOutOfBoundsException - Key: LUCENE-5218 URL: https://issues.apache.org/jira/browse/LUCENE-5218 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.4 Environment: Linux MMapDirectory. Reporter: Littlestar forceMerge(80) == Caused by: java.io.IOException: background merge hit exception: _3h(4.4):c79921/2994 _3vs(4.4):c38658 _eq(4.4):c38586 _h1(4.4):c37370 _16k(4.4):c36591 _j4(4.4):c34316 _dx(4.4):c30550 _3m6(4.4):c30058 _dl(4.4):c28440 _d8(4.4):c19599 _dy(4.4):c1500/75 _h2(4.4):c1500 into _3vt [maxNumSegments=80] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1714) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at com.xxx.yyy.engine.lucene.LuceneEngine.flushAndReopen(LuceneEngine.java:1295) ... 4 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.lucene.util.PagedBytes$Reader.fillSlice(PagedBytes.java:92) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer$6.get(Lucene42DocValuesProducer.java:267) at org.apache.lucene.codecs.DocValuesConsumer$2$1.setNext(DocValuesConsumer.java:239) at org.apache.lucene.codecs.DocValuesConsumer$2$1.hasNext(DocValuesConsumer.java:201) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addBinaryField(Lucene42DocValuesConsumer.java:218) at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addBinaryField(PerFieldDocValuesFormat.java:110) at org.apache.lucene.codecs.DocValuesConsumer.mergeBinaryField(DocValuesConsumer.java:186) at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:171) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.
[ https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768381#comment-13768381 ] Mark Miller commented on SOLR-5150: --- I'm just going to commit the current fix and worry about any performance improvements in another issue. HdfsIndexInput may not fully read requested bytes. -- Key: SOLR-5150 URL: https://issues.apache.org/jira/browse/SOLR-5150 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: SOLR-5150.patch Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - the read call we are using may not read all of the requested bytes - it returns the number of bytes actually written - which we ignore. Blur moved to using a seek and then readFully call - synchronizing across the two calls to deal with clones. We have seen that really kills performance, and using the readFully call that lets you pass the position rather than first doing a seek, performs much better and does not require the synchronization. I also noticed that the seekInternal impl should not seek but be a no op since we are seeking on the read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.
[ https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768387#comment-13768387 ] ASF subversion and git services commented on SOLR-5150: --- Commit 1523693 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1523693 ] SOLR-5150: HdfsIndexInput may not fully read requested bytes. HdfsIndexInput may not fully read requested bytes. -- Key: SOLR-5150 URL: https://issues.apache.org/jira/browse/SOLR-5150 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: SOLR-5150.patch Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - the read call we are using may not read all of the requested bytes - it returns the number of bytes actually written - which we ignore. Blur moved to using a seek and then readFully call - synchronizing across the two calls to deal with clones. We have seen that really kills performance, and using the readFully call that lets you pass the position rather than first doing a seek, performs much better and does not require the synchronization. I also noticed that the seekInternal impl should not seek but be a no op since we are seeking on the read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5082) Implement ie=charset parameter
[ https://issues.apache.org/jira/browse/SOLR-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768383#comment-13768383 ] David Smiley commented on SOLR-5082: Uwe, why did you give me credit with you on this in CHANGES.txt? By the way, I was looking through the code for this. Why in decodeBuffer() do you call remove() from the buffer iterator on every item; couldn't you not to that and simply call clear() when the loop is done? If you made that change, I think ArrayList would perform better for this buffer than LinkedList. Implement ie=charset parameter -- Key: SOLR-5082 URL: https://issues.apache.org/jira/browse/SOLR-5082 Project: Solr Issue Type: Improvement Affects Versions: 4.4 Reporter: Shawn Heisey Assignee: Uwe Schindler Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-5082.patch, SOLR-5082.patch Allow a user to send a query or update to Solr in a character set other than UTF-8 and inform Solr what charset to use with an ie parameter, for input encoding. This was discussed in SOLR-4265 and SOLR-4283. Changing the default charset is a bad idea because distributed search (SolrCloud) relies on UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 807 - Failure!
jvm crash: [junit4] JVM J0: stdout was not empty, see: /Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20130916_101823_672.sysout [junit4] JVM J0: stdout (verbatim) [junit4] # [junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4] # [junit4] # SIGSEGV (0xb) at pc=0x000103acfa2b, pid=388, tid=104711 [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_40-b43) (build 1.7.0_40-b43) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b56 mixed mode bsd-amd64 ) [junit4] # Problematic frame: [junit4] # C [libjava.dylib+0x9a2b] JNU_NewStringPlatform+0x1d3 On Mon, Sep 16, 2013 at 6:22 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/807/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 9834 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_40.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=2C69D4309AF28FAE -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.6 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=4.6-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 -classpath
[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.
[ https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768393#comment-13768393 ] ASF subversion and git services commented on SOLR-5150: --- Commit 1523694 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523694 ] SOLR-5150: HdfsIndexInput may not fully read requested bytes. HdfsIndexInput may not fully read requested bytes. -- Key: SOLR-5150 URL: https://issues.apache.org/jira/browse/SOLR-5150 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: SOLR-5150.patch Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - the read call we are using may not read all of the requested bytes - it returns the number of bytes actually written - which we ignore. Blur moved to using a seek and then readFully call - synchronizing across the two calls to deal with clones. We have seen that really kills performance, and using the readFully call that lets you pass the position rather than first doing a seek, performs much better and does not require the synchronization. I also noticed that the seekInternal impl should not seek but be a no op since we are seeking on the read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.
[ https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768400#comment-13768400 ] ASF subversion and git services commented on SOLR-5150: --- Commit 1523698 from [~markrmil...@gmail.com] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1523698 ] SOLR-5150: HdfsIndexInput may not fully read requested bytes. HdfsIndexInput may not fully read requested bytes. -- Key: SOLR-5150 URL: https://issues.apache.org/jira/browse/SOLR-5150 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: SOLR-5150.patch Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - the read call we are using may not read all of the requested bytes - it returns the number of bytes actually written - which we ignore. Blur moved to using a seek and then readFully call - synchronizing across the two calls to deal with clones. We have seen that really kills performance, and using the readFully call that lets you pass the position rather than first doing a seek, performs much better and does not require the synchronization. I also noticed that the seekInternal impl should not seek but be a no op since we are seeking on the read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.
[ https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-5150. --- Resolution: Fixed HdfsIndexInput may not fully read requested bytes. -- Key: SOLR-5150 URL: https://issues.apache.org/jira/browse/SOLR-5150 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: SOLR-5150.patch Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - the read call we are using may not read all of the requested bytes - it returns the number of bytes actually written - which we ignore. Blur moved to using a seek and then readFully call - synchronizing across the two calls to deal with clones. We have seen that really kills performance, and using the readFully call that lets you pass the position rather than first doing a seek, performs much better and does not require the synchronization. I also noticed that the seekInternal impl should not seek but be a no op since we are seeking on the read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768442#comment-13768442 ] Elran Dvir commented on SOLR-5084: -- Hi all, Did any have a chance to examine the latest patch? Thanks. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Erick Erickson Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch, Solr-5084.trunk.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: SimplePostToolTest very slow
: the main problem with any security manager is: To check if a connection : is allowed, it has to resolve DNS and look the IP up in the policy. Can we update the secuity policy to fail fast anytime a DNS lookup happens? even if it happens implicitly in situations like this (URL.hashCode) so we can more easily find problems like this via test Exceptions instead of via slow tests? (I'm not saying it's a good idea to do this -- i don't know -- it might make more trouble then it's worth ... i'm just trying to udnerstand if it's possible) -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5241) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com
[ https://issues.apache.org/jira/browse/SOLR-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768474#comment-13768474 ] ASF subversion and git services commented on SOLR-5241: --- Commit 1523725 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1523725 ] SOLR-5241: Fix SimplePostToolTest performance problem - implicit DNS lookups SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com --- Key: SOLR-5241 URL: https://issues.apache.org/jira/browse/SOLR-5241 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5241.patch, SOLR-5241.patch As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously slow when he ran from ant, but only takes 1 second in his IDE. problem seems to be relate to the URL class attempting to response example.com -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5241) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com
[ https://issues.apache.org/jira/browse/SOLR-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5241. Resolution: Fixed Fix Version/s: 4.6 5.0 bq. ... Theoretically we could also use 127.0.0.1, the blackhole is not related here, because it just looks up hostnames. ... we could, this was my point earlier when i asked rmuir why [ff01::114] was better - since we're never opening a socket i didn't understand the diff. now that i do understand the diff however, i definitely think [ff01::114] is better -- not because of anything in the test now, but because it helps protect us from the risk of someone working on the test in the future and accidentally changing something so that it *does* start trying to open sockets. so i've committed the most recent patch as is. Thanks everybody for your help. SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com --- Key: SOLR-5241 URL: https://issues.apache.org/jira/browse/SOLR-5241 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.6 Attachments: SOLR-5241.patch, SOLR-5241.patch As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously slow when he ran from ant, but only takes 1 second in his IDE. problem seems to be relate to the URL class attempting to response example.com -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SimplePostToolTest very slow
Solr tests will all completely fail in that case then: just like they do when i run them on my laptop with internet disconnected. thats because it looks up its own hostname: which involves reverse/forward dns lookups. On Mon, Sep 16, 2013 at 1:07 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : the main problem with any security manager is: To check if a connection : is allowed, it has to resolve DNS and look the IP up in the policy. Can we update the secuity policy to fail fast anytime a DNS lookup happens? even if it happens implicitly in situations like this (URL.hashCode) so we can more easily find problems like this via test Exceptions instead of via slow tests? (I'm not saying it's a good idea to do this -- i don't know -- it might make more trouble then it's worth ... i'm just trying to udnerstand if it's possible) -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Parquet dictionary encoding bit packing
You guys got it, of course. :) I liked the sound of being able to detect how to pack things at run time and switch between multiple approaches over time or at least that's how I interpreted the announcement. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Sep 16, 2013 at 4:29 AM, Adrien Grand jpou...@gmail.com wrote: Thanks for pointing this out, Otis! I think the columnar nature of Parquet makes it more similar to doc values than to stored fields, and indeed, if you look at the parquet file-format specification [1], it is very similar to what we have for doc values [2]. In both cases, we have - dictionary encoding (PLAIN_DICTIONARY in parquet, TABLE_COMPRESSED in Lucene45DVF), - bit-packing (BIT_PACKED(/RLE) in parquet, DELTA_COMPRESSED in Lucene45DVF). Parquet also uses run-length encoding (RLE) which is unfortunately not doable for doc values since they need to support random access. Parquet's RLE compression is actually closer to what we have for postings lists (a postings list of X values is encoded as X/128 blocs of 128 packed values and X%128 RLE-encoded (VInt) values). On the other hand, doc values have GCD_COMPRESSED (which efficiently compresses any sequence of longs where all values can be expressed as a * x + b) which is typically useful for storing dates that don't have millisecond precision. About stored fields, it would indeed be possible to store all values of a given field in a column-stride fashion per chunk. However, I think parquet doesn't optimize for the same thing as stored fields: parquet needs to run computations on the values of a few fields of many documents (like doc values) while with stored fields, we usually need to get all values of a single document. This makes columnar storage a bit unconvenient for stored fields, although I think we could try it on our chunks of stored documents given that it may improve the compression ratio. I only have a very superficial understanding of parquet so if you know I said something which is wrong about parquet, please tell me! [1] https://github.com/parquet/parquet-format [2] https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5241) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com
[ https://issues.apache.org/jira/browse/SOLR-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768478#comment-13768478 ] ASF subversion and git services commented on SOLR-5241: --- Commit 1523726 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523726 ] SOLR-5241: Fix SimplePostToolTest performance problem - implicit DNS lookups (merge r1523725) SimplePostToolTest is slow on some systmes - likely due to hostname resolution of example.com --- Key: SOLR-5241 URL: https://issues.apache.org/jira/browse/SOLR-5241 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-5241.patch, SOLR-5241.patch As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously slow when he ran from ant, but only takes 1 second in his IDE. problem seems to be relate to the URL class attempting to response example.com -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Parquet dictionary encoding bit packing
To some extent that already happens in a rough way in things like BlockPackedWriter (and also postings lists). For example these things encode blocks (e.g. 128 in the postings, maybe 1024 in docvalues, i forget), and if they encounter blocks of all the same value, they just write a bit marking that and encode the value once. On Mon, Sep 16, 2013 at 1:18 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: You guys got it, of course. :) I liked the sound of being able to detect how to pack things at run time and switch between multiple approaches over time or at least that's how I interpreted the announcement. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Sep 16, 2013 at 4:29 AM, Adrien Grand jpou...@gmail.com wrote: Thanks for pointing this out, Otis! I think the columnar nature of Parquet makes it more similar to doc values than to stored fields, and indeed, if you look at the parquet file-format specification [1], it is very similar to what we have for doc values [2]. In both cases, we have - dictionary encoding (PLAIN_DICTIONARY in parquet, TABLE_COMPRESSED in Lucene45DVF), - bit-packing (BIT_PACKED(/RLE) in parquet, DELTA_COMPRESSED in Lucene45DVF). Parquet also uses run-length encoding (RLE) which is unfortunately not doable for doc values since they need to support random access. Parquet's RLE compression is actually closer to what we have for postings lists (a postings list of X values is encoded as X/128 blocs of 128 packed values and X%128 RLE-encoded (VInt) values). On the other hand, doc values have GCD_COMPRESSED (which efficiently compresses any sequence of longs where all values can be expressed as a * x + b) which is typically useful for storing dates that don't have millisecond precision. About stored fields, it would indeed be possible to store all values of a given field in a column-stride fashion per chunk. However, I think parquet doesn't optimize for the same thing as stored fields: parquet needs to run computations on the values of a few fields of many documents (like doc values) while with stored fields, we usually need to get all values of a single document. This makes columnar storage a bit unconvenient for stored fields, although I think we could try it on our chunks of stored documents given that it may improve the compression ratio. I only have a very superficial understanding of parquet so if you know I said something which is wrong about parquet, please tell me! [1] https://github.com/parquet/parquet-format [2] https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesConsumer.java -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768518#comment-13768518 ] Hoss Man commented on SOLR-2548: David: i'm not suggesting we rush this -- but if your changes aren't going to make it into 4.5, we should track them in a new issue that can have it's own record in CHANGES.txt so it's clear what versions of Solr have what version of the code. Multithreaded faceting -- Key: SOLR-2548 URL: https://issues.apache.org/jira/browse/SOLR-2548 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.1 Reporter: Janne Majaranta Assignee: Erick Erickson Priority: Minor Labels: facet Fix For: 4.5, 5.0 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #449: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/449/ 83 tests failed. FAILED: org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:13F466DD345B91A3]:0) at org.apache.lucene.expressions.TestDemoExpressions.doTestLotsOfBindings(TestDemoExpressions.java:174) at org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings(TestDemoExpressions.java:156) FAILED: org.apache.lucene.expressions.TestDemoExpressions.test Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:5DC4FDA4CB21DF11]:0) at org.apache.lucene.expressions.TestDemoExpressions.test(TestDemoExpressions.java:85) FAILED: org.apache.lucene.expressions.TestDemoExpressions.testSortValues Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:F364EFF69B4C4982]:0) at org.apache.lucene.expressions.TestDemoExpressions.testSortValues(TestDemoExpressions.java:100) FAILED: org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:2DBE21DF8775F2C1]:0) at org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding(TestDemoExpressions.java:118) FAILED: org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D590C27E65DDB2E9:E49206CA6799B14F]:0) at org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression(TestDemoExpressions.java:136) FAILED: org.apache.lucene.expressions.TestExpressionSorts.testQueries Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([BD0F8AC4302D298F:E181461F2A449C21]:0) at org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:146) at org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:130) at org.apache.lucene.expressions.TestExpressionSorts.testQueries(TestExpressionSorts.java:101) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testValidExternals Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([DDCBCADBC5FF7283:7A90B2D1B9FBFC41]:0) at org.apache.lucene.expressions.TestExpressionValidation.testValidExternals(TestExpressionValidation.java:33) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion3 Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([DDCBCADBC5FF7283:7DA332E3CD41F10]:0) at org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion3(TestExpressionValidation.java:103) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2 Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([DDCBCADBC5FF7283:7AB01C912ED84010]:0) at org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2(TestExpressionValidation.java:90) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion Error Message: Could not
[jira] [Commented] (SOLR-2548) Multithreaded faceting
[ https://issues.apache.org/jira/browse/SOLR-2548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768528#comment-13768528 ] David Smiley commented on SOLR-2548: I thought about that. I figure that if I'm cautious about this such as by committing to trunk first, as I did, then there shouldn't be consternation about porting this to branch_45. Besides, I have more confidence in understanding the code that I committed vs. what it replaced. But I take your point that *if* for some reason it doesn't go to v4.5 then, sure, use another issue. Multithreaded faceting -- Key: SOLR-2548 URL: https://issues.apache.org/jira/browse/SOLR-2548 Project: Solr Issue Type: Improvement Components: search Affects Versions: 3.1 Reporter: Janne Majaranta Assignee: Erick Erickson Priority: Minor Labels: facet Fix For: 4.5, 5.0 Attachments: SOLR-2548_4.2.1.patch, SOLR-2548_for_31x.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548_multithreaded_faceting,_dsmiley.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch, SOLR-2548.patch Add multithreading support for faceting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768629#comment-13768629 ] wolfgang hoschek commented on SOLR-1301: cdk-morphlines-solr-core and cdk-morphlines-solr-cell should remain separate and be available through separate maven modules so that clients such as Flume Solr Sink and Hbase Indexer can continue to choose to depend (or not depend) on them. For example, not everyone wants tika and it's dependency chain. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5234) Allow SolrResourceLoader to load resources from URLs
[ https://issues.apache.org/jira/browse/SOLR-5234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768616#comment-13768616 ] Markus Jelsma commented on SOLR-5234: - So i assume this will provide a similar functionallity as SOLR-5234 when in cloud mode? Allow SolrResourceLoader to load resources from URLs Key: SOLR-5234 URL: https://issues.apache.org/jira/browse/SOLR-5234 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-5234.patch, SOLR-5234.patch This would allow multiple solr instance to share large configuration files. It would also help resolve problems caused by attempting to store 1Mb files in zookeeper. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768661#comment-13768661 ] Erick Erickson commented on SOLR-5084: -- I have a flight coming up, I'll see if I can give it a look-see. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Assignee: Erick Erickson Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch, Solr-5084.trunk.patch, Solr-5084.trunk.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768662#comment-13768662 ] wolfgang hoschek commented on SOLR-1301: Seems like the patch still misses tika-xmp. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5214) Add new FreeTextSuggester, to handle long tail suggestions
[ https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768737#comment-13768737 ] Dawid Weiss commented on LUCENE-5214: - I looked through the patch but I didn't get it, too late ;) I'll give it another shot later. Anyway, the idea is very interesting though -- I wonder how much left-context (regardless of this implementation) one needs for the right prediction (reminds me of Markov chains and generative poetry :) Add new FreeTextSuggester, to handle long tail suggestions Key: LUCENE-5214 URL: https://issues.apache.org/jira/browse/LUCENE-5214 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.6 Attachments: LUCENE-5214.patch The current suggesters are all based on a finite space of possible suggestions, i.e. the ones they were built on, so they can only suggest a full suggestion from that space. This means if the current query goes outside of that space then no suggestions will be found. The goal of FreeTextSuggester is to address this, by giving predictions based on an ngram language model, i.e. using the last few tokens from the user's query to predict likely following token. I got the idea from this blog post about Google's suggest: http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html This is very much still a work in progress, but it seems to be working. I've tested it on the AOL query logs, using an interactive tool from luceneutil to show the suggestions, and it seems to work well. It's fun to use that tool to explore the word associations... I don't think this suggester would be used standalone; rather, I think it'd be a fallback for times when the primary suggester fails to find anything. You can see this behavior on google.com, if you type the fast and the , you see entire queries being suggested, but then if the next word you type is burning then suddenly you see the suggestions are only based on the last word, not the entire query. It uses ShingleFilter under-the-hood to generate the token ngrams; once LUCENE-5180 is in it will be able to properly handle a user query that ends with stop-words (e.g. wizard of ), and then stores the ngrams in an FST. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5215) Add support for FieldInfos generation
[ https://issues.apache.org/jira/browse/LUCENE-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5215: --- Attachment: LUCENE-5215.patch Patch adds FieldInfos generation: * SegmentInfoPerCommit manages fieldInfosGen; SegmentInfos read/write it, like delGen. ** Updated SegmentInfos format jdocs * ReaderAndLiveDocs writes a new FIS generation if there are DV updates, also updates existing FIs dvGen. ** We now support updating documents in segments where the field wasn't indexed (sparse DV). * New Lucene46Codec and Lucene46FieldInfosFormat for writing the dvGen per field in the fnm file. ** Updated package.html ** Updated FieldInfosFormat jdocs ** Deprecated Lucene45Codec, moved Lucene42FieldInfosWriter to test-framework, added Lucene45RWCodec * Added a static utility method to SegmentReader to readFieldInfos from SIPC, since a couple of places in the code needed to execute same logic. * Added segmentSuffix to FieldsReader/Writer. Most of the changes in the patch are due to the new Lucene46Codec. I couldn't test FIS.gen without making all the other changes (Lucene45Codec deprecation etc.) because I didn't feel running tests with e.g. -Dtests.codec=Lucene46 is enough. So the patch is big, but if you want to review the FIS.gen changes, you should look at: Lucene46Codec, Lucene46FieldInfosFormat, ReaderAndLiveDocs, SIPC, SIS. Core tests pass, so I think it's ready for a review. Also, do I understand correctly that a 4.5 index for TestBackcompat will be created when we release 4.6 (if that issue makes it to 4.6)? Add support for FieldInfos generation - Key: LUCENE-5215 URL: https://issues.apache.org/jira/browse/LUCENE-5215 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5215.patch In LUCENE-5189 we've identified few reasons to do that: # If you want to update docs' values of field 'foo', where 'foo' exists in the index, but not in a specific segment (sparse DV), we cannot allow that and have to throw a late UOE. If we could rewrite FieldInfos (with generation), this would be possible since we'd also write a new generation of FIS. # When we apply NDV updates, we call DVF.fieldsConsumer. Currently the consumer isn't allowed to change FI.attributes because we cannot modify the existing FIS. This is implicit however, and we silently ignore any modified attributes. FieldInfos.gen will allow that too. The idea is to add to SIPC fieldInfosGen, add to each FieldInfo a dvGen and add support for FIS generation in FieldInfosFormat, SegReader etc., like we now do for DocValues. I'll work on a patch. Also on LUCENE-5189, Rob raised a concern about SegmentInfo.attributes that have same limitation -- if a Codec modifies them, they are silently being ignored, since we don't gen the .si files. I think we can easily solve that by recording SI.attributes in SegmentInfos, so they are recorded per-commit. But I think it should be handled in a separate issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #972: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/972/ 83 tests failed. FAILED: org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:E0EB452872B80D84]:0) at org.apache.lucene.expressions.TestDemoExpressions.testExpressionRefersToExpression(TestDemoExpressions.java:134) FAILED: org.apache.lucene.expressions.TestDemoExpressions.testSortValues Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:F71DAC148E6DF549]:0) at org.apache.lucene.expressions.TestDemoExpressions.testSortValues(TestDemoExpressions.java:98) FAILED: org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:29C7623D92544E0A]:0) at org.apache.lucene.expressions.TestDemoExpressions.testTwoOfSameBinding(TestDemoExpressions.java:116) FAILED: org.apache.lucene.expressions.TestDemoExpressions.test Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:59BDBE46DE0063DA]:0) at org.apache.lucene.expressions.TestDemoExpressions.test(TestDemoExpressions.java:83) FAILED: org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([D1E9819C70FC0E22:178D253F217A2D68]:0) at org.apache.lucene.expressions.TestDemoExpressions.doTestLotsOfBindings(TestDemoExpressions.java:172) at org.apache.lucene.expressions.TestDemoExpressions.testLotsOfBindings(TestDemoExpressions.java:154) FAILED: org.apache.lucene.expressions.TestExpressionSorts.testQueries Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([A405898F38B743F3:F88B455422DEF65D]:0) at org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:144) at org.apache.lucene.expressions.TestExpressionSorts.assertQuery(TestExpressionSorts.java:128) at org.apache.lucene.expressions.TestExpressionSorts.testQueries(TestExpressionSorts.java:99) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal2 Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([EEC2EA3D0D51:B94A06B61EEAFE2E]:0) at org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal2(TestExpressionValidation.java:56) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2 Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([EEC2EA3D0D51:49B93C7AEB2A3FC2]:0) at org.apache.lucene.expressions.TestExpressionValidation.testCoRecursion2(TestExpressionValidation.java:90) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal Error Message: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler Stack Trace: java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.expressions.js.JavascriptCompiler at __randomizedtesting.SeedInfo.seed([EEC2EA3D0D51:B9449032CFC5C3D]:0) at org.apache.lucene.expressions.TestExpressionValidation.testInvalidExternal(TestExpressionValidation.java:44) FAILED: org.apache.lucene.expressions.TestExpressionValidation.testSelfRecursion Error
[jira] [Updated] (SOLR-4221) Custom sharding
[ https://issues.apache.org/jira/browse/SOLR-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-4221: - Fix Version/s: 5.0 4.5 Custom sharding --- Key: SOLR-4221 URL: https://issues.apache.org/jira/browse/SOLR-4221 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Noble Paul Fix For: 4.5, 5.0 Attachments: SOLR-4221.patch, SOLR-4221.patch, SOLR-4221.patch, SOLR-4221.patch, SOLR-4221.patch Features to let users control everything about sharding/routing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1301: -- Attachment: SOLR-1301.patch This is likely the last patch I'll put up for a bit - I'm on vacation from Wed-Mon. Patch Notes: ant precommit passes again. I've fixed the forbidden api calls and a couple minor javadoc issues in the new morphlines code. Also fixed a more problematic javadocs issue due to broken links from the morphlines code to extraction code due to extending extraction classes. I've added tika-xmp to the extraction dependencies. I don't like that tests can pass when some necessary run-time jars are missing - we will likely need to look into adding simple tests that cause each necessary jar to be used - or even just have hack tests that try and create a class in the offending jars or something. I'll save that for a follow up issue though - the solr cell morphlines tests actually upped the number of dependencies tests hit quite a bit at least. There is also a test speed issue that is not on the critical path - on my fast machine that does 8 tests in parallel, this adds about 4-5 minutes to the tests. It would be good to try and minimize some of the longer tests for std runs, and keep them as is for @nightly runs. That can wait post commit though. That leaves the following 2 critical path items to deal with: * Get the tests to run without a hacked test.policy file. * Dist packaging. This includes things like creation of the final MapReduceIndexerTool jar file and dealing with it's dependencies, as well as the location of the morphlines code and how it is distributed. Other than that we are looking pretty good - all tests passing and precommit passing. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 4.5, 5.0 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache
Re: Can we use TREC data set in open source?
Inline below On Sep 9, 2013, at 10:53 PM, Han Jiang jiangha...@gmail.com wrote: Back in 2007 Grant contacted with NIST about making TREC collection available to our community: http://mail-archives.apache.org/mod_mbox/lucene-dev/200708.mbox/browser I think a try for this is really important to our project and people who use Lucene. All these years the speed performance is mainly tuned on Wikipedia, however it's not very 'standard': * it doesn't represent how real-world search works; * it cannot be used to evaluate the relevance of our scoring models; * researchers tend to do experiments on other data sets, and usually it is hard to know whether Lucene performs its best performance; And personally I agree with this line: I think it would encourage Lucene users/developers to think about relevance as much as we think about speed. There's been much work to make Lucene's scoring models pluggable in 4.0, and it'll be great if we can explore more about it. It is very appealing to see a high-performance library work along with state-of-the-art ranking methods. And about TREC data set, the problems we met are: 1. NIST/TREC does not own the original collections, therefore it might be necessary to have direct contact with those organizations who really did, such as: http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html http://lemurproject.org/clueweb12/ 2. Currently, there is no open-source license for any of the data sets, so it won't be as 'open' as Wikipedia is. As is proposed by Grant, a possibility is to make the data set accessible only to committers instead of all users. It is not very open-source then, but TREC data sets is public and usually available to researchers, so people can still reproduce performance test. I'm quite curious, has anyone explored getting an open-source license for one of those data sets? And is our community still interested about this issue after all these years? It continues to be of interest to me. I've had various conversations throughout the years on it. Most people like the idea, but are not sure how to distribute it in an open way (ClueWeb comes as 4 1TB disks right now) and I am also not sure how they would handle any copyright/redaction claims against it. There is, of course, little incentive for those involved to solve these, either, as most people who are interested sign the form and pay the $600 for the disks. I've had a number of conversations about how I view this to be a significant barrier to open research, esp. in under-served countries and to open source. People sympathize with me, but then move on. To this day, I think the only way it will happen is for the community to build a completely open system, perhaps based off of Common Crawl or our own crawl and host it ourselves and develop judgments, etc. We tried to get this off the ground w/ the Open Relevance Project, but there was never a sustainable effort, and thus I have little hope at this point for it (but I would love to be proven wrong) For it to succeed, I think we would need the backing of a University with students interested in curating such a collection, the judgments, etc. I think we could figure out how to distribute the data either as an AWS public data set or possibly via the ASF or similar (although I am pretty sure the ASF would balk at multi-TB sized downloads). Happy to hear other ideas. Grant Ingersoll | @gsingers http://www.lucidworks.com
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768967#comment-13768967 ] Robert Muir commented on LUCENE-5212: - https://bugs.openjdk.java.net/browse/JDK-8024830 java 7u40 causes sigsegv and corrupt term vectors - Key: LUCENE-5212 URL: https://issues.apache.org/jira/browse/LUCENE-5212 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: hs_err_pid32714.log, jenkins.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768980#comment-13768980 ] Sea Marie commented on SOLR-4470: - Hi, I am running a simple two-node SolrCloud cluster with this patch (pulled from Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. I made a few small changes to the Jetty configs to restrict access via basic auth on all SOLR resources. After rebooting with these changes, the SolrCore on my second node is not coming up - it seems like the credentials are not being used in the core recovery code, or not being passed to ZooKeeper, or something. Have I missed some configuration step? Or am I confused and this scenario is not supported by this patch? Here are the changes I made in Jetty to enable basic auth: etc/webdefault.xml (perhaps protecting everything is overly general?): security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint role-nameaccess-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameAccess Realm/realm-name /login-config etc/jetty.xml: Call name=addBean Arg New class=org.eclipse.jetty.security.HashLoginService Set name=nameAccess Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set Set name=refreshInterval0/Set /New /Arg /Call etc/realm.properties (redacted for obvious reasons :)) user: password, access-role And the changes to SOLR-related things: scripts/ctl.sh (on host2): SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 -Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ -Djetty.home=$INSTALL_PATH/ -jar -DinternalAuthCredentialsBasicAuthUsername=user -DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar $INSTALL_PATH/etc/jetty.xml (on host1, same as above w/o the -DzkHost param) And then the error I'm getting (on host2, the second node, only. host1, the leader is fine): INFO - 2013-09-16 23:36:58.409; org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error while trying to recover. core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://host1:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.5, 5.0 Attachments: SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch, SOLR-4470.patch, SOLR-4470.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously trigger a subrequest, and fallback to a configured internal
[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768980#comment-13768980 ] Sea Marie edited comment on SOLR-4470 at 9/17/13 12:02 AM: --- Hi, I am running a simple two-node SolrCloud cluster with this patch (pulled from Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. I made a few small changes to the Jetty configs to restrict access via basic auth on all SOLR resources. After rebooting with these changes, the SolrCore on my second node is not coming up - it seems like the credentials are not being used in the core recovery code, or not being passed to ZooKeeper, or something. Have I missed some configuration step? Or am I confused and this scenario is not supported by this patch? h5. Changes I made in Jetty to enable basic auth: h6. etc/webdefault.xml (perhaps protecting everything is overly general?): security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint role-nameaccess-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameAccess Realm/realm-name /login-config h6. etc/jetty.xml: Call name=addBean Arg New class=org.eclipse.jetty.security.HashLoginService Set name=nameAccess Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set Set name=refreshInterval0/Set /New /Arg /Call h6. etc/realm.properties (redacted for obvious reasons :)) user: password, access-role h5. Changes to SOLR-related things: scripts/ctl.sh (on host2): SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 -Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ -Djetty.home=$INSTALL_PATH/ -jar -DinternalAuthCredentialsBasicAuthUsername=user -DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar $INSTALL_PATH/etc/jetty.xml (on host1, same as above w/o the -DzkHost param) h5. The error I'm seeing (on host2, the second node, only. host1, the leader is fine): INFO - 2013-09-16 23:36:58.409; org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error while trying to recover. core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://host1:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) was (Author: sapphiremirage): Hi, I am running a simple two-node SolrCloud cluster with this patch (pulled from Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. I made a few small changes to the Jetty configs to restrict access via basic auth on all SOLR resources. After rebooting with these changes, the SolrCore on my second node is not coming up - it seems like the credentials are not being used in the core recovery code, or not being passed to ZooKeeper, or something. Have I missed some configuration step? Or am I confused and this scenario is not supported by this patch? Here are the changes I made in Jetty to enable basic auth: etc/webdefault.xml (perhaps protecting everything is overly general?): security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint role-nameaccess-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameAccess Realm/realm-name /login-config etc/jetty.xml: Call name=addBean Arg New class=org.eclipse.jetty.security.HashLoginService Set name=nameAccess Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set Set name=refreshInterval0/Set /New /Arg /Call etc/realm.properties (redacted for obvious reasons :)) user: password, access-role And the changes to SOLR-related things: scripts/ctl.sh (on host2): SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf
[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768980#comment-13768980 ] Sea Marie edited comment on SOLR-4470 at 9/17/13 12:04 AM: --- Hi, I am running a simple two-node SolrCloud cluster with this patch (pulled from Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. I made a few small changes to the Jetty configs to restrict access via basic auth on all SOLR resources. After rebooting with these changes, the SolrCore on my second node is not coming up - it seems like the credentials are not being used in the core recovery code, or not being passed to ZooKeeper, or something. Have I missed some configuration step? Or am I confused and this scenario is not supported by this patch? h5. Changes I made in Jetty to enable basic auth: h6. etc/webdefault.xml (perhaps protecting everything is overly general?): {noformat} security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint role-nameaccess-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameAccess Realm/realm-name /login-config {noformat} h6. etc/jetty.xml: {noformat} Call name=addBean Arg New class=org.eclipse.jetty.security.HashLoginService Set name=nameAccess Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set Set name=refreshInterval0/Set /New /Arg /Call {noformat} h6. etc/realm.properties (redacted for obvious reasons :)) {noformat} user: password, access-role {noformat} h5. Changes to SOLR-related things: scripts/ctl.sh (on host2): {noformat} SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 -Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ -Djetty.home=$INSTALL_PATH/ -jar -DinternalAuthCredentialsBasicAuthUsername=user -DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar $INSTALL_PATH/etc/jetty.xml {noformat} (on host1, same as above w/o the -DzkHost param) h5. The error I'm seeing (on host2, the second node, only. host1, the leader is fine): {noformat} INFO - 2013-09-16 23:36:58.409; org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error while trying to recover. core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://host1:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) {noformat} was (Author: sapphiremirage): Hi, I am running a simple two-node SolrCloud cluster with this patch (pulled from Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. I made a few small changes to the Jetty configs to restrict access via basic auth on all SOLR resources. After rebooting with these changes, the SolrCore on my second node is not coming up - it seems like the credentials are not being used in the core recovery code, or not being passed to ZooKeeper, or something. Have I missed some configuration step? Or am I confused and this scenario is not supported by this patch? h5. Changes I made in Jetty to enable basic auth: h6. etc/webdefault.xml (perhaps protecting everything is overly general?): security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint role-nameaccess-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameAccess Realm/realm-name /login-config h6. etc/jetty.xml: Call name=addBean Arg New class=org.eclipse.jetty.security.HashLoginService Set name=nameAccess Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set Set name=refreshInterval0/Set /New /Arg /Call h6. etc/realm.properties (redacted for obvious reasons :)) user: password, access-role h5. Changes to
[jira] [Updated] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back
[ https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-5240: --- Attachment: SOLR-5240.patch Here's the simplest patch that fixes it - removing any executor thread limit when in ZK mode. Note that this deadlock-until-timeout situation can also easily happen even when replicas of a particular shard aren't on the same node. All that is required is to have more than 3 cores per node. SolrCloud node doesn't (quickly) come all the way back -- Key: SOLR-5240 URL: https://issues.apache.org/jira/browse/SOLR-5240 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5 Reporter: Yonik Seeley Fix For: 4.5 Attachments: SOLR-5240.patch Killing a single node and bringing it back up can result in waiting until we see more replicas up... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back
[ https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769015#comment-13769015 ] Mark Miller commented on SOLR-5240: --- +1 - any other fix seems somewhat complicated. SolrCloud node doesn't (quickly) come all the way back -- Key: SOLR-5240 URL: https://issues.apache.org/jira/browse/SOLR-5240 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5 Reporter: Yonik Seeley Fix For: 4.5 Attachments: SOLR-5240.patch Killing a single node and bringing it back up can result in waiting until we see more replicas up... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-4470) Support for basic http auth in internal solr requests
[ https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sea Marie updated SOLR-4470: Comment: was deleted (was: Hi, I am running a simple two-node SolrCloud cluster with this patch (pulled from Jan's GitHub and built from there) using the built-in ZooKeeper and Jetty. I made a few small changes to the Jetty configs to restrict access via basic auth on all SOLR resources. After rebooting with these changes, the SolrCore on my second node is not coming up - it seems like the credentials are not being used in the core recovery code, or not being passed to ZooKeeper, or something. Have I missed some configuration step? Or am I confused and this scenario is not supported by this patch? h5. Changes I made in Jetty to enable basic auth: h6. etc/webdefault.xml (perhaps protecting everything is overly general?): {noformat} security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint role-nameaccess-role/role-name /auth-constraint /security-constraint login-config auth-methodBASIC/auth-method realm-nameAccess Realm/realm-name /login-config {noformat} h6. etc/jetty.xml: {noformat} Call name=addBean Arg New class=org.eclipse.jetty.security.HashLoginService Set name=nameAccess Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set Set name=refreshInterval0/Set /New /Arg /Call {noformat} h6. etc/realm.properties (redacted for obvious reasons :)) {noformat} user: password, access-role {noformat} h5. Changes to SOLR-related things: scripts/ctl.sh (on host2): {noformat} SOLR=$JAVABIN -Dbootstrap_confdir=$SOLR_HOME/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=host1:9983 -Dsolr.solr.home=$SOLR_HOME -Djetty.logs=$INSTALL_PATH/logs/ -Djetty.home=$INSTALL_PATH/ -jar -DinternalAuthCredentialsBasicAuthUsername=user -DinternalAuthCredentialsBasicAuthPassword=password $INSTALL_PATH/start.jar $INSTALL_PATH/etc/jetty.xml {noformat} (on host1, same as above w/o the -DzkHost param) h5. The error I'm seeing (on host2, the second node, only. host1, the leader is fine): {noformat} INFO - 2013-09-16 23:36:58.409; org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false ERROR - 2013-09-16 23:36:58.433; org.apache.solr.common.SolrException; Error while trying to recover. core=collection1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://host1:8983/solr returned non ok status:401, message:Unauthorized at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) {noformat} ) Support for basic http auth in internal solr requests - Key: SOLR-4470 URL: https://issues.apache.org/jira/browse/SOLR-4470 Project: Solr Issue Type: New Feature Components: clients - java, multicore, replication (java), SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Jan Høydahl Labels: authentication, https, solrclient, solrcloud, ssl Fix For: 4.5, 5.0 Attachments: SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r145.patch, SOLR-4470.patch, SOLR-4470.patch We want to protect any HTTP-resource (url). We want to require credentials no matter what kind of HTTP-request you make to a Solr-node. It can faily easy be acheived as described on http://wiki.apache.org/solr/SolrSecurity. This problem is that Solr-nodes also make internal request to other Solr-nodes, and for it to work credentials need to be provided here also. Ideally we would like to forward credentials from a particular request to all the internal sub-requests it triggers. E.g. for search and update request. But there are also internal requests * that only indirectly/asynchronously triggered from outside requests (e.g. shard creation/deletion/etc based on calls to the Collection API) * that do not in any way have relation to an outside super-request (e.g. replica synching stuff) We would like to aim at a solution where original credentials are forwarded when a request directly/synchronously
[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back
[ https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769099#comment-13769099 ] ASF subversion and git services commented on SOLR-5240: --- Commit 1523871 from [~yo...@apache.org] in branch 'dev/trunk' [ https://svn.apache.org/r1523871 ] SOLR-5240: unlimited core loading threads to fix waiting-for-other-replicas deadlock SolrCloud node doesn't (quickly) come all the way back -- Key: SOLR-5240 URL: https://issues.apache.org/jira/browse/SOLR-5240 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5 Reporter: Yonik Seeley Fix For: 4.5 Attachments: SOLR-5240.patch Killing a single node and bringing it back up can result in waiting until we see more replicas up... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769096#comment-13769096 ] Robert Muir commented on LUCENE-5212: - I crashed again but with a core file (set 'ulimit -c unlimited'). zip file with core dump and hs_err is here: http://people.apache.org/~rmuir/crash.zip (its too large for JIRA, sorry) For some more context, it always happens fairly early in the test run: so when it doesnt crash at this exact point, you can ^C and run again until it does. Here was my commands with output below: (i tried to simplify the procedure to make it easy to reproduce, but its not easy, it took me quite a few tries) {noformat} # note: we are pulling the exact revision that jenkins failed on, because things have changed in lucene codebase over the weekend svn co -r 1523179 https://svn.apache.org/repos/asf/lucene/dev/trunk # just go to core tests cd trunk/lucene/core # # now the following two commands: just run again and again until it crashes. # rm -rf ../../.caches/ ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs=-XX:-UseCompressedOops -XX:+UseParallelGC {noformat} Here was the output: {noformat} rmuir@beast:~/workspace/trunk/lucene/core$ rm -rf ../../.caches/ rmuir@beast:~/workspace/trunk/lucene/core$ ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs=-XX:-UseCompressedOops -XX:+UseParallelGC Buildfile: /home/rmuir/workspace/trunk/lucene/core/build.xml -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ :: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: init: compile-core: compile-test-framework: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: init: compile-lucene-core: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: init: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: compile-core: compile-codecs: [echo] Building codecs... ivy-availability-check: [echo] Building codecs... ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: common.init: compile-lucene-core: init: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: compile-core: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: common.compile-core: compile-core: compile-test: install-junit4-taskdef: validate: test: [junit4:pickseed] Seed property 'tests.seed' already defined: 43A1116E7F98BED4 [mkdir] Created dir: /home/rmuir/workspace/trunk/.caches/test-stats/core [junit4] JUnit4 says ciao! Master seed: 43A1116E7F98BED4 [junit4] Executing 367 suites with 1 JVM. [junit4] [junit4] Started J0 PID(26780@beast). [junit4] Suite: org.apache.lucene.store.TestHugeRamFile [junit4] Completed in 1.26s, 1 test [junit4] [junit4] Suite: org.apache.lucene.search.TestTimeLimitingCollector [junit4] Completed in 3.26s, 6 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping [junit4] Completed in 0.68s, 2 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestPostingsOffsets [junit4] Completed in 0.78s, 11 tests [junit4] [junit4] Suite: org.apache.lucene.search.TestRegexpQuery [junit4] Completed in 0.11s, 7 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestTryDelete [junit4] Completed in 0.03s, 3 tests [junit4] [junit4] Suite: org.apache.lucene.util.TestDoubleBarrelLRUCache [junit4] Completed in 1.02s, 2 tests [junit4] [junit4] Suite: org.apache.lucene.analysis.TestGraphTokenizers [junit4] Completed in 3.01s, 21 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestIndexWriterMerging [junit4] Completed in 10.21s, 6 tests [junit4] [junit4] Suite: org.apache.lucene.search.TestSearchAfter [junit4] Completed in 2.05s, 1 test [junit4] [junit4] Suite: org.apache.lucene.index.TestNoMergeScheduler [junit4] Completed in 0.02s, 3 tests [junit4] [junit4] Suite: org.apache.lucene.TestSearchForDuplicates [junit4] Completed in 0.08s, 1 test [junit4] [junit4] Suite: org.apache.lucene.util.TestBytesRef [junit4] Completed in 0.02s, 5 tests [junit4] [junit4] Suite: org.apache.lucene.index.Test4GBStoredFields [junit4] IGNOR/A 0.02s | Test4GBStoredFields.test [junit4] Assumption #1: 'nightly' test group is disabled (@Nightly) [junit4] Completed in 0.03s, 1 test, 1 skipped [junit4] [junit4] Suite:
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769105#comment-13769105 ] Robert Muir commented on LUCENE-5212: - I ran again and here is the output where it does not crash, but instead corrupts: {noformat} rmuir@beast:~/workspace/trunk/lucene/core$ rm -rf ../../.caches/ rmuir@beast:~/workspace/trunk/lucene/core$ ant test -Dtests.seed=43A1116E7F98BED4 -Dtests.jvms=1 -Dargs=-XX:-UseCompressedOops -XX:+UseParallelGC Buildfile: /home/rmuir/workspace/trunk/lucene/core/build.xml -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ :: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: init: compile-core: compile-test-framework: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: init: compile-lucene-core: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: init: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: compile-core: compile-codecs: [echo] Building codecs... ivy-availability-check: [echo] Building codecs... ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = /home/rmuir/workspace/trunk/lucene/ivy-settings.xml resolve: common.init: compile-lucene-core: init: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: compile-core: -clover.disable: -clover.load: -clover.classpath: -clover.setup: clover: common.compile-core: compile-core: compile-test: install-junit4-taskdef: validate: test: [junit4:pickseed] Seed property 'tests.seed' already defined: 43A1116E7F98BED4 [mkdir] Created dir: /home/rmuir/workspace/trunk/.caches/test-stats/core [junit4] JUnit4 says ciao! Master seed: 43A1116E7F98BED4 [junit4] Executing 367 suites with 1 JVM. [junit4] [junit4] Started J0 PID(27313@beast). [junit4] Suite: org.apache.lucene.store.TestHugeRamFile [junit4] Completed in 1.55s, 1 test [junit4] [junit4] Suite: org.apache.lucene.search.TestTimeLimitingCollector [junit4] Completed in 3.23s, 6 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping [junit4] Completed in 0.57s, 2 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestPostingsOffsets [junit4] Completed in 0.78s, 11 tests [junit4] [junit4] Suite: org.apache.lucene.search.TestRegexpQuery [junit4] Completed in 0.11s, 7 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestTryDelete [junit4] Completed in 0.04s, 3 tests [junit4] [junit4] Suite: org.apache.lucene.util.TestDoubleBarrelLRUCache [junit4] Completed in 1.02s, 2 tests [junit4] [junit4] Suite: org.apache.lucene.analysis.TestGraphTokenizers [junit4] Completed in 3.05s, 21 tests [junit4] [junit4] Suite: org.apache.lucene.index.TestIndexWriterMerging [junit4] Completed in 10.27s, 6 tests [junit4] [junit4] Suite: org.apache.lucene.search.TestSearchAfter [junit4] 1 CheckIndex failed [junit4] 1 Segments file=segments_2 numSegments=2 version=5.0 format= [junit4] 1 1 of 2: name=_0 docCount=156 [junit4] 1 codec=Lucene45 [junit4] 1 compound=false [junit4] 1 numFiles=30 [junit4] 1 size (MB)=0.157 [junit4] 1 diagnostics = {timestamp=1379384646861, os=Linux, os.version=3.5.0-27-generic, source=flush, lucene.version=5.0-SNAPSHOT, os.arch=amd64, java.version=1.7.0_40, java.vendor=Oracle Corporation} [junit4] 1 no deletions [junit4] 1 test: open reader.OK [junit4] 1 test: fields..OK [15 fields] [junit4] 1 test: field norms.OK [2 fields] [junit4] 1 test: terms, freq, prox...OK [8628 terms; 10659 terms/docs pairs; 423 tokens] [junit4] 1 test: stored fields...OK [1 total field count; avg 0.006 fields per doc] [junit4] 1 test: term vectorsOK [3 total vector count; avg 1 term/freq vector fields per doc] [junit4] 1 test: docvalues...OK [5 docvalues fields; 1 BINARY; 2 NUMERIC; 2 SORTED; 0 SORTED_SET] [junit4] 1 [junit4] 1 2 of 2: name=_1 docCount=57 [junit4] 1 codec=Lucene45 [junit4] 1 compound=false [junit4] 1 numFiles=30 [junit4] 1 size (MB)=0.06 [junit4] 1 diagnostics = {timestamp=1379384646900, os=Linux, os.version=3.5.0-27-generic, source=flush, lucene.version=5.0-SNAPSHOT, os.arch=amd64, java.version=1.7.0_40, java.vendor=Oracle Corporation} [junit4] 1 no
[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back
[ https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769109#comment-13769109 ] ASF subversion and git services commented on SOLR-5240: --- Commit 1523872 from [~yo...@apache.org] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523872 ] SOLR-5240: unlimited core loading threads to fix waiting-for-other-replicas deadlock SolrCloud node doesn't (quickly) come all the way back -- Key: SOLR-5240 URL: https://issues.apache.org/jira/browse/SOLR-5240 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5 Reporter: Yonik Seeley Fix For: 4.5 Attachments: SOLR-5240.patch Killing a single node and bringing it back up can result in waiting until we see more replicas up... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back
[ https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769110#comment-13769110 ] ASF subversion and git services commented on SOLR-5240: --- Commit 1523873 from [~yo...@apache.org] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1523873 ] SOLR-5240: unlimited core loading threads to fix waiting-for-other-replicas deadlock SolrCloud node doesn't (quickly) come all the way back -- Key: SOLR-5240 URL: https://issues.apache.org/jira/browse/SOLR-5240 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5 Reporter: Yonik Seeley Fix For: 4.5 Attachments: SOLR-5240.patch Killing a single node and bringing it back up can result in waiting until we see more replicas up... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769112#comment-13769112 ] Robert Muir commented on LUCENE-5212: - FYI: I ran this procedure about 10 times with the suggested workaround from https://bugs.openjdk.java.net/browse/JDK-8024830 and tests always pass: -XX:-UseLoopPredicate java 7u40 causes sigsegv and corrupt term vectors - Key: LUCENE-5212 URL: https://issues.apache.org/jira/browse/LUCENE-5212 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: hs_err_pid32714.log, jenkins.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5240) SolrCloud node doesn't (quickly) come all the way back
[ https://issues.apache.org/jira/browse/SOLR-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-5240. Resolution: Fixed Fix Version/s: 5.0 SolrCloud node doesn't (quickly) come all the way back -- Key: SOLR-5240 URL: https://issues.apache.org/jira/browse/SOLR-5240 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.5 Reporter: Yonik Seeley Fix For: 4.5, 5.0 Attachments: SOLR-5240.patch Killing a single node and bringing it back up can result in waiting until we see more replicas up... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5219) Make SynonymFilterFactory format attribute pluggable
Ryan Ernst created LUCENE-5219: -- Summary: Make SynonymFilterFactory format attribute pluggable Key: LUCENE-5219 URL: https://issues.apache.org/jira/browse/LUCENE-5219 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan Ernst It would be great to allow custom synonym formats to work with SynonymFilterFactory. There is already a comment in the code to make it pluggable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org