[JENKINS] Lucene-Solr-4.7-Linux (64bit/jdk1.8.0_20-ea-b05) - Build # 103 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.7-Linux/103/ Java: 64bit/jdk1.8.0_20-ea-b05 -XX:-UseCompressedOops -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:57828 within 3 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:57828 within 3 ms at __randomizedtesting.SeedInfo.seed([3EEE89146E3E2924:BF08070C19614918]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:148) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:99) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:94) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:85) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:200) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at
[jira] [Commented] (LUCENE-5584) Allow FST read method to also recycle the output value when traversing FST
[ https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967752#comment-13967752 ] Christian Ziech commented on LUCENE-5584: - {quote} If this is the case, then I am not sure you are using the correct datastructure: it seems to me that a byte sequence output is not appropriate. Since you do not care about the intermediate outputs, but have a complicated intersection with the FST, why not use a numeric output, pointing to the result data somewhere else? {quote} That is what we do right now. This however has the downside that we loose the prefix compression capability of the FST for the FST values which is significant in our case. The single FST with the values attached was roughly 1.2G large and now with the referenced byte arrays (we load them into a DirectByteBuffer) we spend about 2.5G for the values alone. Of course we could try to implement the same prefix compression as the FST does on our own and fill a byte array while traversing the FST but that feels like copying something that is already almost there. If we could just get the extension points I mentioned into Lucene without actually changing the actual behavior of (most or any) of lucenes code that would be a huge help. Allow FST read method to also recycle the output value when traversing FST -- Key: LUCENE-5584 URL: https://issues.apache.org/jira/browse/LUCENE-5584 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Affects Versions: 4.7.1 Reporter: Christian Ziech The FST class heavily reuses Arc instances when traversing the FST. The output of an Arc however is not reused. This can especially be important when traversing large portions of a FST and using the ByteSequenceOutputs and CharSequenceOutputs. Those classes create a new byte[] or char[] for every node read (which has an output). In our use case we intersect a lucene Automaton with a FSTBytesRef much like it is done in org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and since the Automaton and the FST are both rather large tens or even hundreds of thousands of temporary byte array objects are created. One possible solution to the problem would be to change the org.apache.lucene.util.fst.Outputs class to have two additional methods (if you don't want to change the existing methods for compatibility): {code} /** Decode an output value previously written with {@link * #write(Object, DataOutput)} reusing the object passed in if possible */ public abstract T read(DataInput in, T reuse) throws IOException; /** Decode an output value previously written with {@link * #writeFinalOutput(Object, DataOutput)}. By default this * just calls {@link #read(DataInput)}. This tries to reuse the object * passed in if possible */ public T readFinalOutput(DataInput in, T reuse) throws IOException { return read(in, reuse); } {code} The new methods could then be used in the FST in the readNextRealArc() method passing in the output of the reused Arc. For most inputs they could even just invoke the original read(in) method. If you should decide to make that change I'd be happy to supply a patch and/or tests for the feature. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5584) Allow FST read method to also recycle the output value when traversing FST
[ https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967752#comment-13967752 ] Christian Ziech edited comment on LUCENE-5584 at 4/13/14 7:04 AM: -- {quote} If this is the case, then I am not sure you are using the correct datastructure: it seems to me that a byte sequence output is not appropriate. Since you do not care about the intermediate outputs, but have a complicated intersection with the FST, why not use a numeric output, pointing to the result data somewhere else? {quote} That is what we do right now. This however has the downside that we loose the prefix compression capability of the FST for the FST values which is significant in our case. The single FST with the values attached was roughly 1.2G large and now with the referenced byte arrays (we load them into a DirectByteBuffer) we spend about 2.5G for the values alone. Of course we could try to implement the same prefix compression as the FST does on our own and fill a byte array while traversing the FST but that feels like copying something that is already almost there. If we could just get the extension points I mentioned into Lucene without actually changing the actual behavior of (most or any) of lucenes code that would be a huge help. Edit: Also with numeric outputs we still suffer from quite a few unwanted long references that are created temporarily by the VM just as the byte arrays were before. This problem is far less severe and actually manageable though. was (Author: christianz): {quote} If this is the case, then I am not sure you are using the correct datastructure: it seems to me that a byte sequence output is not appropriate. Since you do not care about the intermediate outputs, but have a complicated intersection with the FST, why not use a numeric output, pointing to the result data somewhere else? {quote} That is what we do right now. This however has the downside that we loose the prefix compression capability of the FST for the FST values which is significant in our case. The single FST with the values attached was roughly 1.2G large and now with the referenced byte arrays (we load them into a DirectByteBuffer) we spend about 2.5G for the values alone. Of course we could try to implement the same prefix compression as the FST does on our own and fill a byte array while traversing the FST but that feels like copying something that is already almost there. If we could just get the extension points I mentioned into Lucene without actually changing the actual behavior of (most or any) of lucenes code that would be a huge help. Allow FST read method to also recycle the output value when traversing FST -- Key: LUCENE-5584 URL: https://issues.apache.org/jira/browse/LUCENE-5584 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Affects Versions: 4.7.1 Reporter: Christian Ziech The FST class heavily reuses Arc instances when traversing the FST. The output of an Arc however is not reused. This can especially be important when traversing large portions of a FST and using the ByteSequenceOutputs and CharSequenceOutputs. Those classes create a new byte[] or char[] for every node read (which has an output). In our use case we intersect a lucene Automaton with a FSTBytesRef much like it is done in org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and since the Automaton and the FST are both rather large tens or even hundreds of thousands of temporary byte array objects are created. One possible solution to the problem would be to change the org.apache.lucene.util.fst.Outputs class to have two additional methods (if you don't want to change the existing methods for compatibility): {code} /** Decode an output value previously written with {@link * #write(Object, DataOutput)} reusing the object passed in if possible */ public abstract T read(DataInput in, T reuse) throws IOException; /** Decode an output value previously written with {@link * #writeFinalOutput(Object, DataOutput)}. By default this * just calls {@link #read(DataInput)}. This tries to reuse the object * passed in if possible */ public T readFinalOutput(DataInput in, T reuse) throws IOException { return read(in, reuse); } {code} The new methods could then be used in the FST in the readNextRealArc() method passing in the output of the reused Arc. For most inputs they could even just invoke the original read(in) method. If you should decide to make that change I'd be happy to supply a patch and/or tests for the feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 18572 - Failure!
I committed a fix, to 4.8, 4.x. trunk. Mike McCandless http://blog.mikemccandless.com On Sat, Apr 12, 2014 at 6:41 PM, Michael McCandless luc...@mikemccandless.com wrote: I'll fix; this is already fix in trunk (with LUCENE-4246) but that issue was 5.0 only... Mike McCandless http://blog.mikemccandless.com On Sat, Apr 12, 2014 at 6:24 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/18572/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=385, name=Thread-309, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=385, name=Thread-309, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([180365659FF27163]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:622) Caused by: java.lang.AssertionError at org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:135) at org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:196) at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4706) at org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:713) at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4747) at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4739) at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2151) at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2086) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:578) Build Log: [...truncated 693 lines...] [junit4] Suite: org.apache.lucene.index.TestIndexWriterWithThreads [junit4] 2 ??? 12, 2014 7:21:47 ? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2 WARNING: Uncaught exception in thread: Thread[Thread-309,5,TGRP-TestIndexWriterWithThreads] [junit4] 2 java.lang.RuntimeException: java.lang.AssertionError [junit4] 2at __randomizedtesting.SeedInfo.seed([180365659FF27163]:0) [junit4] 2at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:622) [junit4] 2 Caused by: java.lang.AssertionError [junit4] 2at org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:135) [junit4] 2at org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:196) [junit4] 2at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4706) [junit4] 2at org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:713) [junit4] 2at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4747) [junit4] 2at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4739) [junit4] 2at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2151) [junit4] 2at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2086) [junit4] 2at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:578) [junit4] 2 [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterWithThreads -Dtests.method=testRollbackAndCommitWithThreads -Dtests.seed=180365659FF27163 -Dtests.slow=true -Dtests.locale=ar_BH -Dtests.timezone=AGT -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 0.63s J0 | TestIndexWriterWithThreads.testRollbackAndCommitWithThreads [junit4] Throwable #1: java.lang.AssertionError [junit4]at org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads(TestIndexWriterWithThreads.java:634) [junit4]at java.lang.Thread.run(Thread.java:724)Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=385, name=Thread-309, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] [junit4] Caused by: java.lang.RuntimeException: java.lang.AssertionError [junit4]at __randomizedtesting.SeedInfo.seed([180365659FF27163]:0) [junit4]at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:622) [junit4] Caused by: java.lang.AssertionError [junit4]at
[jira] [Created] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
Michael McCandless created LUCENE-5604: -- Summary: Should we switch BytesRefHash to MurmurHash3? Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5604: --- Attachment: LUCENE-5604.patch Initial patch, Lucene tests pass, but solrj doesn't yet compile I factored out Hash.murmurhash3_x86_32 from Solr into Lucene's StringHelper, and cut over BytesRef.hash, TermToBytesRefAttribute.fillBytesRef, and BytesRefHash. I left some nocommits: I think we should change TermToBytesRefAttribute to not return this hashCode? And also remove the BytesRefHash.add method that takes a hashCode? Seems awkward to make the hash code impl of BytesRefHash so public ... it should be under the hood. I also randomized/salted the hash seed per JVM instance (poached this from Guava), by setting a common static seed on JVM init (just System.currentTimeMillis()). This should frustrate denial of service attacks, and also can catch any places where we rely on this hash function not changing across JVM instances (e.g. persisting to disk somewhere). Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967781#comment-13967781 ] Michael McCandless commented on LUCENE-5604: I ran performance tests on first 5M Wikipedia medium (1 KB sized) docs and Geonames (sources for the benchmark are all in luceneutil): {noformat} Wiki first 5M docs, no merge policy, 64 MB RAM buffer, 4 indexing threads, default codec: trunk:136.985 sec, 189729244 conflicts murmur: 134.156 sec, 164990724 conflicts Geonames, no merge policy, 64 MB RAM buffer, 4 indexing threads, default codec: trunk:167.354 sec, 236051203 conflicts murmur: 168.101 sec, 179747265 conflicts {noformat} Net/net the indexing time is the same (within noise of run-to-run). The conflict count is how many times we had to probe in the open addressed hash table inside BytesRefHash, and Murmur3 gives a nice reduction (~ 13-24%). I think we should switch. Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5604: --- Attachment: BytesRefHash.perturb.patch Separately, I also tried a different probing function inside BytesRefHash, poaching the perturbing approach from Python's dictionary object: {noformat} Wiki murmur + perturb: 134.228 sec, 176358406 conflicts Geonames murmur + perturb: 167.735 sec, 200311281 conflicts {noformat} Curiously, it increased the number of collisions from Murmur3 alone. It's possible I messed up the implementation (though all Lucene tests did pass). Or, it could be that because we only use 32 bits for our hash code (Python uses 64 bit hash codes on 64 bit arch), we just don't have enough bits to mixin when probing for new addresses. In fact, if we move all hashing to be private (under the hood) of BytesRefHash, maybe we could switch to the 128 bit variant MurmurHash3 and then the perturbing might help. Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5584) Allow FST read method to also recycle the output value when traversing FST
[ https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967796#comment-13967796 ] Robert Muir commented on LUCENE-5584: - But this is the *right* thing to do. you can compress it however you want, you can move it to disk (since its like stored fields for your top-N), you can do all kinds of things with it. As for numeric outputs being a problem _at all_, I do not believe you. a benchmark is required. Allow FST read method to also recycle the output value when traversing FST -- Key: LUCENE-5584 URL: https://issues.apache.org/jira/browse/LUCENE-5584 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Affects Versions: 4.7.1 Reporter: Christian Ziech The FST class heavily reuses Arc instances when traversing the FST. The output of an Arc however is not reused. This can especially be important when traversing large portions of a FST and using the ByteSequenceOutputs and CharSequenceOutputs. Those classes create a new byte[] or char[] for every node read (which has an output). In our use case we intersect a lucene Automaton with a FSTBytesRef much like it is done in org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and since the Automaton and the FST are both rather large tens or even hundreds of thousands of temporary byte array objects are created. One possible solution to the problem would be to change the org.apache.lucene.util.fst.Outputs class to have two additional methods (if you don't want to change the existing methods for compatibility): {code} /** Decode an output value previously written with {@link * #write(Object, DataOutput)} reusing the object passed in if possible */ public abstract T read(DataInput in, T reuse) throws IOException; /** Decode an output value previously written with {@link * #writeFinalOutput(Object, DataOutput)}. By default this * just calls {@link #read(DataInput)}. This tries to reuse the object * passed in if possible */ public T readFinalOutput(DataInput in, T reuse) throws IOException { return read(in, reuse); } {code} The new methods could then be used in the FST in the readNextRealArc() method passing in the output of the reused Arc. For most inputs they could even just invoke the original read(in) method. If you should decide to make that change I'd be happy to supply a patch and/or tests for the feature. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967811#comment-13967811 ] Robert Muir commented on LUCENE-5604: - Can we use methods like Integer.reverseBytes/rotateLeft instead of doing byte swapping or bit rotations manually? this may improve the speed, e.g. the former is a jvm intrinsic. Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5981) Please change method visibility of getSolrWriter in DataImportHandler to public (or at least protected)
[ https://issues.apache.org/jira/browse/SOLR-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967816#comment-13967816 ] Aaron LaBella commented on SOLR-5981: - Erick, Thanks -- will do. I probably would've done that but my SVN skills aren't that great. I accidentally built from trunk first, and then realized I should've built against a branch. Then, I tried to run git svn clone ... but that seemed to take forever as well. Just curious -- are there any plans to migrate lucene/solr to a git repository? +1 for git from me ;-) Thanks. Aaron Please change method visibility of getSolrWriter in DataImportHandler to public (or at least protected) --- Key: SOLR-5981 URL: https://issues.apache.org/jira/browse/SOLR-5981 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Affects Versions: 4.0 Environment: Linux 3.13.9-200.fc20.x86_64 Solr 4.6.0 Reporter: Aaron LaBella Assignee: Shawn Heisey Priority: Minor Fix For: 4.8, 5.0 Attachments: SOLR-5981.patch Original Estimate: 1h Remaining Estimate: 1h I've been using the org.apache.solr.handler.dataimport.DataImportHandler for a bit and it's an excellent model and architecture. I'd like to extend the usage of it to plugin my own DIHWriter, but, the code doesn't allow for it. Please change ~line 227 in the DataImportHander class to be: public SolrWriter getSolrWriter instead of: private SolrWriter getSolrWriter or, at a minimum, protected, so that I can extend DataImportHandler and override this method. Thank you *sincerely* in advance for the quick turn-around on this. If the change can be made in 4.6.0 and upstream, that'd be ideal. Thanks! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967819#comment-13967819 ] Georg Sorst edited comment on SOLR-2834 at 4/13/14 1:35 PM: I can verfiy that this is still open for Solr 4.4. I would really like to fix this issue, but need some advice on what / where to fix exactly. I see two options: # Fix the output of the field-analysis request so that it uses {{arr ...}} for CharFilters just like it does for Tokenizers and TokenFilters ** This will probably confuse Solr Admin and who knows what else # Fix the {{FieldAnalysisResponse}} / {{AnalysisResponseBase}} so that it can deal with the current response format ({{str ..}} for CharFilters) ** The {{AnalysisResponseBase}} assumes in many places that the output is {{arr-lst-str}} due to the Generics of the NamedLists; it would be hard to make this change decently type-safe I'm a bit lost here. If someone could give me a few pointers which option is better and which tests to adapt I'll glady try to take care of it. was (Author: gs): I would really like to fix this issue, but need some advice on what / where to fix exactly. I see two options: # Fix the output of the field-analysis request so that it uses {{arr ...}} for CharFilters just like it does for Tokenizers and TokenFilters ** This will probably confuse Solr Admin and who knows what else # Fix the {{FieldAnalysisResponse}} / {{AnalysisResponseBase}} so that it can deal with the current response format ({{str ..}} for CharFilters) ** The {{AnalysisResponseBase}} assumes in many places that the output is {{arr-lst-str}} due to the Generics of the NamedLists; it would be hard to make this change decently type-safe I'm a bit lost here. If someone could give me a few pointers which option is better and which tests to adapt I'll glady try to take care of it. AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter - Key: SOLR-2834 URL: https://issues.apache.org/jira/browse/SOLR-2834 Project: Solr Issue Type: Bug Components: clients - java, Schema and Analysis Affects Versions: 3.4, 3.6, 4.2 Reporter: Shane Assignee: Shalin Shekhar Mangar Priority: Blocker Labels: patch Attachments: AnalysisResponseBase.patch Original Estimate: 5m Remaining Estimate: 5m When using FieldAnalysisRequest.java to analysis a field, a ClassCastExcpetion is thrown if the schema defines the filter org.apache.solr.analysis.HTMLStripCharFilter. The exception is: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69) at org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66) at org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107) My schema definition is: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StandardFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The response is part is: lst name=query str name=org.apache.solr.analysis.HTMLStripCharFiltertesting analysis/str arr name=org.apache.lucene.analysis.standard.StandardTokenizer lst... A simplistic fix would be to test if the Entry value is an instance of List. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967819#comment-13967819 ] Georg Sorst commented on SOLR-2834: --- I would really like to fix this issue, but need some advice on what / where to fix exactly. I see two options: # Fix the output of the field-analysis request so that it uses {{arr ...}} for CharFilters just like it does for Tokenizers and TokenFilters ** This will probably confuse Solr Admin and who knows what else # Fix the {{FieldAnalysisResponse}} / {{AnalysisResponseBase}} so that it can deal with the current response format ({{str ..}} for CharFilters) ** The {{AnalysisResponseBase}} assumes in many places that the output is {{arr-lst-str}} due to the Generics of the NamedLists; it would be hard to make this change decently type-safe I'm a bit lost here. If someone could give me a few pointers which option is better and which tests to adapt I'll glady try to take care of it. AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter - Key: SOLR-2834 URL: https://issues.apache.org/jira/browse/SOLR-2834 Project: Solr Issue Type: Bug Components: clients - java, Schema and Analysis Affects Versions: 3.4, 3.6, 4.2 Reporter: Shane Assignee: Shalin Shekhar Mangar Priority: Blocker Labels: patch Attachments: AnalysisResponseBase.patch Original Estimate: 5m Remaining Estimate: 5m When using FieldAnalysisRequest.java to analysis a field, a ClassCastExcpetion is thrown if the schema defines the filter org.apache.solr.analysis.HTMLStripCharFilter. The exception is: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69) at org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66) at org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107) My schema definition is: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StandardFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The response is part is: lst name=query str name=org.apache.solr.analysis.HTMLStripCharFiltertesting analysis/str arr name=org.apache.lucene.analysis.standard.StandardTokenizer lst... A simplistic fix would be to test if the Entry value is an instance of List. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967851#comment-13967851 ] Yonik Seeley commented on LUCENE-5604: -- The JVM recognizes pairs of shifts that amount to a rotate and replaces them with an intrinsic. bq. Initial patch, Lucene tests pass, but solrj doesn't yet compile Right - SolrJ does not have lucene dependencies. Solr also depends on the *exact* hash, so it can't be tweaked (for example if a variant turns out to be better for lucene indexing). Perhaps Lucene should just make a copy of the one it needs (the byte[] version). Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967866#comment-13967866 ] Uwe Schindler commented on LUCENE-5604: --- bq. The JVM recognizes pairs of shifts that amount to a rotate and replaces them with an intrinsic. I still think we should replace them by the methods. This is the same like replacing the ternary {{? :}} with {{Number.compare(x,y)}} for comparators. Brings no improvements, just better readability in Java 7 and is less error-prone (cf. the possible overflows if implementing the compare with a dumb ternary op). Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception while unmarshalling response in SolrJ
On 4/12/2014 11:46 PM, Prathik Puthran wrote: Hi, I am using SolrJ client to send request to Solr. But instead of calling Solr directly SolrJ communicates with my proxy server which in turn calls Solr and gets the response in javabin format and returns back the response to the client in the same format. The proxy server is written using play framework and just sends request to Solr and returns the HTTP response to client. Below is the exception I get in SolrJ client library when it tries to unmarshall the javabin response. I'm using Solrj 4.7.0. How can I fix this? Exception Stack trace: *Exception in thread main org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:477) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at com.br.solr.Main.main(Main.java:20) Caused by: java.lang.NullPointerException at org.apache.solr.common.util.JavaBinCodec.readExternString(JavaBinCodec.java:769) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:192) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:475) ... 4 more This started as a thread on the user list where someone else put up the same information, but there they said that the Solr and SolrJ versions were 4.3.0. The line numbers in the exception on the user list match up to 4.3.0, and the line numbers here match up to 4.7.0, which is good. In the user list discussion the poster indicated that the production application cannot be changed, but can you set up a testing version and send the request directly to Solr, bypassing the play framework? If you do that and it works, then you'll need to look for help with your play framework code on one of their support venues. They'll need to tell you how to relay the response without changing it. If the request direct to Solr doesn't work, then we can troubleshoot that part of it. The user list is a more appropriate venue. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967880#comment-13967880 ] Shawn Heisey commented on SOLR-2834: [~gs], are you able to test your code with the 4.7.1 release, both on the server and SolrJ? It would actually be better if you could use the current 4.7.2 release candidate. I believe the release vote has passed, so this is what will actually become 4.7.2 in the next couple of days: http://people.apache.org/~rmuir/staging_area/lucene_solr_4_7_2_r1586229/ It is highly unlikely that there will ever be a new 4.4 release. AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter - Key: SOLR-2834 URL: https://issues.apache.org/jira/browse/SOLR-2834 Project: Solr Issue Type: Bug Components: clients - java, Schema and Analysis Affects Versions: 3.4, 3.6, 4.2 Reporter: Shane Assignee: Shalin Shekhar Mangar Priority: Blocker Labels: patch Attachments: AnalysisResponseBase.patch Original Estimate: 5m Remaining Estimate: 5m When using FieldAnalysisRequest.java to analysis a field, a ClassCastExcpetion is thrown if the schema defines the filter org.apache.solr.analysis.HTMLStripCharFilter. The exception is: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69) at org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66) at org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107) My schema definition is: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StandardFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The response is part is: lst name=query str name=org.apache.solr.analysis.HTMLStripCharFiltertesting analysis/str arr name=org.apache.lucene.analysis.standard.StandardTokenizer lst... A simplistic fix would be to test if the Entry value is an instance of List. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967884#comment-13967884 ] Dawid Weiss commented on LUCENE-5604: - by setting a common static seed on JVM init (just System.currentTimeMillis()). This will render any tests that rely on hash ordering, etc. not-repeatable. I suggest initializing this to current time millis OR to the current random seed value (system property 'tests.seed'). Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Exception while unmarshalling response in SolrJ
Hi; I've answered a similar question at mail list. Please check it and give your feedbacks. If I have time I will check it with a Play App. Thanks; Furkan KAMACI 13 Nis 2014 19:00 tarihinde Shawn Heisey s...@elyograg.org yazdı: On 4/12/2014 11:46 PM, Prathik Puthran wrote: Hi, I am using SolrJ client to send request to Solr. But instead of calling Solr directly SolrJ communicates with my proxy server which in turn calls Solr and gets the response in javabin format and returns back the response to the client in the same format. The proxy server is written using play framework and just sends request to Solr and returns the HTTP response to client. Below is the exception I get in SolrJ client library when it tries to unmarshall the javabin response. I'm using Solrj 4.7.0. How can I fix this? Exception Stack trace: *Exception in thread main org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:477) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at com.br.solr.Main.main(Main.java:20) Caused by: java.lang.NullPointerException at org.apache.solr.common.util.JavaBinCodec.readExternString(JavaBinCodec.java:769) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:192) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:475) ... 4 more This started as a thread on the user list where someone else put up the same information, but there they said that the Solr and SolrJ versions were 4.3.0. The line numbers in the exception on the user list match up to 4.3.0, and the line numbers here match up to 4.7.0, which is good. In the user list discussion the poster indicated that the production application cannot be changed, but can you set up a testing version and send the request directly to Solr, bypassing the play framework? If you do that and it works, then you'll need to look for help with your play framework code on one of their support venues. They'll need to tell you how to relay the response without changing it. If the request direct to Solr doesn't work, then we can troubleshoot that part of it. The user list is a more appropriate venue. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter
[ https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967896#comment-13967896 ] Georg Sorst commented on SOLR-2834: --- [~elyograg] The issue still exists in 4.7.1. Unfortunately I could not get 4.7.2 to run ({{svn checkout}} would insist on a redirect to the same URL) but from looking at the code it exists there as well. AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter - Key: SOLR-2834 URL: https://issues.apache.org/jira/browse/SOLR-2834 Project: Solr Issue Type: Bug Components: clients - java, Schema and Analysis Affects Versions: 3.4, 3.6, 4.2 Reporter: Shane Assignee: Shalin Shekhar Mangar Priority: Blocker Labels: patch Attachments: AnalysisResponseBase.patch Original Estimate: 5m Remaining Estimate: 5m When using FieldAnalysisRequest.java to analysis a field, a ClassCastExcpetion is thrown if the schema defines the filter org.apache.solr.analysis.HTMLStripCharFilter. The exception is: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69) at org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66) at org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107) My schema definition is: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StandardFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The response is part is: lst name=query str name=org.apache.solr.analysis.HTMLStripCharFiltertesting analysis/str arr name=org.apache.lucene.analysis.standard.StandardTokenizer lst... A simplistic fix would be to test if the Entry value is an instance of List. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967906#comment-13967906 ] Erick Erickson commented on SOLR-4478: -- [~romseygeek] Can this be closed then? I'm also thinking that SOLR-4779 should just be closed as won't fix since I don't see a good reason to deprecate shareSchema. The hope was that we could share everything in a config set, but as I remember sharing solrconfig was fraught. It seems to me that if we want to go farther down the sharing route thing, we need to use some other sharing model than piecemeal Thoughts? Allow cores to specify a named config set in non-SolrCloud mode --- Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Alan Woodward Fix For: 4.8, 5.0 Attachments: SOLR-4478-take2.patch, SOLR-4478-take2.patch, SOLR-4478-take2.patch, SOLR-4478-take2.patch, SOLR-4478.patch, SOLR-4478.patch, solr.log Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-5871: Assignee: Erick Erickson Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Assignee: Erick Erickson Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967907#comment-13967907 ] Erick Erickson commented on SOLR-5871: -- Hmmm, what's to review? JIRAs are generally used to propose code changes and/or discuss how to improve/change the code and/or attach patches. If this is a more general how to question, it's better to raise it on the user's list rather, you'll get lots more help there. I'll close this in a couple of days unless there's something I'm missing. This is certainly something we see regularly as a request, code patches are welcome! Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Assignee: Erick Erickson Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967913#comment-13967913 ] Jack Krupansky commented on SOLR-5871: -- I've lost count of how many times users have requested this feature. The basic request is for an easy way to determine which fields matched which values for each document, as opposed to having to sift through the debug explanation. One technical difficulty is analysis - the results could report the analyzed field values which matched, which won't necessarily literally agree with the source terms due to case, stemming, synonyms, etc. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Assignee: Erick Erickson Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967927#comment-13967927 ] Alexander S. commented on SOLR-5871: I already asked at solr-u...@lucene.apache.org but seems only one way currently is to read the debug explanation. Unfortunately I am not a java developer thus unable to create a patch, but Solr jira has a wish type so I posted my wish here. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Assignee: Erick Erickson Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.7-Linux (32bit/jdk1.7.0_60-ea-b13) - Build # 106 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.7-Linux/106/ Java: 32bit/jdk1.7.0_60-ea-b13 -client -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:33105 within 3 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:33105 within 3 ms at __randomizedtesting.SeedInfo.seed([C393C5EA46EF7A32:42754BF231B01A0E]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:148) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:99) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:94) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:85) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:200) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at
[jira] [Created] (SOLR-5982) SSLMigrationTest can fail with leaked threads due to problems stopping / starting jetty.
Mark Miller created SOLR-5982: - Summary: SSLMigrationTest can fail with leaked threads due to problems stopping / starting jetty. Key: SOLR-5982 URL: https://issues.apache.org/jira/browse/SOLR-5982 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967945#comment-13967945 ] Mark Miller commented on SOLR-5776: --- On a tip from Robert, I started looking at SecureRandom as the source of this problem. It seems that at least on Linux, the default SecureRandom algorithm will get data from /dev/random, which can block once it exhausts entropy. Some testing with a custom java.security.egd file seems to bear this out as the problem. I'm still trying to work out the best solution. Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-5776: - Assignee: Mark Miller Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.9, 5.0 We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-5776: -- Fix Version/s: 5.0 4.9 Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.9, 5.0 We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?
[ https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5604: --- Attachment: LUCENE-5604.patch New patch, folding in all feedback (thanks!). I think it's ready: * I reverted the Solr changes * I dup'd the murmurhash3_x86_32 taking byte[] into StringHelper, but changed to the intrinsics for Integer.rotateLeft * I added a small test case, confirming our MurmurHash3 impl matches a separate Python/C impl I found * I made the hashing private to BytesRefHash, and changed TermToBytesAtt.fillBytesRef to return void * For the seed/salt, I now pull from tests.seed property if it's non-null Should we switch BytesRefHash to MurmurHash3? - Key: LUCENE-5604 URL: https://issues.apache.org/jira/browse/LUCENE-5604 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch, LUCENE-5604.patch MurmurHash3 has better hashing distribution than the current hash function we use for BytesRefHash which is a simple multiplicative function with 31 multiplier (same as Java's String.hashCode, but applied to bytes not chars). Maybe we should switch ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Lucene / Solr 4.7.2 (take two)
The vote passes. Thanks everyone for voting. On Apr 10, 2014 10:51 AM, Robert Muir rcm...@gmail.com wrote: artifacts are here: http://people.apache.org/~rmuir/staging_area/lucene_solr_4_7_2_r1586229/ here is my +1 SUCCESS! [0:46:25.014499]
[jira] [Closed] (LUCENE-5598) About Scoring
[ https://issues.apache.org/jira/browse/LUCENE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Park JungHo closed LUCENE-5598. --- About Scoring - Key: LUCENE-5598 URL: https://issues.apache.org/jira/browse/LUCENE-5598 Project: Lucene - Core Issue Type: Wish Components: core/query/scoring Affects Versions: 4.7 Reporter: Park JungHo Labels: mentor, patch Fix For: 4.7 I had been generating long type's indexing data using LongField(Field name is 'boost' and value is atomicLong.) for using CustomScoreQuery. And then, I'm applied following code. //code start FunctionQuery fquery = new FunctionQuery(new LongFieldSource(boost)); CustomScoreQuery customQuery = new ScoreQuery(query, fquery); //code end = If indexed data count is 100, I expect 100, 99, 98, ... 91. But, the result was not matched with my expectation if the number of the indexed data gets increased. (For instance 99985, 99986, 99987, 99988, ... 4 when one billion index count ) I thought that was caused by scoring alogorithm returning float value. (Floating point limit.) That is correct? How can I get the result i expect? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5973: - Fix Version/s: 4.9 Pluggable Ranking Collectors Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch This ticket adds the ability to plugin a custom ranking collector to Solr. The proposed design is much simpler then SOLR-4465, which includes configuration support and support for pluggable analytics collectors. In this design, a CollectorFactory can be set onto the ResponseBuilder by a custom SearchComponent. The CollectorFactory is then used to inject a custom TopDocsCollector into the SolrIndexSearcher. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5831) Scale score PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5831: - Fix Version/s: 4.9 Scale score PostFilter -- Key: SOLR-5831 URL: https://issues.apache.org/jira/browse/SOLR-5831 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.7 Reporter: Peter Keegan Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, TestScaleScoreQParserPlugin.patch The ScaleScoreQParserPlugin is a PostFilter that performs score scaling. This is an alternative to using a function query wrapping a scale() wrapping a query(). For example: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} The problem with this query is that it has to scale every hit. Usually, only the returned hits need to be scaled, but there may be use cases where the number of hits to be scaled is greater than the returned hit count, but less than or equal to the total hit count. Sample syntax: fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 func=sum(product(sscore(),0.75),product(field(myfield),0.25))} l=0.0 u=1.0 //Scale scores to values between 0-1, inclusive maxscalehits=1//The maximum number of result scores to scale (-1 = all hits, 0 = results 'page' size) func=... //Apply the composite function to each hit. The scaled score value is accessed by the 'score()' value source All parameters are optional. The defaults are: l=0.0 u=1.0 maxscalehits=0 (result window size) func=(null) Note: this patch is not complete, as it contains no test cases and may not conform to all the guidelines in http://wiki.apache.org/solr/HowToContribute. I would appreciate any feedback on the usability and implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_60-ea-b13) - Build # 9964 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9964/ Java: 64bit/jdk1.7.0_60-ea-b13 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChains Error Message: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=10,endOffset=5 Stack Trace: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=10,endOffset=5 at __randomizedtesting.SeedInfo.seed([909448D307EA17A6:AD7561B240F80A66]:0) at org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45) at org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345) at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) at org.apache.lucene.analysis.de.GermanLightStemFilter.incrementToken(GermanLightStemFilter.java:48) at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:701) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:612) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:511) at org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:922) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Updated] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-5776: -- Attachment: SOLR-5776.patch Attached patch appears to be a working workaround. Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.9, 5.0 Attachments: SOLR-5776.patch We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5980) AbstractFullDistribZkTestBase#compareResults always returns false for shouldFail.
[ https://issues.apache.org/jira/browse/SOLR-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968074#comment-13968074 ] ASF subversion and git services commented on SOLR-5980: --- Commit 1587149 from markrmil...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1587149 ] SOLR-5980: Add a test. AbstractFullDistribZkTestBase#compareResults always returns false for shouldFail. - Key: SOLR-5980 URL: https://issues.apache.org/jira/browse/SOLR-5980 Project: Solr Issue Type: Test Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Critical Fix For: 4.9, 5.0 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968077#comment-13968077 ] Mark Miller commented on SOLR-5776: --- Bah, it seems to be much less frequent, but it can still happen. I think the issue is that if you don't specify the seed, it will still read from /dev/random for that. I had looked into a custom SecureRandom via SPI, but it's my first foray into SPI, and while it seems relatively straightforward, I have not yet figured out how to plug in a custom SecureRandomSPI class in tests. Even when that's done, the impl is not so straightforward - from what I can tell, you cannot extend the standard SecureRandom to fix this and the open jdk code is Oracle and the Harmony code is fairly different and would require some hacking to get in. Otherwise we would need to come up with a clean room impl that was still about as decent as Random. Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.9, 5.0 Attachments: SOLR-5776.patch We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5776) Look at speeding up using SSL with tests.
[ https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968077#comment-13968077 ] Mark Miller edited comment on SOLR-5776 at 4/14/14 4:59 AM: Bah, it seems to be much less frequent, but it can still happen. I think the issue is that if you don't specify the seed, it will still read from /dev/random for that. I had looked into a custom SecureRandom via SPI, but it's my first foray into SPI, and while it seems relatively straightforward, I have not yet figured out how to plug in a custom SecureRandomSPI class in tests. Even when that's done, the impl is not so straightforward - from what I can tell, you cannot extend the standard SecureRandom to fix this and the open jdk code is -Oracle- {color:red}GPL{color} and the Harmony code is fairly different and would require some hacking to get in. Otherwise we would need to come up with a clean room impl that was still about as decent as Random. was (Author: markrmil...@gmail.com): Bah, it seems to be much less frequent, but it can still happen. I think the issue is that if you don't specify the seed, it will still read from /dev/random for that. I had looked into a custom SecureRandom via SPI, but it's my first foray into SPI, and while it seems relatively straightforward, I have not yet figured out how to plug in a custom SecureRandomSPI class in tests. Even when that's done, the impl is not so straightforward - from what I can tell, you cannot extend the standard SecureRandom to fix this and the open jdk code is Oracle and the Harmony code is fairly different and would require some hacking to get in. Otherwise we would need to come up with a clean room impl that was still about as decent as Random. Look at speeding up using SSL with tests. - Key: SOLR-5776 URL: https://issues.apache.org/jira/browse/SOLR-5776 Project: Solr Issue Type: Test Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.9, 5.0 Attachments: SOLR-5776.patch We have to disable SSL on a bunch of tests now because it appears to sometime be ridiculously slow - especially in slow envs (I never see timeouts on my machine). I was talking to Robert about this, and he mentioned that there might be some settings we could change to speed it up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5596) Support for index/search large numeric field
[ https://issues.apache.org/jira/browse/LUCENE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Wang updated LUCENE-5596: --- Attachment: LUCENE-5596.patch initial patch to support BigInteger, I've copied and modified some tests for long field (e.g. TestNumericUtils, TestNumericTokenStream, TestNumericRangeQuery, TestSortDocValues) to support BigInteger and all passed. Support for index/search large numeric field Key: LUCENE-5596 URL: https://issues.apache.org/jira/browse/LUCENE-5596 Project: Lucene - Core Issue Type: New Feature Reporter: Kevin Wang Attachments: LUCENE-5596.patch Currently if an number is larger than Long.MAX_VALUE, we can't index/search that in lucene as a number. For example, IPv6 address is an 128 bit number, so we can't index that as a numeric field and do numeric range query etc. It would be good to support BigInteger / BigDecimal I've tried use BigInteger for IPv6 in Elasticsearch and that works fine, but there are still lots of things to do https://github.com/elasticsearch/elasticsearch/pull/5758 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org