date:20140413

[JENKINS] Lucene-Solr-4.7-Linux (64bit/jdk1.8.0_20-ea-b05) - Build # 103 - Failure!

2014-04-13 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.7-Linux/103/
Java: 64bit/jdk1.8.0_20-ea-b05 -XX:-UseCompressedOops -XX:+UseSerialGC

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:57828 within 3 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:57828 within 3 ms
at 
__randomizedtesting.SeedInfo.seed([3EEE89146E3E2924:BF08070C19614918]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:148)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:99)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:94)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:85)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:200)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at

[jira] [Commented] (LUCENE-5584) Allow FST read method to also recycle the output value when traversing FST

2014-04-13 Thread Christian Ziech (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967752#comment-13967752
 ] 

Christian Ziech commented on LUCENE-5584:
-

{quote}
If this is the case, then I am not sure you are using the correct 
datastructure: it seems to me that a byte sequence output is not appropriate. 
Since you do not care about the intermediate outputs, but have a complicated 
intersection with the FST, why not use a numeric output, pointing to the result 
data somewhere else?
{quote}

That is what we do right now. This however has the downside that we loose the 
prefix compression capability of the FST for the FST values which is 
significant in our case. The single FST with the values attached was roughly 
1.2G large and now with the referenced byte arrays (we load them into a 
DirectByteBuffer) we spend about 2.5G for the values alone. Of course we could 
try to implement the same prefix compression as the FST does on our own and 
fill a byte array while traversing the FST but that feels like copying 
something that is already almost there. If we could just get the extension 
points I mentioned into Lucene without actually changing the actual behavior of 
(most or any) of lucenes code that would be a huge help.


 Allow FST read method to also recycle the output value when traversing FST
 --

 Key: LUCENE-5584
 URL: https://issues.apache.org/jira/browse/LUCENE-5584
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 4.7.1
Reporter: Christian Ziech

 The FST class heavily reuses Arc instances when traversing the FST. The 
 output of an Arc however is not reused. This can especially be important when 
 traversing large portions of a FST and using the ByteSequenceOutputs and 
 CharSequenceOutputs. Those classes create a new byte[] or char[] for every 
 node read (which has an output).
 In our use case we intersect a lucene Automaton with a FSTBytesRef much 
 like it is done in 
 org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and 
 since the Automaton and the FST are both rather large tens or even hundreds 
 of thousands of temporary byte array objects are created.
 One possible solution to the problem would be to change the 
 org.apache.lucene.util.fst.Outputs class to have two additional methods (if 
 you don't want to change the existing methods for compatibility):
 {code}
   /** Decode an output value previously written with {@link
*  #write(Object, DataOutput)} reusing the object passed in if possible */
   public abstract T read(DataInput in, T reuse) throws IOException;
   /** Decode an output value previously written with {@link
*  #writeFinalOutput(Object, DataOutput)}.  By default this
*  just calls {@link #read(DataInput)}. This tries to  reuse the object   
*  passed in if possible */
   public T readFinalOutput(DataInput in, T reuse) throws IOException {
 return read(in, reuse);
   }
 {code}
 The new methods could then be used in the FST in the readNextRealArc() method 
 passing in the output of the reused Arc. For most inputs they could even just 
 invoke the original read(in) method.
 If you should decide to make that change I'd be happy to supply a patch 
 and/or tests for the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5584) Allow FST read method to also recycle the output value when traversing FST

2014-04-13 Thread Christian Ziech (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967752#comment-13967752
 ] 

Christian Ziech edited comment on LUCENE-5584 at 4/13/14 7:04 AM:
--

{quote}
If this is the case, then I am not sure you are using the correct 
datastructure: it seems to me that a byte sequence output is not appropriate. 
Since you do not care about the intermediate outputs, but have a complicated 
intersection with the FST, why not use a numeric output, pointing to the result 
data somewhere else?
{quote}

That is what we do right now. This however has the downside that we loose the 
prefix compression capability of the FST for the FST values which is 
significant in our case. The single FST with the values attached was roughly 
1.2G large and now with the referenced byte arrays (we load them into a 
DirectByteBuffer) we spend about 2.5G for the values alone. Of course we could 
try to implement the same prefix compression as the FST does on our own and 
fill a byte array while traversing the FST but that feels like copying 
something that is already almost there. If we could just get the extension 
points I mentioned into Lucene without actually changing the actual behavior of 
(most or any) of lucenes code that would be a huge help.

Edit: Also with numeric outputs we still suffer from quite a few unwanted long 
references that are created temporarily by the VM just as the byte arrays were 
before. This problem is far less severe and actually manageable though.



was (Author: christianz):
{quote}
If this is the case, then I am not sure you are using the correct 
datastructure: it seems to me that a byte sequence output is not appropriate. 
Since you do not care about the intermediate outputs, but have a complicated 
intersection with the FST, why not use a numeric output, pointing to the result 
data somewhere else?
{quote}

That is what we do right now. This however has the downside that we loose the 
prefix compression capability of the FST for the FST values which is 
significant in our case. The single FST with the values attached was roughly 
1.2G large and now with the referenced byte arrays (we load them into a 
DirectByteBuffer) we spend about 2.5G for the values alone. Of course we could 
try to implement the same prefix compression as the FST does on our own and 
fill a byte array while traversing the FST but that feels like copying 
something that is already almost there. If we could just get the extension 
points I mentioned into Lucene without actually changing the actual behavior of 
(most or any) of lucenes code that would be a huge help.


 Allow FST read method to also recycle the output value when traversing FST
 --

 Key: LUCENE-5584
 URL: https://issues.apache.org/jira/browse/LUCENE-5584
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 4.7.1
Reporter: Christian Ziech

 The FST class heavily reuses Arc instances when traversing the FST. The 
 output of an Arc however is not reused. This can especially be important when 
 traversing large portions of a FST and using the ByteSequenceOutputs and 
 CharSequenceOutputs. Those classes create a new byte[] or char[] for every 
 node read (which has an output).
 In our use case we intersect a lucene Automaton with a FSTBytesRef much 
 like it is done in 
 org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and 
 since the Automaton and the FST are both rather large tens or even hundreds 
 of thousands of temporary byte array objects are created.
 One possible solution to the problem would be to change the 
 org.apache.lucene.util.fst.Outputs class to have two additional methods (if 
 you don't want to change the existing methods for compatibility):
 {code}
   /** Decode an output value previously written with {@link
*  #write(Object, DataOutput)} reusing the object passed in if possible */
   public abstract T read(DataInput in, T reuse) throws IOException;
   /** Decode an output value previously written with {@link
*  #writeFinalOutput(Object, DataOutput)}.  By default this
*  just calls {@link #read(DataInput)}. This tries to  reuse the object   
*  passed in if possible */
   public T readFinalOutput(DataInput in, T reuse) throws IOException {
 return read(in, reuse);
   }
 {code}
 The new methods could then be used in the FST in the readNextRealArc() method 
 passing in the output of the reused Arc. For most inputs they could even just 
 invoke the original read(in) method.
 If you should decide to make that change I'd be happy to supply a patch 
 and/or tests for the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [JENKINS] Lucene-4x-Linux-Java7-64-test-only - Build # 18572 - Failure!

2014-04-13 Thread Michael McCandless

I committed a fix, to 4.8, 4.x. trunk.

Mike McCandless

http://blog.mikemccandless.com


On Sat, Apr 12, 2014 at 6:41 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 I'll fix; this is already fix in trunk (with LUCENE-4246) but that
 issue was 5.0 only...

 Mike McCandless

 http://blog.mikemccandless.com


 On Sat, Apr 12, 2014 at 6:24 PM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java7-64-test-only/18572/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads

 Error Message:
 Captured an uncaught exception in thread: Thread[id=385, name=Thread-309, 
 state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]

 Stack Trace:
 com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=385, name=Thread-309, 
 state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]
 Caused by: java.lang.RuntimeException: java.lang.AssertionError
 at __randomizedtesting.SeedInfo.seed([180365659FF27163]:0)
 at 
 org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:622)
 Caused by: java.lang.AssertionError
 at 
 org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:135)
 at 
 org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:196)
 at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4706)
 at 
 org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:713)
 at 
 org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4747)
 at 
 org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4739)
 at 
 org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2151)
 at 
 org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2086)
 at 
 org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:578)




 Build Log:
 [...truncated 693 lines...]
[junit4] Suite: org.apache.lucene.index.TestIndexWriterWithThreads
[junit4]   2 ??? 12, 2014 7:21:47 ? 
 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
  uncaughtException
[junit4]   2 WARNING: Uncaught exception in thread: 
 Thread[Thread-309,5,TGRP-TestIndexWriterWithThreads]
[junit4]   2 java.lang.RuntimeException: java.lang.AssertionError
[junit4]   2at 
 __randomizedtesting.SeedInfo.seed([180365659FF27163]:0)
[junit4]   2at 
 org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:622)
[junit4]   2 Caused by: java.lang.AssertionError
[junit4]   2at 
 org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:135)
[junit4]   2at 
 org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:196)
[junit4]   2at 
 org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4706)
[junit4]   2at 
 org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:713)
[junit4]   2at 
 org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4747)
[junit4]   2at 
 org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4739)
[junit4]   2at 
 org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2151)
[junit4]   2at 
 org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2086)
[junit4]   2at 
 org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:578)
[junit4]   2
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestIndexWriterWithThreads 
 -Dtests.method=testRollbackAndCommitWithThreads 
 -Dtests.seed=180365659FF27163 -Dtests.slow=true -Dtests.locale=ar_BH 
 -Dtests.timezone=AGT -Dtests.file.encoding=ISO-8859-1
[junit4] ERROR   0.63s J0 | 
 TestIndexWriterWithThreads.testRollbackAndCommitWithThreads 
[junit4] Throwable #1: java.lang.AssertionError
[junit4]at 
 org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads(TestIndexWriterWithThreads.java:634)
[junit4]at java.lang.Thread.run(Thread.java:724)Throwable 
 #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an 
 uncaught exception in thread: Thread[id=385, name=Thread-309, 
 state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads]
[junit4] Caused by: java.lang.RuntimeException: 
 java.lang.AssertionError
[junit4]at 
 __randomizedtesting.SeedInfo.seed([180365659FF27163]:0)
[junit4]at 
 org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:622)
[junit4] Caused by: java.lang.AssertionError
[junit4]at

[jira] [Created] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Michael McCandless (JIRA)

Michael McCandless created LUCENE-5604:
--

 Summary: Should we switch BytesRefHash to MurmurHash3?
 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0


MurmurHash3 has better hashing distribution than the current hash function we 
use for BytesRefHash which is a simple multiplicative function with 31 
multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
Maybe we should switch ...




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5604:
---

Attachment: LUCENE-5604.patch

Initial patch, Lucene tests pass, but solrj doesn't yet compile

I factored out Hash.murmurhash3_x86_32 from Solr into Lucene's StringHelper, 
and cut over BytesRef.hash, TermToBytesRefAttribute.fillBytesRef, and 
BytesRefHash.

I left some nocommits: I think we should change TermToBytesRefAttribute to not 
return this hashCode?  And also remove the BytesRefHash.add method that takes a 
hashCode?  Seems awkward to make the hash code impl of BytesRefHash so public 
... it should be under the hood.

I also randomized/salted the hash seed per JVM instance (poached this from 
Guava), by setting a common static seed on JVM init (just 
System.currentTimeMillis()).  This should frustrate denial of service attacks, 
and also can catch any places where we rely on this hash function not changing 
across JVM instances (e.g. persisting to disk somewhere).


 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967781#comment-13967781
 ] 

Michael McCandless commented on LUCENE-5604:


I ran performance tests on first 5M Wikipedia medium (1 KB sized)
docs and Geonames (sources for the benchmark are all in luceneutil):

{noformat}
Wiki first 5M docs, no merge policy, 64 MB RAM buffer, 4 indexing threads, 
default codec:
  trunk:136.985 sec, 189729244 conflicts
  murmur:   134.156 sec, 164990724 conflicts

Geonames, no merge policy, 64 MB RAM buffer, 4 indexing threads, default codec:
  trunk:167.354 sec, 236051203 conflicts
  murmur:   168.101 sec, 179747265 conflicts
{noformat}

Net/net the indexing time is the same (within noise of run-to-run).
The conflict count is how many times we had to probe in the open
addressed hash table inside BytesRefHash, and Murmur3 gives a nice
reduction (~ 13-24%).  I think we should switch.


 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5604:
---

Attachment: BytesRefHash.perturb.patch

Separately, I also tried a different probing function inside
BytesRefHash, poaching the perturbing approach from Python's
dictionary object:

{noformat}
Wiki
  murmur + perturb: 134.228 sec, 176358406 conflicts

Geonames
  murmur + perturb: 167.735 sec, 200311281 conflicts
{noformat}

Curiously, it increased the number of collisions from Murmur3 alone.
It's possible I messed up the implementation (though all Lucene tests
did pass).

Or, it could be that because we only use 32 bits for our hash code
(Python uses 64 bit hash codes on 64 bit arch), we just don't have
enough bits to mixin when probing for new addresses.

In fact, if we move all hashing to be private (under the hood) of
BytesRefHash, maybe we could switch to the 128 bit variant MurmurHash3
and then the perturbing might help.


 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5584) Allow FST read method to also recycle the output value when traversing FST

2014-04-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967796#comment-13967796
 ] 

Robert Muir commented on LUCENE-5584:
-

But this is the *right* thing to do. you can compress it however you want, you 
can move it to disk (since its like stored fields for your top-N), you can do 
all kinds of things with it.

As for numeric outputs being a problem _at all_, I do not believe you. a 
benchmark is required.

 Allow FST read method to also recycle the output value when traversing FST
 --

 Key: LUCENE-5584
 URL: https://issues.apache.org/jira/browse/LUCENE-5584
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 4.7.1
Reporter: Christian Ziech

 The FST class heavily reuses Arc instances when traversing the FST. The 
 output of an Arc however is not reused. This can especially be important when 
 traversing large portions of a FST and using the ByteSequenceOutputs and 
 CharSequenceOutputs. Those classes create a new byte[] or char[] for every 
 node read (which has an output).
 In our use case we intersect a lucene Automaton with a FSTBytesRef much 
 like it is done in 
 org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths() and 
 since the Automaton and the FST are both rather large tens or even hundreds 
 of thousands of temporary byte array objects are created.
 One possible solution to the problem would be to change the 
 org.apache.lucene.util.fst.Outputs class to have two additional methods (if 
 you don't want to change the existing methods for compatibility):
 {code}
   /** Decode an output value previously written with {@link
*  #write(Object, DataOutput)} reusing the object passed in if possible */
   public abstract T read(DataInput in, T reuse) throws IOException;
   /** Decode an output value previously written with {@link
*  #writeFinalOutput(Object, DataOutput)}.  By default this
*  just calls {@link #read(DataInput)}. This tries to  reuse the object   
*  passed in if possible */
   public T readFinalOutput(DataInput in, T reuse) throws IOException {
 return read(in, reuse);
   }
 {code}
 The new methods could then be used in the FST in the readNextRealArc() method 
 passing in the output of the reused Arc. For most inputs they could even just 
 invoke the original read(in) method.
 If you should decide to make that change I'd be happy to supply a patch 
 and/or tests for the feature.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967811#comment-13967811
 ] 

Robert Muir commented on LUCENE-5604:
-

Can we use methods like Integer.reverseBytes/rotateLeft instead of doing byte 
swapping or bit rotations manually? this may improve the speed, e.g. the former 
is a jvm intrinsic.

 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5981) Please change method visibility of getSolrWriter in DataImportHandler to public (or at least protected)

2014-04-13 Thread Aaron LaBella (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967816#comment-13967816
 ] 

Aaron LaBella commented on SOLR-5981:
-

Erick,

Thanks -- will do.  I probably would've done that but my SVN skills aren't that 
great.  I accidentally built from trunk first, and then realized I should've 
built against a branch.  Then, I tried to run git svn clone ... but that seemed 
to take forever as well.  Just curious -- are there any plans to migrate 
lucene/solr to a git repository?  +1 for git from me ;-)

Thanks.

Aaron

 Please change method visibility of getSolrWriter in DataImportHandler to 
 public (or at least protected)
 ---

 Key: SOLR-5981
 URL: https://issues.apache.org/jira/browse/SOLR-5981
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.0
 Environment: Linux 3.13.9-200.fc20.x86_64
 Solr 4.6.0
Reporter: Aaron LaBella
Assignee: Shawn Heisey
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: SOLR-5981.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I've been using the org.apache.solr.handler.dataimport.DataImportHandler for 
 a bit and it's an excellent model and architecture.  I'd like to extend the 
 usage of it to plugin my own DIHWriter, but, the code doesn't allow for it.  
 Please change ~line 227 in the DataImportHander class to be:
 public SolrWriter getSolrWriter
 instead of:
 private SolrWriter getSolrWriter
 or, at a minimum, protected, so that I can extend DataImportHandler and 
 override this method.
 Thank you *sincerely* in advance for the quick turn-around on this.  If the 
 change can be made in 4.6.0 and upstream, that'd be ideal.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter

2014-04-13 Thread Georg Sorst (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967819#comment-13967819
 ] 

Georg Sorst edited comment on SOLR-2834 at 4/13/14 1:35 PM:


I can verfiy that this is still open for Solr 4.4.

I would really like to fix this issue, but need some advice on what / where to 
fix exactly. I see two options:

# Fix the output of the field-analysis request so that it uses {{arr ...}} for 
CharFilters just like it does for Tokenizers and TokenFilters
** This will probably confuse Solr Admin and who knows what else
# Fix the {{FieldAnalysisResponse}} / {{AnalysisResponseBase}} so that it can 
deal with the current response format ({{str ..}} for CharFilters)
** The {{AnalysisResponseBase}} assumes in many places that the output is 
{{arr-lst-str}} due to the Generics of the NamedLists; it would be hard to 
make this change decently type-safe

I'm a bit lost here. If someone could give me a few pointers which option is 
better and which tests to adapt I'll glady try to take care of it.


was (Author: gs):
I would really like to fix this issue, but need some advice on what / where to 
fix exactly. I see two options:

# Fix the output of the field-analysis request so that it uses {{arr ...}} for 
CharFilters just like it does for Tokenizers and TokenFilters
** This will probably confuse Solr Admin and who knows what else
# Fix the {{FieldAnalysisResponse}} / {{AnalysisResponseBase}} so that it can 
deal with the current response format ({{str ..}} for CharFilters)
** The {{AnalysisResponseBase}} assumes in many places that the output is 
{{arr-lst-str}} due to the Generics of the NamedLists; it would be hard to 
make this change decently type-safe

I'm a bit lost here. If someone could give me a few pointers which option is 
better and which tests to adapt I'll glady try to take care of it.

 AnalysisResponseBase.java doesn't handle 
 org.apache.solr.analysis.HTMLStripCharFilter
 -

 Key: SOLR-2834
 URL: https://issues.apache.org/jira/browse/SOLR-2834
 Project: Solr
  Issue Type: Bug
  Components: clients - java, Schema and Analysis
Affects Versions: 3.4, 3.6, 4.2
Reporter: Shane
Assignee: Shalin Shekhar Mangar
Priority: Blocker
  Labels: patch
 Attachments: AnalysisResponseBase.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using FieldAnalysisRequest.java to analysis a field, a 
 ClassCastExcpetion is thrown if the schema defines the filter 
 org.apache.solr.analysis.HTMLStripCharFilter.  The exception is:
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
 My schema definition is:
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.StandardFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 The response is part is:
 lst name=query
   str name=org.apache.solr.analysis.HTMLStripCharFiltertesting 
 analysis/str
   arr name=org.apache.lucene.analysis.standard.StandardTokenizer
 lst...
 A simplistic fix would be to test if the Entry value is an instance of List.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter

2014-04-13 Thread Georg Sorst (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967819#comment-13967819
 ] 

Georg Sorst commented on SOLR-2834:
---

I would really like to fix this issue, but need some advice on what / where to 
fix exactly. I see two options:

# Fix the output of the field-analysis request so that it uses {{arr ...}} for 
CharFilters just like it does for Tokenizers and TokenFilters
** This will probably confuse Solr Admin and who knows what else
# Fix the {{FieldAnalysisResponse}} / {{AnalysisResponseBase}} so that it can 
deal with the current response format ({{str ..}} for CharFilters)
** The {{AnalysisResponseBase}} assumes in many places that the output is 
{{arr-lst-str}} due to the Generics of the NamedLists; it would be hard to 
make this change decently type-safe

I'm a bit lost here. If someone could give me a few pointers which option is 
better and which tests to adapt I'll glady try to take care of it.

 AnalysisResponseBase.java doesn't handle 
 org.apache.solr.analysis.HTMLStripCharFilter
 -

 Key: SOLR-2834
 URL: https://issues.apache.org/jira/browse/SOLR-2834
 Project: Solr
  Issue Type: Bug
  Components: clients - java, Schema and Analysis
Affects Versions: 3.4, 3.6, 4.2
Reporter: Shane
Assignee: Shalin Shekhar Mangar
Priority: Blocker
  Labels: patch
 Attachments: AnalysisResponseBase.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using FieldAnalysisRequest.java to analysis a field, a 
 ClassCastExcpetion is thrown if the schema defines the filter 
 org.apache.solr.analysis.HTMLStripCharFilter.  The exception is:
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
 My schema definition is:
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.StandardFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 The response is part is:
 lst name=query
   str name=org.apache.solr.analysis.HTMLStripCharFiltertesting 
 analysis/str
   arr name=org.apache.lucene.analysis.standard.StandardTokenizer
 lst...
 A simplistic fix would be to test if the Entry value is an instance of List.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967851#comment-13967851
 ] 

Yonik Seeley commented on LUCENE-5604:
--

The JVM recognizes pairs of shifts that amount to a rotate and replaces them 
with an intrinsic.

bq. Initial patch, Lucene tests pass, but solrj doesn't yet compile

Right - SolrJ does not have lucene dependencies.  Solr also depends on the 
*exact* hash, so it can't be tweaked (for example if a variant turns out to be 
better for lucene indexing).  Perhaps Lucene should just make a copy of the one 
it needs (the byte[] version).

 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967866#comment-13967866
 ] 

Uwe Schindler commented on LUCENE-5604:
---

bq. The JVM recognizes pairs of shifts that amount to a rotate and replaces 
them with an intrinsic.

I still think we should replace them by the methods. This is the same like 
replacing the ternary {{? :}} with {{Number.compare(x,y)}} for comparators. 
Brings no improvements, just better readability in Java 7 and is less 
error-prone (cf. the possible overflows if implementing the compare with a dumb 
ternary op).

 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception while unmarshalling response in SolrJ

2014-04-13 Thread Shawn Heisey

On 4/12/2014 11:46 PM, Prathik Puthran wrote:
Hi,

I am using SolrJ client to send request to Solr. But instead of calling
Solr directly SolrJ communicates with my proxy server which in turn
calls Solr and gets the response in javabin format and returns back the
response to the client in the same format. The proxy server is written
using play framework and just sends request to Solr and returns the HTTP
response to client. Below is the exception I get in SolrJ client library
when it tries to unmarshall the javabin response. I'm using Solrj 4.7.0.
How can I fix this?

Exception Stack trace:

*Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:477)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.br.solr.Main.main(Main.java:20)
Caused by: java.lang.NullPointerException
at
org.apache.solr.common.util.JavaBinCodec.readExternString(JavaBinCodec.java:769)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:192)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:475)
... 4 more

This started as a thread on the user list where someone else put up the
same information, but there they said that the Solr and SolrJ versions
were 4.3.0.

The line numbers in the exception on the user list match up to 4.3.0,
and the line numbers here match up to 4.7.0, which is good.

In the user list discussion the poster indicated that the production
application cannot be changed, but can you set up a testing version and
send the request directly to Solr, bypassing the play framework?

If you do that and it works, then you'll need to look for help with your
play framework code on one of their support venues. They'll need to
tell you how to relay the response without changing it. If the request
direct to Solr doesn't work, then we can troubleshoot that part of it.
The user list is a more appropriate venue.

Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter

2014-04-13 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967880#comment-13967880
 ] 

Shawn Heisey commented on SOLR-2834:


[~gs], are you able to test your code with the 4.7.1 release, both on the 
server and SolrJ?  It would actually be better if you could use the current 
4.7.2 release candidate.  I believe the release vote has passed, so this is 
what will actually become 4.7.2 in the next couple of days:

http://people.apache.org/~rmuir/staging_area/lucene_solr_4_7_2_r1586229/

It is highly unlikely that there will ever be a new 4.4 release.


 AnalysisResponseBase.java doesn't handle 
 org.apache.solr.analysis.HTMLStripCharFilter
 -

 Key: SOLR-2834
 URL: https://issues.apache.org/jira/browse/SOLR-2834
 Project: Solr
  Issue Type: Bug
  Components: clients - java, Schema and Analysis
Affects Versions: 3.4, 3.6, 4.2
Reporter: Shane
Assignee: Shalin Shekhar Mangar
Priority: Blocker
  Labels: patch
 Attachments: AnalysisResponseBase.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using FieldAnalysisRequest.java to analysis a field, a 
 ClassCastExcpetion is thrown if the schema defines the filter 
 org.apache.solr.analysis.HTMLStripCharFilter.  The exception is:
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
 My schema definition is:
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.StandardFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 The response is part is:
 lst name=query
   str name=org.apache.solr.analysis.HTMLStripCharFiltertesting 
 analysis/str
   arr name=org.apache.lucene.analysis.standard.StandardTokenizer
 lst...
 A simplistic fix would be to test if the Entry value is an instance of List.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967884#comment-13967884
 ] 

Dawid Weiss commented on LUCENE-5604:
-

 by setting a common static seed on JVM init (just System.currentTimeMillis()).

This will render any tests that rely on hash ordering, etc. not-repeatable. I 
suggest initializing this to current time millis OR to the current random seed 
value (system property 'tests.seed').

 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exception while unmarshalling response in SolrJ

2014-04-13 Thread Furkan KAMACI

Hi;

I've answered a similar question at mail list. Please check it and give
your feedbacks. If I have time I will check it with a Play App.

Thanks;
Furkan KAMACI
13 Nis 2014 19:00 tarihinde Shawn Heisey s...@elyograg.org yazdı:

On 4/12/2014 11:46 PM, Prathik Puthran wrote:
Hi,

Exception Stack trace:

*Exception in thread main
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:477)
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at

org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.br.solr.Main.main(Main.java:20)
Caused by: java.lang.NullPointerException
at

org.apache.solr.common.util.JavaBinCodec.readExternString(JavaBinCodec.java:769)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:192)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at

org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:475)
... 4 more

This started as a thread on the user list where someone else put up the
same information, but there they said that the Solr and SolrJ versions
were 4.3.0.

The line numbers in the exception on the user list match up to 4.3.0,
and the line numbers here match up to 4.7.0, which is good.

Thanks,
Shawn

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2834) AnalysisResponseBase.java doesn't handle org.apache.solr.analysis.HTMLStripCharFilter

2014-04-13 Thread Georg Sorst (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967896#comment-13967896
 ] 

Georg Sorst commented on SOLR-2834:
---

[~elyograg] The issue still exists in 4.7.1. Unfortunately I could not get 
4.7.2 to run ({{svn checkout}} would insist on a redirect to the same URL) but 
from looking at the code it exists there as well.

 AnalysisResponseBase.java doesn't handle 
 org.apache.solr.analysis.HTMLStripCharFilter
 -

 Key: SOLR-2834
 URL: https://issues.apache.org/jira/browse/SOLR-2834
 Project: Solr
  Issue Type: Bug
  Components: clients - java, Schema and Analysis
Affects Versions: 3.4, 3.6, 4.2
Reporter: Shane
Assignee: Shalin Shekhar Mangar
Priority: Blocker
  Labels: patch
 Attachments: AnalysisResponseBase.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using FieldAnalysisRequest.java to analysis a field, a 
 ClassCastExcpetion is thrown if the schema defines the filter 
 org.apache.solr.analysis.HTMLStripCharFilter.  The exception is:
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)
 My schema definition is:
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.StandardFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 The response is part is:
 lst name=query
   str name=org.apache.solr.analysis.HTMLStripCharFiltertesting 
 analysis/str
   arr name=org.apache.lucene.analysis.standard.StandardTokenizer
 lst...
 A simplistic fix would be to test if the Entry value is an instance of List.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4478) Allow cores to specify a named config set in non-SolrCloud mode

2014-04-13 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967906#comment-13967906
]

Erick Erickson commented on SOLR-4478:
--

[~romseygeek] Can this be closed then? I'm also thinking that SOLR-4779 should
just be closed as won't fix since I don't see a good reason to deprecate
shareSchema. The hope was that we could share everything in a config set, but
as I remember sharing solrconfig was fraught. It seems to me that if we want
to go farther down the sharing route thing, we need to use some other sharing
model than piecemeal

Thoughts?

Allow cores to specify a named config set in non-SolrCloud mode
---

Key: SOLR-4478
URL: https://issues.apache.org/jira/browse/SOLR-4478
Project: Solr
Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
Assignee: Alan Woodward
Fix For: 4.8, 5.0

Attachments: SOLR-4478-take2.patch, SOLR-4478-take2.patch,
SOLR-4478-take2.patch, SOLR-4478-take2.patch, SOLR-4478.patch,
SOLR-4478.patch, solr.log

Part of moving forward to the new way, after SOLR-4196 etc... I propose an
additional parameter specified on the core node in solr.xml or as a
parameter in the discovery mode core.properties file, call it configSet,
where the value provided is a path to a directory, either absolute or
relative. Really, this is as though you copied the conf directory somewhere
to be used by more than one core.
Straw-man: There will be a directory solr_home/configsets which will be the
default. If the configSet parameter is, say, myconf, then I'd expect a
directory named myconf to exist in solr_home/configsets, which would look
something like
solr_home/configsets/myconf/schema.xml
solrconfig.xml
stopwords.txt
velocity
velocity/query.vm
etc.
If multiple cores used the same configSet, schema, solrconfig etc. would all
be shared (i.e. shareSchema=true would be assumed). I don't see a good
use-case for _not_ sharing schemas, so I don't propose to allow this to be
turned off. Hmmm, what if shareSchema is explicitly set to false in the
solr.xml or properties file? I'd guess it should be honored but maybe log a
warning?
Mostly I'm putting this up for comments. I know that there are already
thoughts about how this all should work floating around, so before I start
any work on this I thought I'd at least get an idea of whether this is the
way people are thinking about going.
Configset can be either a relative or absolute path, if relative it's assumed
to be relative to solr_home.
Thoughts?

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-13 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-5871:


Assignee: Erick Erickson

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-13 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967907#comment-13967907
 ] 

Erick Erickson commented on SOLR-5871:
--

Hmmm, what's to review? JIRAs are generally used to propose code changes and/or 
discuss how to improve/change the code and/or attach patches. If this is a more 
general how to question, it's better to raise it on the user's list rather, 
you'll get lots more help there.

I'll close this in a couple of days unless there's something I'm missing. This 
is certainly something we see regularly as a request, code patches are welcome!

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967913#comment-13967913
 ] 

Jack Krupansky commented on SOLR-5871:
--

I've lost count of how many times users have requested this feature. The basic 
request is for an easy way to determine which fields matched which values for 
each document, as opposed to having to sift through the debug explanation.

One technical difficulty is analysis - the results could report the analyzed 
field values which matched, which won't necessarily literally agree with the 
source terms due to case, stemming, synonyms, etc.

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-13 Thread Alexander S. (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967927#comment-13967927
 ] 

Alexander S. commented on SOLR-5871:


I already asked at solr-u...@lucene.apache.org but seems only one way currently 
is to read the debug explanation. Unfortunately I am not a java developer thus 
unable to create a patch, but Solr jira has a wish type so I posted my wish 
here.

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.7-Linux (32bit/jdk1.7.0_60-ea-b13) - Build # 106 - Still Failing!

2014-04-13 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.7-Linux/106/
Java: 32bit/jdk1.7.0_60-ea-b13 -client -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:33105 within 3 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:33105 within 3 ms
at 
__randomizedtesting.SeedInfo.seed([C393C5EA46EF7A32:42754BF231B01A0E]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:148)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:99)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:94)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:85)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:200)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at

[jira] [Created] (SOLR-5982) SSLMigrationTest can fail with leaked threads due to problems stopping / starting jetty.

2014-04-13 Thread Mark Miller (JIRA)

Mark Miller created SOLR-5982:
-

 Summary: SSLMigrationTest can fail with leaked threads due to 
problems stopping / starting jetty.
 Key: SOLR-5982
 URL: https://issues.apache.org/jira/browse/SOLR-5982
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.

2014-04-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967945#comment-13967945
 ] 

Mark Miller commented on SOLR-5776:
---

On a tip from Robert, I started looking at SecureRandom as the source of this 
problem.

It seems that at least on Linux, the default SecureRandom algorithm will get 
data from /dev/random, which can block once it exhausts entropy.

Some testing with a custom java.security.egd file seems to bear this out as the 
problem.

I'm still trying to work out the best solution.

 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller

 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5776) Look at speeding up using SSL with tests.

2014-04-13 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5776:
-

Assignee: Mark Miller

 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.9, 5.0


 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5776) Look at speeding up using SSL with tests.

2014-04-13 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5776:
--

Fix Version/s: 5.0
   4.9

 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.9, 5.0


 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5604) Should we switch BytesRefHash to MurmurHash3?

2014-04-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5604:
---

Attachment: LUCENE-5604.patch

New patch, folding in all feedback (thanks!).  I think it's ready:

  * I reverted the Solr changes

  * I dup'd the murmurhash3_x86_32 taking byte[] into StringHelper, but changed 
to the intrinsics for Integer.rotateLeft

  * I added a small test case, confirming our MurmurHash3 impl matches a 
separate Python/C impl I found

  * I made the hashing private to BytesRefHash, and changed 
TermToBytesAtt.fillBytesRef to return void

  * For the seed/salt, I now pull from tests.seed property if it's non-null


 Should we switch BytesRefHash to MurmurHash3?
 -

 Key: LUCENE-5604
 URL: https://issues.apache.org/jira/browse/LUCENE-5604
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.9, 5.0

 Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch, 
 LUCENE-5604.patch


 MurmurHash3 has better hashing distribution than the current hash function we 
 use for BytesRefHash which is a simple multiplicative function with 31 
 multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
 Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene / Solr 4.7.2 (take two)

2014-04-13 Thread Robert Muir

The vote passes. Thanks everyone for voting.
On Apr 10, 2014 10:51 AM, Robert Muir rcm...@gmail.com wrote:

 artifacts are here:

 http://people.apache.org/~rmuir/staging_area/lucene_solr_4_7_2_r1586229/

 here is my +1
 SUCCESS! [0:46:25.014499]

[jira] [Closed] (LUCENE-5598) About Scoring

2014-04-13 Thread Park JungHo (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Park JungHo closed LUCENE-5598.
---


 About Scoring
 -

 Key: LUCENE-5598
 URL: https://issues.apache.org/jira/browse/LUCENE-5598
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/query/scoring
Affects Versions: 4.7
Reporter: Park JungHo
  Labels: mentor, patch
 Fix For: 4.7


 I had been generating long type's indexing data using LongField(Field name is 
 'boost' and value is atomicLong.) for using CustomScoreQuery.
  And then, I'm applied following code.
  
 //code start 
  FunctionQuery fquery = new FunctionQuery(new LongFieldSource(boost));
  CustomScoreQuery customQuery = new ScoreQuery(query, fquery);
  //code end =
  
 If indexed data count is 100, I expect 100, 99, 98, ... 91.
  But, the result was not matched with my expectation if the number of the 
 indexed data gets increased. (For instance 99985, 99986, 99987, 
 99988, ... 4 when one billion index count )
  
 I thought that was caused by scoring alogorithm returning float value. 
 (Floating point limit.)
  That is correct?
  How can I get the result i expect?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors

2014-04-13 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5973:
-

Fix Version/s: 4.9

 Pluggable Ranking Collectors
 

 Key: SOLR-5973
 URL: https://issues.apache.org/jira/browse/SOLR-5973
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5973.patch, SOLR-5973.patch


 This ticket adds the ability to plugin a custom ranking collector to Solr. 
 The proposed design is much simpler then SOLR-4465, which includes 
 configuration support and support for pluggable analytics collectors.
 In this design, a CollectorFactory can be set onto the ResponseBuilder by a 
 custom SearchComponent. The CollectorFactory is then used to inject a custom 
 TopDocsCollector into the SolrIndexSearcher.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5831) Scale score PostFilter

2014-04-13 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5831:
-

Fix Version/s: 4.9

 Scale score PostFilter
 --

 Key: SOLR-5831
 URL: https://issues.apache.org/jira/browse/SOLR-5831
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.7
Reporter: Peter Keegan
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.9

 Attachments: SOLR-5831.patch, SOLR-5831.patch, SOLR-5831.patch, 
 TestScaleScoreQParserPlugin.patch


 The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
 This is an alternative to using a function query wrapping a scale() wrapping 
 a query(). For example:
 select?qq={!edismax v='news' qf='title^2 
 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query
  v=$qq}
 The problem with this query is that it has to scale every hit. Usually, only 
 the returned hits need to be scaled,
 but there may be use cases where the number of hits to be scaled is greater 
 than the returned hit count,
 but less than or equal to the total hit count.
 Sample syntax:
 fq={!scalescore+l=0.0 u=1.0 maxscalehits=1 
 func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
 l=0.0 u=1.0   //Scale scores to values between 0-1, inclusive 
 maxscalehits=1//The maximum number of result scores to scale (-1 = 
 all hits, 0 = results 'page' size)
 func=...  //Apply the composite function to each hit. The 
 scaled score value is accessed by the 'score()' value source
 All parameters are optional. The defaults are:
 l=0.0 u=1.0
 maxscalehits=0 (result window size)
 func=(null)
  
 Note: this patch is not complete, as it contains no test cases and may not 
 conform 
 to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
  
 I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.7.0_60-ea-b13) - Build # 9964 - Still Failing!

2014-04-13 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9964/
Java: 64bit/jdk1.7.0_60-ea-b13 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains

Error Message:
startOffset must be non-negative, and endOffset must be = startOffset, 
startOffset=10,endOffset=5

Stack Trace:
java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be = startOffset, startOffset=10,endOffset=5
at 
__randomizedtesting.SeedInfo.seed([909448D307EA17A6:AD7561B240F80A66]:0)
at 
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45)
at 
org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
at 
org.apache.lucene.analysis.de.GermanLightStemFilter.incrementToken(GermanLightStemFilter.java:48)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:701)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:612)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:511)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:922)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at

[jira] [Updated] (SOLR-5776) Look at speeding up using SSL with tests.

2014-04-13 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5776:
--

Attachment: SOLR-5776.patch

Attached patch appears to be a working workaround.

 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.9, 5.0

 Attachments: SOLR-5776.patch


 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5980) AbstractFullDistribZkTestBase#compareResults always returns false for shouldFail.

2014-04-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968074#comment-13968074
 ] 

ASF subversion and git services commented on SOLR-5980:
---

Commit 1587149 from markrmil...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1587149 ]

SOLR-5980: Add a test.

 AbstractFullDistribZkTestBase#compareResults always returns false for 
 shouldFail.
 -

 Key: SOLR-5980
 URL: https://issues.apache.org/jira/browse/SOLR-5980
 Project: Solr
  Issue Type: Test
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.9, 5.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5776) Look at speeding up using SSL with tests.

2014-04-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968077#comment-13968077
 ] 

Mark Miller commented on SOLR-5776:
---

Bah, it seems to be much less frequent, but it can still happen. I think the 
issue is that if you don't specify the seed, it will still read from 
/dev/random for that.

I had looked into a custom SecureRandom via SPI, but it's my first foray into 
SPI, and while it seems relatively straightforward, I have not yet figured out 
how to plug in a custom SecureRandomSPI class in tests. Even when that's done, 
the impl is not so straightforward - from what I can tell, you cannot extend 
the standard SecureRandom to fix this and the open jdk code is Oracle and the 
Harmony code is fairly different and would require some hacking to get in. 
Otherwise we would need to come up with a clean room impl that was still about 
as decent as Random.

 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.9, 5.0

 Attachments: SOLR-5776.patch


 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5776) Look at speeding up using SSL with tests.

2014-04-13 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968077#comment-13968077
 ] 

Mark Miller edited comment on SOLR-5776 at 4/14/14 4:59 AM:


Bah, it seems to be much less frequent, but it can still happen. I think the 
issue is that if you don't specify the seed, it will still read from 
/dev/random for that.

I had looked into a custom SecureRandom via SPI, but it's my first foray into 
SPI, and while it seems relatively straightforward, I have not yet figured out 
how to plug in a custom SecureRandomSPI class in tests. Even when that's done, 
the impl is not so straightforward - from what I can tell, you cannot extend 
the standard SecureRandom to fix this and the open jdk code is -Oracle- 
{color:red}GPL{color} and the Harmony code is fairly different and would 
require some hacking to get in. Otherwise we would need to come up with a clean 
room impl that was still about as decent as Random.


was (Author: markrmil...@gmail.com):
Bah, it seems to be much less frequent, but it can still happen. I think the 
issue is that if you don't specify the seed, it will still read from 
/dev/random for that.

I had looked into a custom SecureRandom via SPI, but it's my first foray into 
SPI, and while it seems relatively straightforward, I have not yet figured out 
how to plug in a custom SecureRandomSPI class in tests. Even when that's done, 
the impl is not so straightforward - from what I can tell, you cannot extend 
the standard SecureRandom to fix this and the open jdk code is Oracle and the 
Harmony code is fairly different and would require some hacking to get in. 
Otherwise we would need to come up with a clean room impl that was still about 
as decent as Random.

 Look at speeding up using SSL with tests.
 -

 Key: SOLR-5776
 URL: https://issues.apache.org/jira/browse/SOLR-5776
 Project: Solr
  Issue Type: Test
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.9, 5.0

 Attachments: SOLR-5776.patch


 We have to disable SSL on a bunch of tests now because it appears to sometime 
 be ridiculously slow - especially in slow envs (I never see timeouts on my 
 machine).
 I was talking to Robert about this, and he mentioned that there might be some 
 settings we could change to speed it up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5596) Support for index/search large numeric field

2014-04-13 Thread Kevin Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wang updated LUCENE-5596:
---

Attachment: LUCENE-5596.patch

initial patch to support BigInteger, I've copied and modified some tests for 
long field (e.g. TestNumericUtils, TestNumericTokenStream, 
TestNumericRangeQuery, TestSortDocValues) to support BigInteger and all passed.



 Support for index/search large numeric field
 

 Key: LUCENE-5596
 URL: https://issues.apache.org/jira/browse/LUCENE-5596
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Kevin Wang
 Attachments: LUCENE-5596.patch


 Currently if an number is larger than Long.MAX_VALUE, we can't index/search 
 that in lucene as a number. For example, IPv6 address is an 128 bit number, 
 so we can't index that as a numeric field and do numeric range query etc.
 It would be good to support BigInteger / BigDecimal
 I've tried use BigInteger for IPv6 in Elasticsearch and that works fine, but 
 there are still lots of things to do
 https://github.com/elasticsearch/elasticsearch/pull/5758



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

41 matches

Mail list logo