[JENKINS] Lucene-Solr-Tests-trunk-Java8 - Build # 226 - Still Failing

2015-07-20 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java8/226/

No tests ran.

Build Log:
[...truncated 10457 lines...]
   [junit4] Suite: org.apache.solr.core.SolrCoreCheckLockOnStartupTestFATAL: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel

hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:742)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy59.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
at hudson.Launcher$ProcStarter.join(Launcher.java:367)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
at hudson.model.Build$BuildExecution.build(Build.java:198)
at hudson.model.Build$BuildExecution.doRun(Build.java:159)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1706)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:232)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:805)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
at 
hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents

2015-07-20 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633103#comment-14633103
 ] 

Anshum Gupta commented on SOLR-445:
---

I'm seeing a few errors with the current patch and I think I know what's going 
on. I'll take a look at it and update the patch tomorrow.

 Update Handlers abort with bad documents
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Anshum Gupta
 Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, 
 SOLR-445-alternative.patch, SOLR-445-alternative.patch, 
 SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 745 - Still Failing

2015-07-20 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/745/

No tests ran.

Build Log:
[...truncated 1019 lines...]
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
termination of the channel
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:742)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy59.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
at hudson.Launcher$ProcStarter.join(Launcher.java:367)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
at hudson.model.Build$BuildExecution.build(Build.java:198)
at hudson.model.Build$BuildExecution.doRun(Build.java:159)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1706)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:232)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
Unexpected termination of the channel
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:805)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
at 
hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6686) Improve InforStream API

2015-07-20 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633099#comment-14633099
 ] 

Dawid Weiss commented on LUCENE-6686:
-

This has really been reinvented over and over in logging APIs. The 
{{isEnabled(level)}} idiom is necessary when argument construction is complex 
and costly (so that you want to avoid it before the method call).

 Improve InforStream API
 ---

 Key: LUCENE-6686
 URL: https://issues.apache.org/jira/browse/LUCENE-6686
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Cao Manh Dat

 Currently, We use InfoStream in duplicated ways. For example
 {code}
 if (infoStream.isEnabled(IW)) {
 infoStream.message(IW, init: loaded commit \ + 
 commit.getSegmentsFileName() + \);
   }
 {code}
 Can we change the API of InfoStream to 
 {code}
 infoStream.messageIfEnabled(component,message);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7691) SolrEntityProcessor as SubEntity doesn't work with delta-import

2015-07-20 Thread Sebastian Krebs (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Krebs updated SOLR-7691:
--
Flags: Important

 SolrEntityProcessor as SubEntity doesn't work with delta-import
 ---

 Key: SOLR-7691
 URL: https://issues.apache.org/jira/browse/SOLR-7691
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 5.0, 5.1, 5.2, 5.2.1
Reporter: Sebastian Krebs

 I've used the {{SolrEntityProcessor}} as sub-entity in the dataimporter like 
 this
 {code:lang=xml}
 dataConfig
 document name=products
 entity
 name=outer
 dataSource=my_datasource
 pk=id
 query=...
 deltaQuery=...
 deltaImportQuery=...
 
 entity
 name=solr
 processor=SolrEntityProcessor
 url=http://127.0.0.1:8983/solr/${solr.core.name};
 query=Xid:${outer.Xid}
 rows=1
 fl=Id,FieldA,FieldB
 wt=javabin
 /
 /entity
 /document
 /dataConfig
 {code}
 Recently I decided to upgrade to 5.x, but the delta-import stopped working. 
 At all it looks like the http-connection used by the {{SolrEntityProcessor}} 
 is closed right _after_ the request/response, because the first document is 
 indexed properly and for the second connection the dataimport fetches the 
 record from the database, but after that exists 
 This is the stacktrace taken from the log
 {code:lang=none}
 java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.IllegalStateException: Connection pool shut down
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:444)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:482)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
 Caused by: java.lang.RuntimeException: 
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.IllegalStateException: Connection pool shut down
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:363)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:224)
 ... 3 more
 Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.IllegalStateException: Connection pool shut down
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
 ... 5 more
 Caused by: java.lang.IllegalStateException: Connection pool shut down
 at org.apache.http.util.Asserts.check(Asserts.java:34)
 at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:184)
 at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:217)
 at 
 org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:466)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)
 at 
 org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)
 at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
 at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943)
 at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958)
 at 
 org.apache.solr.handler.dataimport.SolrEntityProcessor.doQuery(SolrEntityProcessor.java:198)
 at 
 

[jira] [Updated] (SOLR-7803) Classloading deadlock in TrieField

2015-07-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-7803:

Attachment: SOLR-7803.patch

I did a bit more refactoring:
- rename DateUtils - DateFormatUtil (the other name was somehow a confusing 
duplicate, so Eclipse autocomplete showed too much unspecific stuff).
- I removed more formatting methods out of TrieDateField. TrieDateField is now 
as any other Trie(Long|Int|Double|Float)Field - short and compact.

I will commit this later and add backwards layer in 5.x. All tests pass.

 Classloading deadlock in TrieField
 --

 Key: SOLR-7803
 URL: https://issues.apache.org/jira/browse/SOLR-7803
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: OSX, JDK8u45
Reporter: Markus Heiden
Assignee: Uwe Schindler
  Labels: patch
 Fix For: 5.3, Trunk

 Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch


 When starting a test Sol instance, it locks up sometimes. We took a thread 
 dump and all threads are trying to load classes via Class.forName() and are 
 stuck in that method. One of these threads got one step further into the 
 clinit of TrieField where it creates an internal static instance of 
 TrieDateField (circular dependency). I don't know why this locks up exactly, 
 but this code smells anyway. So I removed that instance and made the used 
 methods static in TrieDateField.
 This does not completely remove the circular dependency, but at least it is 
 no more in clinit. For the future someone may extract a util class to 
 remove the circular dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6225) Clarify documentation of clone() in IndexInput

2015-07-20 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-6225:

Fix Version/s: 5.3

 Clarify documentation of clone() in IndexInput
 --

 Key: LUCENE-6225
 URL: https://issues.apache.org/jira/browse/LUCENE-6225
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 5.3, Trunk

 Attachments: LUCENE-6225.patch


 Here is a snippet from IndexInput's documentation:
 {code}
 The original instance must take care that cloned instances throw 
 AlreadyClosedException when the original one is closed.
 {code}
 But concrete implementations don't throw this AlreadyClosedException (this 
 would break the contract on Closeable). For example, see NIOFSDirectory:
 {code}
 public void close() throws IOException {
   if (!isClone) {
 channel.close();
   }
 }
 {code}
 What trapped me was that the abstract class IndexInput overrides the default 
 implementation of clone(), but doesn't do anything useful... I guess you 
 could make it final and provide the tracking for cloned instances in this 
 class rather than reimplementing it everywhere else (isCloned() would be a 
 superclass method then too). Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)
Marko Bonaci created LUCENE-6687:


 Summary: MLT term frequency calculation bug
 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci


In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{{q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009}}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7803) Classloading deadlock in TrieField = refactor date formatting/parsing to static utility class

2015-07-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-7803:

Summary: Classloading deadlock in TrieField = refactor date 
formatting/parsing to static utility class  (was: Classloading deadlock in 
TrieField)

 Classloading deadlock in TrieField = refactor date formatting/parsing to 
 static utility class
 --

 Key: SOLR-7803
 URL: https://issues.apache.org/jira/browse/SOLR-7803
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: OSX, JDK8u45
Reporter: Markus Heiden
Assignee: Uwe Schindler
  Labels: patch
 Fix For: 5.3, Trunk

 Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch


 When starting a test Sol instance, it locks up sometimes. We took a thread 
 dump and all threads are trying to load classes via Class.forName() and are 
 stuck in that method. One of these threads got one step further into the 
 clinit of TrieField where it creates an internal static instance of 
 TrieDateField (circular dependency). I don't know why this locks up exactly, 
 but this code smells anyway. So I removed that instance and made the used 
 methods static in TrieDateField.
 This does not completely remove the circular dependency, but at least it is 
 no more in clinit. For the future someone may extract a util class to 
 remove the circular dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_45) - Build # 13531 - Still Failing!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13531/
Java: 64bit/jdk1.8.0_45 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests

Error Message:
Timeout waiting for CDCR replication to complete @source_collection:shard2

Stack Trace:
java.lang.RuntimeException: Timeout waiting for CDCR replication to complete 
@source_collection:shard2
at 
__randomizedtesting.SeedInfo.seed([1EA7E284A8B756C3:16C797A8A7B97EC8]:0)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForReplicationToComplete(BaseCdcrDistributedZkTest.java:732)
at 
org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTestUpdateLogSynchronisation(CdcrReplicationDistributedZkTest.java:362)
at 
org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests(CdcrReplicationDistributedZkTest.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_45) - Build # 13530 - Failure!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13530/
Java: 64bit/jdk1.8.0_45 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest

Error Message:
Captured an uncaught exception in thread: Thread[id=7905, 
name=RecoveryThread-source_collection_shard1_replica1, state=RUNNABLE, 
group=TGRP-CdcrReplicationHandlerTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=7905, 
name=RecoveryThread-source_collection_shard1_replica1, state=RUNNABLE, 
group=TGRP-CdcrReplicationHandlerTest]
Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
at __randomizedtesting.SeedInfo.seed([528FCA919F1E8CAB]:0)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:234)
Caused by: org.apache.solr.common.SolrException: java.io.FileNotFoundException: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J2/temp/solr.cloud.CdcrReplicationHandlerTest_528FCA919F1E8CAB-001/jetty-001/cores/source_collection_shard1_replica1/data/tlog/tlog.007.1507203359819431936
 (No such file or directory)
at 
org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:244)
at 
org.apache.solr.update.CdcrTransactionLog.incref(CdcrTransactionLog.java:173)
at 
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1078)
at 
org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1578)
at 
org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610)
at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:866)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:526)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
Caused by: java.io.FileNotFoundException: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J2/temp/solr.cloud.CdcrReplicationHandlerTest_528FCA919F1E8CAB-001/jetty-001/cores/source_collection_shard1_replica1/data/tlog/tlog.007.1507203359819431936
 (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.init(RandomAccessFile.java:243)
at 
org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:236)
... 7 more




Build Log:
[...truncated 10973 lines...]
   [junit4] Suite: org.apache.solr.cloud.CdcrReplicationHandlerTest
   [junit4]   2 Creating dataDir: 
/home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J2/temp/solr.cloud.CdcrReplicationHandlerTest_528FCA919F1E8CAB-001/init-core-data-001
   [junit4]   2 1046749 INFO  
(SUITE-CdcrReplicationHandlerTest-seed#[528FCA919F1E8CAB]-worker) [] 
o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (true)
   [junit4]   2 1046749 INFO  
(SUITE-CdcrReplicationHandlerTest-seed#[528FCA919F1E8CAB]-worker) [] 
o.a.s.BaseDistributedSearchTestCase Setting hostContext system property: /
   [junit4]   2 1046751 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.ZkTestServer STARTING ZK TEST SERVER
   [junit4]   2 1046751 INFO  (Thread-3004) [] o.a.s.c.ZkTestServer client 
port:0.0.0.0/0.0.0.0:0
   [junit4]   2 1046751 INFO  (Thread-3004) [] o.a.s.c.ZkTestServer 
Starting server
   [junit4]   2 1046851 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.ZkTestServer start zk server on port:45204
   [junit4]   2 1046851 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2 1046852 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
   [junit4]   2 1046854 INFO  (zkCallback-796-thread-1) [] 
o.a.s.c.c.ConnectionManager Watcher 
org.apache.solr.common.cloud.ConnectionManager@4f0677f3 
name:ZooKeeperConnection Watcher:127.0.0.1:45204 got event WatchedEvent 
state:SyncConnected type:None path:null path:null type:None
   [junit4]   2 1046854 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
   [junit4]   2 1046854 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.c.SolrZkClient Using default ZkACLProvider
   [junit4]   2 1046854 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.c.SolrZkClient makePath: /solr
   [junit4]   2 1046856 INFO  
(TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] 
o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider
   [junit4]   2 1046856 INFO  

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Attachments: buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 

[jira] [Commented] (SOLR-7715) Remove IgnoreAcceptDocsQuery

2015-07-20 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633402#comment-14633402
 ] 

Adrien Grand commented on SOLR-7715:


I'll remove it shortly if there are no objections.

 Remove IgnoreAcceptDocsQuery
 

 Key: SOLR-7715
 URL: https://issues.apache.org/jira/browse/SOLR-7715
 Project: Solr
  Issue Type: Task
Reporter: Adrien Grand
Priority: Minor

 While reviewing how queries apply acceptDocs, I noticed that Solr has 
 org.apache.solr.search.join.IgnoreAcceptDocsQuery, but it looks unused. 
 Should we remove it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Extracting article keywords using tf-idf algorithm

2015-07-20 Thread Ali Nazemian
Hi again,
It seems that my problem with the strange behavior of Solr caused by the
fact that I tried to update documents and add keyword field inside the
Lucene index (not from using Solrj API) for the sake of better performance,
But it seems that some processes ignored by this way of modifying index.
(which is obvious) These processes that I am not aware of them are caused
the inconsistency.
One solution would be updating Index by adding a new document with using
SolrJ. As I mentioned this solution is not the best one in case of
performance concerns. (The indexing time would be doubled) Therefore it
would be nice if there are any possible and reliable solution available for
my problem with considering the performance concerns.

Best regards.


On Sat, Jul 18, 2015 at 9:40 PM, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Diego,
 Hi,
 Yeah, exactly what I want.
 As Shawn said it is acronym for More Like This. Actually since Lucene
 already did the hardworking for the purpose of calculating interesting
 terms, I just want to use that for adding a multi-value field to all
 indexed documents.

 Best regards.

 On Sat, Jul 18, 2015 at 8:08 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 7/18/2015 9:16 AM, Diego Ceccarelli wrote:
  Could you please post your code somewhere? I don't understand what is
  mlt  :)

 This is an acronym that means More Like This.

 https://wiki.apache.org/solr/MoreLikeThis

 Thanks,
 Shawn


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 A.Nazemian




-- 
A.Nazemian


[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!

Let's see what happens when we use {{mintf=15}}:

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!

I should probably mention that multiple fields work because I applied the 
patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].

Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears 

[jira] [Commented] (SOLR-7803) Classloading deadlock in TrieField

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633229#comment-14633229
 ] 

ASF subversion and git services commented on SOLR-7803:
---

Commit 1691900 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1691900 ]

SOLR-7803: Use Java 8 ThreadLocal

 Classloading deadlock in TrieField
 --

 Key: SOLR-7803
 URL: https://issues.apache.org/jira/browse/SOLR-7803
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: OSX, JDK8u45
Reporter: Markus Heiden
Assignee: Uwe Schindler
  Labels: patch
 Fix For: 5.3, Trunk

 Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch


 When starting a test Sol instance, it locks up sometimes. We took a thread 
 dump and all threads are trying to load classes via Class.forName() and are 
 stuck in that method. One of these threads got one step further into the 
 clinit of TrieField where it creates an internal static instance of 
 TrieDateField (circular dependency). I don't know why this locks up exactly, 
 but this code smells anyway. So I removed that instance and made the used 
 methods static in TrieDateField.
 This does not completely remove the circular dependency, but at least it is 
 no more in clinit. For the future someone may extract a util class to 
 remove the circular dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7719) Suggester Component results parsing

2015-07-20 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633241#comment-14633241
 ] 

Alessandro Benedetti commented on SOLR-7719:


Perfect Tommaso, thanks for the corrections !
As i provided another similar patch, I actually missed to apply the correction 
on my own on this one.
Can we close the issue ?

Cheers

 Suggester Component results parsing
 ---

 Key: SOLR-7719
 URL: https://issues.apache.org/jira/browse/SOLR-7719
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: 5.2.1
Reporter: Alessandro Benedetti
Assignee: Tommaso Teofili
Priority: Minor
  Labels: queryResponse, suggester, suggestions
 Fix For: Trunk

 Attachments: SOLR-7719.patch, SOLR-7719.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 Currently SolrJ org.apache.solr.client.solrj.response.QueryResponse is not 
 managing suggestions coming from the Suggest Component .
 It would be nice to have the suggestions properly managed and returned with 
 simply getter methods.
 Current Json :
 lst name=suggest
 lst name=dictionary1
 lst name=queryTerm
 int name=numFound2/int
 arr name=suggestions
lst
str name=termsuggestion1/str..
str name=termsuggestion2/str…
/lst
/arr
/lst
/lst..
 This will be parsed accordingly .
 Producing an easy to use Java Map.
 Dictionary2suggestions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: (was: 
solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png)

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Fix For: 5.2.2

 Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 !solr-mlt-tf-doubling-bug.png|height=500!
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 !solr-mlt-tf-doubling-bug-results.png|height=700!
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 !terms-glass.png|height=100!
 !terms-angry.png|height=100!
 !terms-how.png|height=100!
 !terms-accumulator.png|height=100!
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
 Let's see what happens when we use {{mintf=15}}:
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
 I should probably mention that multiple fields ({{qf}}) work because I 
 applied the patch: 
 [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Fix For: 5.2.2

 Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 !solr-mlt-tf-doubling-bug.png|height=500!
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 !solr-mlt-tf-doubling-bug-results.png|height=700!
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 !terms-glass.png|height=100!
 !terms-angry.png|height=100!
 !terms-how.png|height=100!
 !terms-accumulator.png|height=100!
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
 Let's see what happens when we use {{mintf=15}}:
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
 I should probably mention that multiple fields ({{qf}}) work because I 
 applied the patch: 
 [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6225) Clarify documentation of clone() in IndexInput

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633181#comment-14633181
 ] 

ASF subversion and git services commented on LUCENE-6225:
-

Commit 1691888 from [~dawidweiss] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1691888 ]

LUCENE-6225: Clarify documentation of clone/close in IndexInput.

 Clarify documentation of clone() in IndexInput
 --

 Key: LUCENE-6225
 URL: https://issues.apache.org/jira/browse/LUCENE-6225
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: Trunk

 Attachments: LUCENE-6225.patch


 Here is a snippet from IndexInput's documentation:
 {code}
 The original instance must take care that cloned instances throw 
 AlreadyClosedException when the original one is closed.
 {code}
 But concrete implementations don't throw this AlreadyClosedException (this 
 would break the contract on Closeable). For example, see NIOFSDirectory:
 {code}
 public void close() throws IOException {
   if (!isClone) {
 channel.close();
   }
 }
 {code}
 What trapped me was that the abstract class IndexInput overrides the default 
 implementation of clone(), but doesn't do anything useful... I guess you 
 could make it final and provide the tracking for cloned instances in this 
 class rather than reimplementing it everywhere else (isCloned() would be a 
 superclass method then too). Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[CI] Lucene 5x Linux 64 Test Only - Build # 56636 - Failure!

2015-07-20 Thread build



  
  BUILD FAILURE
  
  Build URLhttp://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56636/
  Project:lucene_linux_java8_64_test_only

  Date of build:Mon, 20 Jul 2015 07:16:22 +0200
  Build duration:1 hr 0 min





	
CHANGES
	
No Changes

  












  





CONSOLE OUTPUT

	[...truncated 204 lines...]

		at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)

		at java.util.concurrent.FutureTask.run(FutureTask.java:262)

		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

		at java.lang.Thread.run(Thread.java:745)

		at ..remote call to ubuntu-14-64-8-metal(Native Method)

		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356)

		at hudson.remoting.UserResponse.retrieve(UserRequest.java:221)

		at hudson.remoting.Channel.call(Channel.java:752)

		at hudson.FilePath.act(FilePath.java:978)

		at hudson.FilePath.act(FilePath.java:967)

		at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:89)

		at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:121)

		at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:138)

		at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:74)

		at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)

		at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)

		at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721)

		at hudson.model.Build$BuildExecution.post2(Build.java:183)

		at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:670)

		at hudson.model.Run.execute(Run.java:1776)

		at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)

		at hudson.model.ResourceController.execute(ResourceController.java:89)

		at hudson.model.Executor.run(Executor.java:240)

	[description-setter] Description set: $BUILD_DESC

	Email was triggered for: Failure - 1st

	Trigger Failure - Any was overridden by another trigger and will not send an email.

	Trigger Failure - Still was overridden by another trigger and will not send an email.

	Sending email for trigger: Failure - 1st








-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{{q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009}}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci

 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 There 

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: terms-how.png
terms-glass.png
terms-angry.png
terms-accumulator.png
solr-mlt-tf-doubling-bug.png
solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png
solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png
solr-mlt-tf-doubling-bug-results.png
buggy-method-usage.png

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Attachments: buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 Let's see what happens when we use {{mintf=15}}:
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!

Let's see what happens when we use {{mintf=15}}:

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!

Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):



Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: 

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!

Let's see what happens when we use {{mintf=15}}:

!solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!

I should probably mention that multiple fields ({{qf}}) work because I applied 
the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].

Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

!terms-glass.png|height=100!

!terms-angry.png|height=100!

!terms-how.png|height=100!

!terms-accumulator.png|height=100!

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 

[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
External issue URL:   (was: 
https://docs.google.com/a/sematext.com/document/d/1oPjxj9dpw-sT2NhVN-HuFmCE_ouyrPNdDQLnCgfiyq8/edit?usp=sharing)

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Attachments: buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 !solr-mlt-tf-doubling-bug.png|height=500!
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 !solr-mlt-tf-doubling-bug-results.png|height=700!
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 !terms-glass.png|height=100!
 !terms-angry.png|height=100!
 !terms-how.png|height=100!
 !terms-accumulator.png|height=100!
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
 Let's see what happens when we use {{mintf=15}}:
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
 I should probably mention that multiple fields ({{qf}}) work because I 
 applied the patch: 
 [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Fix Version/s: 5.2.2

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Fix For: 5.2.2

 Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 !solr-mlt-tf-doubling-bug.png|height=500!
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 !solr-mlt-tf-doubling-bug-results.png|height=700!
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 !terms-glass.png|height=100!
 !terms-angry.png|height=100!
 !terms-how.png|height=100!
 !terms-accumulator.png|height=100!
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
 Let's see what happens when we use {{mintf=15}}:
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
 I should probably mention that multiple fields ({{qf}}) work because I 
 applied the patch: 
 [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-7803) Classloading deadlock in TrieField

2015-07-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved SOLR-7803.
-
Resolution: Fixed

I committed and backported + added backwards layer. In trunk I also removed the 
custom ThreadLocal, {{ThreadLocal#withInitial(FORMAT_PROTOTYPE::clone)}} is 
much more elegant.

If you see other class loading deadlocks in Solr startup, those can be caused 
by concurrent core initialization, which may be broken under certain cases. 
Please open other issues about that.

 Classloading deadlock in TrieField
 --

 Key: SOLR-7803
 URL: https://issues.apache.org/jira/browse/SOLR-7803
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: OSX, JDK8u45
Reporter: Markus Heiden
Assignee: Uwe Schindler
  Labels: patch
 Fix For: 5.3, Trunk

 Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch


 When starting a test Sol instance, it locks up sometimes. We took a thread 
 dump and all threads are trying to load classes via Class.forName() and are 
 stuck in that method. One of these threads got one step further into the 
 clinit of TrieField where it creates an internal static instance of 
 TrieDateField (circular dependency). I don't know why this locks up exactly, 
 but this code smells anyway. So I removed that instance and made the used 
 methods static in TrieDateField.
 This does not completely remove the circular dependency, but at least it is 
 no more in clinit. For the future someone may extract a util class to 
 remove the circular dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6225) Clarify documentation of clone() in IndexInput

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633194#comment-14633194
 ] 

ASF subversion and git services commented on LUCENE-6225:
-

Commit 1691892 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1691892 ]

LUCENE-6225: Clarify documentation of clone/close in IndexInput.

 Clarify documentation of clone() in IndexInput
 --

 Key: LUCENE-6225
 URL: https://issues.apache.org/jira/browse/LUCENE-6225
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 5.3, Trunk

 Attachments: LUCENE-6225.patch


 Here is a snippet from IndexInput's documentation:
 {code}
 The original instance must take care that cloned instances throw 
 AlreadyClosedException when the original one is closed.
 {code}
 But concrete implementations don't throw this AlreadyClosedException (this 
 would break the contract on Closeable). For example, see NIOFSDirectory:
 {code}
 public void close() throws IOException {
   if (!isClone) {
 channel.close();
   }
 }
 {code}
 What trapped me was that the abstract class IndexInput overrides the default 
 implementation of clone(), but doesn't do anything useful... I guess you 
 could make it final and provide the tracking for cloned instances in this 
 class rather than reimplementing it everywhere else (isCloned() would be a 
 superclass method then too). Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6225) Clarify documentation of clone() in IndexInput

2015-07-20 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-6225.
-
Resolution: Fixed

 Clarify documentation of clone() in IndexInput
 --

 Key: LUCENE-6225
 URL: https://issues.apache.org/jira/browse/LUCENE-6225
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 5.3, Trunk

 Attachments: LUCENE-6225.patch


 Here is a snippet from IndexInput's documentation:
 {code}
 The original instance must take care that cloned instances throw 
 AlreadyClosedException when the original one is closed.
 {code}
 But concrete implementations don't throw this AlreadyClosedException (this 
 would break the contract on Closeable). For example, see NIOFSDirectory:
 {code}
 public void close() throws IOException {
   if (!isClone) {
 channel.close();
   }
 }
 {code}
 What trapped me was that the abstract class IndexInput overrides the default 
 implementation of clone(), but doesn't do anything useful... I guess you 
 could make it final and provide the tracking for cloned instances in this 
 class rather than reimplementing it everywhere else (isCloned() would be a 
 superclass method then too). Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7803) Classloading deadlock in TrieField

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633205#comment-14633205
 ] 

ASF subversion and git services commented on SOLR-7803:
---

Commit 1691893 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1691893 ]

SOLR-7803: Prevent class loading deadlock in TrieDateField; refactor date 
formatting and parsing out of TrieDateField and move to static utility class 
DateFormatUtil

 Classloading deadlock in TrieField
 --

 Key: SOLR-7803
 URL: https://issues.apache.org/jira/browse/SOLR-7803
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: OSX, JDK8u45
Reporter: Markus Heiden
Assignee: Uwe Schindler
  Labels: patch
 Fix For: 5.3, Trunk

 Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch


 When starting a test Sol instance, it locks up sometimes. We took a thread 
 dump and all threads are trying to load classes via Class.forName() and are 
 stuck in that method. One of these threads got one step further into the 
 clinit of TrieField where it creates an internal static instance of 
 TrieDateField (circular dependency). I don't know why this locks up exactly, 
 but this code smells anyway. So I removed that instance and made the used 
 methods static in TrieDateField.
 This does not completely remove the circular dependency, but at least it is 
 no more in clinit. For the future someone may extract a util class to 
 remove the circular dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Description: 
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!solr-mlt-tf-doubling-bug.png|height=500!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:

!solr-mlt-tf-doubling-bug-results.png|height=700!

And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):



Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?


  was:
In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
{{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, 
but it doesn't have to be an existing doc.

!!

There are 2 for loops, one inside the other, which both loop through the same 
set of fields.
That effectively doubles the term frequency for all the terms from fields that 
we provide in MLT QP {{qf}} parameter. 
It basically goes two times over the list of fields and accumulates the term 
frequencies from all fields into {{termFreqMap}}.

The private method {{retrieveTerms}} is only called from one public method, the 
version of overloaded method {{like}} that receives a Map: so that private 
class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument 
{{fields}}.
 
Uh, I don't understand what I wrote myself, but that basically means that, by 
the time {{retrieveTerms}} method gets called, its parameter fields and private 
member {{fieldNames}} always contain the same list of fields.

Here's the proof:
These are the final results of the calculation:


And this is the actual {{thread_id:TID0009}} document, where those values were 
derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):

Now, let's further test this hypothesis by seeing MLT QP in action from the 
AdminUI.
Let's try to find docs that are More Like doc {{TID0009}}. 
Here's the interesting part, the query:

{code}
q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
{code}

We just saw, in the last image above, that the term accumulator appears {{7}} 
times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}.
By using {{mintf=14}}, we say that, when calculating similarity, we don't want 
to consider terms that appear less than 14 times (when terms from fields 
{{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
I added the term accumulator in only one other document ({{TID0004}}), where it 
appears only once, in the field {{title_mlt}}. 


Let's see what happens when we use {{mintf=15}}:


Bug, no?



 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Attachments: buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 

[jira] [Commented] (SOLR-7803) Classloading deadlock in TrieField

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633219#comment-14633219
 ] 

ASF subversion and git services commented on SOLR-7803:
---

Commit 1691898 from [~thetaphi] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1691898 ]

Merged revision(s) 1691893 from lucene/dev/trunk:
SOLR-7803: Prevent class loading deadlock in TrieDateField; refactor date 
formatting and parsing out of TrieDateField and move to static utility class 
DateFormatUtil (includes bw layer)

 Classloading deadlock in TrieField
 --

 Key: SOLR-7803
 URL: https://issues.apache.org/jira/browse/SOLR-7803
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: OSX, JDK8u45
Reporter: Markus Heiden
Assignee: Uwe Schindler
  Labels: patch
 Fix For: 5.3, Trunk

 Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch


 When starting a test Sol instance, it locks up sometimes. We took a thread 
 dump and all threads are trying to load classes via Class.forName() and are 
 stuck in that method. One of these threads got one step further into the 
 clinit of TrieField where it creates an internal static instance of 
 TrieDateField (circular dependency). I don't know why this locks up exactly, 
 but this code smells anyway. So I removed that instance and made the used 
 methods static in TrieDateField.
 This does not completely remove the circular dependency, but at least it is 
 no more in clinit. For the future someone may extract a util class to 
 remove the circular dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Flags: Patch,Important
Lucene Fields: New,Patch Available  (was: New)

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Attachments: buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 !solr-mlt-tf-doubling-bug.png|height=500!
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 !solr-mlt-tf-doubling-bug-results.png|height=700!
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 !terms-glass.png|height=100!
 !terms-angry.png|height=100!
 !terms-how.png|height=100!
 !terms-accumulator.png|height=100!
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
 Let's see what happens when we use {{mintf=15}}:
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
 I should probably mention that multiple fields ({{qf}}) work because I 
 applied the patch: 
 [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug

2015-07-20 Thread Marko Bonaci (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marko Bonaci updated LUCENE-6687:
-
Attachment: LUCENE-6687.patch

 MLT term frequency calculation bug
 --

 Key: LUCENE-6687
 URL: https://issues.apache.org/jira/browse/LUCENE-6687
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/query/scoring, core/queryparser
Affects Versions: 5.2.1, Trunk
 Environment: OS X v10.10.4; Solr 5.2.1
Reporter: Marko Bonaci
 Attachments: LUCENE-6687.patch, buggy-method-usage.png, 
 solr-mlt-tf-doubling-bug-results.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, 
 solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, 
 solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, 
 terms-glass.png, terms-how.png


 In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method 
 {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document 
 basically, but it doesn't have to be an existing doc.
 !solr-mlt-tf-doubling-bug.png|height=500!
 There are 2 for loops, one inside the other, which both loop through the same 
 set of fields.
 That effectively doubles the term frequency for all the terms from fields 
 that we provide in MLT QP {{qf}} parameter. 
 It basically goes two times over the list of fields and accumulates the term 
 frequencies from all fields into {{termFreqMap}}.
 The private method {{retrieveTerms}} is only called from one public method, 
 the version of overloaded method {{like}} that receives a Map: so that 
 private class member {{fieldNames}} is always derived from 
 {{retrieveTerms}}'s argument {{fields}}.
  
 Uh, I don't understand what I wrote myself, but that basically means that, by 
 the time {{retrieveTerms}} method gets called, its parameter fields and 
 private member {{fieldNames}} always contain the same list of fields.
 Here's the proof:
 These are the final results of the calculation:
 !solr-mlt-tf-doubling-bug-results.png|height=700!
 And this is the actual {{thread_id:TID0009}} document, where those values 
 were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}):
 !terms-glass.png|height=100!
 !terms-angry.png|height=100!
 !terms-how.png|height=100!
 !terms-accumulator.png|height=100!
 Now, let's further test this hypothesis by seeing MLT QP in action from the 
 AdminUI.
 Let's try to find docs that are More Like doc {{TID0009}}. 
 Here's the interesting part, the query:
 {code}
 q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009
 {code}
 We just saw, in the last image above, that the term accumulator appears {{7}} 
 times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as 
 {{14}}.
 By using {{mintf=14}}, we say that, when calculating similarity, we don't 
 want to consider terms that appear less than 14 times (when terms from fields 
 {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}.
 I added the term accumulator in only one other document ({{TID0004}}), where 
 it appears only once, in the field {{title_mlt}}. 
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500!
 Let's see what happens when we use {{mintf=15}}:
 !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500!
 I should probably mention that multiple fields ({{qf}}) work because I 
 applied the patch: 
 [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143].
 Bug, no?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6688) Apply deletes by query using the Query API instead of the Filter API

2015-07-20 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-6688:


 Summary: Apply deletes by query using the Query API instead of the 
Filter API
 Key: LUCENE-6688
 URL: https://issues.apache.org/jira/browse/LUCENE-6688
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


BufferedUpdatesStream still uses QueryWrapperFilter to delete documents by 
query instead of the Weight/Scorer APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6674) J9 assertion / crash in tests

2015-07-20 Thread Brijesh Nekkare (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633451#comment-14633451
 ] 

Brijesh Nekkare commented on LUCENE-6674:
-

We would require the following diagnostics created during the assertion failure 
to root cause this issue :
core.20150710.084504.3376.0001.dmp, 
javacore.20150710.084504.3376.0002.txt and 
Snap.20150710.084504.3376.0003.trc

Thanks and Regards
Brijesh Nekkare
IBM JRE team



 J9 assertion / crash in tests
 -

 Key: LUCENE-6674
 URL: https://issues.apache.org/jira/browse/LUCENE-6674
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 {quote}
06:45:04.031 0x2518500j9mm.107*   ** ASSERTION FAILED ** at 
 ParallelScavenger.cpp:3053: ((false  
 (_extensions-objectModel.isRemembered(objectPtr
 {quote}
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/55153/consoleFull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_60-ea-b21) - Build # 13532 - Still Failing!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13532/
Java: 32bit/jdk1.8.0_60-ea-b21 -server -XX:+UseParallelGC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestSolrConfigHandlerCloud

Error Message:
ERROR: SolrIndexSearcher opens=281 closes=280

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=281 closes=280
at __randomizedtesting.SeedInfo.seed([58EABF133078A799]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:465)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:232)
at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:745)


FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestSolrConfigHandlerCloud

Error Message:
2 threads leaked from SUITE scope at 
org.apache.solr.handler.TestSolrConfigHandlerCloud: 1) Thread[id=2350, 
name=qtp25410446-2350, state=RUNNABLE, group=TGRP-TestSolrConfigHandlerCloud]   
  at java.util.WeakHashMap.get(WeakHashMap.java:403) at 
org.apache.solr.servlet.cache.HttpCacheHeaderUtil.calcEtag(HttpCacheHeaderUtil.java:101)
 at 
org.apache.solr.servlet.cache.HttpCacheHeaderUtil.doCacheHeaderValidation(HttpCacheHeaderUtil.java:219)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:106)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83)
 at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) 
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)  
   at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
 at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)   
  at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
 at 

Re: [CI] Lucene 5x Linux 64 Test Only - Build # 56476 - Failure!

2015-07-20 Thread Adrien Grand
I committed a fix for it. The test expected a bounded number of segments
but did nothing to ensure it.

I had to run the test several times to manage to reproduce it because it
depended on the number of segments in the index, which depended on whether
a concurrent merge was finished or not at the time when the index reader
was open.

On Sun, Jul 19, 2015 at 1:51 AM, bu...@elastic.co wrote:

   *BUILD FAILURE*
 Build URL
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/
 Project:lucene_linux_java8_64_test_only Randomization: 
 JDK8,local,heap[740m],-server
 +UseG1GC +UseCompressedOops,sec manager on Date of build:Sun, 19 Jul 2015
 01:44:10 +0200 Build duration:7 min 30 sec
  *CHANGES* No Changes
  *BUILD ARTIFACTS*
 checkout/lucene/build/facet/test/temp/junit4-J0-20150719_015122_803.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J0-20150719_015122_803.events
 checkout/lucene/build/facet/test/temp/junit4-J1-20150719_015122_804.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J1-20150719_015122_804.events
 checkout/lucene/build/facet/test/temp/junit4-J2-20150719_015122_808.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J2-20150719_015122_808.events
 checkout/lucene/build/facet/test/temp/junit4-J3-20150719_015122_809.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J3-20150719_015122_809.events
 checkout/lucene/build/facet/test/temp/junit4-J4-20150719_015122_809.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J4-20150719_015122_809.events
 checkout/lucene/build/facet/test/temp/junit4-J5-20150719_015122_809.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J5-20150719_015122_809.events
 checkout/lucene/build/facet/test/temp/junit4-J6-20150719_015122_809.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J6-20150719_015122_809.events
 checkout/lucene/build/facet/test/temp/junit4-J7-20150719_015122_809.events
 http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J7-20150719_015122_809.events
  *FAILED JUNIT TESTS* Name: org.apache.lucene.facet Failed: 1 test(s),
 Passed: 22 test(s), Skipped: 0 test(s), Total: 23 test(s)
 *Failed:
 org.apache.lucene.facet.TestRandomSamplingFacetsCollector.testRandomSampling
 *
  *CONSOLE OUTPUT* [...truncated 8655 lines...] [junit4]  [junit4]
 [junit4] JVM J0: 0.88 .. 14.26 = 13.38s [junit4] JVM J1: 0.90 .. 13.71 =
 12.81s [junit4] JVM J2: 0.87 .. 15.17 = 14.30s [junit4] JVM J3: 0.87 ..
 9.98 = 9.11s [junit4] JVM J4: 1.11 .. 12.64 = 11.53s [junit4] JVM J5:
 0.87 .. 9.98 = 9.11s [junit4] JVM J6: 1.11 .. 11.94 = 10.83s [junit4] JVM
 J7: 0.87 .. 11.66 = 10.80s [junit4] Execution time total: 15 seconds [junit4]
 Tests summary: 23 suites, 155 tests, 1 error BUILD FAILED 
 /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/build.xml:469:
 The following error occurred while executing this line: 
 /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/common-build.xml:2240:
 The following error occurred while executing this line: 
 /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/module-build.xml:58:
 The following error occurred while executing this line: 
 /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/common-build.xml:1444:
 The following error occurred while executing this line: 
 /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/common-build.xml:999:
 There were test failures: 23 suites, 155 tests, 1 error Total time: 7
 minutes 10 seconds Build step 'Invoke Ant' marked build as failure Archiving
 artifacts Recording test results [description-setter] Description set:
 JDK8,local,heap[740m],-server +UseG1GC +UseCompressedOops,sec manager on Email
 was triggered for: Failure - 1st Trigger Failure - Any was overridden by
 another trigger and will not send an email. Trigger Failure - Still was
 overridden by another trigger and will not send an email. Sending email
 for trigger: Failure - 1st


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Adrien


[jira] [Updated] (SOLR-7815) Remove LuceneQueryOptimizer

2015-07-20 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated SOLR-7815:
---
Attachment: SOLR-7815.patch

Here is a patch.

 Remove LuceneQueryOptimizer
 ---

 Key: SOLR-7815
 URL: https://issues.apache.org/jira/browse/SOLR-7815
 Project: Solr
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: SOLR-7815.patch


 I noticed that I introduced a bug in this class when refactoring BooleanQuery 
 to be immutable (using the builder as a cache key instead of the query 
 itself). But then I noticed that this class is actually never used, so let's 
 remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7815) Remove LuceneQueryOptimizer

2015-07-20 Thread Adrien Grand (JIRA)
Adrien Grand created SOLR-7815:
--

 Summary: Remove LuceneQueryOptimizer
 Key: SOLR-7815
 URL: https://issues.apache.org/jira/browse/SOLR-7815
 Project: Solr
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


I noticed that I introduced a bug in this class when refactoring BooleanQuery 
to be immutable (using the builder as a cache key instead of the query itself). 
But then I noticed that this class is actually never used, so let's remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6688) Apply deletes by query using the Query API instead of the Filter API

2015-07-20 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-6688:
-
Attachment: LUCENE-6688.patch

Here is a patch.

 Apply deletes by query using the Query API instead of the Filter API
 

 Key: LUCENE-6688
 URL: https://issues.apache.org/jira/browse/LUCENE-6688
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6688.patch


 BufferedUpdatesStream still uses QueryWrapperFilter to delete documents by 
 query instead of the Weight/Scorer APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7692) Implement BasicAuth based impl for the new Authentication/Authorization APIs

2015-07-20 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-7692:
-
Attachment: SOLR-7692.patch

I plan to commit this pretty soon. All inputs/comments are welcome

 Implement BasicAuth based impl for the new Authentication/Authorization APIs
 

 Key: SOLR-7692
 URL: https://issues.apache.org/jira/browse/SOLR-7692
 Project: Solr
  Issue Type: New Feature
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch, 
 SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch


 This involves various components
 h2. Authentication
 A basic auth based authentication filter. This should retrieve the user 
 credentials from ZK.  The user name and sha1 hash of password should be 
 stored in ZK
 sample authentication json 
 {code:javascript}
 {
   authentication:{
 class: solr.BasicAuthPlugin,
 users :{
   john :09fljnklnoiuy98 buygujkjnlk,
   david:f678njfgfjnklno iuy9865ty,
   pete: 87ykjnklndfhjh8 98uyiy98,
}
   }
 }
 {code}
 h2. authorization plugin
 This would store the roles of various users and their privileges in ZK
 sample authorization.json
 {code:javascript}
 {
   authorization: {
 class: solr.ZKAuthorization,
roles :{
   admin : [john]
   guest : [john, david,pete]
}
 permissions: {
collection-edit: {
  role: admin 
},
coreadmin:{
  role:admin
},
config-edit: {
  //all collections
  role: admin,
  method:POST
},
schema-edit: {
  roles: admin,
  method:POST
},
update: {
  //all collections
  role: dev
},
   mycoll_update: {
 collection: mycoll,
 path:[/update/*],
 role: [somebody]
   }
 }
   }
 }
 {code} 
 We will also need to provide APIs to create users and assign them roles



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request: Fix incorrect link to Levenshtein distan...

2015-07-20 Thread Xaerxess
GitHub user Xaerxess opened a pull request:

https://github.com/apache/lucene-solr/pull/190

Fix incorrect link to Levenshtein distance

This is a small fix in documentation, please let me know if Github's pull 
request is sufficient for merging into trunk.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Xaerxess/lucene-solr patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/190.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #190


commit 09a46c967f3751b20f565b4b9ca54a6c2da6cbb5
Author: Grzegorz Rożniecki xaerx...@gmail.com
Date:   2015-07-20T11:47:44Z

Fix incorrect link to Levenshtein distance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_60-ea-b21) - Build # 13533 - Still Failing!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13533/
Java: 32bit/jdk1.8.0_60-ea-b21 -client -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest

Error Message:
There are still nodes recoverying - waited for 330 seconds

Stack Trace:
java.lang.AssertionError: There are still nodes recoverying - waited for 330 
seconds
at 
__randomizedtesting.SeedInfo.seed([13BE54822BF4B54F:B4FAEC26464FA6F6]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:172)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:133)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:128)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForRecoveriesToFinish(BaseCdcrDistributedZkTest.java:465)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.clearSourceCollection(BaseCdcrDistributedZkTest.java:319)
at 
org.apache.solr.cloud.CdcrReplicationHandlerTest.doTestPartialReplicationAfterPeerSync(CdcrReplicationHandlerTest.java:158)
at 
org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest(CdcrReplicationHandlerTest.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (SOLR-7810) mapreduce contrib script to set classpath for convenience refers to example rather than server.

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633626#comment-14633626
 ] 

ASF subversion and git services commented on SOLR-7810:
---

Commit 1691947 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1691947 ]

SOLR-7810: map-reduce contrib script to set classpath for convenience refers to 
example rather than server.

 mapreduce contrib script to set classpath for convenience refers to  example 
 rather than server.
 

 Key: SOLR-7810
 URL: https://issues.apache.org/jira/browse/SOLR-7810
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7810) mapreduce contrib script to set classpath for convenience refers to example rather than server.

2015-07-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633622#comment-14633622
 ] 

ASF subversion and git services commented on SOLR-7810:
---

Commit 1691946 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1691946 ]

SOLR-7810: map-reduce contrib script to set classpath for convenience refers to 
example rather than server.

 mapreduce contrib script to set classpath for convenience refers to  example 
 rather than server.
 

 Key: SOLR-7810
 URL: https://issues.apache.org/jira/browse/SOLR-7810
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-7810) mapreduce contrib script to set classpath for convenience refers to example rather than server.

2015-07-20 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-7810.
---
   Resolution: Fixed
Fix Version/s: Trunk
   5.3

 mapreduce contrib script to set classpath for convenience refers to  example 
 rather than server.
 

 Key: SOLR-7810
 URL: https://issues.apache.org/jira/browse/SOLR-7810
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.3, Trunk






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b60) - Build # 13534 - Still Failing!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13534/
Java: 32bit/jdk1.9.0-ea-b60 -client -XX:+UseG1GC -Djava.locale.providers=JRE,SPI

1 tests failed.
FAILED:  org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test

Error Message:
this writer hit an unrecoverable error; cannot commit

Stack Trace:
java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot 
commit
at 
__randomizedtesting.SeedInfo.seed([58B144653E398529:D0E57BBF90C5E8D1]:0)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2777)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2963)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1066)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1109)
at 
org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test(TestIndexWriterOutOfFileDescriptors.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:502)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: a random IOException (_1c.nvd)
at 
org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOExceptionOnOpen(MockDirectoryWrapper.java:458)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:635)
at 

[jira] [Commented] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components

2015-07-20 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633715#comment-14633715
 ] 

Shawn Heisey commented on LUCENE-6689:
--

The reason that phrase searches don't match after LUCENE-5111 is that the query 
analysis on my real fieldType is slightly different -- catenateWords, 
catenateNumbers, and preserveOriginal are all disabled on the query analysis.  
With those settings and the previously given input of aaa-bbb: ccc, aaa ends 
up at position 1 and bbb at position 2, which is not the same as the index 
analysis with the settings above.

 Odd analysis problem with WDF, appears to be triggered by preceding analysis 
 components
 ---

 Key: LUCENE-6689
 URL: https://issues.apache.org/jira/browse/LUCENE-6689
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Shawn Heisey

 This problem shows up for me in Solr, but I believe the issue is down at the 
 Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
 if necessary.
 I've boiled the problem down to this minimum Solr fieldType:
 {noformat}
 fieldType name=testType class=solr.TextField
 sortMissingLast=true positionIncrementGap=100
   analyzer
 tokenizer
 class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
 rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
 filter class=solr.PatternReplaceFilterFactory
   pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
   replacement=$2
 /
 filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange=1
   splitOnNumerics=1
   stemEnglishPossessive=1
   generateWordParts=1
   generateNumberParts=1
   catenateWords=1
   catenateNumbers=1
   catenateAll=0
   preserveOriginal=1
 /
   /analyzer
 /fieldType
 {noformat}
 On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up 
 at term position 1 and bbb at term position 2.  This seems perfectly 
 reasonable to me.  In Solr 4.9, both terms end up at position 2.  This causes 
 phrase queries which used to work to return zero hits.  The exact text of the 
 phrase query is in the original documents that match on 4.7.
 If the custom rbbi (which is included unmodified from the lucene icu analysis 
 source code) is not used, then the problem doesn't happen, because the 
 punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory 
 is not present, then the problem doesn't happen.
 I can work around the problem by setting luceneMatchVersion to 4.7, but I 
 think the behavior is a bug, and I would rather not continue to use 4.7 
 analysis when I upgrade to 5.x, which I hope to do soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6690) Speed up MultiTermsEnum.next()

2015-07-20 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-6690:
-
Attachment: OrdinalMapBuildBench.java

Here is the benchmark I've been using. It's certainly not great but I don't 
think it's too bad either. :)

 Speed up MultiTermsEnum.next()
 --

 Key: LUCENE-6690
 URL: https://issues.apache.org/jira/browse/LUCENE-6690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: OrdinalMapBuildBench.java


 OrdinalMap is very useful when computing top terms on a multi-index segment. 
 However I've seen it being occasionally slow to build, which was either 
 making facets (when the ordinals map is computed lazily) or reopen (when 
 computed eagerly) slow. So out of curiosity, I tried to profile ordinal map 
 building on a simple index: 10M random strings of length between 0 and 20 
 stored as a SORTED doc values field. The index has 19 segments. The 
 bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef 
 comparisons (UTF8SortedAsUnicodeComparator).
 MultiTermsEnum stores sub enums in two different places:
  - top: a simple array containing all enums on the current term
  - queue: a queue for enums that are not exhausted yet but beyond the current 
 term.
 A non-exhausted enum is in exactly one of these data-structures. When moving 
 to the next term, MultiTermsEnum advances all enums in {{top}}, then adds 
 them to {{queue}} and finally, pops all enum that are on the same term back 
 into {{top}}.
 We could save reorderings of the priority queue by not removing entries from 
 the priority queue and then calling updateTop to advance enums which are on 
 the current term. This is already what we do for disjunctions of doc IDs in 
 DISIPriorityQueue.
 On the index described above and current trunk, building an OrdinalMap has to 
 call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. 
 With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 
 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 908 - Still Failing

2015-07-20 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/908/

No tests ran.

Build Log:
[...truncated 10864 lines...]
FATAL: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: 
Ping started on 1437416070507 hasn't completed at 1437416310507
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: 
Ping started on 1437416070507 hasn't completed at 1437416310507
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:742)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy59.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
at hudson.Launcher$ProcStarter.join(Launcher.java:367)
at hudson.tasks.Ant.perform(Ant.java:217)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
at hudson.model.Build$BuildExecution.build(Build.java:198)
at hudson.model.Build$BuildExecution.doRun(Build.java:159)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
at hudson.model.Run.execute(Run.java:1706)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:232)
Caused by: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: 
Ping started on 1437416070507 hasn't completed at 1437416310507
at hudson.remoting.Request.abort(Request.java:299)
at hudson.remoting.Channel.terminate(Channel.java:805)
at hudson.remoting.Channel$CloseCommand.execute(Channel.java:954)
at hudson.remoting.Channel$2.handle(Channel.java:474)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60)
Caused by: hudson.remoting.Channel$OrderlyShutdown: 
java.util.concurrent.TimeoutException: Ping started on 1437416070507 hasn't 
completed at 1437416310507
... 3 more
Caused by: Command close created at
at hudson.remoting.Command.init(Command.java:56)
at hudson.remoting.Channel$CloseCommand.init(Channel.java:948)
at hudson.remoting.Channel$CloseCommand.init(Channel.java:946)
at hudson.remoting.Channel.close(Channel.java:1029)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1437416070507 
hasn't completed at 1437416310507
... 2 more



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6690) Speed up MultiTermsEnum.next()

2015-07-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633764#comment-14633764
 ] 

Uwe Schindler commented on LUCENE-6690:
---

Good idea! :-)

 Speed up MultiTermsEnum.next()
 --

 Key: LUCENE-6690
 URL: https://issues.apache.org/jira/browse/LUCENE-6690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6690.patch, OrdinalMapBuildBench.java


 OrdinalMap is very useful when computing top terms on a multi-index segment. 
 However I've seen it being occasionally slow to build, which was either 
 making facets (when the ordinals map is computed lazily) or reopen (when 
 computed eagerly) slow. So out of curiosity, I tried to profile ordinal map 
 building on a simple index: 10M random strings of length between 0 and 20 
 stored as a SORTED doc values field. The index has 19 segments. The 
 bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef 
 comparisons (UTF8SortedAsUnicodeComparator).
 MultiTermsEnum stores sub enums in two different places:
  - top: a simple array containing all enums on the current term
  - queue: a queue for enums that are not exhausted yet but beyond the current 
 term.
 A non-exhausted enum is in exactly one of these data-structures. When moving 
 to the next term, MultiTermsEnum advances all enums in {{top}}, then adds 
 them to {{queue}} and finally, pops all enum that are on the same term back 
 into {{top}}.
 We could save reorderings of the priority queue by not removing entries from 
 the priority queue and then calling updateTop to advance enums which are on 
 the current term. This is already what we do for disjunctions of doc IDs in 
 DISIPriorityQueue.
 On the index described above and current trunk, building an OrdinalMap has to 
 call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. 
 With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 
 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components

2015-07-20 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated LUCENE-6689:
-
Description: 
This problem shows up for me in Solr, but I believe the issue is down at the 
Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
if necessary.

I've boiled the problem down to this minimum Solr fieldType:

{noformat}
fieldType name=testType class=solr.TextField
sortMissingLast=true positionIncrementGap=100
  analyzer type=index
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1
/
  /analyzer
  analyzer type=query
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=0
  catenateNumbers=0
  catenateAll=0
  preserveOriginal=0
/
  /analyzer
/fieldType
{noformat}

On Solr 4.7, if this type is given the input aaa-bbb: ccc then index analysis 
puts aaa at term position 1 and bbb at term position 2.  This seems perfectly 
reasonable to me.  In Solr 4.9, both terms end up at position 2.  This causes 
phrase queries which used to work to return zero hits.  The exact text of the 
phrase query is in the original documents that match on 4.7.

If the custom rbbi (which is included unmodified from the lucene icu analysis 
source code) is not used, then the problem doesn't happen, because the 
punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory is 
not present, then the problem doesn't happen.

I can work around the problem by setting luceneMatchVersion to 4.7, but I think 
the behavior is a bug, and I would rather not continue to use 4.7 analysis when 
I upgrade to 5.x, which I hope to do soon.

Whether luceneMatchversion is LUCENE_47 or LUCENE_4_9, query analysis puts aaa 
at term position 1 and bbb at term position 2.

  was:
This problem shows up for me in Solr, but I believe the issue is down at the 
Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
if necessary.

I've boiled the problem down to this minimum Solr fieldType:

{noformat}
fieldType name=testType class=solr.TextField
sortMissingLast=true positionIncrementGap=100
  analyzer type=index
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1
/
  /analyzer
  analyzer type=query
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=0
  catenateNumbers=0
  catenateAll=0
  preserveOriginal=0
/
  /analyzer
/fieldType
{noformat}

On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at 
term position 1 and bbb at term position 2.  This seems perfectly reasonable to 
me.  In Solr 4.9, both terms end up at position 2.  This causes phrase queries 
which used to work to return zero hits.  The exact text of the phrase query is 
in the original documents that match on 4.7.

If the custom rbbi (which is included unmodified from the lucene icu analysis 
source code) is not used, 

[jira] [Commented] (SOLR-6234) Scoring modes for query time join

2015-07-20 Thread Timothy Potter (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633843#comment-14633843
 ] 

Timothy Potter commented on SOLR-6234:
--

looks good [~mkhludnev] +1 to commit. Please be sure to add documentation for 
this new feature to the refguide. I'll add a separate unit test that uses this 
feature to verify SOLR-6357 once this is committed.

 Scoring modes for query time join 
 --

 Key: SOLR-6234
 URL: https://issues.apache.org/jira/browse/SOLR-6234
 Project: Solr
  Issue Type: New Feature
  Components: query parsers
Affects Versions: 5.3
Reporter: Mikhail Khludnev
Assignee: Timothy Potter
  Labels: features, patch, test
 Fix For: 5.3

 Attachments: SOLR-6234.patch, SOLR-6234.patch, SOLR-6234.patch, 
 SOLR-6234.patch, otherHandler.patch


 it adds ability to call Lucene's JoinUtil by specifying local param, ie  
 \{!join score=...} It supports:
 - {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil)
 - -supports {{b=100}} param to pass {{Query.setBoost()}}- postponed till 
 SOLR-7814.
 - -{{multiVals=true|false}} is introduced- YAGNI, let me know otherwise. 
 - there is a test coverage for cross core join case. 
 - so far it joins string and multivalue string fields (Sorted, SortedSet, 
 Binary), but not Numerics DVs. follow-up LUCENE-5868  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6690) Speed up MultiTermsEnum.next()

2015-07-20 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-6690:


 Summary: Speed up MultiTermsEnum.next()
 Key: LUCENE-6690
 URL: https://issues.apache.org/jira/browse/LUCENE-6690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


OrdinalMap is very useful when computing top terms on a multi-index segment. 
However I've seen it being occasionally slow to build, which was either making 
facets (when the ordinals map is computed lazily) or reopen (when computed 
eagerly) slow. So out of curiosity, I tried to profile ordinal map building on 
a simple index: 10M random strings of length between 0 and 20 stored as a 
SORTED doc values field. The index has 19 segments. The bottleneck was 
MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons 
(UTF8SortedAsUnicodeComparator).

MultiTermsEnum stores sub enums in two different places:
 - top: a simple array containing all enums on the current term
 - queue: a queue for enums that are not exhausted yet but beyond the current 
term.

A non-exhausted enum is in exactly one of these data-structures. When moving to 
the next term, MultiTermsEnum advances all enums in {{top}}, then adds them to 
{{queue}} and finally, pops all enum that are on the same term back into 
{{top}}.

We could save reorderings of the priority queue by not removing entries from 
the priority queue and then calling updateTop to advance enums which are on the 
current term. This is already what we do for disjunctions of doc IDs in 
DISIPriorityQueue.

On the index described above and current trunk, building an OrdinalMap has to 
call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. 
With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, 
BytesRef.equals 16297638 times and runs in 1.4s (~26% faster).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components

2015-07-20 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated LUCENE-6689:
-
Description: 
This problem shows up for me in Solr, but I believe the issue is down at the 
Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
if necessary.

I've boiled the problem down to this minimum Solr fieldType:

{noformat}
fieldType name=testType class=solr.TextField
sortMissingLast=true positionIncrementGap=100
  analyzer type=index
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1
/
  /analyzer
  analyzer type=query
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=0
  catenateNumbers=0
  catenateAll=0
  preserveOriginal=0
/
  /analyzer
/fieldType
{noformat}

On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at 
term position 1 and bbb at term position 2.  This seems perfectly reasonable to 
me.  In Solr 4.9, both terms end up at position 2.  This causes phrase queries 
which used to work to return zero hits.  The exact text of the phrase query is 
in the original documents that match on 4.7.

If the custom rbbi (which is included unmodified from the lucene icu analysis 
source code) is not used, then the problem doesn't happen, because the 
punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory is 
not present, then the problem doesn't happen.

I can work around the problem by setting luceneMatchVersion to 4.7, but I think 
the behavior is a bug, and I would rather not continue to use 4.7 analysis when 
I upgrade to 5.x, which I hope to do soon.


  was:
This problem shows up for me in Solr, but I believe the issue is down at the 
Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
if necessary.

I've boiled the problem down to this minimum Solr fieldType:

{noformat}
fieldType name=testType class=solr.TextField
sortMissingLast=true positionIncrementGap=100
  analyzer
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1
/
  /analyzer
/fieldType
{noformat}

On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at 
term position 1 and bbb at term position 2.  This seems perfectly reasonable to 
me.  In Solr 4.9, both terms end up at position 2.  This causes phrase queries 
which used to work to return zero hits.  The exact text of the phrase query is 
in the original documents that match on 4.7.

If the custom rbbi (which is included unmodified from the lucene icu analysis 
source code) is not used, then the problem doesn't happen, because the 
punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory is 
not present, then the problem doesn't happen.

I can work around the problem by setting luceneMatchVersion to 4.7, but I think 
the behavior is a bug, and I would rather not continue to use 4.7 analysis when 
I upgrade to 5.x, which I hope to do soon.



 Odd analysis problem with WDF, appears to be triggered by preceding analysis 
 components
 ---

 Key: LUCENE-6689
 URL: https://issues.apache.org/jira/browse/LUCENE-6689
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Shawn Heisey

 

[jira] [Issue Comment Deleted] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components

2015-07-20 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated LUCENE-6689:
-
Comment: was deleted

(was: The reason that phrase searches don't match after LUCENE-5111 is that the 
query analysis on my real fieldType is slightly different -- catenateWords, 
catenateNumbers, and preserveOriginal are all disabled on the query analysis.  
With those settings and the previously given input of aaa-bbb: ccc, aaa ends 
up at position 1 and bbb at position 2, which is not the same as the index 
analysis with the settings above.)

 Odd analysis problem with WDF, appears to be triggered by preceding analysis 
 components
 ---

 Key: LUCENE-6689
 URL: https://issues.apache.org/jira/browse/LUCENE-6689
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Shawn Heisey

 This problem shows up for me in Solr, but I believe the issue is down at the 
 Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
 if necessary.
 I've boiled the problem down to this minimum Solr fieldType:
 {noformat}
 fieldType name=testType class=solr.TextField
 sortMissingLast=true positionIncrementGap=100
   analyzer type=index
 tokenizer
 class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
 rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
 filter class=solr.PatternReplaceFilterFactory
   pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
   replacement=$2
 /
 filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange=1
   splitOnNumerics=1
   stemEnglishPossessive=1
   generateWordParts=1
   generateNumberParts=1
   catenateWords=1
   catenateNumbers=1
   catenateAll=0
   preserveOriginal=1
 /
   /analyzer
   analyzer type=query
 tokenizer
 class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
 rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
 filter class=solr.PatternReplaceFilterFactory
   pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
   replacement=$2
 /
 filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange=1
   splitOnNumerics=1
   stemEnglishPossessive=1
   generateWordParts=1
   generateNumberParts=1
   catenateWords=0
   catenateNumbers=0
   catenateAll=0
   preserveOriginal=0
 /
   /analyzer
 /fieldType
 {noformat}
 On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up 
 at term position 1 and bbb at term position 2.  This seems perfectly 
 reasonable to me.  In Solr 4.9, both terms end up at position 2.  This causes 
 phrase queries which used to work to return zero hits.  The exact text of the 
 phrase query is in the original documents that match on 4.7.
 If the custom rbbi (which is included unmodified from the lucene icu analysis 
 source code) is not used, then the problem doesn't happen, because the 
 punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory 
 is not present, then the problem doesn't happen.
 I can work around the problem by setting luceneMatchVersion to 4.7, but I 
 think the behavior is a bug, and I would rather not continue to use 4.7 
 analysis when I upgrade to 5.x, which I hope to do soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6690) Speed up MultiTermsEnum.next()

2015-07-20 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-6690:
-
Attachment: LUCENE-6690.patch

And here is the patch.

 Speed up MultiTermsEnum.next()
 --

 Key: LUCENE-6690
 URL: https://issues.apache.org/jira/browse/LUCENE-6690
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6690.patch, OrdinalMapBuildBench.java


 OrdinalMap is very useful when computing top terms on a multi-index segment. 
 However I've seen it being occasionally slow to build, which was either 
 making facets (when the ordinals map is computed lazily) or reopen (when 
 computed eagerly) slow. So out of curiosity, I tried to profile ordinal map 
 building on a simple index: 10M random strings of length between 0 and 20 
 stored as a SORTED doc values field. The index has 19 segments. The 
 bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef 
 comparisons (UTF8SortedAsUnicodeComparator).
 MultiTermsEnum stores sub enums in two different places:
  - top: a simple array containing all enums on the current term
  - queue: a queue for enums that are not exhausted yet but beyond the current 
 term.
 A non-exhausted enum is in exactly one of these data-structures. When moving 
 to the next term, MultiTermsEnum advances all enums in {{top}}, then adds 
 them to {{queue}} and finally, pops all enum that are on the same term back 
 into {{top}}.
 We could save reorderings of the priority queue by not removing entries from 
 the priority queue and then calling updateTop to advance enums which are on 
 the current term. This is already what we do for disjunctions of doc IDs in 
 DISIPriorityQueue.
 On the index described above and current trunk, building an OrdinalMap has to 
 call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. 
 With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 
 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7812) Need a playground to quickly test analyzer stacks

2015-07-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634092#comment-14634092
 ] 

Hoss Man commented on SOLR-7812:


this is already mostly possible with the ManagedSchema and Schema API -- 
there's just no slick UI around it.

* create a collection for doing experiments in
* iterate over...
** use the Schema API to (re)define a field type with the index/query analyzers 
you want to experiment with
** iterate over...
*** use the Analysis handlers to sanity check that various inputs behave the 
way you think they should
** index some test documents
** iterate over...
*** execute various queries to see what results you get and if you are happy
** delete all docs
* delete the experiment collection


 Need a playground to quickly test analyzer stacks
 -

 Key: SOLR-7812
 URL: https://issues.apache.org/jira/browse/SOLR-7812
 Project: Solr
  Issue Type: Wish
  Components: Schema and Analysis
Reporter: Alexandre Rafalovitch
Priority: Minor
  Labels: analyzers, beginners, usability

 (from email by Robert Oschler)
 (Would be useful to have)... a convenient playground for testing index and 
 query filters?
 I'm  imagining a utility where you can select a set of index and query
 filters, and then enter  a string as a test document and a query string
 and see what kind of scores come back during a matching attempt.  This
 would be a big aid in crafting an indexing/query scheme to get the desired
 matching profile working.  Otherwise the only technique I can think of is
 to iteratively modify the schema file and retest with the admin panel with
 each combination of filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6668) Optimize SortedSet/SortedNumeric storage for the few unique sets use-case

2015-07-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633964#comment-14633964
 ] 

Robert Muir commented on LUCENE-6668:
-

+1, nice to have TABLE applied to the other types here too!

 Optimize SortedSet/SortedNumeric storage for the few unique sets use-case
 -

 Key: LUCENE-6668
 URL: https://issues.apache.org/jira/browse/LUCENE-6668
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6668.patch, LUCENE-6668.patch


 Robert suggested this idea: if there are few unique sets of values, we could 
 build a lookup table and then map each doc to an ord in this table, just like 
 we already do for table compression for numerics.
 I think this is especially compelling given that SortedSet/SortedNumeric are 
 our two only doc values types that use O(maxDoc) memory because of the 
 offsets map. When this new strategy is used, memory usage could be bounded to 
 a constant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[CI] Lucene 5x Linux 64 Test Only - Build # 56712 - Failure!

2015-07-20 Thread build



  
  BUILD FAILURE
  
  Build URLhttp://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56712/
  Project:lucene_linux_java8_64_test_only

  Date of build:Mon, 20 Jul 2015 20:14:05 +0200
  Build duration:1 hr 0 min





	
CHANGES
	
No Changes

  












  





CONSOLE OUTPUT

	[...truncated 199 lines...]

		at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)

		at java.util.concurrent.FutureTask.run(FutureTask.java:262)

		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

		at java.lang.Thread.run(Thread.java:745)

		at ..remote call to ubuntu-14-64-8-metal(Native Method)

		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356)

		at hudson.remoting.UserResponse.retrieve(UserRequest.java:221)

		at hudson.remoting.Channel.call(Channel.java:752)

		at hudson.FilePath.act(FilePath.java:978)

		at hudson.FilePath.act(FilePath.java:967)

		at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:89)

		at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:121)

		at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:138)

		at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:74)

		at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)

		at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761)

		at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721)

		at hudson.model.Build$BuildExecution.post2(Build.java:183)

		at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:670)

		at hudson.model.Run.execute(Run.java:1776)

		at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)

		at hudson.model.ResourceController.execute(ResourceController.java:89)

		at hudson.model.Executor.run(Executor.java:240)

	[description-setter] Description set: $BUILD_DESC

	Email was triggered for: Failure - 1st

	Trigger Failure - Any was overridden by another trigger and will not send an email.

	Trigger Failure - Still was overridden by another trigger and will not send an email.

	Sending email for trigger: Failure - 1st








-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.9.0-ea-b60) - Build # 13537 - Failure!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13537/
Java: 64bit/jdk1.9.0-ea-b60 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 
-Djava.locale.providers=JRE,SPI

1 tests failed.
FAILED:  org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests

Error Message:
Timeout waiting for CDCR replication to complete @source_collection:shard1

Stack Trace:
java.lang.RuntimeException: Timeout waiting for CDCR replication to complete 
@source_collection:shard1
at 
__randomizedtesting.SeedInfo.seed([91D81DAC63188477:99B868806C16AC7C]:0)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForReplicationToComplete(BaseCdcrDistributedZkTest.java:732)
at 
org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTestUpdateLogSynchronisation(CdcrReplicationDistributedZkTest.java:361)
at 
org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests(CdcrReplicationDistributedZkTest.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:502)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 

[jira] [Created] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components

2015-07-20 Thread Shawn Heisey (JIRA)
Shawn Heisey created LUCENE-6689:


 Summary: Odd analysis problem with WDF, appears to be triggered by 
preceding analysis components
 Key: LUCENE-6689
 URL: https://issues.apache.org/jira/browse/LUCENE-6689
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Shawn Heisey


This problem shows up for me in Solr, but I believe the issue is down at the 
Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
if necessary.

I've boiled the problem down to this minimum Solr fieldType:

{noformat}
fieldType name=testType class=solr.TextField
sortMissingLast=true positionIncrementGap=100
  analyzer
tokenizer
class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
filter class=solr.PatternReplaceFilterFactory
  pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
  replacement=$2
/
filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1
  splitOnNumerics=1
  stemEnglishPossessive=1
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1
/
  /analyzer
/fieldType
{noformat}

On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at 
term position 1 and bbb at term position 2.  This seems perfectly reasonable to 
me.  In Solr 4.9, both terms end up at position 2.  This causes phrase queries 
which used to work to return zero hits.  The exact text of the phrase query is 
in the original documents that match on 4.7.

If the custom rbbi (which is included unmodified from the lucene icu analysis 
source code) is not used, then the problem doesn't happen, because the 
punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory is 
not present, then the problem doesn't happen.

I can work around the problem by setting luceneMatchVersion to 4.7, but I think 
the behavior is a bug, and I would rather not continue to use 4.7 analysis when 
I upgrade to 5.x, which I hope to do soon.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components

2015-07-20 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633706#comment-14633706
 ] 

Shawn Heisey commented on LUCENE-6689:
--

LUCENE-5111 seems to contain the commit that causes this behavior.

 Odd analysis problem with WDF, appears to be triggered by preceding analysis 
 components
 ---

 Key: LUCENE-6689
 URL: https://issues.apache.org/jira/browse/LUCENE-6689
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.8
Reporter: Shawn Heisey

 This problem shows up for me in Solr, but I believe the issue is down at the 
 Lucene level, so I've opened the issue in the LUCENE project.  We can move it 
 if necessary.
 I've boiled the problem down to this minimum Solr fieldType:
 {noformat}
 fieldType name=testType class=solr.TextField
 sortMissingLast=true positionIncrementGap=100
   analyzer
 tokenizer
 class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory
 rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/
 filter class=solr.PatternReplaceFilterFactory
   pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$
   replacement=$2
 /
 filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange=1
   splitOnNumerics=1
   stemEnglishPossessive=1
   generateWordParts=1
   generateNumberParts=1
   catenateWords=1
   catenateNumbers=1
   catenateAll=0
   preserveOriginal=1
 /
   /analyzer
 /fieldType
 {noformat}
 On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up 
 at term position 1 and bbb at term position 2.  This seems perfectly 
 reasonable to me.  In Solr 4.9, both terms end up at position 2.  This causes 
 phrase queries which used to work to return zero hits.  The exact text of the 
 phrase query is in the original documents that match on 4.7.
 If the custom rbbi (which is included unmodified from the lucene icu analysis 
 source code) is not used, then the problem doesn't happen, because the 
 punctuation doesn't make it to the PRF.  If the PatternReplaceFilterFactory 
 is not present, then the problem doesn't happen.
 I can work around the problem by setting luceneMatchVersion to 4.7, but I 
 think the behavior is a bug, and I would rather not continue to use 4.7 
 analysis when I upgrade to 5.x, which I hope to do soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2479 - Failure!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2479/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseG1GC

3 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores

Error Message:
ERROR: SolrIndexSearcher opens=51 closes=50

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=51 closes=50
at __randomizedtesting.SeedInfo.seed([71276CCB450E50CD]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:465)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:232)
at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:745)


FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores

Error Message:
1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) 
Thread[id=9163, name=searcherExecutor-4396-thread-1, state=WAITING, 
group=TGRP-TestLazyCores] at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.core.TestLazyCores: 
   1) Thread[id=9163, name=searcherExecutor-4396-thread-1, state=WAITING, 
group=TGRP-TestLazyCores]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
at __randomizedtesting.SeedInfo.seed([71276CCB450E50CD]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestLazyCores

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=9163, 

[jira] [Commented] (SOLR-7760) Fix method and field visibility for UIMAUpdateRequestProcessor and SolrUIMAConfiguration

2015-07-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634408#comment-14634408
 ] 

Hoss Man commented on SOLR-7760:


I understand very little about UIMA, but can you please elaborate on what you 
mean by ...they need to be for other code to be able to make use of the 
configuration data, ie: mapped fields...

(Ideally: include a testcase  mock/sample custom plugin demonstrating how you 
would take advantage of these new methods)

 Fix method and field visibility for UIMAUpdateRequestProcessor and 
 SolrUIMAConfiguration
 

 Key: SOLR-7760
 URL: https://issues.apache.org/jira/browse/SOLR-7760
 Project: Solr
  Issue Type: Improvement
  Components: contrib - UIMA
Affects Versions: 5x
Reporter: Aaron LaBella
Priority: Critical
 Fix For: 5.3

 Attachments: SOLR-7760.patch


 The methods in 
 {{solr/contrib/uima/src/java/org/apache/solr/uima/processor/SolrUIMAConfiguration.java}}
  are not public and they need to be for other code to be able to make use of 
 the configuration data, ie: mapped fields.   Likewise, 
 {{solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java}}
  does not have an accessor for the SolrUIMAConfiguration object



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7812) Need a playground to quickly test analyzer stacks

2015-07-20 Thread Alexandre Rafalovitch (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634224#comment-14634224
 ] 

Alexandre Rafalovitch commented on SOLR-7812:
-

Well, that was possible with static schema too, really. Just rewrite the file, 
reload the core.

The issue is making user-friendly UI. Which means:
*) Having a list of all possible analytizers
*) Having all their various options described/self-described
*) Running the same query through several stacks at once

Otherwise, it is not a playground but a slog. Hence a question of whether it is 
worth the effort to do that.

 Need a playground to quickly test analyzer stacks
 -

 Key: SOLR-7812
 URL: https://issues.apache.org/jira/browse/SOLR-7812
 Project: Solr
  Issue Type: Wish
  Components: Schema and Analysis
Reporter: Alexandre Rafalovitch
Priority: Minor
  Labels: analyzers, beginners, usability

 (from email by Robert Oschler)
 (Would be useful to have)... a convenient playground for testing index and 
 query filters?
 I'm  imagining a utility where you can select a set of index and query
 filters, and then enter  a string as a test document and a query string
 and see what kind of scores come back during a matching attempt.  This
 would be a big aid in crafting an indexing/query scheme to get the desired
 matching profile working.  Otherwise the only technique I can think of is
 to iteratively modify the schema file and retest with the admin panel with
 each combination of filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7804) TestCloudPivotFacet failures: num pivots expected:X but was:Y

2015-07-20 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-7804:
-
Summary: TestCloudPivotFacet failures: num pivots expected:X but was:Y  
(was: TestCloudPivotFacet failures: num pivots expected:X but was:X+/-1)

 TestCloudPivotFacet failures: num pivots expected:X but was:Y
 -

 Key: SOLR-7804
 URL: https://issues.apache.org/jira/browse/SOLR-7804
 Project: Solr
  Issue Type: Bug
  Components: faceting
Affects Versions: 5.3, Trunk
Reporter: Steve Rowe

 A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk 
 - here's one on trunk: 
 [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on 
 branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/].
 I reproduced another branch_5x failure from a few days ago (Jenkins job has 
 been removed already) on OS X, using both java7 and java8:
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestCloudPivotFacet -Dtests.method=test 
 -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA 
 -Dtests.timezone=America/El_Salvador -Dtests.asserts=true 
 -Dtests.file.encoding=UTF-8
[junit4] FAILURE 46.6s | TestCloudPivotFacet.test 
[junit4] Throwable #1: java.lang.AssertionError: 
 {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)}
  num pivots expected:2 but was:1
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7804) TestCloudPivotFacet failures: num pivots expected:X but was:Y

2015-07-20 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634228#comment-14634228
 ] 

Steve Rowe commented on SOLR-7804:
--

Another trunk failure on Linux: 
[http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/901/] - reproduces for 
me on OS X, both on trunk and on branch_5x, the latter with both Java7 and 
Java8: 

{noformat}
   [junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet 
-Dtests.method=test -Dtests.seed=957BC6861F510BE -Dtests.slow=true 
-Dtests.locale=sr_BA -Dtests.timezone=America/Guadeloupe -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 36.2s J3  | TestCloudPivotFacet.test 
   [junit4] Throwable #1: java.lang.AssertionError: 
{main(facet=truefacet.pivot=pivot_b%2Cpivot_f%2Cpivot_dt1facet.pivot=%7B%21stats%3Dst3%7Dpivot_td%2Cpivot_z_s1facet.limit=5facet.pivot.mincount=16facet.missing=truefacet.sort=indexfacet.overrequest.ratio=1.1832508),extra(rows=0q=*%3A*stats=truestats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tlstats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_tdt1stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_y_s_test_min=16_test_miss=true_test_sort=index)}
 == pivot_b,pivot_f,pivot_dt1: 
{params(rows=0),defaults({main(rows=0q=*%3A*stats=truestats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tlstats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_tdt1stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_y_s_test_min=16_test_miss=true_test_sort=index),extra(fq=-pivot_b%3A%5B*+TO+*%5D)})}
 expected:17 but was:22
   [junit4]at 
__randomizedtesting.SeedInfo.seed([957BC6861F510BE:810383B2CF097D46]:0)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:281)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228)
   [junit4]at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963)
   [junit4]at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938)
   [junit4]at java.lang.Thread.run(Thread.java:745)
   [junit4] Caused by: java.lang.AssertionError: 
pivot_b,pivot_f,pivot_dt1: 
{params(rows=0),defaults({main(rows=0q=*%3A*stats=truestats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tlstats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_tdt1stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_y_s_test_min=16_test_miss=true_test_sort=index),extra(fq=-pivot_b%3A%5B*+TO+*%5D)})}
 expected:17 but was:22
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:680)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotData(TestCloudPivotFacet.java:335)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:302)
   [junit4]at 
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:271)
   [junit4]... 42 more
{noformat}

 TestCloudPivotFacet failures: num pivots expected:X but was:Y
 -

 Key: SOLR-7804
 URL: https://issues.apache.org/jira/browse/SOLR-7804
 Project: Solr
  Issue Type: Bug
  Components: faceting
Affects Versions: 5.3, Trunk
Reporter: Steve Rowe

 A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk 
 - here's one on trunk: 
 [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on 
 branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/].
 I reproduced another branch_5x failure from a few days ago (Jenkins job has 
 been removed already) on OS X, using both java7 and java8:
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestCloudPivotFacet -Dtests.method=test 
 -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA 
 -Dtests.timezone=America/El_Salvador -Dtests.asserts=true 
 -Dtests.file.encoding=UTF-8
[junit4] FAILURE 46.6s | TestCloudPivotFacet.test 
[junit4] Throwable #1: java.lang.AssertionError: 
 {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)}
  num pivots expected:2 but was:1
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251)

[jira] [Updated] (SOLR-7804) TestCloudPivotFacet failures: num pivots expected:X but was:Y, also

2015-07-20 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-7804:
-
Summary: TestCloudPivotFacet failures: num pivots expected:X but was:Y, 
also   (was: TestCloudPivotFacet failures: num pivots expected:X but was:Y)

 TestCloudPivotFacet failures: num pivots expected:X but was:Y, also 
 

 Key: SOLR-7804
 URL: https://issues.apache.org/jira/browse/SOLR-7804
 Project: Solr
  Issue Type: Bug
  Components: faceting
Affects Versions: 5.3, Trunk
Reporter: Steve Rowe

 A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk 
 - here's one on trunk: 
 [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on 
 branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/].
 I reproduced another branch_5x failure from a few days ago (Jenkins job has 
 been removed already) on OS X, using both java7 and java8:
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestCloudPivotFacet -Dtests.method=test 
 -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA 
 -Dtests.timezone=America/El_Salvador -Dtests.asserts=true 
 -Dtests.file.encoding=UTF-8
[junit4] FAILURE 46.6s | TestCloudPivotFacet.test 
[junit4] Throwable #1: java.lang.AssertionError: 
 {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)}
  num pivots expected:2 but was:1
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7765) TokenizerChain without char filters cause NPE in luke request handler

2015-07-20 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-7765:
---
Attachment: SOLR-7765.patch

bq. I'm going to do a quick audit of all TokenizerChain clients to see where 
else null checks are currently be doing that can be optimized away with this 
fix and post an updated patch.

attached.

 TokenizerChain without char filters cause NPE in luke request handler
 -

 Key: SOLR-7765
 URL: https://issues.apache.org/jira/browse/SOLR-7765
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
Reporter: Konstantin Gribov
Assignee: Hoss Man
Priority: Minor
 Attachments: SOLR-7765.patch, SOLR-7765.patch, SOLR-7765.patch


 {{TokenizerChain}} created using 2-arg constructor has {{null}} in 
 {{charFilters}}, so {{LukeRequestHandler}} throws NPE on iterating it.
 Will create PR in a couple of minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: timeAllowed parameter ignored edge-case bug?

2015-07-20 Thread Chris Hostetter

: In the scenario outlined below, the second run's timeAllowed parameter 
: is unexpectedly ignored. Could this be intentionally so somehow (q vs. 
: fq processing?, Collector vs. LeafCollector?, DocList vs. DocSet?), or 
: is it an edge-case bug?

Based on your description (didn't re-review the code directly) it sounds 
like an oversight with timeAllowed -- probably overlooked because of the 
oddity of having a queryResultCache but not filterCache (correct me if i'm 
wrong, but it sounds like this bug won't surface if both queryResultsCache 
 filterCache are enabled -- or both disabled -- correct?) ... probably 
doesn't affect (m)any real users because of this.

Sounds like we should split out the build part of 
buildAndRunCollectorChain into it's own method and re-use it in 
getDocSet (although it seems like that will almost certainly require some 
API changes to propogate the QueryCommand context down)

Christine: can you file this as a Jira so we don't lose track of it?

: 
: Regards,
: 
: Christine
: 
: ---
: 
: solrconfig characteristics:
:  * a queryResultsCache is configured
:  * no filterCache is configured
: 
: query characteristics:
:  * q parameter present
:  * at least one fq parameter present
:  * sort parameter present (and does not require the score field)
:  * GET_DOCSET flag is set e.g. via the StatsComponent i.e. stats=true 
parameter
: 
: runtime characteristics:
:  * first run of the query gets a queryResultsCache-miss and respects 
timeAllowed
:  * second run gets a queryResultsCache-hit and ignores timeAllowed (but still
:makes use of the lucene IndexSearcher)
: 
: code path execution details (first run):
: * SolrIndexSearcher.search calls getDocListC
: * getDocListC called queryResultCache.get which found nothing
: * getDocListC calls getDocListAndSetNC
: * getDocListAndSetNC calls buildAndRunCollectorChain
: * buildAndRunCollectorChain constructs TimeLimitingCollector
: 
: code path execution details (second run):
: * SolrIndexSearcher.search calls getDocListC
: * getDocListC called queryResultCache.get which found something
: * getDocListC calls getDocSet(ListQuery queries)
: * getDocSet(ListQuery queries) iterates over IndexSearcher.leafContexts
: -
: To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
: For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 

-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7815) Remove LuceneQueryOptimizer

2015-07-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634183#comment-14634183
 ] 

Hoss Man commented on SOLR-7815:


Linking to SOLR-1052 and SOLR-3093 for context.

In particular note that r922957 (March 2010) is where the the code that used 
the optimizer was last removed and after that SOLR-1052 dealt with the cleanup 
to remove the config parsing to enable the optimizer.

bq. Here is a patch.

+1

 Remove LuceneQueryOptimizer
 ---

 Key: SOLR-7815
 URL: https://issues.apache.org/jira/browse/SOLR-7815
 Project: Solr
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: SOLR-7815.patch


 I noticed that I introduced a bug in this class when refactoring BooleanQuery 
 to be immutable (using the builder as a cache key instead of the query 
 itself). But then I noticed that this class is actually never used, so let's 
 remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7765) TokenizerChain without char filters cause NPE in luke request handler

2015-07-20 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-7765:
---
Attachment: SOLR-7765.patch

bq. I'll add a test to my PR.

Thanks!

I missunderstood what you ment before, but with the testcase you provided it 
all makes sense.

In my opinion, the root bug here is that TokenizerChain should be more explicit 
about what is allowed in it's construtor, and more resilient to null args when 
things are optional -- that way callers like LukeAdminHandler don't have to 
constantly do null checks.

The attached path fixes what i consider the root of the bug and gets your test 
to pass w/o modifying LukeAdminHandler.  It also adds more randomization to 
your test to cover more permutations of options, and updates MultiTermTest to 
account for the improved behavior of getCharFilterFactories() (which you can 
see from looking at that test was annoying inconsistent before depending on 
what analyzer was used and where it came from)

I'm going to do a quick audit of all TokenizerChain clients to see where else 
null checks are currently be doing that can be optimized away with this fix and 
post an updated patch.



 TokenizerChain without char filters cause NPE in luke request handler
 -

 Key: SOLR-7765
 URL: https://issues.apache.org/jira/browse/SOLR-7765
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
Reporter: Konstantin Gribov
Assignee: Hoss Man
Priority: Minor
 Attachments: SOLR-7765.patch, SOLR-7765.patch


 {{TokenizerChain}} created using 2-arg constructor has {{null}} in 
 {{charFilters}}, so {{LukeRequestHandler}} throws NPE on iterating it.
 Will create PR in a couple of minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7804) TestCloudPivotFacet failures: expected:X but was:Y

2015-07-20 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-7804:
-
Summary: TestCloudPivotFacet failures: expected:X but was:Y  (was: 
TestCloudPivotFacet failures: num pivots expected:X but was:Y, also )

 TestCloudPivotFacet failures: expected:X but was:Y
 --

 Key: SOLR-7804
 URL: https://issues.apache.org/jira/browse/SOLR-7804
 Project: Solr
  Issue Type: Bug
  Components: faceting
Affects Versions: 5.3, Trunk
Reporter: Steve Rowe

 A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk 
 - here's one on trunk: 
 [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on 
 branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/].
 I reproduced another branch_5x failure from a few days ago (Jenkins job has 
 been removed already) on OS X, using both java7 and java8:
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestCloudPivotFacet -Dtests.method=test 
 -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA 
 -Dtests.timezone=America/El_Salvador -Dtests.asserts=true 
 -Dtests.file.encoding=UTF-8
[junit4] FAILURE 46.6s | TestCloudPivotFacet.test 
[junit4] Throwable #1: java.lang.AssertionError: 
 {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)}
  num pivots expected:2 but was:1
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251)
[junit4]  at 
 org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
[junit4]  at 
 org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
[junit4]  at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.7.0_80) - Build # 4924 - Failure!

2015-07-20 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4924/
Java: 64bit/jdk1.7.0_80 -XX:-UseCompressedOops -XX:+UseG1GC

2 tests failed.
FAILED:  org.apache.solr.search.TestSolr4Spatial2.testBBox

Error Message:
PermGen space

Stack Trace:
java.lang.OutOfMemoryError: PermGen space
at 
__randomizedtesting.SeedInfo.seed([6E0F96AE68F74794:170EDCADCF916BE9]:0)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at 
org.apache.solr.schema.BBoxField.getValueSourceFromSpatialArgs(BBoxField.java:183)
at 
org.apache.solr.schema.BBoxField.getValueSourceFromSpatialArgs(BBoxField.java:36)
at 
org.apache.solr.schema.AbstractSpatialFieldType.getQueryFromSpatialArgs(AbstractSpatialFieldType.java:338)
at 
org.apache.solr.schema.AbstractSpatialFieldType.getFieldQuery(AbstractSpatialFieldType.java:312)
at 
org.apache.solr.search.FieldQParserPlugin$1.parse(FieldQParserPlugin.java:50)
at org.apache.solr.search.QParser.getQuery(QParser.java:141)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:157)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:258)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at org.apache.solr.util.TestHarness.query(TestHarness.java:320)
at org.apache.solr.util.TestHarness.query(TestHarness.java:302)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:829)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:798)
at 
org.apache.solr.search.TestSolr4Spatial2.testBBox(TestSolr4Spatial2.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627)


FAILED:  junit.framework.TestSuite.org.apache.solr.update.AddBlockUpdateTest

Error Message:
PermGen space

Stack Trace:
java.lang.OutOfMemoryError: PermGen space
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.instantiate(SlaveMain.java:228)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:188)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:310)
at 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12)




Build Log:
[...truncated 11469 lines...]
   [junit4] Suite: org.apache.solr.search.TestSolr4Spatial2
   [junit4]   2 Creating dataDir: 
C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.search.TestSolr4Spatial2_6E0F96AE68F74794-001\init-core-data-001
   [junit4]   2 3097174 INFO  
(SUITE-TestSolr4Spatial2-seed#[6E0F96AE68F74794]-worker) [] 
o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (false)
   [junit4]   2 3097175 INFO  
(SUITE-TestSolr4Spatial2-seed#[6E0F96AE68F74794]-worker) [] 
o.a.s.SolrTestCaseJ4 

[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents

2015-07-20 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634464#comment-14634464
 ] 

Noble Paul commented on SOLR-445:
-

I guess it would be better if we return the whole command instead of just the 
id to the user

 Update Handlers abort with bad documents
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Anshum Gupta
 Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, 
 SOLR-445-alternative.patch, SOLR-445-alternative.patch, 
 SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Failing CDCR tests

2015-07-20 Thread Erick Erickson
We're looking in to these, if we don't have something relatively soon
I'll disable them until we do. I suspect these are an artifact of the
test framework but don't know for sure just yet.

Please bear with us re: the noise for another day or two. If we don't
have something by then I'll disable the tests until we do.

Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards

2015-07-20 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6685:
---
Attachment: LUCENE-6685.patch

I put together a visualization of the ranges that were being created (will add 
the link to the video when I post it). This revealed some interesting issues. 
At precision_step 6 and detailLevel 16 the number of ranges for the worst case 
boundary condition were nearly 2 million. 100 iteration beast tests would take 
just over an hour.  Reducing that precisionStep to 3 and the detailLevel to 12 
reduced the number of ranges to just over 10K.  The 100 iteration beast test 
was reduced from over an hour to just over 8 minutes. There was also a bug in 
the pointDistance query that added unnecessary high resolution ranges that fell 
within the bounding box but outside the actual pointRadius.

 GeoPointInBBox/Distance queries should have safeguards
 --

 Key: LUCENE-6685
 URL: https://issues.apache.org/jira/browse/LUCENE-6685
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.3, Trunk

 Attachments: LUCENE-6685.patch


 These queries build a big list of term ranges, where the size of the list is 
 in proportion to how many cells of the space filling curve are crossed by 
 the perimeter of the query (I think?).
 This can easily be 100s of MBs for a big enough query ... not to mention slow 
 to enumerate (we still do this again for each segment).
 I think the queries should have safeguards, much like we have 
 maxDeterminizedStates for Automaton based queries, to prevent accidental 
 OOMEs.
 But I think longer term we should either change the ranges to be enumerated 
 on-demand and never stored in entirety (like NumericRangeTermsEnum), or 
 change the query so it has a fixed budget of how many cells it's allowed to 
 visit and then within a crossing cell it uses doc values to post-filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards

2015-07-20 Thread Nicholas Knize (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634504#comment-14634504
 ] 

Nicholas Knize edited comment on LUCENE-6685 at 7/21/15 4:22 AM:
-

I put together a visualization of the ranges that were being created (will add 
the link to the video when I post it). This revealed some interesting issues. 
At precision_step 6 and detailLevel 16 the number of ranges for the worst case 
boundary condition were nearly 2 million. 100 iteration beast tests would take 
just over an hour.  Reducing that precisionStep to 3 and the detailLevel to 12 
reduced the number of ranges to just over 10K.  The 100 iteration beast test 
was reduced from over an hour to just over 8 minutes. There was also a bug in 
the pointDistance query that added unnecessary high resolution ranges that fell 
within the bounding box but outside the actual pointRadius.  Patch included


was (Author: nknize):
I put together a visualization of the ranges that were being created (will add 
the link to the video when I post it). This revealed some interesting issues. 
At precision_step 6 and detailLevel 16 the number of ranges for the worst case 
boundary condition were nearly 2 million. 100 iteration beast tests would take 
just over an hour.  Reducing that precisionStep to 3 and the detailLevel to 12 
reduced the number of ranges to just over 10K.  The 100 iteration beast test 
was reduced from over an hour to just over 8 minutes. There was also a bug in 
the pointDistance query that added unnecessary high resolution ranges that fell 
within the bounding box but outside the actual pointRadius.

 GeoPointInBBox/Distance queries should have safeguards
 --

 Key: LUCENE-6685
 URL: https://issues.apache.org/jira/browse/LUCENE-6685
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.3, Trunk

 Attachments: LUCENE-6685.patch


 These queries build a big list of term ranges, where the size of the list is 
 in proportion to how many cells of the space filling curve are crossed by 
 the perimeter of the query (I think?).
 This can easily be 100s of MBs for a big enough query ... not to mention slow 
 to enumerate (we still do this again for each segment).
 I think the queries should have safeguards, much like we have 
 maxDeterminizedStates for Automaton based queries, to prevent accidental 
 OOMEs.
 But I think longer term we should either change the ranges to be enumerated 
 on-demand and never stored in entirety (like NumericRangeTermsEnum), or 
 change the query so it has a fixed budget of how many cells it's allowed to 
 visit and then within a crossing cell it uses doc values to post-filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards

2015-07-20 Thread Nicholas Knize (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Knize updated LUCENE-6685:
---
Attachment: LUCENE-6685.patch

 GeoPointInBBox/Distance queries should have safeguards
 --

 Key: LUCENE-6685
 URL: https://issues.apache.org/jira/browse/LUCENE-6685
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 5.3, Trunk

 Attachments: LUCENE-6685.patch, LUCENE-6685.patch


 These queries build a big list of term ranges, where the size of the list is 
 in proportion to how many cells of the space filling curve are crossed by 
 the perimeter of the query (I think?).
 This can easily be 100s of MBs for a big enough query ... not to mention slow 
 to enumerate (we still do this again for each segment).
 I think the queries should have safeguards, much like we have 
 maxDeterminizedStates for Automaton based queries, to prevent accidental 
 OOMEs.
 But I think longer term we should either change the ranges to be enumerated 
 on-demand and never stored in entirety (like NumericRangeTermsEnum), or 
 change the query so it has a fixed budget of how many cells it's allowed to 
 visit and then within a crossing cell it uses doc values to post-filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org