[JENKINS] Lucene-Solr-Tests-trunk-Java8 - Build # 226 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java8/226/ No tests ran. Build Log: [...truncated 10457 lines...] [junit4] Suite: org.apache.solr.core.SolrCoreCheckLockOnStartupTestFATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:742) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168) at com.sun.proxy.$Proxy59.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956) at hudson.Launcher$ProcStarter.join(Launcher.java:367) at hudson.tasks.Ant.perform(Ant.java:217) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756) at hudson.model.Build$BuildExecution.build(Build.java:198) at hudson.model.Build$BuildExecution.doRun(Build.java:159) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) at hudson.model.Run.execute(Run.java:1706) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:232) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:805) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) at java.io.ObjectInputStream.init(ObjectInputStream.java:299) at hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633103#comment-14633103 ] Anshum Gupta commented on SOLR-445: --- I'm seeing a few errors with the current patch and I think I know what's going on. I'll take a look at it and update the patch tomorrow. Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Anshum Gupta Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 745 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/745/ No tests ran. Build Log: [...truncated 1019 lines...] FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:742) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168) at com.sun.proxy.$Proxy59.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956) at hudson.Launcher$ProcStarter.join(Launcher.java:367) at hudson.tasks.Ant.perform(Ant.java:217) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756) at hudson.model.Build$BuildExecution.build(Build.java:198) at hudson.model.Build$BuildExecution.doRun(Build.java:159) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) at hudson.model.Run.execute(Run.java:1706) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:232) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:805) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) at java.io.ObjectInputStream.init(ObjectInputStream.java:299) at hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6686) Improve InforStream API
[ https://issues.apache.org/jira/browse/LUCENE-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633099#comment-14633099 ] Dawid Weiss commented on LUCENE-6686: - This has really been reinvented over and over in logging APIs. The {{isEnabled(level)}} idiom is necessary when argument construction is complex and costly (so that you want to avoid it before the method call). Improve InforStream API --- Key: LUCENE-6686 URL: https://issues.apache.org/jira/browse/LUCENE-6686 Project: Lucene - Core Issue Type: Improvement Reporter: Cao Manh Dat Currently, We use InfoStream in duplicated ways. For example {code} if (infoStream.isEnabled(IW)) { infoStream.message(IW, init: loaded commit \ + commit.getSegmentsFileName() + \); } {code} Can we change the API of InfoStream to {code} infoStream.messageIfEnabled(component,message); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7691) SolrEntityProcessor as SubEntity doesn't work with delta-import
[ https://issues.apache.org/jira/browse/SOLR-7691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Krebs updated SOLR-7691: -- Flags: Important SolrEntityProcessor as SubEntity doesn't work with delta-import --- Key: SOLR-7691 URL: https://issues.apache.org/jira/browse/SOLR-7691 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 5.0, 5.1, 5.2, 5.2.1 Reporter: Sebastian Krebs I've used the {{SolrEntityProcessor}} as sub-entity in the dataimporter like this {code:lang=xml} dataConfig document name=products entity name=outer dataSource=my_datasource pk=id query=... deltaQuery=... deltaImportQuery=... entity name=solr processor=SolrEntityProcessor url=http://127.0.0.1:8983/solr/${solr.core.name}; query=Xid:${outer.Xid} rows=1 fl=Id,FieldA,FieldB wt=javabin / /entity /document /dataConfig {code} Recently I decided to upgrade to 5.x, but the delta-import stopped working. At all it looks like the http-connection used by the {{SolrEntityProcessor}} is closed right _after_ the request/response, because the first document is indexed properly and for the second connection the dataimport fetches the record from the database, but after that exists This is the stacktrace taken from the log {code:lang=none} java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalStateException: Connection pool shut down at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:444) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:482) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalStateException: Connection pool shut down at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:363) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:224) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalStateException: Connection pool shut down at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414) ... 5 more Caused by: java.lang.IllegalStateException: Connection pool shut down at org.apache.http.util.Asserts.check(Asserts.java:34) at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:184) at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:217) at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:466) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at org.apache.solr.handler.dataimport.SolrEntityProcessor.doQuery(SolrEntityProcessor.java:198) at
[jira] [Updated] (SOLR-7803) Classloading deadlock in TrieField
[ https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-7803: Attachment: SOLR-7803.patch I did a bit more refactoring: - rename DateUtils - DateFormatUtil (the other name was somehow a confusing duplicate, so Eclipse autocomplete showed too much unspecific stuff). - I removed more formatting methods out of TrieDateField. TrieDateField is now as any other Trie(Long|Int|Double|Float)Field - short and compact. I will commit this later and add backwards layer in 5.x. All tests pass. Classloading deadlock in TrieField -- Key: SOLR-7803 URL: https://issues.apache.org/jira/browse/SOLR-7803 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: OSX, JDK8u45 Reporter: Markus Heiden Assignee: Uwe Schindler Labels: patch Fix For: 5.3, Trunk Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the clinit of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField. This does not completely remove the circular dependency, but at least it is no more in clinit. For the future someone may extract a util class to remove the circular dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6225) Clarify documentation of clone() in IndexInput
[ https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-6225: Fix Version/s: 5.3 Clarify documentation of clone() in IndexInput -- Key: LUCENE-6225 URL: https://issues.apache.org/jira/browse/LUCENE-6225 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 5.3, Trunk Attachments: LUCENE-6225.patch Here is a snippet from IndexInput's documentation: {code} The original instance must take care that cloned instances throw AlreadyClosedException when the original one is closed. {code} But concrete implementations don't throw this AlreadyClosedException (this would break the contract on Closeable). For example, see NIOFSDirectory: {code} public void close() throws IOException { if (!isClone) { channel.close(); } } {code} What trapped me was that the abstract class IndexInput overrides the default implementation of clone(), but doesn't do anything useful... I guess you could make it final and provide the tracking for cloned instances in this class rather than reimplementing it everywhere else (isCloned() would be a superclass method then too). Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6687) MLT term frequency calculation bug
Marko Bonaci created LUCENE-6687: Summary: MLT term frequency calculation bug Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {{q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009}} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7803) Classloading deadlock in TrieField = refactor date formatting/parsing to static utility class
[ https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-7803: Summary: Classloading deadlock in TrieField = refactor date formatting/parsing to static utility class (was: Classloading deadlock in TrieField) Classloading deadlock in TrieField = refactor date formatting/parsing to static utility class -- Key: SOLR-7803 URL: https://issues.apache.org/jira/browse/SOLR-7803 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: OSX, JDK8u45 Reporter: Markus Heiden Assignee: Uwe Schindler Labels: patch Fix For: 5.3, Trunk Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the clinit of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField. This does not completely remove the circular dependency, but at least it is no more in clinit. For the future someone may extract a util class to remove the circular dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_45) - Build # 13531 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13531/ Java: 64bit/jdk1.8.0_45 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests Error Message: Timeout waiting for CDCR replication to complete @source_collection:shard2 Stack Trace: java.lang.RuntimeException: Timeout waiting for CDCR replication to complete @source_collection:shard2 at __randomizedtesting.SeedInfo.seed([1EA7E284A8B756C3:16C797A8A7B97EC8]:0) at org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForReplicationToComplete(BaseCdcrDistributedZkTest.java:732) at org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTestUpdateLogSynchronisation(CdcrReplicationDistributedZkTest.java:362) at org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests(CdcrReplicationDistributedZkTest.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.8.0_45) - Build # 13530 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13530/ Java: 64bit/jdk1.8.0_45 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest Error Message: Captured an uncaught exception in thread: Thread[id=7905, name=RecoveryThread-source_collection_shard1_replica1, state=RUNNABLE, group=TGRP-CdcrReplicationHandlerTest] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=7905, name=RecoveryThread-source_collection_shard1_replica1, state=RUNNABLE, group=TGRP-CdcrReplicationHandlerTest] Caused by: org.apache.solr.common.cloud.ZooKeeperException: at __randomizedtesting.SeedInfo.seed([528FCA919F1E8CAB]:0) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:234) Caused by: org.apache.solr.common.SolrException: java.io.FileNotFoundException: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J2/temp/solr.cloud.CdcrReplicationHandlerTest_528FCA919F1E8CAB-001/jetty-001/cores/source_collection_shard1_replica1/data/tlog/tlog.007.1507203359819431936 (No such file or directory) at org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:244) at org.apache.solr.update.CdcrTransactionLog.incref(CdcrTransactionLog.java:173) at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1078) at org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1578) at org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610) at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:866) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:526) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227) Caused by: java.io.FileNotFoundException: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J2/temp/solr.cloud.CdcrReplicationHandlerTest_528FCA919F1E8CAB-001/jetty-001/cores/source_collection_shard1_replica1/data/tlog/tlog.007.1507203359819431936 (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.init(RandomAccessFile.java:243) at org.apache.solr.update.CdcrTransactionLog.reopenOutputStream(CdcrTransactionLog.java:236) ... 7 more Build Log: [...truncated 10973 lines...] [junit4] Suite: org.apache.solr.cloud.CdcrReplicationHandlerTest [junit4] 2 Creating dataDir: /home/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build/solr-core/test/J2/temp/solr.cloud.CdcrReplicationHandlerTest_528FCA919F1E8CAB-001/init-core-data-001 [junit4] 2 1046749 INFO (SUITE-CdcrReplicationHandlerTest-seed#[528FCA919F1E8CAB]-worker) [] o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (true) [junit4] 2 1046749 INFO (SUITE-CdcrReplicationHandlerTest-seed#[528FCA919F1E8CAB]-worker) [] o.a.s.BaseDistributedSearchTestCase Setting hostContext system property: / [junit4] 2 1046751 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.ZkTestServer STARTING ZK TEST SERVER [junit4] 2 1046751 INFO (Thread-3004) [] o.a.s.c.ZkTestServer client port:0.0.0.0/0.0.0.0:0 [junit4] 2 1046751 INFO (Thread-3004) [] o.a.s.c.ZkTestServer Starting server [junit4] 2 1046851 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.ZkTestServer start zk server on port:45204 [junit4] 2 1046851 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider [junit4] 2 1046852 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper [junit4] 2 1046854 INFO (zkCallback-796-thread-1) [] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@4f0677f3 name:ZooKeeperConnection Watcher:127.0.0.1:45204 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None [junit4] 2 1046854 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper [junit4] 2 1046854 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.c.SolrZkClient Using default ZkACLProvider [junit4] 2 1046854 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.c.SolrZkClient makePath: /solr [junit4] 2 1046856 INFO (TEST-CdcrReplicationHandlerTest.doTest-seed#[528FCA919F1E8CAB]) [] o.a.s.c.c.SolrZkClient Using default ZkCredentialsProvider [junit4] 2 1046856 INFO
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Description: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? was: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Attachments: buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png,
[jira] [Commented] (SOLR-7715) Remove IgnoreAcceptDocsQuery
[ https://issues.apache.org/jira/browse/SOLR-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633402#comment-14633402 ] Adrien Grand commented on SOLR-7715: I'll remove it shortly if there are no objections. Remove IgnoreAcceptDocsQuery Key: SOLR-7715 URL: https://issues.apache.org/jira/browse/SOLR-7715 Project: Solr Issue Type: Task Reporter: Adrien Grand Priority: Minor While reviewing how queries apply acceptDocs, I noticed that Solr has org.apache.solr.search.join.IgnoreAcceptDocsQuery, but it looks unused. Should we remove it? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Extracting article keywords using tf-idf algorithm
Hi again, It seems that my problem with the strange behavior of Solr caused by the fact that I tried to update documents and add keyword field inside the Lucene index (not from using Solrj API) for the sake of better performance, But it seems that some processes ignored by this way of modifying index. (which is obvious) These processes that I am not aware of them are caused the inconsistency. One solution would be updating Index by adding a new document with using SolrJ. As I mentioned this solution is not the best one in case of performance concerns. (The indexing time would be doubled) Therefore it would be nice if there are any possible and reliable solution available for my problem with considering the performance concerns. Best regards. On Sat, Jul 18, 2015 at 9:40 PM, Ali Nazemian alinazem...@gmail.com wrote: Dear Diego, Hi, Yeah, exactly what I want. As Shawn said it is acronym for More Like This. Actually since Lucene already did the hardworking for the purpose of calculating interesting terms, I just want to use that for adding a multi-value field to all indexed documents. Best regards. On Sat, Jul 18, 2015 at 8:08 PM, Shawn Heisey apa...@elyograg.org wrote: On 7/18/2015 9:16 AM, Diego Ceccarelli wrote: Could you please post your code somewhere? I don't understand what is mlt :) This is an acronym that means More Like This. https://wiki.apache.org/solr/MoreLikeThis Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- A.Nazemian -- A.Nazemian
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Description: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? was: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears
[jira] [Commented] (SOLR-7803) Classloading deadlock in TrieField
[ https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633229#comment-14633229 ] ASF subversion and git services commented on SOLR-7803: --- Commit 1691900 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1691900 ] SOLR-7803: Use Java 8 ThreadLocal Classloading deadlock in TrieField -- Key: SOLR-7803 URL: https://issues.apache.org/jira/browse/SOLR-7803 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: OSX, JDK8u45 Reporter: Markus Heiden Assignee: Uwe Schindler Labels: patch Fix For: 5.3, Trunk Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the clinit of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField. This does not completely remove the circular dependency, but at least it is no more in clinit. For the future someone may extract a util class to remove the circular dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7719) Suggester Component results parsing
[ https://issues.apache.org/jira/browse/SOLR-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633241#comment-14633241 ] Alessandro Benedetti commented on SOLR-7719: Perfect Tommaso, thanks for the corrections ! As i provided another similar patch, I actually missed to apply the correction on my own on this one. Can we close the issue ? Cheers Suggester Component results parsing --- Key: SOLR-7719 URL: https://issues.apache.org/jira/browse/SOLR-7719 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: 5.2.1 Reporter: Alessandro Benedetti Assignee: Tommaso Teofili Priority: Minor Labels: queryResponse, suggester, suggestions Fix For: Trunk Attachments: SOLR-7719.patch, SOLR-7719.patch Original Estimate: 24h Remaining Estimate: 24h Currently SolrJ org.apache.solr.client.solrj.response.QueryResponse is not managing suggestions coming from the Suggest Component . It would be nice to have the suggestions properly managed and returned with simply getter methods. Current Json : lst name=suggest lst name=dictionary1 lst name=queryTerm int name=numFound2/int arr name=suggestions lst str name=termsuggestion1/str.. str name=termsuggestion2/str… /lst /arr /lst /lst.. This will be parsed accordingly . Producing an easy to use Java Map. Dictionary2suggestions -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Attachment: (was: solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png) MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Fix For: 5.2.2 Attachments: LUCENE-6687.patch, buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Attachment: solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Fix For: 5.2.2 Attachments: LUCENE-6687.patch, buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6225) Clarify documentation of clone() in IndexInput
[ https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633181#comment-14633181 ] ASF subversion and git services commented on LUCENE-6225: - Commit 1691888 from [~dawidweiss] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1691888 ] LUCENE-6225: Clarify documentation of clone/close in IndexInput. Clarify documentation of clone() in IndexInput -- Key: LUCENE-6225 URL: https://issues.apache.org/jira/browse/LUCENE-6225 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: Trunk Attachments: LUCENE-6225.patch Here is a snippet from IndexInput's documentation: {code} The original instance must take care that cloned instances throw AlreadyClosedException when the original one is closed. {code} But concrete implementations don't throw this AlreadyClosedException (this would break the contract on Closeable). For example, see NIOFSDirectory: {code} public void close() throws IOException { if (!isClone) { channel.close(); } } {code} What trapped me was that the abstract class IndexInput overrides the default implementation of clone(), but doesn't do anything useful... I guess you could make it final and provide the tracking for cloned instances in this class rather than reimplementing it everywhere else (isCloned() would be a superclass method then too). Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[CI] Lucene 5x Linux 64 Test Only - Build # 56636 - Failure!
BUILD FAILURE Build URLhttp://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56636/ Project:lucene_linux_java8_64_test_only Date of build:Mon, 20 Jul 2015 07:16:22 +0200 Build duration:1 hr 0 min CHANGES No Changes CONSOLE OUTPUT [...truncated 204 lines...] at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at ..remote call to ubuntu-14-64-8-metal(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356) at hudson.remoting.UserResponse.retrieve(UserRequest.java:221) at hudson.remoting.Channel.call(Channel.java:752) at hudson.FilePath.act(FilePath.java:978) at hudson.FilePath.act(FilePath.java:967) at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:89) at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:121) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:138) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:74) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:670) at hudson.model.Run.execute(Run.java:1776) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:89) at hudson.model.Executor.run(Executor.java:240) [description-setter] Description set: $BUILD_DESC Email was triggered for: Failure - 1st Trigger Failure - Any was overridden by another trigger and will not send an email. Trigger Failure - Still was overridden by another trigger and will not send an email. Sending email for trigger: Failure - 1st - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Description: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? was: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {{q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009}} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. There
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Attachment: terms-how.png terms-glass.png terms-angry.png terms-accumulator.png solr-mlt-tf-doubling-bug.png solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png solr-mlt-tf-doubling-bug-results.png buggy-method-usage.png MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Attachments: buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Description: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! Bug, no? was: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? MLT term frequency calculation bug -- Key: LUCENE-6687 URL:
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Description: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? was: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - External issue URL: (was: https://docs.google.com/a/sematext.com/document/d/1oPjxj9dpw-sT2NhVN-HuFmCE_ouyrPNdDQLnCgfiyq8/edit?usp=sharing) MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Attachments: buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Fix Version/s: 5.2.2 MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Fix For: 5.2.2 Attachments: LUCENE-6687.patch, buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-7803) Classloading deadlock in TrieField
[ https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved SOLR-7803. - Resolution: Fixed I committed and backported + added backwards layer. In trunk I also removed the custom ThreadLocal, {{ThreadLocal#withInitial(FORMAT_PROTOTYPE::clone)}} is much more elegant. If you see other class loading deadlocks in Solr startup, those can be caused by concurrent core initialization, which may be broken under certain cases. Please open other issues about that. Classloading deadlock in TrieField -- Key: SOLR-7803 URL: https://issues.apache.org/jira/browse/SOLR-7803 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: OSX, JDK8u45 Reporter: Markus Heiden Assignee: Uwe Schindler Labels: patch Fix For: 5.3, Trunk Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the clinit of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField. This does not completely remove the circular dependency, but at least it is no more in clinit. For the future someone may extract a util class to remove the circular dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6225) Clarify documentation of clone() in IndexInput
[ https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633194#comment-14633194 ] ASF subversion and git services commented on LUCENE-6225: - Commit 1691892 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1691892 ] LUCENE-6225: Clarify documentation of clone/close in IndexInput. Clarify documentation of clone() in IndexInput -- Key: LUCENE-6225 URL: https://issues.apache.org/jira/browse/LUCENE-6225 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 5.3, Trunk Attachments: LUCENE-6225.patch Here is a snippet from IndexInput's documentation: {code} The original instance must take care that cloned instances throw AlreadyClosedException when the original one is closed. {code} But concrete implementations don't throw this AlreadyClosedException (this would break the contract on Closeable). For example, see NIOFSDirectory: {code} public void close() throws IOException { if (!isClone) { channel.close(); } } {code} What trapped me was that the abstract class IndexInput overrides the default implementation of clone(), but doesn't do anything useful... I guess you could make it final and provide the tracking for cloned instances in this class rather than reimplementing it everywhere else (isCloned() would be a superclass method then too). Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6225) Clarify documentation of clone() in IndexInput
[ https://issues.apache.org/jira/browse/LUCENE-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-6225. - Resolution: Fixed Clarify documentation of clone() in IndexInput -- Key: LUCENE-6225 URL: https://issues.apache.org/jira/browse/LUCENE-6225 Project: Lucene - Core Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 5.3, Trunk Attachments: LUCENE-6225.patch Here is a snippet from IndexInput's documentation: {code} The original instance must take care that cloned instances throw AlreadyClosedException when the original one is closed. {code} But concrete implementations don't throw this AlreadyClosedException (this would break the contract on Closeable). For example, see NIOFSDirectory: {code} public void close() throws IOException { if (!isClone) { channel.close(); } } {code} What trapped me was that the abstract class IndexInput overrides the default implementation of clone(), but doesn't do anything useful... I guess you could make it final and provide the tracking for cloned instances in this class rather than reimplementing it everywhere else (isCloned() would be a superclass method then too). Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7803) Classloading deadlock in TrieField
[ https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633205#comment-14633205 ] ASF subversion and git services commented on SOLR-7803: --- Commit 1691893 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1691893 ] SOLR-7803: Prevent class loading deadlock in TrieDateField; refactor date formatting and parsing out of TrieDateField and move to static utility class DateFormatUtil Classloading deadlock in TrieField -- Key: SOLR-7803 URL: https://issues.apache.org/jira/browse/SOLR-7803 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: OSX, JDK8u45 Reporter: Markus Heiden Assignee: Uwe Schindler Labels: patch Fix For: 5.3, Trunk Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the clinit of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField. This does not completely remove the circular dependency, but at least it is no more in clinit. For the future someone may extract a util class to remove the circular dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Description: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? was: In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. Let's see what happens when we use {{mintf=15}}: Bug, no? MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Attachments: buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png,
[jira] [Commented] (SOLR-7803) Classloading deadlock in TrieField
[ https://issues.apache.org/jira/browse/SOLR-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633219#comment-14633219 ] ASF subversion and git services commented on SOLR-7803: --- Commit 1691898 from [~thetaphi] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1691898 ] Merged revision(s) 1691893 from lucene/dev/trunk: SOLR-7803: Prevent class loading deadlock in TrieDateField; refactor date formatting and parsing out of TrieDateField and move to static utility class DateFormatUtil (includes bw layer) Classloading deadlock in TrieField -- Key: SOLR-7803 URL: https://issues.apache.org/jira/browse/SOLR-7803 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: OSX, JDK8u45 Reporter: Markus Heiden Assignee: Uwe Schindler Labels: patch Fix For: 5.3, Trunk Attachments: SOLR-7803.patch, SOLR-7803.patch, TrieField.patch When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the clinit of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField. This does not completely remove the circular dependency, but at least it is no more in clinit. For the future someone may extract a util class to remove the circular dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Flags: Patch,Important Lucene Fields: New,Patch Available (was: New) MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Attachments: buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6687) MLT term frequency calculation bug
[ https://issues.apache.org/jira/browse/LUCENE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marko Bonaci updated LUCENE-6687: - Attachment: LUCENE-6687.patch MLT term frequency calculation bug -- Key: LUCENE-6687 URL: https://issues.apache.org/jira/browse/LUCENE-6687 Project: Lucene - Core Issue Type: Bug Components: core/query/scoring, core/queryparser Affects Versions: 5.2.1, Trunk Environment: OS X v10.10.4; Solr 5.2.1 Reporter: Marko Bonaci Attachments: LUCENE-6687.patch, buggy-method-usage.png, solr-mlt-tf-doubling-bug-results.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png, solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png, solr-mlt-tf-doubling-bug.png, terms-accumulator.png, terms-angry.png, terms-glass.png, terms-how.png In {{org.apache.lucene.queries.mlt.MoreLikeThis}}, there's a method {{retrieveTerms}} that receives a {{Map}} of fields, i.e. a document basically, but it doesn't have to be an existing doc. !solr-mlt-tf-doubling-bug.png|height=500! There are 2 for loops, one inside the other, which both loop through the same set of fields. That effectively doubles the term frequency for all the terms from fields that we provide in MLT QP {{qf}} parameter. It basically goes two times over the list of fields and accumulates the term frequencies from all fields into {{termFreqMap}}. The private method {{retrieveTerms}} is only called from one public method, the version of overloaded method {{like}} that receives a Map: so that private class member {{fieldNames}} is always derived from {{retrieveTerms}}'s argument {{fields}}. Uh, I don't understand what I wrote myself, but that basically means that, by the time {{retrieveTerms}} method gets called, its parameter fields and private member {{fieldNames}} always contain the same list of fields. Here's the proof: These are the final results of the calculation: !solr-mlt-tf-doubling-bug-results.png|height=700! And this is the actual {{thread_id:TID0009}} document, where those values were derived from (from fields {{title_mlt}} and {{pagetext_mlt}}): !terms-glass.png|height=100! !terms-angry.png|height=100! !terms-how.png|height=100! !terms-accumulator.png|height=100! Now, let's further test this hypothesis by seeing MLT QP in action from the AdminUI. Let's try to find docs that are More Like doc {{TID0009}}. Here's the interesting part, the query: {code} q={!mlt qf=pagetext_mlt,title_mlt mintf=14 mindf=2 minwl=3 maxwl=15}TID0009 {code} We just saw, in the last image above, that the term accumulator appears {{7}} times in {{TID0009}} doc, but the {{accumulator}}'s TF was calculated as {{14}}. By using {{mintf=14}}, we say that, when calculating similarity, we don't want to consider terms that appear less than 14 times (when terms from fields {{title_mlt}} and {{pagetext_mlt}} are merged together) in {{TID0009}}. I added the term accumulator in only one other document ({{TID0004}}), where it appears only once, in the field {{title_mlt}}. !solr-mlt-tf-doubling-bug-verify-accumulator-mintf14.png|height=500! Let's see what happens when we use {{mintf=15}}: !solr-mlt-tf-doubling-bug-verify-accumulator-mintf15.png|height=500! I should probably mention that multiple fields ({{qf}}) work because I applied the patch: [SOLR-7143|https://issues.apache.org/jira/browse/SOLR-7143]. Bug, no? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6688) Apply deletes by query using the Query API instead of the Filter API
Adrien Grand created LUCENE-6688: Summary: Apply deletes by query using the Query API instead of the Filter API Key: LUCENE-6688 URL: https://issues.apache.org/jira/browse/LUCENE-6688 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor BufferedUpdatesStream still uses QueryWrapperFilter to delete documents by query instead of the Weight/Scorer APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6674) J9 assertion / crash in tests
[ https://issues.apache.org/jira/browse/LUCENE-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633451#comment-14633451 ] Brijesh Nekkare commented on LUCENE-6674: - We would require the following diagnostics created during the assertion failure to root cause this issue : core.20150710.084504.3376.0001.dmp, javacore.20150710.084504.3376.0002.txt and Snap.20150710.084504.3376.0003.trc Thanks and Regards Brijesh Nekkare IBM JRE team J9 assertion / crash in tests - Key: LUCENE-6674 URL: https://issues.apache.org/jira/browse/LUCENE-6674 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir {quote} 06:45:04.031 0x2518500j9mm.107* ** ASSERTION FAILED ** at ParallelScavenger.cpp:3053: ((false (_extensions-objectModel.isRemembered(objectPtr {quote} http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/55153/consoleFull -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_60-ea-b21) - Build # 13532 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13532/ Java: 32bit/jdk1.8.0_60-ea-b21 -server -XX:+UseParallelGC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestSolrConfigHandlerCloud Error Message: ERROR: SolrIndexSearcher opens=281 closes=280 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=281 closes=280 at __randomizedtesting.SeedInfo.seed([58EABF133078A799]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:465) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:232) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestSolrConfigHandlerCloud Error Message: 2 threads leaked from SUITE scope at org.apache.solr.handler.TestSolrConfigHandlerCloud: 1) Thread[id=2350, name=qtp25410446-2350, state=RUNNABLE, group=TGRP-TestSolrConfigHandlerCloud] at java.util.WeakHashMap.get(WeakHashMap.java:403) at org.apache.solr.servlet.cache.HttpCacheHeaderUtil.calcEtag(HttpCacheHeaderUtil.java:101) at org.apache.solr.servlet.cache.HttpCacheHeaderUtil.doCacheHeaderValidation(HttpCacheHeaderUtil.java:219) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:445) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:106) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at
Re: [CI] Lucene 5x Linux 64 Test Only - Build # 56476 - Failure!
I committed a fix for it. The test expected a bounded number of segments but did nothing to ensure it. I had to run the test several times to manage to reproduce it because it depended on the number of segments in the index, which depended on whether a concurrent merge was finished or not at the time when the index reader was open. On Sun, Jul 19, 2015 at 1:51 AM, bu...@elastic.co wrote: *BUILD FAILURE* Build URL http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/ Project:lucene_linux_java8_64_test_only Randomization: JDK8,local,heap[740m],-server +UseG1GC +UseCompressedOops,sec manager on Date of build:Sun, 19 Jul 2015 01:44:10 +0200 Build duration:7 min 30 sec *CHANGES* No Changes *BUILD ARTIFACTS* checkout/lucene/build/facet/test/temp/junit4-J0-20150719_015122_803.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J0-20150719_015122_803.events checkout/lucene/build/facet/test/temp/junit4-J1-20150719_015122_804.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J1-20150719_015122_804.events checkout/lucene/build/facet/test/temp/junit4-J2-20150719_015122_808.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J2-20150719_015122_808.events checkout/lucene/build/facet/test/temp/junit4-J3-20150719_015122_809.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J3-20150719_015122_809.events checkout/lucene/build/facet/test/temp/junit4-J4-20150719_015122_809.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J4-20150719_015122_809.events checkout/lucene/build/facet/test/temp/junit4-J5-20150719_015122_809.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J5-20150719_015122_809.events checkout/lucene/build/facet/test/temp/junit4-J6-20150719_015122_809.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J6-20150719_015122_809.events checkout/lucene/build/facet/test/temp/junit4-J7-20150719_015122_809.events http://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56476/artifact/checkout/lucene/build/facet/test/temp/junit4-J7-20150719_015122_809.events *FAILED JUNIT TESTS* Name: org.apache.lucene.facet Failed: 1 test(s), Passed: 22 test(s), Skipped: 0 test(s), Total: 23 test(s) *Failed: org.apache.lucene.facet.TestRandomSamplingFacetsCollector.testRandomSampling * *CONSOLE OUTPUT* [...truncated 8655 lines...] [junit4] [junit4] [junit4] JVM J0: 0.88 .. 14.26 = 13.38s [junit4] JVM J1: 0.90 .. 13.71 = 12.81s [junit4] JVM J2: 0.87 .. 15.17 = 14.30s [junit4] JVM J3: 0.87 .. 9.98 = 9.11s [junit4] JVM J4: 1.11 .. 12.64 = 11.53s [junit4] JVM J5: 0.87 .. 9.98 = 9.11s [junit4] JVM J6: 1.11 .. 11.94 = 10.83s [junit4] JVM J7: 0.87 .. 11.66 = 10.80s [junit4] Execution time total: 15 seconds [junit4] Tests summary: 23 suites, 155 tests, 1 error BUILD FAILED /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/build.xml:469: The following error occurred while executing this line: /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/common-build.xml:2240: The following error occurred while executing this line: /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/module-build.xml:58: The following error occurred while executing this line: /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/common-build.xml:1444: The following error occurred while executing this line: /home/jenkins/workspace/lucene_linux_java8_64_test_only/checkout/lucene/common-build.xml:999: There were test failures: 23 suites, 155 tests, 1 error Total time: 7 minutes 10 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results [description-setter] Description set: JDK8,local,heap[740m],-server +UseG1GC +UseCompressedOops,sec manager on Email was triggered for: Failure - 1st Trigger Failure - Any was overridden by another trigger and will not send an email. Trigger Failure - Still was overridden by another trigger and will not send an email. Sending email for trigger: Failure - 1st - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Adrien
[jira] [Updated] (SOLR-7815) Remove LuceneQueryOptimizer
[ https://issues.apache.org/jira/browse/SOLR-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated SOLR-7815: --- Attachment: SOLR-7815.patch Here is a patch. Remove LuceneQueryOptimizer --- Key: SOLR-7815 URL: https://issues.apache.org/jira/browse/SOLR-7815 Project: Solr Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: SOLR-7815.patch I noticed that I introduced a bug in this class when refactoring BooleanQuery to be immutable (using the builder as a cache key instead of the query itself). But then I noticed that this class is actually never used, so let's remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7815) Remove LuceneQueryOptimizer
Adrien Grand created SOLR-7815: -- Summary: Remove LuceneQueryOptimizer Key: SOLR-7815 URL: https://issues.apache.org/jira/browse/SOLR-7815 Project: Solr Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor I noticed that I introduced a bug in this class when refactoring BooleanQuery to be immutable (using the builder as a cache key instead of the query itself). But then I noticed that this class is actually never used, so let's remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6688) Apply deletes by query using the Query API instead of the Filter API
[ https://issues.apache.org/jira/browse/LUCENE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-6688: - Attachment: LUCENE-6688.patch Here is a patch. Apply deletes by query using the Query API instead of the Filter API Key: LUCENE-6688 URL: https://issues.apache.org/jira/browse/LUCENE-6688 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6688.patch BufferedUpdatesStream still uses QueryWrapperFilter to delete documents by query instead of the Weight/Scorer APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7692) Implement BasicAuth based impl for the new Authentication/Authorization APIs
[ https://issues.apache.org/jira/browse/SOLR-7692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-7692: - Attachment: SOLR-7692.patch I plan to commit this pretty soon. All inputs/comments are welcome Implement BasicAuth based impl for the new Authentication/Authorization APIs Key: SOLR-7692 URL: https://issues.apache.org/jira/browse/SOLR-7692 Project: Solr Issue Type: New Feature Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch, SOLR-7692.patch This involves various components h2. Authentication A basic auth based authentication filter. This should retrieve the user credentials from ZK. The user name and sha1 hash of password should be stored in ZK sample authentication json {code:javascript} { authentication:{ class: solr.BasicAuthPlugin, users :{ john :09fljnklnoiuy98 buygujkjnlk, david:f678njfgfjnklno iuy9865ty, pete: 87ykjnklndfhjh8 98uyiy98, } } } {code} h2. authorization plugin This would store the roles of various users and their privileges in ZK sample authorization.json {code:javascript} { authorization: { class: solr.ZKAuthorization, roles :{ admin : [john] guest : [john, david,pete] } permissions: { collection-edit: { role: admin }, coreadmin:{ role:admin }, config-edit: { //all collections role: admin, method:POST }, schema-edit: { roles: admin, method:POST }, update: { //all collections role: dev }, mycoll_update: { collection: mycoll, path:[/update/*], role: [somebody] } } } } {code} We will also need to provide APIs to create users and assign them roles -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Fix incorrect link to Levenshtein distan...
GitHub user Xaerxess opened a pull request: https://github.com/apache/lucene-solr/pull/190 Fix incorrect link to Levenshtein distance This is a small fix in documentation, please let me know if Github's pull request is sufficient for merging into trunk. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xaerxess/lucene-solr patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/190.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #190 commit 09a46c967f3751b20f565b4b9ca54a6c2da6cbb5 Author: Grzegorz Rożniecki xaerx...@gmail.com Date: 2015-07-20T11:47:44Z Fix incorrect link to Levenshtein distance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0_60-ea-b21) - Build # 13533 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13533/ Java: 32bit/jdk1.8.0_60-ea-b21 -client -XX:+UseG1GC 1 tests failed. FAILED: org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest Error Message: There are still nodes recoverying - waited for 330 seconds Stack Trace: java.lang.AssertionError: There are still nodes recoverying - waited for 330 seconds at __randomizedtesting.SeedInfo.seed([13BE54822BF4B54F:B4FAEC26464FA6F6]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:172) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:133) at org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:128) at org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForRecoveriesToFinish(BaseCdcrDistributedZkTest.java:465) at org.apache.solr.cloud.BaseCdcrDistributedZkTest.clearSourceCollection(BaseCdcrDistributedZkTest.java:319) at org.apache.solr.cloud.CdcrReplicationHandlerTest.doTestPartialReplicationAfterPeerSync(CdcrReplicationHandlerTest.java:158) at org.apache.solr.cloud.CdcrReplicationHandlerTest.doTest(CdcrReplicationHandlerTest.java:53) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (SOLR-7810) mapreduce contrib script to set classpath for convenience refers to example rather than server.
[ https://issues.apache.org/jira/browse/SOLR-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633626#comment-14633626 ] ASF subversion and git services commented on SOLR-7810: --- Commit 1691947 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1691947 ] SOLR-7810: map-reduce contrib script to set classpath for convenience refers to example rather than server. mapreduce contrib script to set classpath for convenience refers to example rather than server. Key: SOLR-7810 URL: https://issues.apache.org/jira/browse/SOLR-7810 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7810) mapreduce contrib script to set classpath for convenience refers to example rather than server.
[ https://issues.apache.org/jira/browse/SOLR-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633622#comment-14633622 ] ASF subversion and git services commented on SOLR-7810: --- Commit 1691946 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1691946 ] SOLR-7810: map-reduce contrib script to set classpath for convenience refers to example rather than server. mapreduce contrib script to set classpath for convenience refers to example rather than server. Key: SOLR-7810 URL: https://issues.apache.org/jira/browse/SOLR-7810 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-7810) mapreduce contrib script to set classpath for convenience refers to example rather than server.
[ https://issues.apache.org/jira/browse/SOLR-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-7810. --- Resolution: Fixed Fix Version/s: Trunk 5.3 mapreduce contrib script to set classpath for convenience refers to example rather than server. Key: SOLR-7810 URL: https://issues.apache.org/jira/browse/SOLR-7810 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 5.3, Trunk -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b60) - Build # 13534 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13534/ Java: 32bit/jdk1.9.0-ea-b60 -client -XX:+UseG1GC -Djava.locale.providers=JRE,SPI 1 tests failed. FAILED: org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test Error Message: this writer hit an unrecoverable error; cannot commit Stack Trace: java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot commit at __randomizedtesting.SeedInfo.seed([58B144653E398529:D0E57BBF90C5E8D1]:0) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2777) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2963) at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1066) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1109) at org.apache.lucene.index.TestIndexWriterOutOfFileDescriptors.test(TestIndexWriterOutOfFileDescriptors.java:87) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.file.NoSuchFileException: a random IOException (_1c.nvd) at org.apache.lucene.store.MockDirectoryWrapper.maybeThrowIOExceptionOnOpen(MockDirectoryWrapper.java:458) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:635) at
[jira] [Commented] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components
[ https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633715#comment-14633715 ] Shawn Heisey commented on LUCENE-6689: -- The reason that phrase searches don't match after LUCENE-5111 is that the query analysis on my real fieldType is slightly different -- catenateWords, catenateNumbers, and preserveOriginal are all disabled on the query analysis. With those settings and the previously given input of aaa-bbb: ccc, aaa ends up at position 1 and bbb at position 2, which is not the same as the index analysis with the settings above. Odd analysis problem with WDF, appears to be triggered by preceding analysis components --- Key: LUCENE-6689 URL: https://issues.apache.org/jira/browse/LUCENE-6689 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.8 Reporter: Shawn Heisey This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6690) Speed up MultiTermsEnum.next()
[ https://issues.apache.org/jira/browse/LUCENE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-6690: - Attachment: OrdinalMapBuildBench.java Here is the benchmark I've been using. It's certainly not great but I don't think it's too bad either. :) Speed up MultiTermsEnum.next() -- Key: LUCENE-6690 URL: https://issues.apache.org/jira/browse/LUCENE-6690 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: OrdinalMapBuildBench.java OrdinalMap is very useful when computing top terms on a multi-index segment. However I've seen it being occasionally slow to build, which was either making facets (when the ordinals map is computed lazily) or reopen (when computed eagerly) slow. So out of curiosity, I tried to profile ordinal map building on a simple index: 10M random strings of length between 0 and 20 stored as a SORTED doc values field. The index has 19 segments. The bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons (UTF8SortedAsUnicodeComparator). MultiTermsEnum stores sub enums in two different places: - top: a simple array containing all enums on the current term - queue: a queue for enums that are not exhausted yet but beyond the current term. A non-exhausted enum is in exactly one of these data-structures. When moving to the next term, MultiTermsEnum advances all enums in {{top}}, then adds them to {{queue}} and finally, pops all enum that are on the same term back into {{top}}. We could save reorderings of the priority queue by not removing entries from the priority queue and then calling updateTop to advance enums which are on the current term. This is already what we do for disjunctions of doc IDs in DISIPriorityQueue. On the index described above and current trunk, building an OrdinalMap has to call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-5.x - Build # 908 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-5.x/908/ No tests ran. Build Log: [...truncated 10864 lines...] FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: Ping started on 1437416070507 hasn't completed at 1437416310507 hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: Ping started on 1437416070507 hasn't completed at 1437416310507 at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:742) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168) at com.sun.proxy.$Proxy59.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956) at hudson.Launcher$ProcStarter.join(Launcher.java:367) at hudson.tasks.Ant.perform(Ant.java:217) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756) at hudson.model.Build$BuildExecution.build(Build.java:198) at hudson.model.Build$BuildExecution.doRun(Build.java:159) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529) at hudson.model.Run.execute(Run.java:1706) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:232) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: Ping started on 1437416070507 hasn't completed at 1437416310507 at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:805) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:954) at hudson.remoting.Channel$2.handle(Channel.java:474) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:60) Caused by: hudson.remoting.Channel$OrderlyShutdown: java.util.concurrent.TimeoutException: Ping started on 1437416070507 hasn't completed at 1437416310507 ... 3 more Caused by: Command close created at at hudson.remoting.Command.init(Command.java:56) at hudson.remoting.Channel$CloseCommand.init(Channel.java:948) at hudson.remoting.Channel$CloseCommand.init(Channel.java:946) at hudson.remoting.Channel.close(Channel.java:1029) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110) at hudson.remoting.PingThread.ping(PingThread.java:120) at hudson.remoting.PingThread.run(PingThread.java:81) Caused by: java.util.concurrent.TimeoutException: Ping started on 1437416070507 hasn't completed at 1437416310507 ... 2 more - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6690) Speed up MultiTermsEnum.next()
[ https://issues.apache.org/jira/browse/LUCENE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633764#comment-14633764 ] Uwe Schindler commented on LUCENE-6690: --- Good idea! :-) Speed up MultiTermsEnum.next() -- Key: LUCENE-6690 URL: https://issues.apache.org/jira/browse/LUCENE-6690 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6690.patch, OrdinalMapBuildBench.java OrdinalMap is very useful when computing top terms on a multi-index segment. However I've seen it being occasionally slow to build, which was either making facets (when the ordinals map is computed lazily) or reopen (when computed eagerly) slow. So out of curiosity, I tried to profile ordinal map building on a simple index: 10M random strings of length between 0 and 20 stored as a SORTED doc values field. The index has 19 segments. The bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons (UTF8SortedAsUnicodeComparator). MultiTermsEnum stores sub enums in two different places: - top: a simple array containing all enums on the current term - queue: a queue for enums that are not exhausted yet but beyond the current term. A non-exhausted enum is in exactly one of these data-structures. When moving to the next term, MultiTermsEnum advances all enums in {{top}}, then adds them to {{queue}} and finally, pops all enum that are on the same term back into {{top}}. We could save reorderings of the priority queue by not removing entries from the priority queue and then calling updateTop to advance enums which are on the current term. This is already what we do for disjunctions of doc IDs in DISIPriorityQueue. On the index described above and current trunk, building an OrdinalMap has to call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components
[ https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated LUCENE-6689: - Description: This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer type=index tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer analyzer type=query tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then index analysis puts aaa at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. Whether luceneMatchversion is LUCENE_47 or LUCENE_4_9, query analysis puts aaa at term position 1 and bbb at term position 2. was: This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer type=index tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer analyzer type=query tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used,
[jira] [Commented] (SOLR-6234) Scoring modes for query time join
[ https://issues.apache.org/jira/browse/SOLR-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633843#comment-14633843 ] Timothy Potter commented on SOLR-6234: -- looks good [~mkhludnev] +1 to commit. Please be sure to add documentation for this new feature to the refguide. I'll add a separate unit test that uses this feature to verify SOLR-6357 once this is committed. Scoring modes for query time join -- Key: SOLR-6234 URL: https://issues.apache.org/jira/browse/SOLR-6234 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 5.3 Reporter: Mikhail Khludnev Assignee: Timothy Potter Labels: features, patch, test Fix For: 5.3 Attachments: SOLR-6234.patch, SOLR-6234.patch, SOLR-6234.patch, SOLR-6234.patch, otherHandler.patch it adds ability to call Lucene's JoinUtil by specifying local param, ie \{!join score=...} It supports: - {{score=none|avg|max|total}} local param (passed as ScoreMode to JoinUtil) - -supports {{b=100}} param to pass {{Query.setBoost()}}- postponed till SOLR-7814. - -{{multiVals=true|false}} is introduced- YAGNI, let me know otherwise. - there is a test coverage for cross core join case. - so far it joins string and multivalue string fields (Sorted, SortedSet, Binary), but not Numerics DVs. follow-up LUCENE-5868 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6690) Speed up MultiTermsEnum.next()
Adrien Grand created LUCENE-6690: Summary: Speed up MultiTermsEnum.next() Key: LUCENE-6690 URL: https://issues.apache.org/jira/browse/LUCENE-6690 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor OrdinalMap is very useful when computing top terms on a multi-index segment. However I've seen it being occasionally slow to build, which was either making facets (when the ordinals map is computed lazily) or reopen (when computed eagerly) slow. So out of curiosity, I tried to profile ordinal map building on a simple index: 10M random strings of length between 0 and 20 stored as a SORTED doc values field. The index has 19 segments. The bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons (UTF8SortedAsUnicodeComparator). MultiTermsEnum stores sub enums in two different places: - top: a simple array containing all enums on the current term - queue: a queue for enums that are not exhausted yet but beyond the current term. A non-exhausted enum is in exactly one of these data-structures. When moving to the next term, MultiTermsEnum advances all enums in {{top}}, then adds them to {{queue}} and finally, pops all enum that are on the same term back into {{top}}. We could save reorderings of the priority queue by not removing entries from the priority queue and then calling updateTop to advance enums which are on the current term. This is already what we do for disjunctions of doc IDs in DISIPriorityQueue. On the index described above and current trunk, building an OrdinalMap has to call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components
[ https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated LUCENE-6689: - Description: This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer type=index tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer analyzer type=query tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. was: This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. Odd analysis problem with WDF, appears to be triggered by preceding analysis components --- Key: LUCENE-6689 URL: https://issues.apache.org/jira/browse/LUCENE-6689 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.8 Reporter: Shawn Heisey
[jira] [Issue Comment Deleted] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components
[ https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated LUCENE-6689: - Comment: was deleted (was: The reason that phrase searches don't match after LUCENE-5111 is that the query analysis on my real fieldType is slightly different -- catenateWords, catenateNumbers, and preserveOriginal are all disabled on the query analysis. With those settings and the previously given input of aaa-bbb: ccc, aaa ends up at position 1 and bbb at position 2, which is not the same as the index analysis with the settings above.) Odd analysis problem with WDF, appears to be triggered by preceding analysis components --- Key: LUCENE-6689 URL: https://issues.apache.org/jira/browse/LUCENE-6689 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.8 Reporter: Shawn Heisey This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer type=index tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer analyzer type=query tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=0 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6690) Speed up MultiTermsEnum.next()
[ https://issues.apache.org/jira/browse/LUCENE-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-6690: - Attachment: LUCENE-6690.patch And here is the patch. Speed up MultiTermsEnum.next() -- Key: LUCENE-6690 URL: https://issues.apache.org/jira/browse/LUCENE-6690 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6690.patch, OrdinalMapBuildBench.java OrdinalMap is very useful when computing top terms on a multi-index segment. However I've seen it being occasionally slow to build, which was either making facets (when the ordinals map is computed lazily) or reopen (when computed eagerly) slow. So out of curiosity, I tried to profile ordinal map building on a simple index: 10M random strings of length between 0 and 20 stored as a SORTED doc values field. The index has 19 segments. The bottleneck was MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons (UTF8SortedAsUnicodeComparator). MultiTermsEnum stores sub enums in two different places: - top: a simple array containing all enums on the current term - queue: a queue for enums that are not exhausted yet but beyond the current term. A non-exhausted enum is in exactly one of these data-structures. When moving to the next term, MultiTermsEnum advances all enums in {{top}}, then adds them to {{queue}} and finally, pops all enum that are on the same term back into {{top}}. We could save reorderings of the priority queue by not removing entries from the priority queue and then calling updateTop to advance enums which are on the current term. This is already what we do for disjunctions of doc IDs in DISIPriorityQueue. On the index described above and current trunk, building an OrdinalMap has to call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, BytesRef.equals 16297638 times and runs in 1.4s (~26% faster). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7812) Need a playground to quickly test analyzer stacks
[ https://issues.apache.org/jira/browse/SOLR-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634092#comment-14634092 ] Hoss Man commented on SOLR-7812: this is already mostly possible with the ManagedSchema and Schema API -- there's just no slick UI around it. * create a collection for doing experiments in * iterate over... ** use the Schema API to (re)define a field type with the index/query analyzers you want to experiment with ** iterate over... *** use the Analysis handlers to sanity check that various inputs behave the way you think they should ** index some test documents ** iterate over... *** execute various queries to see what results you get and if you are happy ** delete all docs * delete the experiment collection Need a playground to quickly test analyzer stacks - Key: SOLR-7812 URL: https://issues.apache.org/jira/browse/SOLR-7812 Project: Solr Issue Type: Wish Components: Schema and Analysis Reporter: Alexandre Rafalovitch Priority: Minor Labels: analyzers, beginners, usability (from email by Robert Oschler) (Would be useful to have)... a convenient playground for testing index and query filters? I'm imagining a utility where you can select a set of index and query filters, and then enter a string as a test document and a query string and see what kind of scores come back during a matching attempt. This would be a big aid in crafting an indexing/query scheme to get the desired matching profile working. Otherwise the only technique I can think of is to iteratively modify the schema file and retest with the admin panel with each combination of filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6668) Optimize SortedSet/SortedNumeric storage for the few unique sets use-case
[ https://issues.apache.org/jira/browse/LUCENE-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633964#comment-14633964 ] Robert Muir commented on LUCENE-6668: - +1, nice to have TABLE applied to the other types here too! Optimize SortedSet/SortedNumeric storage for the few unique sets use-case - Key: LUCENE-6668 URL: https://issues.apache.org/jira/browse/LUCENE-6668 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-6668.patch, LUCENE-6668.patch Robert suggested this idea: if there are few unique sets of values, we could build a lookup table and then map each doc to an ord in this table, just like we already do for table compression for numerics. I think this is especially compelling given that SortedSet/SortedNumeric are our two only doc values types that use O(maxDoc) memory because of the offsets map. When this new strategy is used, memory usage could be bounded to a constant. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[CI] Lucene 5x Linux 64 Test Only - Build # 56712 - Failure!
BUILD FAILURE Build URLhttp://build-eu-00.elastic.co/job/lucene_linux_java8_64_test_only/56712/ Project:lucene_linux_java8_64_test_only Date of build:Mon, 20 Jul 2015 20:14:05 +0200 Build duration:1 hr 0 min CHANGES No Changes CONSOLE OUTPUT [...truncated 199 lines...] at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at ..remote call to ubuntu-14-64-8-metal(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356) at hudson.remoting.UserResponse.retrieve(UserRequest.java:221) at hudson.remoting.Channel.call(Channel.java:752) at hudson.FilePath.act(FilePath.java:978) at hudson.FilePath.act(FilePath.java:967) at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:89) at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:121) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:138) at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:74) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:761) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:721) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:670) at hudson.model.Run.execute(Run.java:1776) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:89) at hudson.model.Executor.run(Executor.java:240) [description-setter] Description set: $BUILD_DESC Email was triggered for: Failure - 1st Trigger Failure - Any was overridden by another trigger and will not send an email. Trigger Failure - Still was overridden by another trigger and will not send an email. Sending email for trigger: Failure - 1st - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.9.0-ea-b60) - Build # 13537 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/13537/ Java: 64bit/jdk1.9.0-ea-b60 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -Djava.locale.providers=JRE,SPI 1 tests failed. FAILED: org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests Error Message: Timeout waiting for CDCR replication to complete @source_collection:shard1 Stack Trace: java.lang.RuntimeException: Timeout waiting for CDCR replication to complete @source_collection:shard1 at __randomizedtesting.SeedInfo.seed([91D81DAC63188477:99B868806C16AC7C]:0) at org.apache.solr.cloud.BaseCdcrDistributedZkTest.waitForReplicationToComplete(BaseCdcrDistributedZkTest.java:732) at org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTestUpdateLogSynchronisation(CdcrReplicationDistributedZkTest.java:361) at org.apache.solr.cloud.CdcrReplicationDistributedZkTest.doTests(CdcrReplicationDistributedZkTest.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:872) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:886) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:845) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:792) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at
[jira] [Created] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components
Shawn Heisey created LUCENE-6689: Summary: Odd analysis problem with WDF, appears to be triggered by preceding analysis components Key: LUCENE-6689 URL: https://issues.apache.org/jira/browse/LUCENE-6689 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.8 Reporter: Shawn Heisey This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6689) Odd analysis problem with WDF, appears to be triggered by preceding analysis components
[ https://issues.apache.org/jira/browse/LUCENE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633706#comment-14633706 ] Shawn Heisey commented on LUCENE-6689: -- LUCENE-5111 seems to contain the commit that causes this behavior. Odd analysis problem with WDF, appears to be triggered by preceding analysis components --- Key: LUCENE-6689 URL: https://issues.apache.org/jira/browse/LUCENE-6689 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.8 Reporter: Shawn Heisey This problem shows up for me in Solr, but I believe the issue is down at the Lucene level, so I've opened the issue in the LUCENE project. We can move it if necessary. I've boiled the problem down to this minimum Solr fieldType: {noformat} fieldType name=testType class=solr.TextField sortMissingLast=true positionIncrementGap=100 analyzer tokenizer class=org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory rulefiles=Latn:Latin-break-only-on-whitespace.rbbi/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer /fieldType {noformat} On Solr 4.7, if this type is given the input aaa-bbb: ccc then aaa ends up at term position 1 and bbb at term position 2. This seems perfectly reasonable to me. In Solr 4.9, both terms end up at position 2. This causes phrase queries which used to work to return zero hits. The exact text of the phrase query is in the original documents that match on 4.7. If the custom rbbi (which is included unmodified from the lucene icu analysis source code) is not used, then the problem doesn't happen, because the punctuation doesn't make it to the PRF. If the PatternReplaceFilterFactory is not present, then the problem doesn't happen. I can work around the problem by setting luceneMatchVersion to 4.7, but I think the behavior is a bug, and I would rather not continue to use 4.7 analysis when I upgrade to 5.x, which I hope to do soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-MacOSX (64bit/jdk1.7.0) - Build # 2479 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-MacOSX/2479/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseG1GC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: ERROR: SolrIndexSearcher opens=51 closes=50 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=51 closes=50 at __randomizedtesting.SeedInfo.seed([71276CCB450E50CD]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:465) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:232) at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:799) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at java.lang.Thread.run(Thread.java:745) FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: 1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) Thread[id=9163, name=searcherExecutor-4396-thread-1, state=WAITING, group=TGRP-TestLazyCores] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.core.TestLazyCores: 1) Thread[id=9163, name=searcherExecutor-4396-thread-1, state=WAITING, group=TGRP-TestLazyCores] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) at __randomizedtesting.SeedInfo.seed([71276CCB450E50CD]:0) FAILED: junit.framework.TestSuite.org.apache.solr.core.TestLazyCores Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=9163,
[jira] [Commented] (SOLR-7760) Fix method and field visibility for UIMAUpdateRequestProcessor and SolrUIMAConfiguration
[ https://issues.apache.org/jira/browse/SOLR-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634408#comment-14634408 ] Hoss Man commented on SOLR-7760: I understand very little about UIMA, but can you please elaborate on what you mean by ...they need to be for other code to be able to make use of the configuration data, ie: mapped fields... (Ideally: include a testcase mock/sample custom plugin demonstrating how you would take advantage of these new methods) Fix method and field visibility for UIMAUpdateRequestProcessor and SolrUIMAConfiguration Key: SOLR-7760 URL: https://issues.apache.org/jira/browse/SOLR-7760 Project: Solr Issue Type: Improvement Components: contrib - UIMA Affects Versions: 5x Reporter: Aaron LaBella Priority: Critical Fix For: 5.3 Attachments: SOLR-7760.patch The methods in {{solr/contrib/uima/src/java/org/apache/solr/uima/processor/SolrUIMAConfiguration.java}} are not public and they need to be for other code to be able to make use of the configuration data, ie: mapped fields. Likewise, {{solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java}} does not have an accessor for the SolrUIMAConfiguration object -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7812) Need a playground to quickly test analyzer stacks
[ https://issues.apache.org/jira/browse/SOLR-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634224#comment-14634224 ] Alexandre Rafalovitch commented on SOLR-7812: - Well, that was possible with static schema too, really. Just rewrite the file, reload the core. The issue is making user-friendly UI. Which means: *) Having a list of all possible analytizers *) Having all their various options described/self-described *) Running the same query through several stacks at once Otherwise, it is not a playground but a slog. Hence a question of whether it is worth the effort to do that. Need a playground to quickly test analyzer stacks - Key: SOLR-7812 URL: https://issues.apache.org/jira/browse/SOLR-7812 Project: Solr Issue Type: Wish Components: Schema and Analysis Reporter: Alexandre Rafalovitch Priority: Minor Labels: analyzers, beginners, usability (from email by Robert Oschler) (Would be useful to have)... a convenient playground for testing index and query filters? I'm imagining a utility where you can select a set of index and query filters, and then enter a string as a test document and a query string and see what kind of scores come back during a matching attempt. This would be a big aid in crafting an indexing/query scheme to get the desired matching profile working. Otherwise the only technique I can think of is to iteratively modify the schema file and retest with the admin panel with each combination of filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7804) TestCloudPivotFacet failures: num pivots expected:X but was:Y
[ https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-7804: - Summary: TestCloudPivotFacet failures: num pivots expected:X but was:Y (was: TestCloudPivotFacet failures: num pivots expected:X but was:X+/-1) TestCloudPivotFacet failures: num pivots expected:X but was:Y - Key: SOLR-7804 URL: https://issues.apache.org/jira/browse/SOLR-7804 Project: Solr Issue Type: Bug Components: faceting Affects Versions: 5.3, Trunk Reporter: Steve Rowe A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk - here's one on trunk: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/]. I reproduced another branch_5x failure from a few days ago (Jenkins job has been removed already) on OS X, using both java7 and java8: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=test -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA -Dtests.timezone=America/El_Salvador -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 46.6s | TestCloudPivotFacet.test [junit4] Throwable #1: java.lang.AssertionError: {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)} num pivots expected:2 but was:1 [junit4] at __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228) [junit4] at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) [junit4] at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7804) TestCloudPivotFacet failures: num pivots expected:X but was:Y
[ https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634228#comment-14634228 ] Steve Rowe commented on SOLR-7804: -- Another trunk failure on Linux: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/901/] - reproduces for me on OS X, both on trunk and on branch_5x, the latter with both Java7 and Java8: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=test -Dtests.seed=957BC6861F510BE -Dtests.slow=true -Dtests.locale=sr_BA -Dtests.timezone=America/Guadeloupe -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 36.2s J3 | TestCloudPivotFacet.test [junit4] Throwable #1: java.lang.AssertionError: {main(facet=truefacet.pivot=pivot_b%2Cpivot_f%2Cpivot_dt1facet.pivot=%7B%21stats%3Dst3%7Dpivot_td%2Cpivot_z_s1facet.limit=5facet.pivot.mincount=16facet.missing=truefacet.sort=indexfacet.overrequest.ratio=1.1832508),extra(rows=0q=*%3A*stats=truestats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tlstats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_tdt1stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_y_s_test_min=16_test_miss=true_test_sort=index)} == pivot_b,pivot_f,pivot_dt1: {params(rows=0),defaults({main(rows=0q=*%3A*stats=truestats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tlstats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_tdt1stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_y_s_test_min=16_test_miss=true_test_sort=index),extra(fq=-pivot_b%3A%5B*+TO+*%5D)})} expected:17 but was:22 [junit4]at __randomizedtesting.SeedInfo.seed([957BC6861F510BE:810383B2CF097D46]:0) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:281) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228) [junit4]at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963) [junit4]at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) [junit4]at java.lang.Thread.run(Thread.java:745) [junit4] Caused by: java.lang.AssertionError: pivot_b,pivot_f,pivot_dt1: {params(rows=0),defaults({main(rows=0q=*%3A*stats=truestats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tlstats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_tdt1stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_y_s_test_min=16_test_miss=true_test_sort=index),extra(fq=-pivot_b%3A%5B*+TO+*%5D)})} expected:17 but was:22 [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:680) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotData(TestCloudPivotFacet.java:335) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:302) [junit4]at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:271) [junit4]... 42 more {noformat} TestCloudPivotFacet failures: num pivots expected:X but was:Y - Key: SOLR-7804 URL: https://issues.apache.org/jira/browse/SOLR-7804 Project: Solr Issue Type: Bug Components: faceting Affects Versions: 5.3, Trunk Reporter: Steve Rowe A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk - here's one on trunk: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/]. I reproduced another branch_5x failure from a few days ago (Jenkins job has been removed already) on OS X, using both java7 and java8: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=test -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA -Dtests.timezone=America/El_Salvador -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 46.6s | TestCloudPivotFacet.test [junit4] Throwable #1: java.lang.AssertionError: {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)} num pivots expected:2 but was:1 [junit4] at __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251)
[jira] [Updated] (SOLR-7804) TestCloudPivotFacet failures: num pivots expected:X but was:Y, also
[ https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-7804: - Summary: TestCloudPivotFacet failures: num pivots expected:X but was:Y, also (was: TestCloudPivotFacet failures: num pivots expected:X but was:Y) TestCloudPivotFacet failures: num pivots expected:X but was:Y, also Key: SOLR-7804 URL: https://issues.apache.org/jira/browse/SOLR-7804 Project: Solr Issue Type: Bug Components: faceting Affects Versions: 5.3, Trunk Reporter: Steve Rowe A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk - here's one on trunk: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/]. I reproduced another branch_5x failure from a few days ago (Jenkins job has been removed already) on OS X, using both java7 and java8: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=test -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA -Dtests.timezone=America/El_Salvador -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 46.6s | TestCloudPivotFacet.test [junit4] Throwable #1: java.lang.AssertionError: {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)} num pivots expected:2 but was:1 [junit4] at __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228) [junit4] at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) [junit4] at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7765) TokenizerChain without char filters cause NPE in luke request handler
[ https://issues.apache.org/jira/browse/SOLR-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-7765: --- Attachment: SOLR-7765.patch bq. I'm going to do a quick audit of all TokenizerChain clients to see where else null checks are currently be doing that can be optimized away with this fix and post an updated patch. attached. TokenizerChain without char filters cause NPE in luke request handler - Key: SOLR-7765 URL: https://issues.apache.org/jira/browse/SOLR-7765 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Reporter: Konstantin Gribov Assignee: Hoss Man Priority: Minor Attachments: SOLR-7765.patch, SOLR-7765.patch, SOLR-7765.patch {{TokenizerChain}} created using 2-arg constructor has {{null}} in {{charFilters}}, so {{LukeRequestHandler}} throws NPE on iterating it. Will create PR in a couple of minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: timeAllowed parameter ignored edge-case bug?
: In the scenario outlined below, the second run's timeAllowed parameter : is unexpectedly ignored. Could this be intentionally so somehow (q vs. : fq processing?, Collector vs. LeafCollector?, DocList vs. DocSet?), or : is it an edge-case bug? Based on your description (didn't re-review the code directly) it sounds like an oversight with timeAllowed -- probably overlooked because of the oddity of having a queryResultCache but not filterCache (correct me if i'm wrong, but it sounds like this bug won't surface if both queryResultsCache filterCache are enabled -- or both disabled -- correct?) ... probably doesn't affect (m)any real users because of this. Sounds like we should split out the build part of buildAndRunCollectorChain into it's own method and re-use it in getDocSet (although it seems like that will almost certainly require some API changes to propogate the QueryCommand context down) Christine: can you file this as a Jira so we don't lose track of it? : : Regards, : : Christine : : --- : : solrconfig characteristics: : * a queryResultsCache is configured : * no filterCache is configured : : query characteristics: : * q parameter present : * at least one fq parameter present : * sort parameter present (and does not require the score field) : * GET_DOCSET flag is set e.g. via the StatsComponent i.e. stats=true parameter : : runtime characteristics: : * first run of the query gets a queryResultsCache-miss and respects timeAllowed : * second run gets a queryResultsCache-hit and ignores timeAllowed (but still :makes use of the lucene IndexSearcher) : : code path execution details (first run): : * SolrIndexSearcher.search calls getDocListC : * getDocListC called queryResultCache.get which found nothing : * getDocListC calls getDocListAndSetNC : * getDocListAndSetNC calls buildAndRunCollectorChain : * buildAndRunCollectorChain constructs TimeLimitingCollector : : code path execution details (second run): : * SolrIndexSearcher.search calls getDocListC : * getDocListC called queryResultCache.get which found something : * getDocListC calls getDocSet(ListQuery queries) : * getDocSet(ListQuery queries) iterates over IndexSearcher.leafContexts : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7815) Remove LuceneQueryOptimizer
[ https://issues.apache.org/jira/browse/SOLR-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634183#comment-14634183 ] Hoss Man commented on SOLR-7815: Linking to SOLR-1052 and SOLR-3093 for context. In particular note that r922957 (March 2010) is where the the code that used the optimizer was last removed and after that SOLR-1052 dealt with the cleanup to remove the config parsing to enable the optimizer. bq. Here is a patch. +1 Remove LuceneQueryOptimizer --- Key: SOLR-7815 URL: https://issues.apache.org/jira/browse/SOLR-7815 Project: Solr Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: SOLR-7815.patch I noticed that I introduced a bug in this class when refactoring BooleanQuery to be immutable (using the builder as a cache key instead of the query itself). But then I noticed that this class is actually never used, so let's remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7765) TokenizerChain without char filters cause NPE in luke request handler
[ https://issues.apache.org/jira/browse/SOLR-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-7765: --- Attachment: SOLR-7765.patch bq. I'll add a test to my PR. Thanks! I missunderstood what you ment before, but with the testcase you provided it all makes sense. In my opinion, the root bug here is that TokenizerChain should be more explicit about what is allowed in it's construtor, and more resilient to null args when things are optional -- that way callers like LukeAdminHandler don't have to constantly do null checks. The attached path fixes what i consider the root of the bug and gets your test to pass w/o modifying LukeAdminHandler. It also adds more randomization to your test to cover more permutations of options, and updates MultiTermTest to account for the improved behavior of getCharFilterFactories() (which you can see from looking at that test was annoying inconsistent before depending on what analyzer was used and where it came from) I'm going to do a quick audit of all TokenizerChain clients to see where else null checks are currently be doing that can be optimized away with this fix and post an updated patch. TokenizerChain without char filters cause NPE in luke request handler - Key: SOLR-7765 URL: https://issues.apache.org/jira/browse/SOLR-7765 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Reporter: Konstantin Gribov Assignee: Hoss Man Priority: Minor Attachments: SOLR-7765.patch, SOLR-7765.patch {{TokenizerChain}} created using 2-arg constructor has {{null}} in {{charFilters}}, so {{LukeRequestHandler}} throws NPE on iterating it. Will create PR in a couple of minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7804) TestCloudPivotFacet failures: expected:X but was:Y
[ https://issues.apache.org/jira/browse/SOLR-7804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-7804: - Summary: TestCloudPivotFacet failures: expected:X but was:Y (was: TestCloudPivotFacet failures: num pivots expected:X but was:Y, also ) TestCloudPivotFacet failures: expected:X but was:Y -- Key: SOLR-7804 URL: https://issues.apache.org/jira/browse/SOLR-7804 Project: Solr Issue Type: Bug Components: faceting Affects Versions: 5.3, Trunk Reporter: Steve Rowe A couple failures recently on my Jenkins (Linux), both on branch_5x and trunk - here's one on trunk: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/766/], and another on branch_5x: [http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/546/]. I reproduced another branch_5x failure from a few days ago (Jenkins job has been removed already) on OS X, using both java7 and java8: {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet -Dtests.method=test -Dtests.seed=D8E5204E25B3DB16 -Dtests.slow=true -Dtests.locale=es_PA -Dtests.timezone=America/El_Salvador -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 46.6s | TestCloudPivotFacet.test [junit4] Throwable #1: java.lang.AssertionError: {main(facet=truefacet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.pivot=%7B%21stats%3Dst0%7Dpivot_ti1facet.limit=4facet.offset=6facet.missing=truefacet.overrequest.ratio=-0.9744149),extra(rows=0q=id%3A%5B*+TO+448%5Dfq=id%3A%5B*+TO+516%5D_test_miss=true)} num pivots expected:2 but was:1 [junit4] at __randomizedtesting.SeedInfo.seed([D8E5204E25B3DB16:50B11F948B4FB6EE]:0) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:251) [junit4] at org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228) [junit4] at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960) [junit4] at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935) [junit4] at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Windows (64bit/jdk1.7.0_80) - Build # 4924 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Windows/4924/ Java: 64bit/jdk1.7.0_80 -XX:-UseCompressedOops -XX:+UseG1GC 2 tests failed. FAILED: org.apache.solr.search.TestSolr4Spatial2.testBBox Error Message: PermGen space Stack Trace: java.lang.OutOfMemoryError: PermGen space at __randomizedtesting.SeedInfo.seed([6E0F96AE68F74794:170EDCADCF916BE9]:0) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.solr.schema.BBoxField.getValueSourceFromSpatialArgs(BBoxField.java:183) at org.apache.solr.schema.BBoxField.getValueSourceFromSpatialArgs(BBoxField.java:36) at org.apache.solr.schema.AbstractSpatialFieldType.getQueryFromSpatialArgs(AbstractSpatialFieldType.java:338) at org.apache.solr.schema.AbstractSpatialFieldType.getFieldQuery(AbstractSpatialFieldType.java:312) at org.apache.solr.search.FieldQParserPlugin$1.parse(FieldQParserPlugin.java:50) at org.apache.solr.search.QParser.getQuery(QParser.java:141) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:157) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:258) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.util.TestHarness.query(TestHarness.java:320) at org.apache.solr.util.TestHarness.query(TestHarness.java:302) at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:829) at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:798) at org.apache.solr.search.TestSolr4Spatial2.testBBox(TestSolr4Spatial2.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1627) FAILED: junit.framework.TestSuite.org.apache.solr.update.AddBlockUpdateTest Error Message: PermGen space Stack Trace: java.lang.OutOfMemoryError: PermGen space at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.instantiate(SlaveMain.java:228) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:188) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:310) at com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:12) Build Log: [...truncated 11469 lines...] [junit4] Suite: org.apache.solr.search.TestSolr4Spatial2 [junit4] 2 Creating dataDir: C:\Users\JenkinsSlave\workspace\Lucene-Solr-5.x-Windows\solr\build\solr-core\test\J0\temp\solr.search.TestSolr4Spatial2_6E0F96AE68F74794-001\init-core-data-001 [junit4] 2 3097174 INFO (SUITE-TestSolr4Spatial2-seed#[6E0F96AE68F74794]-worker) [] o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (false) [junit4] 2 3097175 INFO (SUITE-TestSolr4Spatial2-seed#[6E0F96AE68F74794]-worker) [] o.a.s.SolrTestCaseJ4
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634464#comment-14634464 ] Noble Paul commented on SOLR-445: - I guess it would be better if we return the whole command instead of just the id to the user Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Anshum Gupta Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Failing CDCR tests
We're looking in to these, if we don't have something relatively soon I'll disable them until we do. I suspect these are an artifact of the test framework but don't know for sure just yet. Please bear with us re: the noise for another day or two. If we don't have something by then I'll disable the tests until we do. Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards
[ https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6685: --- Attachment: LUCENE-6685.patch I put together a visualization of the ranges that were being created (will add the link to the video when I post it). This revealed some interesting issues. At precision_step 6 and detailLevel 16 the number of ranges for the worst case boundary condition were nearly 2 million. 100 iteration beast tests would take just over an hour. Reducing that precisionStep to 3 and the detailLevel to 12 reduced the number of ranges to just over 10K. The 100 iteration beast test was reduced from over an hour to just over 8 minutes. There was also a bug in the pointDistance query that added unnecessary high resolution ranges that fell within the bounding box but outside the actual pointRadius. GeoPointInBBox/Distance queries should have safeguards -- Key: LUCENE-6685 URL: https://issues.apache.org/jira/browse/LUCENE-6685 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.3, Trunk Attachments: LUCENE-6685.patch These queries build a big list of term ranges, where the size of the list is in proportion to how many cells of the space filling curve are crossed by the perimeter of the query (I think?). This can easily be 100s of MBs for a big enough query ... not to mention slow to enumerate (we still do this again for each segment). I think the queries should have safeguards, much like we have maxDeterminizedStates for Automaton based queries, to prevent accidental OOMEs. But I think longer term we should either change the ranges to be enumerated on-demand and never stored in entirety (like NumericRangeTermsEnum), or change the query so it has a fixed budget of how many cells it's allowed to visit and then within a crossing cell it uses doc values to post-filter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards
[ https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634504#comment-14634504 ] Nicholas Knize edited comment on LUCENE-6685 at 7/21/15 4:22 AM: - I put together a visualization of the ranges that were being created (will add the link to the video when I post it). This revealed some interesting issues. At precision_step 6 and detailLevel 16 the number of ranges for the worst case boundary condition were nearly 2 million. 100 iteration beast tests would take just over an hour. Reducing that precisionStep to 3 and the detailLevel to 12 reduced the number of ranges to just over 10K. The 100 iteration beast test was reduced from over an hour to just over 8 minutes. There was also a bug in the pointDistance query that added unnecessary high resolution ranges that fell within the bounding box but outside the actual pointRadius. Patch included was (Author: nknize): I put together a visualization of the ranges that were being created (will add the link to the video when I post it). This revealed some interesting issues. At precision_step 6 and detailLevel 16 the number of ranges for the worst case boundary condition were nearly 2 million. 100 iteration beast tests would take just over an hour. Reducing that precisionStep to 3 and the detailLevel to 12 reduced the number of ranges to just over 10K. The 100 iteration beast test was reduced from over an hour to just over 8 minutes. There was also a bug in the pointDistance query that added unnecessary high resolution ranges that fell within the bounding box but outside the actual pointRadius. GeoPointInBBox/Distance queries should have safeguards -- Key: LUCENE-6685 URL: https://issues.apache.org/jira/browse/LUCENE-6685 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.3, Trunk Attachments: LUCENE-6685.patch These queries build a big list of term ranges, where the size of the list is in proportion to how many cells of the space filling curve are crossed by the perimeter of the query (I think?). This can easily be 100s of MBs for a big enough query ... not to mention slow to enumerate (we still do this again for each segment). I think the queries should have safeguards, much like we have maxDeterminizedStates for Automaton based queries, to prevent accidental OOMEs. But I think longer term we should either change the ranges to be enumerated on-demand and never stored in entirety (like NumericRangeTermsEnum), or change the query so it has a fixed budget of how many cells it's allowed to visit and then within a crossing cell it uses doc values to post-filter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards
[ https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Knize updated LUCENE-6685: --- Attachment: LUCENE-6685.patch GeoPointInBBox/Distance queries should have safeguards -- Key: LUCENE-6685 URL: https://issues.apache.org/jira/browse/LUCENE-6685 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 5.3, Trunk Attachments: LUCENE-6685.patch, LUCENE-6685.patch These queries build a big list of term ranges, where the size of the list is in proportion to how many cells of the space filling curve are crossed by the perimeter of the query (I think?). This can easily be 100s of MBs for a big enough query ... not to mention slow to enumerate (we still do this again for each segment). I think the queries should have safeguards, much like we have maxDeterminizedStates for Automaton based queries, to prevent accidental OOMEs. But I think longer term we should either change the ranges to be enumerated on-demand and never stored in entirety (like NumericRangeTermsEnum), or change the query so it has a fixed budget of how many cells it's allowed to visit and then within a crossing cell it uses doc values to post-filter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org