[jira] [Updated] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5532: Attachment: LUCENE-5532.patch same patch, just with some reordering of things in RunAutomaton.equals for faster speed. AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5513: --- Attachment: LUCENE-5513.patch Patch makes the following refactoring changes (all internal API): * DocValuesUpdate abstract class w/ common implementation for NumericDocValuesUpdate and BinaryDocValuesUpdate. * DocValuesFieldUpdates hold the doc+updates for a single field. It mostly defines the API for the Numeric* and Binary* implementations. * DocValuesFieldUpdates.Container holds numeric+binary updates for a set of fields. It is as its name says -- a container of updates used by ReaderAndUpdates. ** It helps not bloat the API w/ more maps being passed as well as simplified BufferedUpdatesStream and IndexWriter.commitMergedDeletes. ** It also serves as a factory method based on the updates Type * Finished TestBinaryDVUpdates * Added TestMixedDVUpdates which ports some of the 'big' tests from both TestNDV/BDVUpdates and mixes some NDV and BDV updates. ** I'll beast it some to make sure all edge cases are covered. I may take a crack at simplifying IW.commitMergedDeletes even more by pulling a lot of duplicate code into a method. This is impossible now because those sections modify more than one state variables, but I'll try to stuff these variables in a container to make this method more sane to read. Otherwise, I think it's ready. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60-ea-b07) - Build # 9716 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9716/ Java: 32bit/jdk1.7.0_60-ea-b07 -server -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:34990 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:34990 within 45000 ms at __randomizedtesting.SeedInfo.seed([8CEE065EE8AE1FEE:D0888469FF17FD2]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937526#comment-13937526 ] ASF subversion and git services commented on LUCENE-5515: - Commit 1578262 from [~martijn.v.groningen] in branch 'dev/trunk' [ https://svn.apache.org/r1578262 ] LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc array with length of at most equal to the specified size instead of length equal to at most from + size as was before. Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5476) Facet sampling
[ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Audenaerde updated LUCENE-5476: --- Attachment: LUCENE-5476.patch New patch. I'm still not really sure about the scorings, but please take a look at it. Facet sampling -- Key: LUCENE-5476 URL: https://issues.apache.org/jira/browse/LUCENE-5476 Project: Lucene - Core Issue Type: Improvement Reporter: Rob Audenaerde Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java With LUCENE-5339 facet sampling disappeared. When trying to display facet counts on large datasets (10M documents) counting facets is rather expensive, as all the hits are collected and processed. Sampling greatly reduced this and thus provided a nice speedup. Could it be brought back? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results
Rob Audenaerde created LUCENE-5533: -- Summary: TaxonomyFacetSumIntAssociations overflows, unpredicted results Key: LUCENE-5533 URL: https://issues.apache.org/jira/browse/LUCENE-5533 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: 4.7 Reporter: Rob Audenaerde {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses a {{int[]}} to store values. If you sum a lot of integers in the IntAssociatoins, the {{int}} will overflow. The easiest fix seems to change the {{value[]}} to {{long}}? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937530#comment-13937530 ] ASF subversion and git services commented on LUCENE-5515: - Commit 1578267 from [~martijn.v.groningen] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578267 ] Merged revision 1578262 from trunk: LUCENE-5515: Improved TopDocs#merge to create a merged ScoreDoc array with length of at most equal to the specified size instead of length equal to at most from + size as was before. Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen resolved LUCENE-5515. --- Resolution: Fixed Committed to trunk and 4x branch. Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5871) Ability to see the list of fields that matched the query with scores
Alexander S. created SOLR-5871: -- Summary: Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to show users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5871: --- Description: Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. was: Hello, I need the ability to show users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5849) write.lock is not removed by LogReplayer
[ https://issues.apache.org/jira/browse/SOLR-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pavan patel closed SOLR-5849. - Resolution: Invalid Marking this bug invalid as this is not a solr issue. In my application we had two cores and we were initializing the SolrCoreContainer twice in our factory class. Because of that the write.lock issue was coming. write.lock is not removed by LogReplayer Key: SOLR-5849 URL: https://issues.apache.org/jira/browse/SOLR-5849 Project: Solr Issue Type: Bug Environment: Windows 7, Tomcat 7.0.52, Solr 4.3.0, jdk1.7.0_51 Reporter: pavan patel In my application I am using SolrEmbeddedServer inside tomcat. I have below configuration for my core:- lockTypesimple/lockType unlockOnStartuptrue/unlockOnStartup updateLog str name=dir${solr.ulog.dir:}/str /updateLog autoCommit maxTime15000/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit The issue I am facing is when I restart tocmat and in case there is any uncommitted data in tlog, then I am getting below exception:- org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@F:\mydir\Install\solr\conf\alerts\data\index\write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:644) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:197) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:148) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:504) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:640) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) After restart I am not able to index anything into the solr. I debug the code and found out that LogReplayer during start up creates the SolrIndexWriter on core and that creates the write.lock file. Once all the leftover tlog's are indexed, the write.lock remains there, its not getting deleted. So when my application tries to add document the SolrIndexWriter is not able to create the lock because write.lock already exists. This seems to be a bug in Solr 4.3.0, because I believe SolrIndexWriter created during LogReplayer is not closed, and that is causing the write.lock leftover in data directory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5476) Facet sampling
[ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937580#comment-13937580 ] Gilad Barkai commented on LUCENE-5476: -- About the scores (the only part I got to review thus far), the scores should be a non-sparse float array. E,g, if there are 1M documents and the original set contains 1000 documents the score[] array would be of length 1000, If the sampled set will only have 10 documents, the score[] array should be only 10. The relevant part: {code} if (getKeepScores()) { scores[doc] = docs.scores[doc]; } {code} should be changed as the scores[] size and index should be relative to the sampled set and not the original results. Also the size of the score[] array could be the amount of bins? Facet sampling -- Key: LUCENE-5476 URL: https://issues.apache.org/jira/browse/LUCENE-5476 Project: Lucene - Core Issue Type: Improvement Reporter: Rob Audenaerde Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java With LUCENE-5339 facet sampling disappeared. When trying to display facet counts on large datasets (10M documents) counting facets is rather expensive, as all the hits are collected and processed. Sampling greatly reduced this and thus provided a nice speedup. Could it be brought back? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937585#comment-13937585 ] Simon Willnauer commented on LUCENE-5532: - +1 to the patch - I agree with the change to go away from acceptsSameLanguage! speedups are also good though. Let's make sure we put this on the change runtime behavior section in CHANGES.TXT AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937594#comment-13937594 ] Uwe Schindler commented on LUCENE-5532: --- +1 to the patch. I am not sure about this, I did not get the latest test-framework updates: It looks to me that the thread does not use a custom name or is the thread group inherited by the test framework? I would also change the thread to simply {{Assert.fail()}} on Exception in the thread. AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937611#comment-13937611 ] Robert Muir commented on LUCENE-5532: - Uwe, it just inherits. Its similar to many tests in the .index package that work this way. If we want to do something else, we should ban some methods. But I can name the threads if you want :) As far as Assert.fail, this would lose the stacktrace of the original exception? In the case of this test failing due to a thread safety issue, I think thats useful for debugging AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937616#comment-13937616 ] Tim Allison commented on LUCENE-5205: - [~rcmuir] and community, given recent interest in LUCENE-2878, should we stop work on this? [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.8 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937617#comment-13937617 ] Robert Muir commented on LUCENE-5205: - Tim I don't think so. I think actually it makes sense to have real current use cases for spans to ensure everything is really done correctly. This is just my opinion. I've fallen behind on the issue only because I've been busy lately. [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser --- Key: LUCENE-5205 URL: https://issues.apache.org/jira/browse/LUCENE-5205 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Reporter: Tim Allison Labels: patch Fix For: 4.8 Attachments: LUCENE-5205-cleanup-tests.patch, LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_improve_stop_word_handling.patch, LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt This parser extends QueryParserBase and includes functionality from: * Classic QueryParser: most of its syntax * SurroundQueryParser: recursive parsing for near and not clauses. * ComplexPhraseQueryParser: can handle near queries that include multiterms (wildcard, fuzzy, regex, prefix), * AnalyzingQueryParser: has an option to analyze multiterms. At a high level, there's a first pass BooleanQuery/field parser and then a span query parser handles all terminal nodes and phrases. Same as classic syntax: * term: test * fuzzy: roam~0.8, roam~2 * wildcard: te?t, test*, t*st * regex: /\[mb\]oat/ * phrase: jakarta apache * phrase with slop: jakarta apache~3 * default or clause: jakarta apache * grouping or clause: (jakarta apache) * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta * multiple fields: title:lucene author:hatcher Main additions in SpanQueryParser syntax vs. classic syntax: * Can require in order for phrases with slop with the \~ operator: jakarta apache\~3 * Can specify not near: fever bieber!\~3,10 :: find fever but not if bieber appears within 3 words before or 10 words after it. * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~4 :: find jakarta within 3 words of apache, and that hit has to be within four words before lucene * Can also use \[\] for single level phrasal queries instead of as in: \[jakarta apache\] * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 :: find apache and then either lucene or solr within three words. * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two words of ap*che and that hit has to be within ten words of something like solr or that lucene regex. * Can require at least x number of hits at boolean level: apache AND (lucene solr tika)~2 * Can use negative only query: -jakarta :: Find all docs that don't contain jakarta * Can use an edit distance 2 for fuzzy query via SlowFuzzyQuery (beware of potential performance issues!). Trivial additions: * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, prefix =2) * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein) This parser can be very useful for concordance tasks (see also LUCENE-5317 and LUCENE-5318) and for analytical search. Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery. Most of the documentation is in the javadoc for SpanQueryParser. Any and all feedback is welcome. Thank you. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937619#comment-13937619 ] Uwe Schindler commented on LUCENE-5532: --- bq. Uwe, it just inherits. Its similar to many tests in the .index package that work this way. If we want to do something else, we should ban some methods. But I can name the threads if you want All fine, just wanted to be sure. And we dont have Thread#init() on the forbidden list :-) bq. As far as Assert.fail, this would lose the stacktrace of the original exception? In the case of this test failing due to a thread safety issue, I think thats useful for debugging Yes it will loose. I dont like RuntimeExceptions wrapping others. You can ideally do Rethrow.rethrow(ex). In tests this is I think the preferred way. You will get the original Exception in the thread stack dump, unwrapped. This is why we have Rethrow class in test framework. AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937623#comment-13937623 ] Robert Muir commented on LUCENE-5532: - Rethrow is good: I'll use that! AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4878) Change default directory for infostream from CWD to dataDir
[ https://issues.apache.org/jira/browse/SOLR-4878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-4878: --- Fix Version/s: (was: 4.7) 4.8 Change default directory for infostream from CWD to dataDir --- Key: SOLR-4878 URL: https://issues.apache.org/jira/browse/SOLR-4878 Project: Solr Issue Type: Bug Affects Versions: 4.3 Reporter: Shawn Heisey Assignee: Shawn Heisey Fix For: 4.8 Attachments: SOLR-4878.patch, SOLR-4878.patch The default directory for the infoStream file is CWD. In a multicore system where all the cores share similar configs, the output from all cores is likely to end up in the same file. Although this is sometimes the desired outcome, it seems less than ideal. If you've got cores that literally share the same config file, or you're using SolrCloud where more than one core on the system uses the same config set, you won't have the option of putting different files in different configs. If the default directory were dataDir rather than CWD, each core would get its own infostream file. You could still get the original behavior by specifying an absolute path. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937672#comment-13937672 ] Michael McCandless commented on LUCENE-5532: +1 I love the use of startingGun in the test :) AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937673#comment-13937673 ] Michael McCandless commented on LUCENE-5515: This seems worth mentioning in CHANGES? Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results
[ https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937677#comment-13937677 ] Michael McCandless commented on LUCENE-5533: +1 But maybe we should break out a LongTaxonomyFacets instead? Ie, the more common-case of simple facet counting would never overlow an int since a Lucene shard can have at most 2.1B docs. TaxonomyFacetSumIntAssociations overflows, unpredicted results -- Key: LUCENE-5533 URL: https://issues.apache.org/jira/browse/LUCENE-5533 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: 4.7 Reporter: Rob Audenaerde {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses a {{int[]}} to store values. If you sum a lot of integers in the IntAssociatoins, the {{int}} will overflow. The easiest fix seems to change the {{value[]}} to {{long}}? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937680#comment-13937680 ] ASF subversion and git services commented on LUCENE-5515: - Commit 1578300 from [~martijn.v.groningen] in branch 'dev/trunk' [ https://svn.apache.org/r1578300 ] LUCENE-5515: Added missing CHANGES.TXT entry Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5476) Facet sampling
[ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937681#comment-13937681 ] Shai Erera commented on LUCENE-5476: Rob, I reviewed the patch and I agree with Gilad - the way you handle the scores array is wrong. It's not random access by doc. I believe if you added a test it would show up quickly. But perhaps we can keep scores out of this collector ... we can always add it later. So I don't mind if you want to wrap up w/o scores for now. Can you then fix the patch to always set keepScores=false? Also, I noticed few sops left in test. Facet sampling -- Key: LUCENE-5476 URL: https://issues.apache.org/jira/browse/LUCENE-5476 Project: Lucene - Core Issue Type: Improvement Reporter: Rob Audenaerde Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java With LUCENE-5339 facet sampling disappeared. When trying to display facet counts on large datasets (10M documents) counting facets is rather expensive, as all the hits are collected and processed. Sampling greatly reduced this and thus provided a nice speedup. Could it be brought back? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses
[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937683#comment-13937683 ] Michael McCandless commented on LUCENE-4396: Da, that's a great discovery. So, in the case where at least one MUST clause is present, BS will in fact collect in-order, and then BS could be embedded in other queries that want a sub-scorer. This may force us to more strongly separate the notion of force doc-at-a-time scoring (LUCENE-2684), since today the sneaky way to do this is return false from your Collector.acceptsDocsOutOfOrder. I think you should be careful in your proposal to keep this issue well-scoped. I.e., the overall goal is to let BS handle MUST clauses in certain causes (heuristic needs to decide this), and then a nice-to-have is to enable BS too also be a sub-scorer in some cases. BooleanScorer should sometimes be used for MUST clauses --- Key: LUCENE-4396 URL: https://issues.apache.org/jira/browse/LUCENE-4396 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. If there is one or more MUST clauses we always use BooleanScorer2. But I suspect that unless the MUST clauses have very low hit count compared to the other clauses, that BooleanScorer would perform better than BooleanScorer2. BooleanScorer still has some vestiges from when it used to handle MUST so it shouldn't be hard to bring back this capability ... I think the challenging part might be the heuristics on when to use which (likely we would have to use firstDocID as proxy for total hit count). Likely we should also have BooleanScorer sometimes use .advance() on the subs in this case, eg if suddenly the MUST clause skips 100 docs then you want to .advance() all the SHOULD clauses. I won't have near term time to work on this so feel free to take it if you are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5265) Add backward compatibility tests to JavaBinCodec's format.
[ https://issues.apache.org/jira/browse/SOLR-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5265. -- Resolution: Fixed Fix Version/s: 5.0 Add backward compatibility tests to JavaBinCodec's format. -- Key: SOLR-5265 URL: https://issues.apache.org/jira/browse/SOLR-5265 Project: Solr Issue Type: Test Reporter: Adrien Grand Assignee: Noble Paul Priority: Blocker Fix For: 4.8, 5.0 Attachments: SOLR-5265.patch, SOLR-5265.patch, SOLR-5265.patch, SOLR-5265.patch, javabin_backcompat.bin Since Solr guarantees backward compatibility of JavaBinCodec's format between releases, we should have tests for it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Update GreekStemmer.java
GitHub user pitsios-s opened a pull request: https://github.com/apache/lucene-solr/pull/43 Update GreekStemmer.java Added javadoc to the functions stem , endsWith , endsWithVowel and endsWithVowelNoY for the purposes of the software engineering class. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Team-DP/lucene-solr trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/43.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #43 commit 4bb599eabfe9355b823ad9809eca06c61db53e1b Author: Stamatis Pitsios stamatis@gmail.com Date: 2014-03-17T11:15:10Z Update GreekStemmer.java Added javadoc to the functions stem , endsWith , endsWithVowel and endsWithVowelNoY for the purposes of the software engineering class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937685#comment-13937685 ] ASF subversion and git services commented on LUCENE-5515: - Commit 1578305 from [~martijn.v.groningen] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578305 ] LUCENE-5515: Added missing CHANGES.TXT entry Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5837) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField.
[ https://issues.apache.org/jira/browse/SOLR-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-5837: Assignee: Noble Paul (was: Mark Miller) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField. - Key: SOLR-5837 URL: https://issues.apache.org/jira/browse/SOLR-5837 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Assignee: Noble Paul Attachments: SOLR-5837.patch, SOLR-5837.patch While working on SOLR-5265 I tried comparing objects of SolrDocument, SolrInputDocument and SolrInputField. These classes did not Override the equals implementation. The issue will Override equals and hashCode methods to the 3 classes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5837) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField.
[ https://issues.apache.org/jira/browse/SOLR-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5837. -- Resolution: Won't Fix Fix Version/s: (was: 4.8) (was: 5.0) The equals methods were added to testcases . So ,this is not required Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField. - Key: SOLR-5837 URL: https://issues.apache.org/jira/browse/SOLR-5837 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Assignee: Noble Paul Attachments: SOLR-5837.patch, SOLR-5837.patch While working on SOLR-5265 I tried comparing objects of SolrDocument, SolrInputDocument and SolrInputField. These classes did not Override the equals implementation. The issue will Override equals and hashCode methods to the 3 classes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results
[ https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937690#comment-13937690 ] Shai Erera commented on LUCENE-5533: bq. But maybe we should break out a LongTaxonomyFacets instead? +1, and especially as it's only relevant to association-based faceting. Rob, is this something you really hit or just a random code review? I agree when you sum integers there's a risk of overflowing, but I'm afraid if we introduce LongTaxoFacets users might want to use it just in case. The risk is that a single ord will overflow, right? I wonder if we should use a packed long buffer instead of a plain long[] ... that's optimization though. First let's agree that this is something that needs fixing. TaxonomyFacetSumIntAssociations overflows, unpredicted results -- Key: LUCENE-5533 URL: https://issues.apache.org/jira/browse/LUCENE-5533 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: 4.7 Reporter: Rob Audenaerde {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses a {{int[]}} to store values. If you sum a lot of integers in the IntAssociatoins, the {{int}} will overflow. The easiest fix seems to change the {{value[]}} to {{long}}? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5515) Improve TopDocs#merge for pagination
[ https://issues.apache.org/jira/browse/LUCENE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937694#comment-13937694 ] ASF subversion and git services commented on LUCENE-5515: - Commit 1578308 from [~martijn.v.groningen] in branch 'dev/trunk' [ https://svn.apache.org/r1578308 ] LUCENE-5515: Added author to CHANGES.txt entry Improve TopDocs#merge for pagination Key: LUCENE-5515 URL: https://issues.apache.org/jira/browse/LUCENE-5515 Project: Lucene - Core Issue Type: Improvement Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.8 Attachments: LUCENE-5515.patch, LUCENE-5515.patch If TopDocs#merge takes from and size into account it can be optimized to create a hits ScoreDoc array equal to size instead of from+size what is now the case. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937696#comment-13937696 ] Shai Erera edited comment on LUCENE-5513 at 3/17/14 11:26 AM: -- Fixed silly bug in BinaryDocValuesFieldUpdates.merge(). was (Author: shaie): Fixed stupid bug in BinaryDocValuesFieldUpdates.merge(). Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5513: --- Attachment: LUCENE-5513.patch Fixed stupid bug in BinaryDocValuesFieldUpdates.merge(). Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5872) Eliminate overseer queue
Noble Paul created SOLR-5872: Summary: Eliminate overseer queue Key: SOLR-5872 URL: https://issues.apache.org/jira/browse/SOLR-5872 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul The overseer queue is one of the busiest points in the entire system. The raison d'être of the queue is * Provide batching of operations for the main clusterstate,json so that state updates are minimized * Avoid race conditions and ensure order Now , as we move the individual collection states out of the main clusterstate.json, the batching is not useful anymore. Race conditions can easily be solved by using a compare and set in Zookeeper. The proposed solution is , whenever an operation is required to be performed on the clusterstate, the same thread (and of course the same JVM) # read the fresh state and version of zk node # construct the new state # perform a compare and set # if compare and set fails go to step 1 This should be limited to all operations performed on external collections because batching would be required for others -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5534) GreekStemmer javadocs
Robert Muir created LUCENE-5534: --- Summary: GreekStemmer javadocs Key: LUCENE-5534 URL: https://issues.apache.org/jira/browse/LUCENE-5534 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5873) Improve JavaBinCodec's backward compatibility tests
Varun Thacker created SOLR-5873: --- Summary: Improve JavaBinCodec's backward compatibility tests Key: SOLR-5873 URL: https://issues.apache.org/jira/browse/SOLR-5873 Project: Solr Issue Type: Improvement Reporter: Varun Thacker SOLR-5265 added backward compatibility tests, but it tries to read a pre-written binary file to check if there is a break a not. If we add more types to JavaBinCodec the test will need to be updated too, which will be error prone again. This is what [~hakeber] proposed on IRC - - A test that I was thinking of: we could have a jenkins job that ran a script that checked out the previous version of lucene and the the latest - Then use the solr/cloud-dev scripts to start a cloud cluster - Index some docs - Stop a node at a time, replace webapp with the latest in a rolling upgrade fashion - Then we have a full rolling upgrade test This would be a better approach for back compat tests. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5534) GreekStemmer javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937704#comment-13937704 ] ASF subversion and git services commented on LUCENE-5534: - Commit 1578315 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1578315 ] LUCENE-5534: add javadocs to GreekStemmer (closes #43) GreekStemmer javadocs - Key: LUCENE-5534 URL: https://issues.apache.org/jira/browse/LUCENE-5534 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Update GreekStemmer.java
Github user rmuir commented on the pull request: https://github.com/apache/lucene-solr/pull/43#issuecomment-37806850 Thank you very much! I just committed this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: Update GreekStemmer.java
Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/43 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5534) GreekStemmer javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937705#comment-13937705 ] ASF subversion and git services commented on LUCENE-5534: - Commit 1578317 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578317 ] LUCENE-5534: add javadocs to GreekStemmer (closes #43) GreekStemmer javadocs - Key: LUCENE-5534 URL: https://issues.apache.org/jira/browse/LUCENE-5534 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5534) GreekStemmer javadocs
[ https://issues.apache.org/jira/browse/LUCENE-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5534. - Resolution: Fixed Fix Version/s: 5.0 4.8 GreekStemmer javadocs - Key: LUCENE-5534 URL: https://issues.apache.org/jira/browse/LUCENE-5534 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.8, 5.0 Just an issue for tracking https://github.com/apache/lucene-solr/pull/43.patch -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5532) AutomatonQuery.hashCode is not thread safe
[ https://issues.apache.org/jira/browse/LUCENE-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937723#comment-13937723 ] Dawid Weiss commented on LUCENE-5532: - or is the thread group inherited by the test framework? The test suite runs in its own test group so any thread (unless explicitly assigned to another group) will inherit that group from its parent. AutomatonQuery.hashCode is not thread safe -- Key: LUCENE-5532 URL: https://issues.apache.org/jira/browse/LUCENE-5532 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5532.patch, LUCENE-5532.patch This hashCode is implemented based on #states and #transitions. These methods use getNumberedStates() though, which may oversize itself during construction and then size down when its done. But numberedStates is prematurely set (before its ready), which can cause a hashCode call from another thread to see a corrupt state... causing things like NPEs from null states and other strangeness. I don't think we should set this variable until its finished. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937732#comment-13937732 ] Alexander S. commented on SOLR-4787: Thank you, Kranti Parisa, I am far from java development, how can I apply this patch and build solr for linux? I tried to patch, it creates a new folder joins in solr/contrib, installed ivy and launched ant compile but got this error: {quote} common.compile-core: [mkdir] Created dir: /home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java [javac] Compiling 3 source files to /home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:883: error: reached end of file while parsing [javac] return this.delegate.acceptsDocsOutOfOrder(); [javac]^ [javac] /home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:884: error: reached end of file while parsing [javac] 2 errors [javac] 1 warning BUILD FAILED /home/heaven/Desktop/solr-4.7.0/build.xml:106: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:458: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:449: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:471: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:1736: Compile failed; see the compiler error output for details. Total time: 8 minutes 55 seconds {quote} Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which
Re: Reducing the number of warnings in the codebase
Hi; As being adviser of using Sonar I want to add some more points. First of all such kind of tools shows some metrics other than code warnings and we should show that metrics even we don't use them. In example *sometimes *code complexity is a good measure to review your code. I should mention that code warnings are not listed directly at Sonar. They are separated into categories. Major is an important category to take care on. You can ignore the minor warnings if you want. When I use Sonar and check the codes of my team sometimes I realize that there are false positive warnings. These rules can easily be dropped from Sonar. My idea is that: whether we care Sonar outputs or not we should integrate our project to Sonar instance available at Apache. I've opened a Jira issue for it: https://issues.apache.org/jira/browse/SOLR-5869 and *I'm volunteer to work for it.* All in all I think that these tools (as like PMD or etc.) sometimes really helpful. Try to search for the reasons of every fired bugs and at the end you can see that you've finished Effective Java because of references to items. I know that there are false positives but such things can be discard easily. Other point is that: these tools produces nice graphics that you can see the direction of your project even you don't use its bug warnings, code coverage metrics or something like that. I've created issues about bugs (I've checked all the major ones) and *I've applied patch for them* previously. Some of them are: SOLR-5836, SOLR-5838, SOLR-5839, SOLR-5840, SOLR-5841, LUCENE-5506, LUCENE-5508, LUCENE-5509 Thanks; Furkan KAMACI 2014-03-16 23:34 GMT+02:00 Benson Margulies bimargul...@gmail.com: I think we avoid bikeshed by making incremental changes. If you offer a commit to turn off serial version UID whining, I'll +1 it. And then we iterate, in small doses, agreeing to either spike the warning or change the code. In passing, I will warn you that the IDEs can be very stubborn; in some cases, there is no way to avoid some amount of whining. Eclipse used to insist on warning on every @SuppressWarnings that it didn't understand. It might still. On Sun, Mar 16, 2014 at 5:29 PM, Shawn Heisey s...@elyograg.org wrote: A starting comment: We could bikeshed for *years*. General thought: The more I think about it, the more I like the notion of confining most of the cleanup to trunk. Actual bug fixes and changes that are relatively non-invasive should be backported. On 3/16/2014 2:48 PM, Uwe Schindler wrote: Just because some tool expresses distaste, doesn't imply that everyone here agrees that it's a problem we should fix. Yes that is my biggest problem. Lots of warnings by Eclipse are just bullshit because of the code style in Lucene and for example the way we do things - e.g., it complains about missing close() all the time, just because we use IOUtils.closeWhileHandlingExceptions() for that. My original thought on this was that we should use a combination of SuppressWarnings and actual code changes to eliminate most of the warnings that show up in the well-supported IDEs when they are configured with *default* settings. Uwe brings up a really good point that there are a number of completely useless warnings, but I think there's still value in looking through EVERY default IDE warning and evaluating each one on a case-by-case basis to decide whether that specific warning should be fixed or ignored. It could be a sort of background task with an open Jira for tracking commits. It could also be something that we decide isn't worth the effort. In my experience, the default Sonar rulesets contain many things that people here are prone to disagree with. Start with serialVersionUID: do we care? Why would we care? In what cases to we really believe that a sane person would be using Java serialization with a Lucene/Solr class? We officially don't support serialization, so all warnings are useless. It's just Eclipse that complains for no reason. Project-specific IDE settings for errors/warnings (set by the ant build target) will go a long way towards making the whole situation better. For the current stable branch, we should include settings for anything that we want to ignore on trunk, but only a subset of those problems that get elevated to error status. Sonar can also be a bit cranky; it arranges for various tools to run via mechanisms that sometimes conflict with the ways you might run them yourself. So I'd suggest a process like: 1. Someone proposes a set of (e.g.) checkstyle rules to live by. 2. That ruleset is refined by experiment. 3. We make violations fail the build. Then lather, rinse, repeat for other tools. Yes I agree. I am strongly against PMD or CheckStyle without our own rules. Forbiddeen-apis was invented because of the brokenness of PMD and CheckStyle to detect default Locale/Charset/Timezone violations (and
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937747#comment-13937747 ] Alexander S. commented on SOLR-4787: Nvm, there were 3 missing } at the end of HashSetJoinQParserPlugin.java, the build was successful, testing now. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist
[jira] [Updated] (LUCENE-5476) Facet sampling
[ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Audenaerde updated LUCENE-5476: --- Attachment: LUCENE-5476.patch Removed scores. Added javadoc explaining what happens to scores. Removed System.out.println Facet sampling -- Key: LUCENE-5476 URL: https://issues.apache.org/jira/browse/LUCENE-5476 Project: Lucene - Core Issue Type: Improvement Reporter: Rob Audenaerde Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java With LUCENE-5339 facet sampling disappeared. When trying to display facet counts on large datasets (10M documents) counting facets is rather expensive, as all the hits are collected and processed. Sampling greatly reduced this and thus provided a nice speedup. Could it be brought back? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
Rob Audenaerde created LUCENE-5535: -- Summary: DrillDownQuery not working with AssociateFacetFields? Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Audenaerde updated LUCENE-5535: --- Attachment: AssociationsFacetsWithDrilldownExample.java DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_60-ea-b07) - Build # 9719 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9719/ Java: 32bit/jdk1.7.0_60-ea-b07 -client -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:59868 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:59868 within 45000 ms at __randomizedtesting.SeedInfo.seed([7B2763E3A20565BB:FAC1EDFBD55A0587]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937767#comment-13937767 ] Shai Erera commented on LUCENE-5535: Have you looked at AssociationsFacetsExample under lucene/demo? It has a drilldown() example too. Also, I ran the example code you attached and it produced: {noformat} Sum associations example: - tags: dim=tags path=[] value=-1 childCount=2 lucene (4) solr (2) genre: dim=genre path=[] value=-1.0 childCount=2 computing (1.62) software (0.34) Count withouth associations: - tags: dim=tags path=[] value=-1 childCount=2 lucene (2) solr (1) {noformat} Where is the problem? DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2733) DIH - Ignoring Error when closing connection when send command abort in jdbc 5.1.17
[ https://issues.apache.org/jira/browse/SOLR-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937766#comment-13937766 ] Manjunath commented on SOLR-2733: - I did get the same error but different occasion. Here is the stack trace {code:xml} ERROR org.apache.solr.handler.dataimport.JdbcDataSource – Ignoring Error when closing connection java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@479abcd4 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924) at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3314) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2477) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809) at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:5165) at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5048) at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4654) at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1630) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:436) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:421) at org.apache.solr.handler.dataimport.DebugLogger$2.close(DebugLogger.java:180) at org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:294) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:283) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) {code} DIH -
[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results
[ https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937770#comment-13937770 ] Rob Audenaerde commented on LUCENE-5533: I hit this pretty easily. I tried to build an aggregator that sums the associated values for a given search. In my testcase, I used 1M documents. The n-th document had {{n}} as associated int value. Average int value is thus 500.000. 500.000x1M 2,147,483,648 I currently switched to using Floats, which for me gives results that are accurate enough, and also allow for numbers greater than {{Integer.MAX_VALUE}}, so I'm not really sure it is a problem. Maybe there should be a {{RuntimeException}} if the accumulated values overflows? TaxonomyFacetSumIntAssociations overflows, unpredicted results -- Key: LUCENE-5533 URL: https://issues.apache.org/jira/browse/LUCENE-5533 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: 4.7 Reporter: Rob Audenaerde {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses a {{int[]}} to store values. If you sum a lot of integers in the IntAssociatoins, the {{int}} will overflow. The easiest fix seems to change the {{value[]}} to {{long}}? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937783#comment-13937783 ] Rob Audenaerde commented on LUCENE-5535: Maybe then I made a mistake somewhere else, because this is my result: {noformat} Sum associations example: - tags: null genre: null Count withouth associations: - tags: dim=tags path=[] value=-1 childCount=2 lucene (2) solr (1) {noformat} I'll try to double check asap. DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5536) TaxonomyFacetSumInt/FloatAssociations should not rollup()
Shai Erera created LUCENE-5536: -- Summary: TaxonomyFacetSumInt/FloatAssociations should not rollup() Key: LUCENE-5536 URL: https://issues.apache.org/jira/browse/LUCENE-5536 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Shai Erera Stumbled upon this by accident when I reviewed the code. The previous associations impl never rolled-up. The assumption is that association values are given to exact categories and have no hierarchical meaning. For instance if a document is associated with two categories: {{Category/CS/Algo}} and {{Category/CS/DataStructure}} with weights {{0.95}} and {{0.43}} respectively, it is not associated with {{Category/CS}} with weight {{1.38}}! :) If the app wants to association values to apply to parents in the hierarchy as well, it needs to explicitly specify that (as in passing the hierarchy categories with their own association value). I will fix the bug and also make sure the app cannot trip it by accidentally specifying hierarchical on these categories, or that if it does (cause e.g. it indexes the categories for both counting and assoc values) then we don't apply the association to all the categories in the hierarchy. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5533) TaxonomyFacetSumIntAssociations overflows, unpredicted results
[ https://issues.apache.org/jira/browse/LUCENE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937785#comment-13937785 ] Shai Erera commented on LUCENE-5533: I see, it is a pretty extreme case indeed! We never hit overflow problems in the past :). The problem w/ raising RuntimeException is it means adding an {{if}} to every aggregation, which is costly and for a really extreme case. I think it's better if you write your own TaxoFacetSumLongAssoc to use a long[], packed-ints, float[] or whatever and raise these exceptions yourself? Also, perhaps it's ok to e.g. stop at weight=1B/2.1B to denote that this category is already very important and all categories beyond this weight are equally important? Not sure of your usecase and if it makes sense, but juts a thought. That too can easily be done in your own Facets impl. TaxonomyFacetSumIntAssociations overflows, unpredicted results -- Key: LUCENE-5533 URL: https://issues.apache.org/jira/browse/LUCENE-5533 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: 4.7 Reporter: Rob Audenaerde {{TaxonomyFacetSumIntAssociations}} extends {{IntTaxonomyFacets}} which uses a {{int[]}} to store values. If you sum a lot of integers in the IntAssociatoins, the {{int}} will overflow. The easiest fix seems to change the {{value[]}} to {{long}}? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937793#comment-13937793 ] Shai Erera commented on LUCENE-5535: Phew :). I'll resolve the issue then. Feel free to reopen if it still doesn't work. DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937798#comment-13937798 ] Shai Erera commented on LUCENE-5535: I think you may be tripping LUCENE-5522, which I fixed a few days ago. DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5535. Resolution: Duplicate Assignee: Shai Erera DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Assignee: Shai Erera Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5535) DrillDownQuery not working with AssociateFacetFields?
[ https://issues.apache.org/jira/browse/LUCENE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937789#comment-13937789 ] Rob Audenaerde commented on LUCENE-5535: I think I used an older revision :/ DrillDownQuery not working with AssociateFacetFields? - Key: LUCENE-5535 URL: https://issues.apache.org/jira/browse/LUCENE-5535 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Rob Audenaerde Attachments: AssociationsFacetsWithDrilldownExample.java I'm trying to use the FloatAssociationFacetField to store a float with each facet. Retrieving, summing etc. works fine for MatchAllDocumentQuery(). When I try to drilldown on one of the facets, the result is always empty. See attached example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5874) Unsafe cast in RouteException
David Arthur created SOLR-5874: -- Summary: Unsafe cast in RouteException Key: SOLR-5874 URL: https://issues.apache.org/jira/browse/SOLR-5874 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.6.1 Reporter: David Arthur When a non-Exception is thrown somewhere in the CloudSolrServer, I get a XXX cannot be cast to java.lang.Exception {code} java.lang.ClassCastException: java.lang.NoClassDefFoundError cannot be cast to java.lang.Exception at org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException.init(CloudSolrServer.java:484) at org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:351) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:510) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) {code} Should probably cast to Throwable, or do a check and wrap non-Exceptions in an Exception first -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5664) /browse: Show all highlighting fragments
[ https://issues.apache.org/jira/browse/SOLR-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-5664: -- Description: Currently if there are more highlighting fragments for the features field in example, only the first one is redered in the /browse GUI (was: Currently if there are more highlighting fragments, only the first one is redered in the /browse GUI) /browse: Show all highlighting fragments Key: SOLR-5664 URL: https://issues.apache.org/jira/browse/SOLR-5664 Project: Solr Issue Type: Bug Components: contrib - Velocity Reporter: Jan Høydahl Fix For: 4.8 Currently if there are more highlighting fragments for the features field in example, only the first one is redered in the /browse GUI -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-3613) Namespace Solr's JAVA OPTIONS
[ https://issues.apache.org/jira/browse/SOLR-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl closed SOLR-3613. - Resolution: Won't Fix As Solr is moving away from being a deployable war, this issue becomes less relevant. Closing. Namespace Solr's JAVA OPTIONS - Key: SOLR-3613 URL: https://issues.apache.org/jira/browse/SOLR-3613 Project: Solr Issue Type: Improvement Affects Versions: 4.0-ALPHA Reporter: Jan Høydahl Fix For: 4.8 Attachments: SOLR-3613.patch Solr being a web-app, should play nicely in a setting where users deploy it on a shared appServer. To this regard Solr's JAVA_OPTS should be properly name spaced, both to avoid name clashes and for clarity when reading your appserver startup script. We currently do that with most: {{solr.solr.home, solr.data.dir, solr.abortOnConfigurationError, solr.directoryFactory, solr.clustering.enabled, solr.velocity.enabled etc}}, but for some opts we fail to do so. Before release of 4.0 we should make sure to clean this up. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4495) solr.xml sharedLib attribtue should take a list of paths
[ https://issues.apache.org/jira/browse/SOLR-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937840#comment-13937840 ] Jan Høydahl commented on SOLR-4495: --- What do people feel is best here? * (A) {{;}} separated string in {{str name=sharedLib}} * (B) Multiple occurrences of the tag {{str name=sharedLib}} solr.xml sharedLib attribtue should take a list of paths Key: SOLR-4495 URL: https://issues.apache.org/jira/browse/SOLR-4495 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jan Høydahl Labels: classpath, solr.xml Fix For: 4.8 Attachments: SOLR-4495.patch solr.xml's sharedLib is a great way to add plugins that should be shared across all cores/collections. For increased flexibility I would like for it to take a list of paths. Then I'd put Solr's own contrib libs in one shared folder solrJars and custom plugins with deps in another customerJars. That eases Solr upgrades, then we can simply wipe and replace all jars in solrJars during upgrade. I realize that solr.xml is going away, and so the same request will be valid for whatever replaces solr.xml, whether it be system prop or properties file. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4495) solr.xml sharedLib attribute should take a list of paths
[ https://issues.apache.org/jira/browse/SOLR-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-4495: -- Summary: solr.xml sharedLib attribute should take a list of paths (was: solr.xml sharedLib attribtue should take a list of paths) solr.xml sharedLib attribute should take a list of paths Key: SOLR-4495 URL: https://issues.apache.org/jira/browse/SOLR-4495 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Jan Høydahl Labels: classpath, solr.xml Fix For: 4.8 Attachments: SOLR-4495.patch solr.xml's sharedLib is a great way to add plugins that should be shared across all cores/collections. For increased flexibility I would like for it to take a list of paths. Then I'd put Solr's own contrib libs in one shared folder solrJars and custom plugins with deps in another customerJars. That eases Solr upgrades, then we can simply wipe and replace all jars in solrJars during upgrade. I realize that solr.xml is going away, and so the same request will be valid for whatever replaces solr.xml, whether it be system prop or properties file. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5872) Eliminate overseer queue
[ https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937842#comment-13937842 ] Yonik Seeley commented on SOLR-5872: bq. as we move the individual collection states out of the main clusterstate.json [...] This will make a difference on clusters with many smaller collections, but not on the single big collection. It seems like we still want scalability in both directions (wrt number of collections, and the size a single collection can be). Eliminate overseer queue - Key: SOLR-5872 URL: https://issues.apache.org/jira/browse/SOLR-5872 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul The overseer queue is one of the busiest points in the entire system. The raison d'être of the queue is * Provide batching of operations for the main clusterstate,json so that state updates are minimized * Avoid race conditions and ensure order Now , as we move the individual collection states out of the main clusterstate.json, the batching is not useful anymore. Race conditions can easily be solved by using a compare and set in Zookeeper. The proposed solution is , whenever an operation is required to be performed on the clusterstate, the same thread (and of course the same JVM) # read the fresh state and version of zk node # construct the new state # perform a compare and set # if compare and set fails go to step 1 This should be limited to all operations performed on external collections because batching would be required for others -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3135) New binary request/response format using Avro
[ https://issues.apache.org/jira/browse/SOLR-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-3135: -- Fix Version/s: (was: 4.8) New binary request/response format using Avro - Key: SOLR-3135 URL: https://issues.apache.org/jira/browse/SOLR-3135 Project: Solr Issue Type: New Feature Components: Response Writers, search Reporter: Jan Høydahl Labels: Avro, RequestHandler, ResponseWriter, serialization Solr does not have a binary request/response format which can be supported by any client/programming language. The JavaBin format is Java only and is also not standards based. The proposal (spinoff from SOLR-1535 and SOLR-2204) is to investigate creation of an [Apache Avro|http://avro.apache.org/] based serialization format. First goal is to produce Avro [Schemas|http://avro.apache.org/docs/current/#schemas] for Request and Response and then provide {{AvroRequestHandler}} and {{AvroResponseWriter}}. Secondary goal is to use it for replication. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937845#comment-13937845 ] Alexander S. commented on SOLR-4787: Kranti, Do I need to update anything in my solr config/schema? I've just tried the patched version and it still ignores the fq parameter. I was using solr 4.7.0. Thanks, Alex Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr
[jira] [Commented] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937853#comment-13937853 ] Furkan KAMACI commented on SOLR-5852: - [~elyograg] ConnectStringParser at Zookeeper checks chroot and other invalid situations. We can give that checking responsibility to Zookeeper. If anything changes within Zookeeper check condition our CloudSolrServer will not be affected from it because we will pass that check to Zookeeper and it will handle it. I think that we can handle chroot with current situation too. Zookeeper.java works like that: 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002/app/a so I can improve the javadoc and include that: if there is a chroot add it to the end of the last host string (this is how original Zookeeper code works). All in all if anybody sends multiple chroot definitions or anything else Zookeeper will return an error. Another approach is accepting like that: 127.0.0.1:3000/app/a,127.0.0.1:3001/app/a,127.0.0.1:3002/app/a so parsing if there any chroot and valid for all hosts etc. Add CloudSolrServer helper method to connect to a ZK ensemble - Key: SOLR-5852 URL: https://issues.apache.org/jira/browse/SOLR-5852 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-5852.patch, SOLR-5852_FK.patch We should have a CloudSolrServer constructor which takes a list of ZK servers to connect to. Something Like {noformat} public CloudSolrServer(String... zkHost); {noformat} - Document the current constructor better to mention that to connect to a ZK ensemble you can pass a comma-delimited list of ZK servers like zk1:2181,zk2:2181,zk3:2181 - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937868#comment-13937868 ] Kranti Parisa commented on SOLR-4787: - Alex, Are you using HashSetJoin? Did you configure in solrconfig.xml? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937878#comment-13937878 ] Erick Erickson commented on SOLR-1604: -- OK, to finish this off, we need a Wiki/Confluence page, calling for volunteers: Some points that should be mentioned I think: how to set up/use (simple really, defType) A number of examples inOrder=true|false as a local param mentioned Anyone's experience with how it performs, especially with things like single-letter wildcards (e.g. j* smith) Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Assignee: Erick Erickson Priority: Minor Fix For: 4.8, 5.0 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble
[ https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated SOLR-5852: Attachment: SOLR-5852_FK.patch I've improved the javadoc. We can use whether SOLR-4620 or this. On the other hand I can implement another patch according to second approach at my previous comment. Add CloudSolrServer helper method to connect to a ZK ensemble - Key: SOLR-5852 URL: https://issues.apache.org/jira/browse/SOLR-5852 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-5852.patch, SOLR-5852_FK.patch, SOLR-5852_FK.patch We should have a CloudSolrServer constructor which takes a list of ZK servers to connect to. Something Like {noformat} public CloudSolrServer(String... zkHost); {noformat} - Document the current constructor better to mention that to connect to a ZK ensemble you can pass a comma-delimited list of ZK servers like zk1:2181,zk2:2181,zk3:2181 - Thirdly should getLbServer() and getZKStatereader() be public? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937883#comment-13937883 ] Steven Bower commented on SOLR-5488: For field facet test this is def an ordering thing.. i started looking at this but haven't finished.. although i think i removed the @Ignore which is why is started failing always.. that being said I found some rather interesting issues internally that may have been causing some of the intermittent failures.. are these tests with the most recent patch i applied? still working.. will update when I get further.. Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 4.7, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937887#comment-13937887 ] Alexander S. commented on SOLR-4787: Hi, I am using simple join, this way: {!join from=profile_ids_im to=id_i fq=$joinFilter1 v=$joinQuery1}. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory.
[jira] [Created] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
Steve Rowe created SOLR-5875: Summary: QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.7.1 SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937896#comment-13937896 ] Erick Erickson commented on SOLR-5488: -- Hey Steve! Sorry, should have updated things yesterday. Yes, these are with all the latest patches applied. That said, I re-wound based on Houston's comments and undid the changes to fieldFacets.txt (which were local anyway, of course I didn't check them in). So essentially, just trunk with your latest patch and removing @Ignore and/or @BadApple. The FieldFacetTest is the more interesting since it fails all the time. Why that would be related to the assertU around the commits I have no clue. That seems out in left field somewhere. I'll be able to look at any changes intermittently starting this evening CA time, got a busy day ahead. Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 4.7, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937895#comment-13937895 ] Kranti Parisa commented on SOLR-4787: - NestedJoins (fqs) are implemented in HashSetJoin. so after applying the patch you will need to configure it in solrconfig.xml queryParser name=hjoin class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/ and use {!hjoin from=profile_ids_im to=id_i fq=$joinFilter1 v=$joinQuery1}, so you are trying to do a self join on the same core? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the
[jira] [Updated] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-5875: - Attachment: SOLR-5875.patch Simple patch with fix. [~alexey] has confirmed that this solved the excessive memory use issue he saw. Committing shortly. QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5872) Eliminate overseer queue
[ https://issues.apache.org/jira/browse/SOLR-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937901#comment-13937901 ] Mark Miller commented on SOLR-5872: --- I'm not fully sold on this yet. Compare and set is how this was first implemented and it has it's own issues - hence the work Sami did to move to the queue. Potter has noticed the overseer is fairly slow at working through state updates. I think that should be investigated first. Eliminate overseer queue - Key: SOLR-5872 URL: https://issues.apache.org/jira/browse/SOLR-5872 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul The overseer queue is one of the busiest points in the entire system. The raison d'être of the queue is * Provide batching of operations for the main clusterstate,json so that state updates are minimized * Avoid race conditions and ensure order Now , as we move the individual collection states out of the main clusterstate.json, the batching is not useful anymore. Race conditions can easily be solved by using a compare and set in Zookeeper. The proposed solution is , whenever an operation is required to be performed on the clusterstate, the same thread (and of course the same JVM) # read the fresh state and version of zk node # construct the new state # perform a compare and set # if compare and set fails go to step 1 This should be limited to all operations performed on external collections because batching would be required for others -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937915#comment-13937915 ] ASF subversion and git services commented on SOLR-5875: --- Commit 1578434 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1578434 ] SOLR-5875: QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard. QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937917#comment-13937917 ] ASF subversion and git services commented on SOLR-5875: --- Commit 1578435 from [~steve_rowe] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578435 ] SOLR-5875: QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard. (merged trunk r1578434) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937921#comment-13937921 ] Alexander S. commented on SOLR-4787: Ok, thx, I'll try with hjoin. And yes, I am trying to do it on the same core. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the
[jira] [Resolved] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe resolved SOLR-5875. -- Resolution: Fixed Committed to trunk, branch_4x and the lucene_solr_4_7 branch. QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937920#comment-13937920 ] ASF subversion and git services commented on SOLR-5875: --- Commit 1578438 from [~steve_rowe] in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1578438 ] SOLR-5875: QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard. (merged trunk r1578434) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-5875: - Fix Version/s: 5.0 4.8 QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5750) Backup/Restore API for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937924#comment-13937924 ] Robert Parker commented on SOLR-5750: - You should have the option of backing up/replicating a live searchable collection on SolrCloud A to a live searchable collection across a WAN on SolrCloud B, each with their own separate ZooKeeper ensemble. You should also be able to rename the collection on the fly so that the live searchable collection on SolrCloud A is called collectionA and its live updated searchable replication copy is known as collectionB so as to allow a single remote instance of SolrCloud to be multi-homed to act as a replication target for multiple other Solr instances' collections, even if those collections happen to have the same name on each of their source instances. Also, WAN compression/optimization would be helpful as well. Backup/Restore API for SolrCloud Key: SOLR-5750 URL: https://issues.apache.org/jira/browse/SOLR-5750 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Shalin Shekhar Mangar Fix For: 4.8, 5.0 We should have an easy way to do backups and restores in SolrCloud. The ReplicationHandler supports a backup command which can create snapshots of the index but that is too little. The command should be able to backup: # Snapshots of all indexes or indexes from the leader or the shards # Config set # Cluster state # Cluster properties # Aliases # Overseer work queue? A restore should be able to completely restore the cloud i.e. no manual steps required other than bringing nodes back up or setting up a new cloud cluster. SOLR-5340 will be a part of this issue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Sorry for JIRA spam
I don’t think it’s bad to have JIRA bump the fix version at all - you just want to supress an individual email for each change if its going to be that many. -- Mark Miller about.me/markrmiller On March 16, 2014 at 9:45:42 AM, David Smiley (@MITRE.org) (dsmi...@mitre.org) wrote: Sorry for all the email spam last night, folks. I Released Lucene Solr 4.7 in JIRA last night. I updated the instructions here https://wiki.apache.org/lucene-java/ReleaseTodo#Update_JIRA to explicitly indicate *not* to have JIRA bump the Fix-version values. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Sorry for JIRA spam
What I mean is that when you click the “Release” menu choice next to the version, JIRA optionally asks you if it should bump the fix versions (I forget the precise language). It didn’t say that in doing so it would send a ton of email, and it didn’t give an option to suppress email. Separately from this, the release instructions we have in our wiki describe how to advance the fix-versions in a way that suppresses email. But if beforehand you let JIRA do it for you with just one-click as part of releasing the version, then it’ll send out the mass email. ~ David From: Mark Miller-3 [via Lucene] ml-node+s472066n4124847...@n3.nabble.commailto:ml-node+s472066n4124847...@n3.nabble.com Date: Monday, March 17, 2014 at 11:41 AM To: Smiley, David W. dsmi...@mitre.orgmailto:dsmi...@mitre.org Subject: Re: Sorry for JIRA spam I don’t think it’s bad to have JIRA bump the fix version at all - you just want to supress an individual email for each change if its going to be that many. -- Mark Miller about.me/markrmiller On March 16, 2014 at 9:45:42 AM, David Smiley (@MITRE.org) ([hidden email]/user/SendEmail.jtp?type=nodenode=4124847i=0) wrote: Sorry for all the email spam last night, folks. I Released Lucene Solr 4.7 in JIRA last night. I updated the instructions here https://wiki.apache.org/lucene-java/ReleaseTodo#Update_JIRA to explicitly indicate *not* to have JIRA bump the Fix-version values. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email]/user/SendEmail.jtp?type=nodenode=4124847i=1 For additional commands, e-mail: [hidden email]/user/SendEmail.jtp?type=nodenode=4124847i=2 If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545p4124847.html To unsubscribe from Sorry for JIRA spam, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4124545code=RFNNSUxFWUBtaXRyZS5vcmd8NDEyNDU0NXwxMDE2NDI2OTUw. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Sorry-for-JIRA-spam-tp4124545p4124848.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
Lucene/Solr 4.7.1
I’d like to make a 4.7.1 release. I’ve committed SOLR-5875 to the lucene_solr_4_7 branch; I think it definitely warrants a bugfix release. I propose making an RC in one week: Monday March 24. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937946#comment-13937946 ] Erick Erickson commented on SOLR-5875: -- Hmmm, is it possible to have the original person who posted the problem give it a test run? For something like this it'd be good to have some proof that if fixes the problem described. Just a thought. Erick QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5052) bitset codec for off heap filters
[ https://issues.apache.org/jira/browse/LUCENE-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937947#comment-13937947 ] Mikhail Khludnev commented on LUCENE-5052: -- bq. it'd be better if the postings format wrapped another postings format, and then only used the bitset when the docFreq was high enough There are two orthogonal conceptions: * particular format - let's generalize bitset format to no-tf format, and use WAH8, Elas-Fano with off-heap access (TODO). Thus, it works for spare postings; * API - how consumer can express his intention to use no-tf format? e.g. TermFilter or TermsEnum.docs() with special flag; I'd like to clarify use-case for this issue (issue summary might need to be improved). It aims Solr's fq or even Heliosearch's GC-lightness. I suppose that user can decide which fields to index with no-tf format, these are string fields. Then, user requests filtering for these fields, no scoring is needed, for sure. [~mikemccand] Hence, I don't think than conditional conditional triggering is a good choice, however I don't know how to do it. I might not understand well how pulsing codec is used (impl idea is clear, though), can you point me on its' usage. Thanks! bitset codec for off heap filters - Key: LUCENE-5052 URL: https://issues.apache.org/jira/browse/LUCENE-5052 Project: Lucene - Core Issue Type: New Feature Components: core/codecs Reporter: Mikhail Khludnev Labels: features Fix For: 5.0 Attachments: LUCENE-5052.patch, bitsetcodec.zip, bitsetcodec.zip Colleagues, When we filter we don’t care any of scoring factors i.e. norms, positions, tf, but it should be fast. The obvious way to handle this is to decode postings list and cache it in heap (CachingWrappingFilter, Solr’s DocSet). Both of consuming a heap and decoding as well are expensive. Let’s write a posting list as a bitset, if df is greater than segment's maxdocs/8 (what about skiplists? and overall performance?). Beside of the codec implementation, the trickiest part to me is to design API for this. How we can let the app know that a term query don’t need to be cached in heap, but can be held as an mmaped bitset? WDYT? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5875) QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard
[ https://issues.apache.org/jira/browse/SOLR-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937953#comment-13937953 ] Steve Rowe commented on SOLR-5875: -- Erick, as I mentioned above, [~alexey] gave it a test run and it fixed the memory issue he saw. QueryComponent.mergeIds() unmarshals all docs' sort field values once per doc instead of once per shard --- Key: SOLR-5875 URL: https://issues.apache.org/jira/browse/SOLR-5875 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.7 Reporter: Steve Rowe Assignee: Steve Rowe Priority: Critical Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5875.patch SOLR-5354 added unmarshalling of distributed sort field values in {{QueryComponent.mergeIds()}}, but incorrectly performs this (unmarshalling all docs' sort field values) for every doc, and stores the result with each doc. This is unnecessary, inefficient, and extremely wasteful of memory. In an offline conversation, [~alexey] described the issue to me and located the likely problem, and [~hossman_luc...@fucit.org] located the problem code via inspection. This bug is very likely the problem described on the solr-user mailing list here: [SolrCloud constantly crashes after upgrading to Solr 4.7|http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c83f549bdf8deecbc7567c324ee0cb...@cluster38.e-active.nl%3e] -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5796) With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a conflicting state about who is the lead
[ https://issues.apache.org/jira/browse/SOLR-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937957#comment-13937957 ] Mark Miller commented on SOLR-5796: --- Do we have a JIRA issue for the instability you mention in the failover? I can guess what it is... but we should track it and harden it. With many collections, leader re-election takes too long when a node dies or is rebooted, leading to some shards getting into a conflicting state about who is the leader. Key: SOLR-5796 URL: https://issues.apache.org/jira/browse/SOLR-5796 Project: Solr Issue Type: Bug Components: SolrCloud Environment: Found on branch_4x Reporter: Timothy Potter Assignee: Mark Miller Fix For: 4.8, 5.0 Attachments: SOLR-5796.patch I'm doing some testing with a 4-node SolrCloud cluster against the latest rev in branch_4x having many collections, 150 to be exact, each having 4 shards with rf=3, so 450 cores per node. Nodes are decent in terms of resources: -Xmx6g with 4 CPU - m3.xlarge's in EC2. The problem occurs when rebooting one of the nodes, say as part of a rolling restart of the cluster. If I kill one node and then wait for an extended period of time, such as 3 minutes, then all of the leaders on the downed node (roughly 150) have time to failover to another node in the cluster. When I restart the downed node, since leaders have all failed over successfully, the new node starts up and all cores assume the replica role in their respective shards. This is goodness and expected. However, if I don't wait long enough for the leader failover process to complete on the other nodes before restarting the downed node, then some bad things happen. Specifically, when the dust settles, many of the previous leaders on the node I restarted get stuck in the conflicting state seen in the ZkController, starting around line 852 in branch_4x: {quote} 852 while (!leaderUrl.equals(clusterStateLeaderUrl)) { 853 if (tries == 60) { 854 throw new SolrException(ErrorCode.SERVER_ERROR, 855 There is conflicting information about the leader of shard: 856 + cloudDesc.getShardId() + our state says: 857 + clusterStateLeaderUrl + but zookeeper says: + leaderUrl); 858 } 859 Thread.sleep(1000); 860 tries++; 861 clusterStateLeaderUrl = zkStateReader.getLeaderUrl(collection, shardId, 862 timeoutms); 863 leaderUrl = getLeaderProps(collection, cloudDesc.getShardId(), timeoutms) 864 .getCoreUrl(); 865 } {quote} As you can see, the code is trying to give a little time for this problem to work itself out, 1 minute to be exact. Unfortunately, that doesn't seem to be long enough for a busy cluster that has many collections. Now, one might argue that 450 cores per node is asking too much of Solr, however I think this points to a bigger issue of the fact that a node coming up isn't aware that it went down and leader election is running on other nodes and is just being slow. Moreover, once this problem occurs, it's not clear how to fix it besides shutting the node down again and waiting for leader failover to complete. It's also interesting to me that /clusterstate.json was updated by the healthy node taking over the leader role but the /collections/collleaders/shard# was not updated? I added some debugging and it seems like the overseer queue is extremely backed up with work. Maybe the solution here is to just wait longer but I also want to get some feedback from the community on other options? I know there are some plans to help scale the Overseer (i.e. SOLR-5476) so maybe that helps and I'm trying to add more debug to see if this is really due to overseer backlog (which I suspect it is). In general, I'm a little confused by the keeping of leader state in multiple places in ZK. Is there any background information on why we have leader state in /clusterstate.json and in the leader path znode? Also, here are some interesting side observations: a. If I use rf=2, then this problem doesn't occur as leader failover happens more quickly and there's less overseer work? May be a red herring here, but I can consistently reproduce it with RF=3, but not with RF=2 ... suppose that is because there are only 300 cores per node versus 450 and that's just enough less work to make this issue work itself out. b. To support that many cores, I had to set -Xss256k to reduce the stack size as Solr uses a lot of threads during startup (high point was 800'ish)
[jira] [Commented] (SOLR-5800) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.
[ https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937968#comment-13937968 ] ASF subversion and git services commented on SOLR-5800: --- Commit 1578444 from [~steffkes] in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1578444 ] SOLR-5800: Admin UI - Analysis form doesn't render results correctly when a CharFilter is used (merge r1576652) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used. Key: SOLR-5800 URL: https://issues.apache.org/jira/browse/SOLR-5800 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.7 Reporter: Timothy Potter Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5800-sample.json, SOLR-5800.patch I have an example in Solr In Action that uses the PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0. Specifically, the fieldType is: fieldType name=text_microblog class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=([a-zA-Z])\1+ replacement=$1$1/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 preserveOriginal=0 catenateWords=1 generateNumberParts=1 catenateNumbers=0 catenateAll=0 types=wdfftypes.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.KStemFilterFactory/ /analyzer /fieldType The PatternReplaceCharFilterFactory (PRCF) is used to collapse repeated letters in a term down to a max of 2, such as #yu would be #yumm When I run some text through this analyzer using the Analysis form, the output is as if the resulting text is unavailable to the tokenizer. In other words, the only results being displayed in the output on the form is for the PRCF This example stopped working in 4.7.0 and I've verified it worked correctly in 4.6.1. Initially, I thought this might be an issue with the actual analysis, but the analyzer actually works when indexing / querying. Then, looking at the JSON response in the Developer console with Chrome, I see the JSON that comes back includes output for all the components in my chain (see below) ... so looks like a UI rendering issue to me? {responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm :) Drinking a latte at Caffe Grecco in SF's historic North Beach... Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23 59 75 6d 6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a 29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44 72 69 6e 6b 69 6e 67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c ... the JSON returned to the browser has evidence that the full analysis chain was applied, so this seems to just be a rendering issue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5800) Admin UI - Analysis form doesn't render results correctly when a CharFilter is used.
[ https://issues.apache.org/jira/browse/SOLR-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-5800: Fix Version/s: 4.7.1 Admin UI - Analysis form doesn't render results correctly when a CharFilter is used. Key: SOLR-5800 URL: https://issues.apache.org/jira/browse/SOLR-5800 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.7 Reporter: Timothy Potter Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.8, 5.0, 4.7.1 Attachments: SOLR-5800-sample.json, SOLR-5800.patch I have an example in Solr In Action that uses the PatternReplaceCharFilterFactory and now it doesn't work in 4.7.0. Specifically, the fieldType is: fieldType name=text_microblog class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=([a-zA-Z])\1+ replacement=$1$1/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1 preserveOriginal=0 catenateWords=1 generateNumberParts=1 catenateNumbers=0 catenateAll=0 types=wdfftypes.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.KStemFilterFactory/ /analyzer /fieldType The PatternReplaceCharFilterFactory (PRCF) is used to collapse repeated letters in a term down to a max of 2, such as #yu would be #yumm When I run some text through this analyzer using the Analysis form, the output is as if the resulting text is unavailable to the tokenizer. In other words, the only results being displayed in the output on the form is for the PRCF This example stopped working in 4.7.0 and I've verified it worked correctly in 4.6.1. Initially, I thought this might be an issue with the actual analysis, but the analyzer actually works when indexing / querying. Then, looking at the JSON response in the Developer console with Chrome, I see the JSON that comes back includes output for all the components in my chain (see below) ... so looks like a UI rendering issue to me? {responseHeader:{status:0,QTime:24},analysis:{field_types:{text_microblog:{index:[org.apache.lucene.analysis.pattern.PatternReplaceCharFilter,#Yumm :) Drinking a latte at Caffe Grecco in SF's historic North Beach... Learning text analysis with #SolrInAction by @ManningBooks on my i-Pad foo5,org.apache.lucene.analysis.core.WhitespaceTokenizer,[{text:#Yumm,raw_bytes:[23 59 75 6d 6d],start:0,end:6,position:1,positionHistory:[1],type:word},{text::),raw_bytes:[3a 29],start:7,end:9,position:2,positionHistory:[2],type:word},{text:Drinking,raw_bytes:[44 72 69 6e 6b 69 6e 67],start:10,end:18,position:3,positionHistory:[3],type:word},{text:a,raw_bytes:[61],start:19,end:20,position:4,positionHistory:[4],type:word},{text:latte,raw_bytes:[6c ... the JSON returned to the browser has evidence that the full analysis chain was applied, so this seems to just be a rendering issue. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene/Solr 4.7.1
Sounds good to me. -- Mark Miller about.me/markrmiller On March 17, 2014 at 11:53:14 AM, Steve Rowe (sar...@gmail.com) wrote: I’d like to make a 4.7.1 release. I’ve committed SOLR-5875 to the lucene_solr_4_7 branch; I think it definitely warrants a bugfix release. I propose making an RC in one week: Monday March 24. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene/Solr 4.7.1
Thanks for doing this Steve! I've merged SOLR-5800 to the branch -Stefan On Monday, March 17, 2014 at 4:52 PM, Steve Rowe wrote: I’d like to make a 4.7.1 release. I’ve committed SOLR-5875 to the lucene_solr_4_7 branch; I think it definitely warrants a bugfix release. I propose making an RC in one week: Monday March 24. Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org (mailto:dev-unsubscr...@lucene.apache.org) For additional commands, e-mail: dev-h...@lucene.apache.org (mailto:dev-h...@lucene.apache.org)
[jira] [Commented] (SOLR-5873) Improve JavaBinCodec's backward compatibility tests
[ https://issues.apache.org/jira/browse/SOLR-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937985#comment-13937985 ] Mark Miller commented on SOLR-5873: --- Wrong Mark Miller pinged ;) I'm one of the last ones that comes up - markrmil...@gmail.com username rather than hakeber. Improve JavaBinCodec's backward compatibility tests --- Key: SOLR-5873 URL: https://issues.apache.org/jira/browse/SOLR-5873 Project: Solr Issue Type: Improvement Reporter: Varun Thacker SOLR-5265 added backward compatibility tests, but it tries to read a pre-written binary file to check if there is a break a not. If we add more types to JavaBinCodec the test will need to be updated too, which will be error prone again. This is what [~hakeber] proposed on IRC - - A test that I was thinking of: we could have a jenkins job that ran a script that checked out the previous version of lucene and the the latest - Then use the solr/cloud-dev scripts to start a cloud cluster - Index some docs - Stop a node at a time, replace webapp with the latest in a rolling upgrade fashion - Then we have a full rolling upgrade test This would be a better approach for back compat tests. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5837) Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField.
[ https://issues.apache.org/jira/browse/SOLR-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937991#comment-13937991 ] Mark Miller commented on SOLR-5837: --- If you look at the previous commits, you will see that this also had a CHANGES entry introduced under Other that needs to be removed. Add missing equals implementation for SolrDocument, SolrInputDocument and SolrInputField. - Key: SOLR-5837 URL: https://issues.apache.org/jira/browse/SOLR-5837 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Assignee: Noble Paul Attachments: SOLR-5837.patch, SOLR-5837.patch While working on SOLR-5265 I tried comparing objects of SolrDocument, SolrInputDocument and SolrInputField. These classes did not Override the equals implementation. The issue will Override equals and hashCode methods to the 3 classes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org