[jira] [Commented] (LUCENE-5525) Implement MultiFacets.getAllDims
[ https://issues.apache.org/jira/browse/LUCENE-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936049#comment-13936049 ] Shai Erera commented on LUCENE-5525: Looks good, +1! Implement MultiFacets.getAllDims Key: LUCENE-5525 URL: https://issues.apache.org/jira/browse/LUCENE-5525 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.7 Reporter: Jose Peleteiro Assignee: Michael McCandless Attachments: LUCENE-5525.patch DrillSideways.DrillSidewaysResult uses Facets when the query does not filter by a facet, but it uses MultiFacets when it does, and MultiFacets implementation is not complete. See: https://github.com/apache/lucene-solr/blob/0b0bc89932622f5bc2c4d74f978178b9ae15c700/lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java#L67 See http://pastebin.com/5eDbTM2v This code works when DrillDownQuery.add is not called (when there is no facets selected), but it fails with an UnsupportedOperationException. Perhaps I'm not using Facets correctly, but I'm trying to figure it out to upgrade from 4.6.1 by my self as I could not find a documentation other than javadocs for facets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5525) Implement MultiFacets.getAllDims
[ https://issues.apache.org/jira/browse/LUCENE-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936050#comment-13936050 ] Shai Erera commented on LUCENE-5525: I reviewed MultiCategoryListsFacetsExample.java under lucene/demo -- do you think it should use MultiFacets? And also exercise getAllDims()? Implement MultiFacets.getAllDims Key: LUCENE-5525 URL: https://issues.apache.org/jira/browse/LUCENE-5525 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.7 Reporter: Jose Peleteiro Assignee: Michael McCandless Attachments: LUCENE-5525.patch DrillSideways.DrillSidewaysResult uses Facets when the query does not filter by a facet, but it uses MultiFacets when it does, and MultiFacets implementation is not complete. See: https://github.com/apache/lucene-solr/blob/0b0bc89932622f5bc2c4d74f978178b9ae15c700/lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java#L67 See http://pastebin.com/5eDbTM2v This code works when DrillDownQuery.add is not called (when there is no facets selected), but it fails with an UnsupportedOperationException. Perhaps I'm not using Facets correctly, but I'm trying to figure it out to upgrade from 4.6.1 by my self as I could not find a documentation other than javadocs for facets. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936061#comment-13936061 ] Anshum Gupta commented on SOLR-5477: Thanks for pointing that out Steve. This must have gotten in when I started working on this one i.e. before SOLR-3854 went in and just stayed as a result of a bad merge. I'll fix this up. Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5477: --- Attachment: SOLR-5477.urlschemefix.patch Fix for not modifying url scheme. Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.urlschemefix.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936085#comment-13936085 ] ASF subversion and git services commented on SOLR-5477: --- Commit 1577801 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1577801 ] SOLR-5477: Fix URL scheme modification from an earlier commit for SOLR-5477. Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.urlschemefix.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/ Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms at __randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at
[jira] [Assigned] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned LUCENE-1486: -- Assignee: Erick Erickson Wildcards, ORs etc inside Phrase queries Key: LUCENE-1486 URL: https://issues.apache.org/jira/browse/LUCENE-1486 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Affects Versions: 2.4 Reporter: Mark Harwood Assignee: Erick Erickson Priority: Minor Fix For: 4.7 Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: checkMatches(\j* smyth~\, 1,2); //wildcards and fuzzies are OK in phrases checkMatches(\(jo* -john) smith\, 2); // boolean logic works checkMatches(\jo* smith\~2, 1,2,3); // position logic works. checkBadQuery(\jo* id:1 smith\); //mixing fields in a phrase is bad checkBadQuery(\jo* \smith\ \); //phrases inside phrases is bad checkBadQuery(\jo* [sma TO smZ]\ \); //range queries inside phrases not supported Code plus Junit test to follow... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.
[ https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned LUCENE-3758: -- Assignee: Erick Erickson Allow the ComplexPhraseQueryParser to search order or un-order proximity queries. - Key: LUCENE-3758 URL: https://issues.apache.org/jira/browse/LUCENE-3758 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Affects Versions: 4.0-ALPHA Reporter: Tomás Fernández Löbbe Assignee: Erick Erickson Priority: Minor Fix For: 4.7 Attachments: LUCENE-3758.patch The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder value hardcoded to true. This could be configurable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936189#comment-13936189 ] Erick Erickson commented on SOLR-1604: -- OK, I was looking around the patch and think I understand at least some of what's going on. To drive this forward, I need a couple of things: 1 Vitaliy and Ahmet to resolve the two patches and let me know what the right one to use is. BTW, Vitaliy, please use svn diff or the equivalent Git command to create patches. Zipped up sources are much harder to work with. 2 Some idea of a roadmap from here. Straw-man proposal: 2a Close 1486 and open a new JIRA if there's a fix for that if necessary. It looks to me like this patch can be committed without 1486 and we'll generate a separate fix. 2b commit 3758, and remove inOrder from this patch, then commit this patch. 2c I've assigned these to myself so I don't lose track of them. I'll look desperately for cycles to work on them :). But I have a couple of long plane flights in my future... 3 Of course we need to document the syntax and behavior here, [~ctargett] can probably point us in the right direction for doing this right by putting it in the new documentation! 4 I'm also curious what we know now in terms of performance, resource requirements, that kind of stuff. 5 I notice there's a patch labeled as having to do with license stuff. What's up there? Is this just putting the headers in the source files? 5 Anything else? Does anyone out there object to moving forward with this? Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Assignee: Erick Erickson Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Analytics test errors
I was all excited by the lack of errors coming from these tests until I noticed they were BadApples. So I took the ExpressionTest BadApple designation out and ran the test 20K times without error (it used to fail on my Mac). I'm going to pull the other BadApple designations out now that I'm stealing some cycles to work with this run all the tests a bunch of times on my laptop and, if I can't repro the problem, un-bad-apple them and commit to trunk unless there are lots of objections. Otherwise I don't see how to make forward progress on these. Apologies for the long period when they generated test noise, I've been unable to devote any time to it for far too long. Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936212#comment-13936212 ] Ahmet Arslan commented on SOLR-1604: Here are some clarification regarding zipped attachments : Zipped attachments are not meant for source code inclusion but for to be consumed as solr plugin. They will never be committed. Mainly because zipped version(s) include a duplicate code from lucene code base. Duplicated class is org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser. Duplication is done for two reasons : * To enable fielded queries. this duplicate code changes package name to org.apache.lucene.queryparser.classic.ComplexPhraseQueryParser. Originally Somehow this feature forgotten accidentally in LUCENE1468, while committing lucene.ComplexPhraseQueryParser. After that commit, package name changed from classic to complexPhrase. For this fix it needs to access a field from super class. After realizing this, chancing this fields visibility to protected is accepted by lazy consensus. This is the [patch|https://issues.apache.org/jira/secure/attachment/12513804/LUCENE-1486.patch] for this. * To enable ability change inOrder parameter. In original lucene code inOrder parameter is barcoded to true inSpanNearQuery classes. Separate jira for this is LUCENE-3758. By the way, why LUCENE-1486 is re-opened is a mystery. It is not re-opened because of unforgotten non-default patch. Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Assignee: Erick Erickson Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936221#comment-13936221 ] Ahmet Arslan commented on SOLR-1604: bq. Vitaliy and Ahmet to resolve the two patches and let me know what the right one to use is. none of them actually. They include source code (ComplexPhraseQueryParser.java) duplication from lucene. I will attach a patch that consumes lucene's ComplexPhraseQueryParser created against trunk. bq. Close 1486 and open a new JIRA if there's a fix for that if necessary. It looks to me like this patch can be committed without 1486 and we'll generate a separate fix. +1. Yes this patch can be committed without LUCENE-1486. +1 for closing LUCENE-1486 given that it is re-opened mysteriously. +1 for creating a separate jira for [this|https://issues.apache.org/jira/secure/attachment/12513804/LUCENE-1486.patch] functionality just because it is less confusing. bq. commit 3758, and remove inOrder from this patch, then commit this patch. Request ability change inOrder parameter came from a user. Robert had [this|https://issues.apache.org/jira/browse/LUCENE-3758?focusedCommentId=13206996page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13206996] comment about this. bq. I notice there's a patch labeled as having to do with license stuff. This attachment is old. I accidentally forget selection 'ASF inclusion radio box then. Jira weren't displaying feather icon for that. After that incident jira had removed that radio button selection option. Attachments are ASF granted by default now. That file is renamed automatically by infra. Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Assignee: Erick Erickson Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-2878: --- Assignee: Robert Muir (was: Simon Willnauer) Allow Scorer to expose positions and payloads aka. nuke spans -- Key: LUCENE-2878 URL: https://issues.apache.org/jira/browse/LUCENE-2878 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: Positions Branch Reporter: Simon Willnauer Assignee: Robert Muir Labels: gsoc2014 Fix For: Positions Branch Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec. So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute. I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :). The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing
[ https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936239#comment-13936239 ] Mark Miller commented on SOLR-5865: --- Looks great! + // We could upload the minimum set of files rather than the directory, but that requires keeping the list up to date + ZkController.uploadToZK(zkClient, new File(configDir), ZkController.CONFIGS_ZKNODE + / + configName); The main reason most of the cloud tests have gone with specifying which config files to put in zk was that uploading the entire directory of test configs was damn slow and then repeated for all cloud tests. A better solution at some point would be a new test config folder just for solrcloud. We already have a lot of configs, but we could probably merge some things into this - like the common solrconfig and schema that almost all cloud tests use anyway. If we kept it to one set, I think it would be an improvement for cloud tests. Provide a MiniSolrCloudCluster to enable easier testing --- Key: SOLR-5865 URL: https://issues.apache.org/jira/browse/SOLR-5865 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.7, 5.0 Reporter: Gregory Chanan Attachments: SOLR-5865.patch Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, which has a couple of issues around support for downstream projects: - It's difficult to test SolrCloud support in a downstream project that may have its own test framework. For example, some projects have support for different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests against each of the different backends. This is difficult to do cleanly, because the Solr tests require derivation from LuceneTestCase, while the other don't - The LuceneTestCase class hierarchy is really designed for internal solr tests (e.g. it randomizes a lot of parameters to get test coverage, but a downstream project probably doesn't care about that). It's also quite complicated and dense, much more so than a downstream project would want. Given these reasons, it would be nice to provide a simple MiniSolrCloudCluster, similar to how HDFS provides a MiniHdfsCluster or HBase provides a MiniHBaseCluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936240#comment-13936240 ] Simon Willnauer commented on LUCENE-2878: - Now we are talking Sent from my iPhone Allow Scorer to expose positions and payloads aka. nuke spans -- Key: LUCENE-2878 URL: https://issues.apache.org/jira/browse/LUCENE-2878 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: Positions Branch Reporter: Simon Willnauer Assignee: Robert Muir Labels: gsoc2014 Fix For: Positions Branch Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec. So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute. I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :). The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-1604: --- Attachment: SOLR-1604.patch This is solr-only patch (solr/core/src/) and does not touch lucene code case. It adds two new java classes (ComplexPhraseQParserPlugin and TestComplexPhraseQParserPlugin) and consumes o.a.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Assignee: Erick Erickson Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936265#comment-13936265 ] Mikhail Khludnev commented on LUCENE-5189: -- Just want to leave one caveat for memories. When you call {code}IW.updateNumericDocValue(Term, String, Long){code} make sure that the term is deeply cloned before. Otherwise, if you modify term or bytes, then the modified version will be applied. That's might be a problem. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936279#comment-13936279 ] Shai Erera commented on LUCENE-5189: I checked the code and it looks the same with e.g. deleteDocuments(Term) - the Term isn't cloned internally. So your comment pertains to other IW methods. Numeric DocValues Updates - Key: LUCENE-5189 URL: https://issues.apache.org/jira/browse/LUCENE-5189 Project: Lucene - Core Issue Type: New Feature Components: core/index Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.6, 5.0 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch In LUCENE-4258 we started to work on incremental field updates, however the amount of changes are immense and hard to follow/consume. The reason is that we targeted postings, stored fields, DV etc., all from the get go. I'd like to start afresh here, with numeric-dv-field updates only. There are a couple of reasons to that: * NumericDV fields should be easier to update, if e.g. we write all the values of all the documents in a segment for the updated field (similar to how livedocs work, and previously norms). * It's a fairly contained issue, attempting to handle just one data type to update, yet requires many changes to core code which will also be useful for updating other data types. * It has value in and on itself, and we don't need to allow updating all the data types in Lucene at once ... we can do that gradually. I have some working patch already which I'll upload next, explaining the changes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5770) All attempts to match a SolrCore with it's state in clusterstate.json should be done with the NodeName rather than the baseUrl.
[ https://issues.apache.org/jira/browse/SOLR-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936290#comment-13936290 ] Mark Miller commented on SOLR-5770: --- Awesome, thanks Steve - had not had a chance to look further at this yet. I'll try your patch this weekend. All attempts to match a SolrCore with it's state in clusterstate.json should be done with the NodeName rather than the baseUrl. --- Key: SOLR-5770 URL: https://issues.apache.org/jira/browse/SOLR-5770 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.8, 5.0 Attachments: SOLR-5770.patch, SOLR-5770.patch -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-5488: - Attachment: SOLR-5488.patch Takes of @Ignore and @BadApple. See comments here: https://issues.apache.org/jira/browse/SOLR-5685 This fix suddently caused FieldFacetTest to start failing. It fails first time, every time. Interestingly, when it does fail it's because MinMaxStatsCollection.getStat is looking for the stat min, but this.min is null. Seems like it _may_ be related to the mysterious failures we were seeing, but I'm grasping at straws. I'll be trying ExpressionTest repeatedly to see if we're back now... Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 4.7, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!
Hmm…only interesting logging I see is this: 57473 T32 oazsp.FileTxnLog.commit WARN fsync-ing the write ahead log in SyncThread:0 took 50531ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide I wonder if that means that if i boost the connect timeout from 45 to 60 seconds, it will pass. Perhaps this machine has some IO issues? -- Mark Miller about.me/markrmiller On March 15, 2014 at 9:23:25 AM, Policeman Jenkins Server (jenk...@thetaphi.de) wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/ Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms at __randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans
[ https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936304#comment-13936304 ] Alan Woodward commented on LUCENE-2878: --- Ooh, hello. So the LUCENE-2878 branch is a bit of a mess, in that it has two semi-working versions of this code: Simon's initial IntervalIterator API, in the o.a.l.search.intervals package, and my DocsEnum.nextPosition() API in o.a.l.search.positions. Simon's code is much more complete, and I've been using a separately maintained version of that in production code for various clients, which you can see at https://github.com/flaxsearch/lucene-solr-intervals. I think the nextPosition() API is nicer, but the IntervalIterator API has the advantage of actually working. The github repository has some other stuff on it too, around making the intervals code work across different fields. The API that I've come up with there is not very nice, though. It would be ace to get this moving again! Allow Scorer to expose positions and payloads aka. nuke spans -- Key: LUCENE-2878 URL: https://issues.apache.org/jira/browse/LUCENE-2878 Project: Lucene - Core Issue Type: Improvement Components: core/search Affects Versions: Positions Branch Reporter: Simon Willnauer Assignee: Robert Muir Labels: gsoc2014 Fix For: Positions Branch Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec. So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute. I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :). The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-fcs-b132) - Build # 9804 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9804/ Java: 32bit/jdk1.8.0-fcs-b132 -client -XX:+UseSerialGC 1 tests failed. FAILED: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:58601 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:58601 within 45000 ms at __randomizedtesting.SeedInfo.seed([2C01501183016211:ADE7DE09F45E022D]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936313#comment-13936313 ] Erick Erickson commented on SOLR-5488: -- OK, maybe we're on to something, ExpressionTest (run with a bunch of iterations) failed with a very similar message to FieldFacetTest. FWIW Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 4.7, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!
No IO issues and it runs on SSD. Machine is also stable and has no SATA timeouts or similar stuff. It is just a 3 year old server CPU and its running a Vbox VM in parallel. Uwe On 15. März 2014 21:31:10 MEZ, Mark Miller markrmil...@gmail.com wrote: Hmm…only interesting logging I see is this: 57473 T32 oazsp.FileTxnLog.commit WARN fsync-ing the write ahead log in SyncThread:0 took 50531ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide I wonder if that means that if i boost the connect timeout from 45 to 60 seconds, it will pass. Perhaps this machine has some IO issues? -- Mark Miller about.me/markrmiller On March 15, 2014 at 9:23:25 AM, Policeman Jenkins Server (jenk...@thetaphi.de) wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/ Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms at __randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
[jira] [Updated] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.
[ https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-3758: - Attachment: LUCENE-3758.patch patch for trunk (revision 1577942) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries. - Key: LUCENE-3758 URL: https://issues.apache.org/jira/browse/LUCENE-3758 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser Affects Versions: 4.0-ALPHA Reporter: Tomás Fernández Löbbe Assignee: Erick Erickson Priority: Minor Fix For: 4.7 Attachments: LUCENE-3758.patch, LUCENE-3758.patch The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder value hardcoded to true. This could be configurable. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties
Steve Davids created SOLR-5866: -- Summary: UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties Key: SOLR-5866 URL: https://issues.apache.org/jira/browse/SOLR-5866 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 The UpdateShardHandler configures it's own PoolingClientConnectionManager which *doesn't* use the system default scheme registry factory which interrogates the javax.net.ssl.* system properties to wire up the https scheme into HttpClient. To ease configuration the system default registry should be used. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties
[ https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Davids updated SOLR-5866: --- Attachment: SOLR-5866.patch Attached the trivial patch. UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties -- Key: SOLR-5866 URL: https://issues.apache.org/jira/browse/SOLR-5866 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 Attachments: SOLR-5866.patch The UpdateShardHandler configures it's own PoolingClientConnectionManager which *doesn't* use the system default scheme registry factory which interrogates the javax.net.ssl.* system properties to wire up the https scheme into HttpClient. To ease configuration the system default registry should be used. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases
Steve Davids created SOLR-5867: -- Summary: OverseerCollectionProcessor isn't properly generating https urls in some cases Key: SOLR-5867 URL: https://issues.apache.org/jira/browse/SOLR-5867 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 All URLs should generated using a call out to the zk state reader: {code} zkStateReader.getBaseUrlForNodeName(nodeName); {code} This is because the url scheme is stored in the clusterprops.json file and is necessary to properly build the correct URL to propagate the request. Please note that if the base_url is available, that should be used since it does have the properly built schemed url without the need to check zk. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases
[ https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Davids updated SOLR-5867: --- Attachment: SOLR-5867.patch Attached patch. OverseerCollectionProcessor isn't properly generating https urls in some cases -- Key: SOLR-5867 URL: https://issues.apache.org/jira/browse/SOLR-5867 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 Attachments: SOLR-5867.patch All URLs should generated using a call out to the zk state reader: {code} zkStateReader.getBaseUrlForNodeName(nodeName); {code} This is because the url scheme is stored in the clusterprops.json file and is necessary to properly build the correct URL to propagate the request. Please note that if the base_url is available, that should be used since it does have the properly built schemed url without the need to check zk. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936377#comment-13936377 ] Steve Davids commented on SOLR-5477: You should drop the unnecessary assignment: {code} String replica = zkStateReader.getBaseUrlForNodeName(nodeName); {code} on line 1829, making an unnecessary call out to zk for a value that isn't being used. Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.urlschemefix.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5868) HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup
Steve Davids created SOLR-5868: -- Summary: HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup Key: SOLR-5868 URL: https://issues.apache.org/jira/browse/SOLR-5868 Project: Solr Issue Type: Improvement Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 The default HttpClient hostname verifier is the BROWSER_COMPATIBLE_HOSTNAME_VERIFIER which verifies the hostname that is being connected to matches the hostname presented within the certificate. This is meant to protect clients that are making external requests out across the internet, but requests within the the SOLR cluster should be trusted and can be relaxed to simplify the SSL/certificate setup process. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936380#comment-13936380 ] ASF subversion and git services commented on SOLR-5477: --- Commit 1577965 from [~anshumg] in branch 'dev/trunk' [ https://svn.apache.org/r1577965 ] SOLR-5477: Removing an unwanted call to zk Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.urlschemefix.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5868) HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup
[ https://issues.apache.org/jira/browse/SOLR-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936381#comment-13936381 ] Steve Davids commented on SOLR-5868: In the current HttpClientUtil paradigm this can be achieved by retrieving the url scheme and setting the hostname verifier on the SSLSocketFactory: https://gist.github.com/sdavids13/9577027 If the HTTPClientBuilder approach is introduced (SOLR-5604) then it can be simply done via: {code} HttpClientBuilder.create().useSystemProperties().setHostnameVerifier(new AllowAllHostnameVerifier())...; {code} HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup - Key: SOLR-5868 URL: https://issues.apache.org/jira/browse/SOLR-5868 Project: Solr Issue Type: Improvement Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 The default HttpClient hostname verifier is the BROWSER_COMPATIBLE_HOSTNAME_VERIFIER which verifies the hostname that is being connected to matches the hostname presented within the certificate. This is meant to protect clients that are making external requests out across the internet, but requests within the the SOLR cluster should be trusted and can be relaxed to simplify the SSL/certificate setup process. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases
[ https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936392#comment-13936392 ] ASF subversion and git services commented on SOLR-5867: --- Commit 1577968 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1577968 ] SOLR-5867: OverseerCollectionProcessor isn't properly generating https urls in some cases OverseerCollectionProcessor isn't properly generating https urls in some cases -- Key: SOLR-5867 URL: https://issues.apache.org/jira/browse/SOLR-5867 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 Attachments: SOLR-5867.patch All URLs should generated using a call out to the zk state reader: {code} zkStateReader.getBaseUrlForNodeName(nodeName); {code} This is because the url scheme is stored in the clusterprops.json file and is necessary to properly build the correct URL to propagate the request. Please note that if the base_url is available, that should be used since it does have the properly built schemed url without the need to check zk. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases
[ https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936394#comment-13936394 ] ASF subversion and git services commented on SOLR-5867: --- Commit 1577969 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1577969 ] SOLR-5867: OverseerCollectionProcessor isn't properly generating https urls in some cases OverseerCollectionProcessor isn't properly generating https urls in some cases -- Key: SOLR-5867 URL: https://issues.apache.org/jira/browse/SOLR-5867 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8, 5.0 Attachments: SOLR-5867.patch All URLs should generated using a call out to the zk state reader: {code} zkStateReader.getBaseUrlForNodeName(nodeName); {code} This is because the url scheme is stored in the clusterprops.json file and is necessary to properly build the correct URL to propagate the request. Please note that if the base_url is available, that should be used since it does have the properly built schemed url without the need to check zk. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases
[ https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-5867. - Resolution: Fixed Fix Version/s: 5.0 Assignee: Shalin Shekhar Mangar Thanks Steve! OverseerCollectionProcessor isn't properly generating https urls in some cases -- Key: SOLR-5867 URL: https://issues.apache.org/jira/browse/SOLR-5867 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Assignee: Shalin Shekhar Mangar Fix For: 4.8, 5.0 Attachments: SOLR-5867.patch All URLs should generated using a call out to the zk state reader: {code} zkStateReader.getBaseUrlForNodeName(nodeName); {code} This is because the url scheme is stored in the clusterprops.json file and is necessary to properly build the correct URL to propagate the request. Please note that if the base_url is available, that should be used since it does have the properly built schemed url without the need to check zk. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties
[ https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936398#comment-13936398 ] ASF subversion and git services commented on SOLR-5866: --- Commit 1577971 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1577971 ] SOLR-5866: UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties -- Key: SOLR-5866 URL: https://issues.apache.org/jira/browse/SOLR-5866 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8 Attachments: SOLR-5866.patch The UpdateShardHandler configures it's own PoolingClientConnectionManager which *doesn't* use the system default scheme registry factory which interrogates the javax.net.ssl.* system properties to wire up the https scheme into HttpClient. To ease configuration the system default registry should be used. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties
[ https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936399#comment-13936399 ] ASF subversion and git services commented on SOLR-5866: --- Commit 1577972 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1577972 ] SOLR-5866: UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties -- Key: SOLR-5866 URL: https://issues.apache.org/jira/browse/SOLR-5866 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Fix For: 4.8, 5.0 Attachments: SOLR-5866.patch The UpdateShardHandler configures it's own PoolingClientConnectionManager which *doesn't* use the system default scheme registry factory which interrogates the javax.net.ssl.* system properties to wire up the https scheme into HttpClient. To ease configuration the system default registry should be used. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties
[ https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-5866. - Resolution: Fixed Fix Version/s: 5.0 Assignee: Shalin Shekhar Mangar Thanks Steve! UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties -- Key: SOLR-5866 URL: https://issues.apache.org/jira/browse/SOLR-5866 Project: Solr Issue Type: Bug Affects Versions: 4.7 Reporter: Steve Davids Assignee: Shalin Shekhar Mangar Fix For: 4.8, 5.0 Attachments: SOLR-5866.patch The UpdateShardHandler configures it's own PoolingClientConnectionManager which *doesn't* use the system default scheme registry factory which interrogates the javax.net.ssl.* system properties to wire up the https scheme into HttpClient. To ease configuration the system default registry should be used. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4978) Spatial search with point query won't find identical indexed point
[ https://issues.apache.org/jira/browse/LUCENE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4978: - Attachment: LUCENE-4978_fix_small_grid_false_negatives.patch This patch addresses the issue simply by removing the optimization. I did some performance tests with rects circles and it was very minor, although I didn't test polygons which should have a greater effect. While I was at it, I beefed up the tests further in ways that would have previously failed due to the false-negative. I removed an older test: RecursivePrefixTreeTest.geohashRecursiveRandom() which is hard to maintain and is now obsoleted by SpatialOpRecursivePrefixTreeTest which now uses geohashes. I'll commit this Monday. Spatial search with point query won't find identical indexed point -- Key: LUCENE-4978 URL: https://issues.apache.org/jira/browse/LUCENE-4978 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Affects Versions: 4.1 Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 4.7 Attachments: LUCENE-4978_fix_small_grid_false_negatives.patch Given a document with indexed POINT (10 20), when a search for INTERSECTS( POINT (10 20)) is issued, no results are returned. The work-around is to not search with a point shape, use a very small-radius circle or rectangle. (I'm marking this issue as minor because it's easy to do this). An unstated objective of the PrefixTree/grid approximation is that no matter what precision you use, an intersects query will find all true-positives. Due to approximations, it may also find some close false-positives. But in the case above, that unstated promise is violated. But it can also happen for query shapes other than points which do in fact barely enclose the point given at index time yet the indexed point is in-effect shifted to the center point of a cell which could be outside the query shape, and ultimately leading to a false-negative. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4978) Spatial search with point query won't find identical indexed point
[ https://issues.apache.org/jira/browse/LUCENE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4978: - Fix Version/s: (was: 4.7) 4.8 Spatial search with point query won't find identical indexed point -- Key: LUCENE-4978 URL: https://issues.apache.org/jira/browse/LUCENE-4978 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Affects Versions: 4.1 Reporter: David Smiley Assignee: David Smiley Priority: Minor Fix For: 4.8 Attachments: LUCENE-4978_fix_small_grid_false_negatives.patch Given a document with indexed POINT (10 20), when a search for INTERSECTS( POINT (10 20)) is issued, no results are returned. The work-around is to not search with a point shape, use a very small-radius circle or rectangle. (I'm marking this issue as minor because it's easy to do this). An unstated objective of the PrefixTree/grid approximation is that no matter what precision you use, an intersects query will find all true-positives. Due to approximations, it may also find some close false-positives. But in the case above, that unstated promise is violated. But it can also happen for query shapes other than points which do in fact barely enclose the point given at index time yet the indexed point is in-effect shifted to the center point of a cell which could be outside the query shape, and ultimately leading to a false-negative. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936415#comment-13936415 ] ASF subversion and git services commented on SOLR-3177: --- Commit 1577976 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1577976 ] SOLR-3177: Enable tagging and excluding filters in StatsComponent via the localParams syntax Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1 Reporter: Mathias H. Assignee: Shalin Shekhar Mangar Priority: Minor Labels: localparams, stats, statscomponent Attachments: SOLR-3177.patch, SOLR-3177.patch, SOLR-3177.patch It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936416#comment-13936416 ] ASF subversion and git services commented on SOLR-3177: --- Commit 1577977 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1577977 ] SOLR-3177: Enable tagging and excluding filters in StatsComponent via the localParams syntax Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1 Reporter: Mathias H. Assignee: Shalin Shekhar Mangar Priority: Minor Labels: localparams, stats, statscomponent Attachments: SOLR-3177.patch, SOLR-3177.patch, SOLR-3177.patch It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-3177. - Resolution: Fixed Fix Version/s: 5.0 4.8 This will be released with Solr 4.8 Thank you all for the comments and upvotes and sorry that this took so much time. Thanks Nikolai and Vitaliy for the patches! Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1 Reporter: Mathias H. Assignee: Shalin Shekhar Mangar Priority: Minor Labels: localparams, stats, statscomponent Fix For: 4.8, 5.0 Attachments: SOLR-3177.patch, SOLR-3177.patch, SOLR-3177.patch It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!
Hmm…I’ll check with Patrick Hunt and see if he has any thoughts on that logging warning. -- Mark Miller about.me/markrmiller On March 15, 2014 at 5:38:51 PM, Uwe Schindler (u...@thetaphi.de) wrote: No IO issues and it runs on SSD. Machine is also stable and has no SATA timeouts or similar stuff. It is just a 3 year old server CPU and its running a Vbox VM in parallel. Uwe On 15. März 2014 21:31:10 MEZ, Mark Miller markrmil...@gmail.com wrote: Hmm…only interesting logging I see is this: 57473 T32 oazsp.FileTxnLog.commit WARN fsync-ing the write ahead log in SyncThread:0 took 50531ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide I wonder if that means that if i boost the connect timeout from 45 to 60 seconds, it will pass. Perhaps this machine has some IO issues? -- Mark Miller about.me/markrmiller On March 15, 2014 at 9:23:25 AM, Policeman Jenkins Server (jenk...@thetaphi.de) wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/ Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms at __randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83) at org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201) at org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936426#comment-13936426 ] David Smiley commented on LUCENE-5527: -- +1 I like it! Make the Collector API work per-segment --- Key: LUCENE-5527 URL: https://issues.apache.org/jira/browse/LUCENE-5527 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor Spin-off of LUCENE-5299. LUCENE-5229 proposes different changes, some of them being controversial, but there is one of them that I really really like that consists in refactoring the {{Collector}} API in order to have a different Collector per segment. The idea is, instead of having a single Collector object that needs to be able to take care of all segments, to have a top-level Collector: {code} public interface Collector { AtomicCollector setNextReader(AtomicReaderContext context) throws IOException; } {code} and a per-AtomicReaderContext collector: {code} public interface AtomicCollector { void setScorer(Scorer scorer) throws IOException; void collect(int doc) throws IOException; boolean acceptsDocsOutOfOrder(); } {code} I think it makes the API clearer since it is now obious {{setScorer}} and {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is otherwise unclear. It also makes things more flexible. For example, a collector could much more easily decide to use different strategies on different segments. In particular, it makes the early-termination collector much cleaner since it can return different atomic collectors implementations depending on whether the current segment is sorted or not. Even if we have lots of collectors all over the place, we could make it easier to migrate by having a Collector that would implement both Collector and AtomicCollector, return {{this}} in setNextReader and make current concrete Collector implementations extend this class instead of directly extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936430#comment-13936430 ] Shai Erera commented on LUCENE-5527: Maybe we can get rid of setScorer, passing Scorer to {{setNextReader(AtomicReaderContext,Scorer)}}? Make the Collector API work per-segment --- Key: LUCENE-5527 URL: https://issues.apache.org/jira/browse/LUCENE-5527 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor Spin-off of LUCENE-5299. LUCENE-5229 proposes different changes, some of them being controversial, but there is one of them that I really really like that consists in refactoring the {{Collector}} API in order to have a different Collector per segment. The idea is, instead of having a single Collector object that needs to be able to take care of all segments, to have a top-level Collector: {code} public interface Collector { AtomicCollector setNextReader(AtomicReaderContext context) throws IOException; } {code} and a per-AtomicReaderContext collector: {code} public interface AtomicCollector { void setScorer(Scorer scorer) throws IOException; void collect(int doc) throws IOException; boolean acceptsDocsOutOfOrder(); } {code} I think it makes the API clearer since it is now obious {{setScorer}} and {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is otherwise unclear. It also makes things more flexible. For example, a collector could much more easily decide to use different strategies on different segments. In particular, it makes the early-termination collector much cleaner since it can return different atomic collectors implementations depending on whether the current segment is sorted or not. Even if we have lots of collectors all over the place, we could make it easier to migrate by having a Collector that would implement both Collector and AtomicCollector, return {{this}} in setNextReader and make current concrete Collector implementations extend this class instead of directly extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5430) Suggesters should verify its index before loading it from disk
[ https://issues.apache.org/jira/browse/LUCENE-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5430: - Fix Version/s: (was: 4.7) 4.8 Suggesters should verify its index before loading it from disk -- Key: LUCENE-5430 URL: https://issues.apache.org/jira/browse/LUCENE-5430 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.7, 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 4.8, 5.0 The issue was pointed out by Michael in the discussion on LUCENE-5404. The idea is to make all the suggesters use CodecUtils.writeHeader when they are about to store their index on file and subsequently perform a check when they are loaded. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5438) add near-real-time replication
[ https://issues.apache.org/jira/browse/LUCENE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5438: - Fix Version/s: (was: 4.7) 4.8 add near-real-time replication -- Key: LUCENE-5438 URL: https://issues.apache.org/jira/browse/LUCENE-5438 Project: Lucene - Core Issue Type: Improvement Components: modules/replicator Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.8, 5.0 Attachments: LUCENE-5438.patch, LUCENE-5438.patch Lucene's replication module makes it easy to incrementally sync index changes from a master index to any number of replicas, and it handles/abstracts all the underlying complexity of holding a time-expiring snapshot, finding which files need copying, syncing more than one index (e.g., taxo + index), etc. But today you must first commit on the master, and then again the replica's copied files are fsync'd, because the code operates on commit points. But this isn't technically necessary, and it mixes up durability and fast turnaround time. Long ago we added near-real-time readers to Lucene, for the same reason: you shouldn't have to commit just to see the new index changes. I think we should do the same for replication: allow the new segments to be copied out to replica(s), and new NRT readers to be opened, to fully decouple committing from visibility. This way apps can then separately choose when to replicate (for freshness), and when to commit (for durability). I think for some apps this could be a compelling alternative to the re-index all documents on each shard approach that Solr Cloud / ElasticSearch implement today, and it may also mean that the transaction log can remain external to / above the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5411) Upgrade to released JFlex 1.5.0
[ https://issues.apache.org/jira/browse/LUCENE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5411: - Fix Version/s: (was: 4.7) 4.8 Upgrade to released JFlex 1.5.0 --- Key: LUCENE-5411 URL: https://issues.apache.org/jira/browse/LUCENE-5411 Project: Lucene - Core Issue Type: Improvement Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-5411.patch The JFlex 1.5.0 release will be officially announced shortly. The jar is already on Maven Central. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5406) ShingleAnalyzerWrapper should expose the delegated analyzer as a public final
[ https://issues.apache.org/jira/browse/LUCENE-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5406: - Fix Version/s: (was: 4.7) 4.8 ShingleAnalyzerWrapper should expose the delegated analyzer as a public final - Key: LUCENE-5406 URL: https://issues.apache.org/jira/browse/LUCENE-5406 Project: Lucene - Core Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Fix For: 4.8, 5.0 I'm sometimes given a ShingleAnalyzerWrapper that I would like to change the shingle size on, so I need to create a new instance. However, I don't always know what the underlying analyzer is and I can't access it b/c it is a protected method on a final class. The solution here is to make the getAnalyzer method public final for the ShingleAnalyzerWrapper. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5402) Add support for index-time pruning in Document*Dictionary (Suggester)
[ https://issues.apache.org/jira/browse/LUCENE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5402: - Fix Version/s: (was: 4.7) 4.8 Add support for index-time pruning in Document*Dictionary (Suggester) - Key: LUCENE-5402 URL: https://issues.apache.org/jira/browse/LUCENE-5402 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Areek Zillur Fix For: 4.8, 5.0 Attachments: LUCENE-5402.patch, LUCENE-5402.patch It would be nice to be able to prune out entries that the suggester consumes by some query. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5417) Solr function query supports reading multiple values from a field.
[ https://issues.apache.org/jira/browse/LUCENE-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5417: - Fix Version/s: (was: 4.7) 4.8 Solr function query supports reading multiple values from a field. -- Key: LUCENE-5417 URL: https://issues.apache.org/jira/browse/LUCENE-5417 Project: Lucene - Core Issue Type: New Feature Components: core/query/scoring Affects Versions: 4.6 Environment: N/A Reporter: Peng Cheng Priority: Minor Fix For: 4.8 Attachments: MultiFieldCacheValueSources.patch Original Estimate: 168h Remaining Estimate: 168h Solr function query is a powerful tool to customize search criterion and ranking function (http://wiki.apache.org/solr/FunctionQuery). However, it cannot effectively benefit from field values from multi-valued field, namely, the field(...) function can only read one value and discard the others. This limitation has been associated with FieldCacheSource, and the fact that FieldCache cannot fetch multiple values from a field, but such constraint has been largely lifted by LUCENE-3354, which allows multiple values to be extracted from one field. Those values in turn can be used as parameters of other functions to yield a single score. I personally find this limitation very unhandy when building a learning-to-rank system that uses many cues and string features. Therefore I would like to post this feature request and (hopefully) work on it myself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5356) more generic lucene-morfologik integration
[ https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5356: - Fix Version/s: (was: 4.7) 4.8 more generic lucene-morfologik integration -- Key: LUCENE-5356 URL: https://issues.apache.org/jira/browse/LUCENE-5356 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.6 Reporter: Michal Hlavac Assignee: Dawid Weiss Priority: Minor Labels: newbie, patch Fix For: 4.8, 5.0 Attachments: LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5351) DirectoryReader#close can throw AlreadyClosedException if it's and NRT reader
[ https://issues.apache.org/jira/browse/LUCENE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5351: - Fix Version/s: (was: 4.7) 4.8 DirectoryReader#close can throw AlreadyClosedException if it's and NRT reader - Key: LUCENE-5351 URL: https://issues.apache.org/jira/browse/LUCENE-5351 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.6 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.8, 5.0 Attachments: LUCENE-5351.patch in StandartDirectoryReader#doClose we do this: {noformat} if (writer != null) { // Since we just closed, writer may now be able to // delete unused files: writer.deletePendingFiles(); } {noformat} which can throw AlreadyClosedException from the directory if the Direcotory has already closed. To me this looks like a bug and we should catch this exception from the directory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5381) Lucene highlighter doesn't honor hl.fragsize; it appends all text for last fragment
[ https://issues.apache.org/jira/browse/LUCENE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5381: - Fix Version/s: (was: 4.7) 4.8 Lucene highlighter doesn't honor hl.fragsize; it appends all text for last fragment --- Key: LUCENE-5381 URL: https://issues.apache.org/jira/browse/LUCENE-5381 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0, 4.6 Reporter: yuanyun.cn Priority: Minor Labels: highlighter, lucene Fix For: 4.8, 5.0 Attachments: LUCENE-5381.patch Original Estimate: 4h Remaining Estimate: 4h Recently, we hit a problem related with highlighter: I set hl.fragsize = 300, but the highlight section for one document outputs more than 2000 characters. Look into the code, in org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(TokenStream, String, boolean, int), after the for loop, it appends whole remaining text into last fragment. if ( // if there is text beyond the last token considered.. (lastEndOffset text.length()) // and that text is not too large... (text.length()= maxDocCharsToAnalyze) ) { //append it to the last fragment newText.append(encoder.encodeText(text.substring(lastEndOffset))); } currentFrag.textEndPos = newText.length(); This code is problematical, as in some cases, the last fragment is the most relevant section and will be selected to return to client. I made some change to the code like below: Now it works. //Test what remains of the original text beyond the point where we stopped analyzing if(lastEndOffset text.length()) { if(textFragmenter instanceof SimpleFragmenter) { SimpleFragmenter simpleFragmenter = (SimpleFragmenter) textFragmenter; int remain =simpleFragmenter.getFragmentSize() -(newText.length() - currentFrag.textStartPos); if(remain 0 ) { int endIndex = lastEndOffset + remain; if (endIndex text.length()) { endIndex = text.length(); } newText.append(encoder.encodeText(text.substring(lastEndOffset, endIndex))); } } else { newText.append(encoder.encodeText(text.substring(lastEndOffset))); } } currentFrag.textEndPos = newText.length(); -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5350) Add Context Aware Suggester
[ https://issues.apache.org/jira/browse/LUCENE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5350: - Fix Version/s: (was: 4.7) 4.8 Add Context Aware Suggester --- Key: LUCENE-5350 URL: https://issues.apache.org/jira/browse/LUCENE-5350 Project: Lucene - Core Issue Type: New Feature Components: core/search Reporter: Areek Zillur Fix For: 4.8, 5.0 Attachments: LUCENE-5350-benchmark.patch, LUCENE-5350-benchmark.patch, LUCENE-5350.patch, LUCENE-5350.patch It would be nice to have a Context Aware Suggester (i.e. a suggester that could return suggestions depending on some specified context(s)). Use-cases: - location-based suggestions: -- returns suggestions which 'match' the context of a particular area --- suggest restaurants names which are in Palo Alto (context - Palo Alto) - category-based suggestions: -- returns suggestions for items that are only in certain categories/genres (contexts) --- suggest movies that are of the genre sci-fi and adventure (context - [sci-fi, adventure]) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5056) Indexing non-point shapes close to the poles doesn't scale
[ https://issues.apache.org/jira/browse/LUCENE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5056: - Fix Version/s: (was: 4.7) 4.8 Indexing non-point shapes close to the poles doesn't scale -- Key: LUCENE-5056 URL: https://issues.apache.org/jira/browse/LUCENE-5056 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Affects Versions: 4.3 Reporter: Hal Deadman Assignee: David Smiley Fix For: 4.8 Attachments: indexed circle close to the pole.png From: [~hdeadman] We are seeing an issue where certain shapes are causing Solr to use up all available heap space when a record with one of those shapes is indexed. We were indexing polygons where we had the points going clockwise instead of counter-clockwise and the shape would be so large that we would run out of memory. We fixed those shapes but we are seeing this circle eat up about 700MB of memory before we get an OutOfMemory error (heap space) with a 1GB JVM heap. Circle(3.0 90 d=0.0499542757922153) Google Earth can't plot that circle either, maybe it is invalid or too close to the north pole due to the latitude of 90, but it would be nice if there was a way for shapes to be validated before they cause an OOM error. The objects (4.5 million) are all GeohashPrefixTree$GhCell objects in an ArrayList owned by PrefixTreeStrategy$CellTokenStream. Is there anyway to have a max number of cells in a shape before it is considered too large and is not indexed? Is there a geo library that could validate the shape as being reasonably sized and bounded before it is processed? We are currently using Solr 4.1. fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees / -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4872) BooleanWeight should decide how to execute minNrShouldMatch
[ https://issues.apache.org/jira/browse/LUCENE-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4872: - Fix Version/s: (was: 4.7) 4.8 BooleanWeight should decide how to execute minNrShouldMatch --- Key: LUCENE-4872 URL: https://issues.apache.org/jira/browse/LUCENE-4872 Project: Lucene - Core Issue Type: Sub-task Components: core/search Reporter: Robert Muir Fix For: 4.8 Attachments: crazyMinShouldMatch.tasks LUCENE-4571 adds a dedicated document-at-time scorer for minNrShouldMatch which can use advance() behind the scenes. In cases where you have some really common terms and some rare ones this can be a huge performance improvement. On the other hand BooleanScorer might still be faster in some cases. We should think about what the logic should be here: one simple thing to do is to always use the new scorer when minShouldMatch is set: thats where i'm leaning. But maybe we could have a smarter heuristic too, perhaps based on cost() -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field
[ https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5199: - Fix Version/s: (was: 4.7) 4.8 Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field --- Key: LUCENE-5199 URL: https://issues.apache.org/jira/browse/LUCENE-5199 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Shai Erera Assignee: Shai Erera Fix For: 4.8 Attachments: LUCENE-5199.patch, LUENE-5199.patch On LUCENE-5178 Han reported the following test failure: {noformat} [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues [junit4] Throwable #1: org.junit.ComparisonFailure: expected:...(0) [junit4] less than 10 ([8) [junit4] less than or equal to 10 (]8) [junit4] over 90 (8) [junit4] 9... but was:...(0) [junit4] less than 10 ([28) [junit4] less than or equal to 10 (2]8) [junit4] over 90 (8) [junit4] 9... [junit4] at __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0) [junit4] at org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670) [junit4] at java.lang.Thread.run(Thread.java:722) {noformat} which can be reproduced with {noformat} tcase=TestRangeAccumulator -Dtests.method=testMissingValues -Dtests.seed=815B6AA86D05329C -Dtests.slow=true -Dtests.postingsformat=Lucene41 -Dtests.locale=ca -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8 {noformat} It seems that the Codec that is picked is a Lucene45Codec with Lucene42DVFormat, which does not support docsWithFields for numericDV. We should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields and check that the actual DVF used for each field supports it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4960) Require minimum ivy version
[ https://issues.apache.org/jira/browse/LUCENE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4960: - Fix Version/s: (was: 4.7) 4.8 Require minimum ivy version --- Key: LUCENE-4960 URL: https://issues.apache.org/jira/browse/LUCENE-4960 Project: Lucene - Core Issue Type: Bug Components: general/build Affects Versions: 4.2.1 Reporter: Shawn Heisey Priority: Minor Fix For: 4.8 Someone on solr-user ran into a problem while trying to run 'ant idea' so they could work on Solr in their IDE. [~steve_rowe] indicated that this is probably due to IVY-1194, requiring an ivy jar upgrade. The build system should check for a minimum ivy version, just like it does with ant. The absolute minimum we require appears to be 2.2.0, but do we want to make it 2.3.0 due to IVY-1388? I'm not sure how to go about checking the ivy version. Checking the ant version is easy because it's ant itself that does the checking. There might be other component versions that should be checked too. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4950) AssertingIndexSearcher isn't wrapping the Collector to AssertingCollector
[ https://issues.apache.org/jira/browse/LUCENE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4950: - Fix Version/s: (was: 4.7) 4.8 AssertingIndexSearcher isn't wrapping the Collector to AssertingCollector - Key: LUCENE-4950 URL: https://issues.apache.org/jira/browse/LUCENE-4950 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.8 Attachments: LUCENE-4950.patch -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5317) [PATCH] Concordance capability
[ https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5317: - Fix Version/s: (was: 4.7) 4.8 [PATCH] Concordance capability -- Key: LUCENE-5317 URL: https://issues.apache.org/jira/browse/LUCENE-5317 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 4.5 Reporter: Tim Allison Labels: patch Fix For: 4.8 Attachments: concordance_v1.patch.gz This patch enables a Lucene-powered concordance search capability. Concordances are extremely useful for linguists, lawyers and other analysts performing analytic search vs. traditional snippeting/document retrieval tasks. By analytic search, I mean that the user wants to browse every time a term appears (or at least the topn) in a subset of documents and see the words before and after. Concordance technology is far simpler and less interesting than IR relevance models/methods, but it can be extremely useful for some use cases. Traditional concordance sort orders are available (sort on words before the target, words after, target then words before and target then words after). Under the hood, this is running SpanQuery's getSpans() and reanalyzing to obtain character offsets. There is plenty of room for optimizations and refactoring. Many thanks to my colleague, Jason Robinson, for input on the design of this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
[ https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4524: - Fix Version/s: (was: 4.7) 4.8 Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum - Key: LUCENE-4524 URL: https://issues.apache.org/jira/browse/LUCENE-4524 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-4524.patch, LUCENE-4524.patch spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261 {noformat} hey folks, I have spend a hell lot of time on the positions branch to make positions and offsets working on all queries if needed. The one thing that bugged me the most is the distinction between DocsEnum and DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. Same is true for DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I don't really see the benefits from this. We should rather make the interface simple and call it something like PostingsEnum where you have to specify flags on the TermsIterator and if we can't provide the sufficient enum we throw an exception? I just want to bring up the idea here since it might simplify a lot for users as well for us when improving our positions / offset etc. support. thoughts? Ideas? simon {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4943) remove 'Changes to Backwards Compatibility Policy' from lucene/CHANGES.txt
[ https://issues.apache.org/jira/browse/LUCENE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4943: - Fix Version/s: (was: 4.7) 4.8 remove 'Changes to Backwards Compatibility Policy' from lucene/CHANGES.txt -- Key: LUCENE-4943 URL: https://issues.apache.org/jira/browse/LUCENE-4943 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.8 CHANGES.txt is useful to summarize the changes in a release. However its expected that a lot of changes will impact the APIs, this currently hurts the quality of CHANGES.txt because it leads to a significant portion of changes (whether they be bugs, features, whatever) being grouped under this one title. It also leads to descriptions of CHANGES being unnecessarily verbose. I think it makes CHANGES confusing and overwhelming, and it would be better to have a simpler 'upgrading' section with practical information on what you actually need to do (like Solr's CHANGES.txt). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5288) Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur close together (in proximity) in each document
[ https://issues.apache.org/jira/browse/LUCENE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5288: - Fix Version/s: (was: 4.7) 4.8 Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur close together (in proximity) in each document - Key: LUCENE-5288 URL: https://issues.apache.org/jira/browse/LUCENE-5288 Project: Lucene - Core Issue Type: New Feature Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.8, 5.0 Attachments: LUCENE-5288.patch, LUCENE-5288.patch, LUCENE-5288.patch, LUCENE-5288.patch This is very much a work in progress, tons of nocommits... It adds two classes: * ProxBooleanTermQuery: like BooleanQuery (currently, all clauses must be TermQuery, and only Occur.SHOULD is supported), which is essentially a BooleanQuery (same matching/scoring) except for each matching docs the positions are merge-sorted and scored to boost the document's score * QueryRescorer: simple API to re-score top hits using a different query. Because ProxBooleanTermQuery is so costly, apps would normally run an ordinary BooleanQuery across the full index, to get the top few hundred hits, and then rescore using the more costly ProxBooleanTermQuery (or other costly queries). I'm not sure how to actually compute the appropriate prox boost (this is the hard part!!) and I've completely punted on that in the current patch (it's just a hack now), but the patch does all the mechanics to merge/visit all the positions in order per hit. Maybe we could do the similar scoring that SpanNearQuery or sloppy PhraseQuery would do, or maybe this paper: http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf which Rob also used in LUCENE-4909 to add proximity scoring to PostingsHighlighter. Maybe we need to make it (how the prox boost is computed/folded in) somehow pluggable ... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5024) Can we reliably detect an incomplete first commit vs index corruption?
[ https://issues.apache.org/jira/browse/LUCENE-5024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5024: - Fix Version/s: (was: 4.7) 4.8 Can we reliably detect an incomplete first commit vs index corruption? -- Key: LUCENE-5024 URL: https://issues.apache.org/jira/browse/LUCENE-5024 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Fix For: 4.8 Normally, if something bad happens (OS, JVM, hardware crashes) while IndexWriter is committing, we will just fallback to the prior commit and no intervention necessary from the app. But if that commit is the first commit, then on restart IndexWriter will now throw CorruptIndexException, as of LUCENE-4738. Prior to LUCENE-4738, in LUCENE-2812, we used to try to detect the corrupt first commit, but that logic was dangerous and could result in falsely believing no index is present when one is, e.g. when transient IOExceptions are thrown due to file descriptor exhaustion. But now two users have hit this change ... see CorruptIndexException when opening Index during first commit and Calling IndexWriter.commit() immediately after creating the writer, both on java-user. It would be nice to get back to not marking an incomplete first commit as corruption ... but we have to proceed carefully. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5093) nightly-smoke should run some fail fast checks before doing the full smoke tester
[ https://issues.apache.org/jira/browse/LUCENE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5093: - Fix Version/s: (was: 4.7) 4.8 nightly-smoke should run some fail fast checks before doing the full smoke tester - Key: LUCENE-5093 URL: https://issues.apache.org/jira/browse/LUCENE-5093 Project: Lucene - Core Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.8 Attachments: LUCENE-5093.patch If something like the NOTICES fail the smoke tester, it currently takes 22 minutes to find out on my pretty fast machine. That means testing a fix also takes 22 minutes. It would be nice if some of these types of checks happened right away on the src tree - we should also check the actual artifacts with the same check later - but also have this fail fast path. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4813) Allow DirectSpellchecker to use totalTermFrequency rather than docFrequency
[ https://issues.apache.org/jira/browse/LUCENE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4813: - Fix Version/s: (was: 4.7) 4.8 Allow DirectSpellchecker to use totalTermFrequency rather than docFrequency --- Key: LUCENE-4813 URL: https://issues.apache.org/jira/browse/LUCENE-4813 Project: Lucene - Core Issue Type: Bug Components: modules/spellchecker Affects Versions: 4.1 Reporter: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-4813.patch, LUCENE-4813.patch we have a bunch of new statistics in on our term dictionaries that we should make use of where it makes sense. For DirectSpellChecker totalTermFreq and sumTotalTermFreq might be better suited for spell correction on top of a fulltext index than docFreq and maxDoc -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4491) Make analyzing suggester more flexible
[ https://issues.apache.org/jira/browse/LUCENE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4491: - Fix Version/s: (was: 4.7) 4.8 Make analyzing suggester more flexible -- Key: LUCENE-4491 URL: https://issues.apache.org/jira/browse/LUCENE-4491 Project: Lucene - Core Issue Type: Improvement Components: modules/other Affects Versions: 4.1 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-4491.patch, LUCENE-4491.patch Today we have a analyzing suggester that is bound to a single key. Yet, if you want to have a totally different surface form compared to the key used to find the suggestion you either have to copy the code or play some super ugly analyzer tricks. For example I want to suggest Barbar Streisand if somebody types strei in that case the surface form is totally different from the analyzed form. Even one step further I want to embed some meta-data in the suggested key like a user id or some type my surface form could look like Barbar Streisand|15. Ideally I want to encode this as binary and that might not be a valid UTF-8 byte sequence. I'm actually doing this in production and my only option was to copy the analyzing suggester and some of it's related classes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5318) Co-occurrence counts from Concordance
[ https://issues.apache.org/jira/browse/LUCENE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5318: - Fix Version/s: (was: 4.7) 4.8 Co-occurrence counts from Concordance - Key: LUCENE-5318 URL: https://issues.apache.org/jira/browse/LUCENE-5318 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 4.5 Reporter: Tim Allison Labels: patch Fix For: 4.8 Attachments: cooccur_v1.patch.gz This patch calculates co-occurrence statistics on search terms within a window of x tokens. This can help in synonym discovery and anywhere else co-occurrence stats have been used. The attached patch depends on LUCENE-5317. Again, many thanks to my colleague, Jason Robinson, for advice in developing this code and for his modifications to this code to make it more Solr-friendly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4734) FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
[ https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4734: - Fix Version/s: (was: 4.7) 4.8 FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight Key: LUCENE-4734 URL: https://issues.apache.org/jira/browse/LUCENE-4734 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0, 4.1, 5.0 Reporter: Ryan Lauck Labels: fastvectorhighlighter, highlighter Fix For: 4.8 Attachments: LUCENE-4734-2.patch, LUCENE-4734.patch, lucene-4734.patch If a proximity phrase query overlaps with any other query term it will not be highlighted. Example Text: A B C D E F G Example Queries: B E~10 D (D will be highlighted instead of B C D E) B E~10 C F~10 (nothing will be highlighted) This can be traced to the FieldPhraseList constructor's inner while loop. From the first example query, the first TermInfo popped off the stack will be B. The second TermInfo will be D which will not be found in the submap for B E~10 and will trigger a failed match. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4746) Create a move method in Directory.
[ https://issues.apache.org/jira/browse/LUCENE-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4746: - Fix Version/s: (was: 4.7) 4.8 Create a move method in Directory. -- Key: LUCENE-4746 URL: https://issues.apache.org/jira/browse/LUCENE-4746 Project: Lucene - Core Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.8 Attachments: LUCENE-4746.patch I'd like to make a move method for directory. We already have a move for Solr in DirectoryFactory, but it seems it belongs at the directory level really. The default impl can do a copy and delete, but most implementations will be able to optimize to a rename. Besides the move we do for Solr (to move a replicated index into place), it would also be useful for another feature I'd like to add - the ability to merge an index with moves rather than copies. In some cases, you don't need/want to copy all the files and could just rename/move them. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory
[ https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4281: - Fix Version/s: (was: 4.7) 4.8 Delegate to default thread factory in NamedThreadFactory Key: LUCENE-4281 URL: https://issues.apache.org/jira/browse/LUCENE-4281 Project: Lucene - Core Issue Type: Improvement Affects Versions: 3.6.1, 4.0-BETA, 5.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.8 Attachments: LUCENE-4281.patch currently we state that we yield the same behavior as Executors#defaultThreadFactory() but this behavior could change over time even if it is compatible. We should just delegate to the default thread factory instead of creating the threads ourself. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4823) Add a separate registration singleton for Lucene's SPI, so there is only one central instance to request rescanning of classpath (e.g. from Solr's ResourceLoader)
[ https://issues.apache.org/jira/browse/LUCENE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4823: - Fix Version/s: (was: 4.7) 4.8 Add a separate registration singleton for Lucene's SPI, so there is only one central instance to request rescanning of classpath (e.g. from Solr's ResourceLoader) Key: LUCENE-4823 URL: https://issues.apache.org/jira/browse/LUCENE-4823 Project: Lucene - Core Issue Type: Bug Components: core/other Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.8 Currently there is no easy way to do a global rescan/reload of all of Lucene's SPIs in the right order. In solr there is a long list of reload instructions in the ResourceLoader. If somebody adds a new SPI type, you have to add it there. It would be good to java a central instance in oal.util that keeps track of all NamedSPILoaders and AnalysisSPILoaders (in the order they were instantiated), so you have one central entry point to trigger a reload. This issue will introduce: - A singleton that makes reloading possible. The singleton keeps weak refs to all loaders (of any kind) in the order they were created. - NamedSPILoader and AnalysisSPILoader cleanup (unfortunately we need both instances, as they differ in the internals (one keeps classes, the other one instances). Both should implement a reloadable interface. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5310) Merge Threads unnecessarily block on SerialMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5310: - Fix Version/s: (was: 4.7) 4.8 Merge Threads unnecessarily block on SerialMergeScheduler - Key: LUCENE-5310 URL: https://issues.apache.org/jira/browse/LUCENE-5310 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.5, 5.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-5310.patch, LUCENE-5310.patch I have been working on a high level merge multiplexer that shares threads across different IW instances and I came across the fact that SerialMergeScheduler actually blocks incoming thread is a merge in going on. Yet this blocks threads unnecessarily since we pull the merges in a loop anyway. We should use a tryLock operation instead of syncing the entire method? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4803) DrillDownQuery should rewrite to FilteredQuery?
[ https://issues.apache.org/jira/browse/LUCENE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4803: - Fix Version/s: (was: 4.7) 4.8 DrillDownQuery should rewrite to FilteredQuery? --- Key: LUCENE-4803 URL: https://issues.apache.org/jira/browse/LUCENE-4803 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.8 Today we rewrite to a query like +baseQuery +ConstantScoreQuery(boost=0.0 TermQuery(drillDownTerm)), but I'm not certain 0.0 boost is safe / doesn't actually change scores. We should also add a test to assert that scores are not changed by drill down. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4630) add a system property to allow testing of suspicious stuff
[ https://issues.apache.org/jira/browse/LUCENE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4630: - Fix Version/s: (was: 4.7) 4.8 add a system property to allow testing of suspicious stuff -- Key: LUCENE-4630 URL: https://issues.apache.org/jira/browse/LUCENE-4630 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Fix For: 4.8 there are times when people want to add assumptions in test to prevent confusing/false failures in certain situations (eg: known bugs in JVM X, known incompatibilities between lucene feature Z and filesystem Y, etc...) By default we want these situations to be skiped in tests with clear messages so that it's clear to end users trying out releases that these tests can't be run for specific sitautions. But at the same time we need a way for developers to be able to try running these tests anyway so we know if/when the underliyng problem is resolved. i propose we add a tests.suspicious.shit system property, which defaults to false in the javacode, but can be set at runtime to true assumptions about things like incompatibilities with OSs, JVM vendors, JVM versions, filesystems, etc.. can all be dependent on this system propery. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4526) Allow runtime settings on Codecs
[ https://issues.apache.org/jira/browse/LUCENE-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4526: - Fix Version/s: (was: 4.7) 4.8 Allow runtime settings on Codecs Key: LUCENE-4526 URL: https://issues.apache.org/jira/browse/LUCENE-4526 Project: Lucene - Core Issue Type: Bug Components: core/codecs Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-4526.patch Today we expose termIndexInterval and termIndexDivisor via several APIs and they are deprecated. Those settings are 1. codec / postingformat specific and 2. not extendable. We should provide a more flexible way to pass information down to our codecs. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5326) Add enum facet method to Lucene facet module
[ https://issues.apache.org/jira/browse/LUCENE-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5326: - Fix Version/s: (was: 4.7) 4.8 Add enum facet method to Lucene facet module Key: LUCENE-5326 URL: https://issues.apache.org/jira/browse/LUCENE-5326 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.8, 5.0 Attachments: LUCENE-5326.patch I've been testing Solr facet performance, and the enum method works very well for low cardinality (not many unique values) fields. So I think we should fold a similar option into Lucene's facet module. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4835) Raise maxClauseCount in BooleanQuery to Integer.MAX_VALUE
[ https://issues.apache.org/jira/browse/LUCENE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4835: - Fix Version/s: (was: 4.7) 4.8 Raise maxClauseCount in BooleanQuery to Integer.MAX_VALUE - Key: LUCENE-4835 URL: https://issues.apache.org/jira/browse/LUCENE-4835 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.2 Reporter: Shawn Heisey Fix For: 4.8 Discussion on SOLR-4586 raised the idea of raising the limit on boolean clauses from 1024 to Integer.MAX_VALUE. This should be a safe change. It will change the nature of help requests from Why can't I do 2000 clauses? to Why is my 5000-clause query slow? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3997) join module should not depend on grouping module
[ https://issues.apache.org/jira/browse/LUCENE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3997: - Fix Version/s: (was: 4.7) 4.8 join module should not depend on grouping module Key: LUCENE-3997 URL: https://issues.apache.org/jira/browse/LUCENE-3997 Project: Lucene - Core Issue Type: Task Affects Versions: 4.0-ALPHA Reporter: Robert Muir Fix For: 4.8 Attachments: LUCENE-3997.patch, LUCENE-3997.patch I think TopGroups/GroupDocs should simply be in core? Both grouping and join modules use these trivial classes, but join depends on grouping just for them. I think its better that we try to minimize these inter-module dependencies. Of course, another option is to combine grouping and join into one module, but last time i brought that up nobody could agree on a name. Anyway I think the change is pretty clean: its similar to having basic stuff like Analyzer.java in core, so other things can work with Analyzer without depending on any specific implementing modules. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4954) LuceneTestFramework fails to catch temporary FieldCache insanity
[ https://issues.apache.org/jira/browse/LUCENE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4954: - Fix Version/s: (was: 4.7) 4.8 LuceneTestFramework fails to catch temporary FieldCache insanity Key: LUCENE-4954 URL: https://issues.apache.org/jira/browse/LUCENE-4954 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Fix For: 4.8 Ever since we added readerClosedListeners to evict FieldCache entries, LTC will no longer detect insanity as long as the test closes all readers leading to insanity ... So this has weakened our testing of catching accidental insanity producing code. To fix this I think we could tap into FieldCacheImpl.setInfoStream ... and ensure the test didn't print anything to it. This was a spinoff from LUCENE-4953, where that test (AllGroupHeadsCollectorTest) is always producing insanity, but then because of a bug the FC eviction wasn't working right, and LTC then detected the insanity. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5130) fail the build on compilation warnings
[ https://issues.apache.org/jira/browse/LUCENE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5130: - Fix Version/s: (was: 4.7) 4.8 fail the build on compilation warnings -- Key: LUCENE-5130 URL: https://issues.apache.org/jira/browse/LUCENE-5130 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.8 Attachments: LUCENE-5130.patch, LUCENE-5130.patch Many modules compile w/o warnings ... we should lock this in and fail the build if warnings are ever added, and try to fix the warnings in existing modules. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4121) Standardize ramBytesUsed/sizeInBytes/memSize
[ https://issues.apache.org/jira/browse/LUCENE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4121: - Fix Version/s: (was: 4.7) 4.8 Standardize ramBytesUsed/sizeInBytes/memSize Key: LUCENE-4121 URL: https://issues.apache.org/jira/browse/LUCENE-4121 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.8 Attachments: LUCENE-4121.patch We should standardize the names of the methods we use to estimate the sizes of objects in memory and on disk. (cf. discussion on dev@lucene http://search-lucene.com/m/VbXSx1BP60G). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3843) implement PositionLengthAttribute for all tokenstreams where its appropriate
[ https://issues.apache.org/jira/browse/LUCENE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3843: - Fix Version/s: (was: 4.7) 4.8 implement PositionLengthAttribute for all tokenstreams where its appropriate Key: LUCENE-3843 URL: https://issues.apache.org/jira/browse/LUCENE-3843 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 4.8 LUCENE-3767 introduces PositionLengthAttribute, which extends the tokenstream API from a sausage to a real graph. Currently tokenstreams such as WordDelimiterFilter and SynonymsFilter theoretically work at a graph level, but then serialize themselves to a sausage, for example: wi-fi with WDF creates: wi(posinc=1), fi(posinc=1), wifi(posinc=0) So the lossiness is that the 'wifi' is simply stacked ontop of 'fi' PositionLengthAttribute fixes this by allowing a token to declare how far it spans, so we don't lose any information. While the indexer currently can only support sausages anyway (and for performance reasons, this is probably just fine!), other tokenstream consumers such as queryparsers and suggesters such as LUCENE-3842 can actually make use of this information for better behavior. So I think its ideal if the TokenStream API doesn't reflect the lossiness of the index format, but instead keeps all information, and after LUCENE-3767 is committed we should fix tokenstreams to preserve this information for consumers that can use it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3451) Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3451: - Fix Version/s: (was: 4.7) 4.8 Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery - Key: LUCENE-3451 URL: https://issues.apache.org/jira/browse/LUCENE-3451 Project: Lucene - Core Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.8 Attachments: LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows pure negative Filter clauses. This is not supported by BooleanQuery and confuses users (I think that's the problem in LUCENE-3450). The hack is buggy, as it does not respect deleted documents and returns them in its DocIdSet. Also we should think about disallowing pure-negative Queries at all and throw UOE. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4545) Better error reporting StemmerOverrideFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4545: - Fix Version/s: (was: 4.7) 4.8 Better error reporting StemmerOverrideFilterFactory --- Key: LUCENE-4545 URL: https://issues.apache.org/jira/browse/LUCENE-4545 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Markus Jelsma Priority: Trivial Fix For: 4.8 Attachments: LUCENE-4545-trunk-1.patch If the dictionary contains an error such as a space instead of a tab somewhere in the dictionary it is hard to find the error in a long dictionary. This patch includes the file and line number in the exception, helping to debug it quickly. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4246) Fix IndexWriter.close() to not commit or wait for pending merges
[ https://issues.apache.org/jira/browse/LUCENE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4246: - Fix Version/s: (was: 4.7) 4.8 Fix IndexWriter.close() to not commit or wait for pending merges Key: LUCENE-4246 URL: https://issues.apache.org/jira/browse/LUCENE-4246 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.8 -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4382) Unicode escape no longer works for non-suffix-only wildcard terms
[ https://issues.apache.org/jira/browse/LUCENE-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4382: - Fix Version/s: (was: 4.7) 4.8 Unicode escape no longer works for non-suffix-only wildcard terms - Key: LUCENE-4382 URL: https://issues.apache.org/jira/browse/LUCENE-4382 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.0-BETA Reporter: Jack Krupansky Fix For: 4.8 LUCENE-588 added support for escaping of wildcard characters, but when the de-escaping logic was pushed down from the query parser (QueryParserBase) into WildcardQuery, support for Unicode escaping (backslash, u, and the four-digit hex Unicode code) was not included. Two solutions: 1. Do the Unicode de-escaping in the query parser before calling getWildcardQuery. 2. Support Unicode de-escaping in WildcardQuery. A suffix-only wildcard does not exhibit this problem because full de-escaping is performed in the query parser before calling getPrefixQuery. My test case, added at the beginning of TestExtendedDismaxParser.testFocusQueryParser: {code} assertQ(expected doc is missing (using escaped edismax w/field), req(q, t_special:literal\\:\\u0063olo*n, defType, edismax), //doc[1]/str[@name='id'][.='46']); {code} Note: That test case was only used to debug into WildcardQuery to see that the Unicode escape was not processed correctly. It fails in all cases, but that's because of how the field type is analyzed. Here is a Lucene-level test case that can also be debugged to see that WildcardQuery is not processing the Unicode escape properly. I added it at the start of TestMultiAnalyzer.testMultiAnalyzer: {code} assertEquals(literal\\:\\u0063olo*n, qp.parse(literal\\:\\u0063olo*n).toString()); {code} Note: This case will always run correctly since it is only checking the input pattern string for WildcardQuery and not how the de-escaping was performed within WildcardQuery. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4159) Code review before 4.0 release
[ https://issues.apache.org/jira/browse/LUCENE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4159: - Fix Version/s: (was: 4.7) 4.8 Code review before 4.0 release -- Key: LUCENE-4159 URL: https://issues.apache.org/jira/browse/LUCENE-4159 Project: Lucene - Core Issue Type: Task Reporter: Tommaso Teofili Priority: Minor Fix For: 4.8 Before the 4.0 release I think it makes sense to plan for a (Lucene and Solr) comprehensive code review in order to improve APIs, performance and code style. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3978) redo how our download redirect pages work
[ https://issues.apache.org/jira/browse/LUCENE-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3978: - Fix Version/s: (was: 4.7) 4.8 redo how our download redirect pages work - Key: LUCENE-3978 URL: https://issues.apache.org/jira/browse/LUCENE-3978 Project: Lucene - Core Issue Type: Improvement Reporter: Hoss Man Fix For: 4.8 the download latest redirect pages are kind of a pain to change when we release a new version... http://lucene.apache.org/core/mirrors-core-latest-redir.html http://lucene.apache.org/solr/mirrors-solr-latest-redir.html -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4688) Reuse TermsEnum in BlockTreeTermsReader
[ https://issues.apache.org/jira/browse/LUCENE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4688: - Fix Version/s: (was: 4.7) 4.8 Reuse TermsEnum in BlockTreeTermsReader --- Key: LUCENE-4688 URL: https://issues.apache.org/jira/browse/LUCENE-4688 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 4.0, 4.1 Reporter: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-4688.patch Opening a TermsEnum comes with a significant cost at this point if done frequently like primary key lookups or if many segments are present. Currently we don't reuse it at all and create a lot of objects even if the enum is just used for a single seekExact (ie. TermQuery). Stressing the Terms#iterator(reuse) call shows significant gains with reuse... -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3610) Revamp spatial APIs that use primitives (or arrays of primitives) in their args/results so that they use strongly typed objects
[ https://issues.apache.org/jira/browse/LUCENE-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3610: - Fix Version/s: (was: 4.7) 4.8 Revamp spatial APIs that use primitives (or arrays of primitives) in their args/results so that they use strongly typed objects --- Key: LUCENE-3610 URL: https://issues.apache.org/jira/browse/LUCENE-3610 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: Hoss Man Fix For: 4.8 My spatial awareness is pretty meek, but LUCENE-3599 seems like a prime example of the types of mistakes that are probably really easy to make with all of the Spatial related APIs that deal with arrays (or sequences) of doubles where specific indexes of those arrays (or sequences) have significant meaning: mainly latitude vs longitude. We should probably reconsider any method that takes in double[] or multiple doubles to express latlon pairs and rewrite them to use the existing LatLng class -- or if people think that class is too heavyweight, then add a new lightweight class to handle the strong typing of a basic latlon point instead of just passing around a double[2] or two doubles called x and y ... {code} public static final class SimpleLatLonPointInRadians { public double latitude; public double longitude; } {code} ...then all those various methods that expect lat+lon pairs in radians (like DistanceUtils.haversine, DistanceUtils.normLat, DistanceUtils.normLng, DistanceUtils.pointOnBearing, DistanceUtils.latLonCorner, etc...) can start having APIs that don't make your eyes bleed when you start trying to understand what order the args go in. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3888: - Fix Version/s: (was: 4.7) 4.8 split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.8 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3912) Improved the checked-in tiny line file docs
[ https://issues.apache.org/jira/browse/LUCENE-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3912: - Fix Version/s: (was: 4.7) 4.8 Improved the checked-in tiny line file docs --- Key: LUCENE-3912 URL: https://issues.apache.org/jira/browse/LUCENE-3912 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.8 I think it may not have any surrogate pairs (it was derived from Europarl). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects
[ https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4556: - Fix Version/s: (was: 4.7) 4.8 FuzzyTermsEnum creates tons of objects -- Key: LUCENE-4556 URL: https://issues.apache.org/jira/browse/LUCENE-4556 Project: Lucene - Core Issue Type: Improvement Components: core/search, modules/spellchecker Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Critical Fix For: 4.8 Attachments: LUCENE-4556.patch, LUCENE-4556.patch I ran into this problem in production using the DirectSpellchecker. The number of objects created by the spellchecker shoot through the roof very very quickly. We ran about 130 queries and ended up with 2M transitions / states. We spend 50% of the time in GC just because of transitions. Other parts of the system behave just fine here. I talked quickly to robert and gave a POC a shot providing a LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case and build a array based strucuture converted into UTF-8 directly instead of going through the object based APIs. This involved quite a bit of changes but they are all package private at this point. I have a patch that still has a fair set of nocommits but its shows that its possible and IMO worth the trouble to make this really useable in production. All tests pass with the patch - its a start -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS
[ https://issues.apache.org/jira/browse/LUCENE-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4731: - Fix Version/s: (was: 4.7) 4.8 New ReplicatingDirectory mirrors index files to HDFS Key: LUCENE-4731 URL: https://issues.apache.org/jira/browse/LUCENE-4731 Project: Lucene - Core Issue Type: New Feature Components: core/store Reporter: David Arthur Fix For: 4.8 Attachments: ReplicatingDirectory.java I've been working on a Directory implementation that mirrors the index files to HDFS (or other Hadoop supported FileSystem). A ReplicatingDirectory delegates all calls to an underlying Directory (supplied in the constructor). The only hooks are the deleteFile and sync calls. We submit deletes and replications to a single scheduler thread to keep things serializer. During a sync call, if segments.gen is seen in the list of files, we know a commit is finishing. After calling the deletage's sync method, we initialize an asynchronous replication as follows. * Read segments.gen (before leaving ReplicatingDirectory#sync), save the values for later * Get a list of local files from ReplicatingDirectory#listAll before leaving ReplicatingDirectory#sync * Submit replication task (DirectoryReplicator) to scheduler thread * Compare local files to remote files, determine which remote files get deleted, and which need to get copied * Submit a thread to copy each file (one thead per file) * Submit a thread to delete each file (one thead per file) * Submit a finalizer thread. This thread waits on the previous two batches of threads to finish. Once finished, this thread generates a new segments.gen remotely (using the version and generation number previously read in). I have no idea where this would belong in the Lucene project, so i'll just attach the standalone class instead of a patch. It introduces dependencies on Hadoop core (and all the deps that brings with it). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3797) 3xCodec should throw UOE if a DocValuesConsumer is pulled
[ https://issues.apache.org/jira/browse/LUCENE-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-3797: - Fix Version/s: (was: 4.7) 4.8 3xCodec should throw UOE if a DocValuesConsumer is pulled -- Key: LUCENE-3797 URL: https://issues.apache.org/jira/browse/LUCENE-3797 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.8 Attachments: LUCENE-3797.patch, LUCENE-3797.patch currently we just return null if a DVConsumer is pulled from 3.x which is trappy since it causes an NPE in DocFieldProcessor. We should rather throw a UOE. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org