[jira] [Commented] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716142#comment-13716142 ] ASF subversion and git services commented on LUCENE-5128: - Commit 1505909 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1505909 ] LUCENE-5128: IndexSearcher.searchAfter should throw IllegalArgumentException if after.doc = reader.maxDoc() Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException - Key: LUCENE-5128 URL: https://issues.apache.org/jira/browse/LUCENE-5128 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.2 Reporter: crocket Attachments: LUCENE-5128.patch, LUCENE-5128.patch ArrayIndexOutOfBoundsException makes it harder to reason about the cause. Is there a better way to notify programmers of the cause? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [CONF] Apache Solr Reference Guide Schema API
Comment spam, hooray. I deleted the spammer's comment and disabled their account. Confluence now says This user has been disabled. This user will not be able to log in to Confluence. Not sure if this is better than removing the user's account? I guess we'll have to see how prevalent this will be. Yuck. Steve On Jul 23, 2013, at 2:24 AM, gaobin (Confluence) conflue...@apache.org wrote: Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr) Page: Schema API (https://cwiki.apache.org/confluence/display/solr/Schema+API) Comment edited by gaobin : - Mr. Kelly cheapest christian louboutin shoes china [link=http://www.zarafarms.com/]zarafarms.com[/link] has great christian louboutin shoes for men [link=http://www.zarafarms.com/]Ralph Lauren Pas cher[/link] price and distinctive features features to draw. Here is a man with a serious look, projected in his stern eyes and firm jaw. M. Raja Shanmugam, who plays for the Kauvery Recreation Club, says to the ball instead of letting the ball come to [link=http://www.zarafarms.com/]Ralph Lauren Chemises[/link] you while fielding is an ideal fitness technique. Shanmugam insists that mental [link=http://www.zarafarms.com/enfants-ralph-lauren/]Ralph Lauren Enfant[/link] fitness should complement physical fitness and yoga while breathing exercises are ideal.. Comment was previously : - PMr. Kelly cheapest christian louboutin shoes china has great christian louboutin shoes for men price and distinctive features features to draw. Here is a man with a serious look, projected in [url=http://www.zarafarms.com/][b]Ralph Lauren Pas cher[/b][/url] his stern eyes and firm jaw. M. [url=http://www.zarafarms.com/enfants-ralph-lauren/][b]Ralph Lauren Enfant[/b][/url] Raja Shanmugam, who plays for the Kauvery Recreation Club, says to the ball [url=http://www.zarafarms.com/][b]Ralph Lauren Chemises[/b][/url] instead of letting the ball come to you while fielding is an ideal fitness technique. Shanmugam insists that mental [url=http://www.zarafarms.com/][b]zarafarms.com[/b][/url] fitness should complement physical fitness and yoga while breathing exercises are ideal.. /P Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716147#comment-13716147 ] ASF subversion and git services commented on LUCENE-5128: - Commit 1505910 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1505910 ] LUCENE-5128: IndexSearcher.searchAfter should throw IllegalArgumentException if after.doc = reader.maxDoc() Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException - Key: LUCENE-5128 URL: https://issues.apache.org/jira/browse/LUCENE-5128 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.2 Reporter: crocket Attachments: LUCENE-5128.patch, LUCENE-5128.patch ArrayIndexOutOfBoundsException makes it harder to reason about the cause. Is there a better way to notify programmers of the cause? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5065) ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent
[ https://issues.apache.org/jira/browse/SOLR-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716148#comment-13716148 ] Robert Muir commented on SOLR-5065: --- The programming parser of Double.parseDouble is different than the locale-sensitive stuff in NumberFormat... it will parse your number there, as well as hex formats and other things like Infinity, and I think won't throw exception if the value ends with d or f. alternatively, there is also NumberFormat.getScientificInstance in ICU. ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent - Key: SOLR-5065 URL: https://issues.apache.org/jira/browse/SOLR-5065 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.4 Reporter: Jack Krupansky The ParseDoubleFieldUpdateProcessorFactory is unable to parse the full syntax of Java/JSON scientific notation. Parse fails for 4.5E+10, but does succeed for 4.5E10 and 4.5E-10. Using the schema and config from example-schemaless, I added this data: {code} curl http://localhost:8983/solr/update?commit=true; \ -H 'Content-type:application/json' -d ' [{id: doc-1, a1: Hello World, a2: 123, a3: 123.0, a4: 1.23, a5: 4.5E+10, a6: 123, a7: true, a8: false, a9: true, a10: 2013-07-22, a11: 4.5E10, a12: 4.5E-10, a13: 4.5E+10, a14: 4.5E10, a15: 4.5E-10}]' {code} A query returns: {code} doc str name=iddoc-1/str arr name=a1 strHello World/str /arr arr name=a2 long123/long /arr arr name=a3 double123.0/double /arr arr name=a4 double1.23/double /arr arr name=a5 double4.5E10/double /arr arr name=a6 long123/long /arr arr name=a7 booltrue/bool /arr arr name=a8 boolfalse/bool /arr arr name=a9 booltrue/bool /arr arr name=a10 date2013-07-22T00:00:00Z/date /arr arr name=a11 double4.5E10/double /arr arr name=a12 double4.5E-10/double /arr arr name=a13 str4.5E+10/str /arr arr name=a14 double4.5E10/double /arr arr name=a15 double4.5E-10/double /arr long name=_version_1441308941516537856/long/doc {code} The input value of a13 was the same as a5, but was treated as a string, rather than parsed as a double. So, JSON/Java was able to parse 4.5E+10, but this update processor was not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5128. Resolution: Fixed Fix Version/s: 4.5 5.0 Assignee: Shai Erera Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x. Closing it now, crocket - still it would be good if you can paste the full stacktrace, so we can check if there are collectors that are sensitive to that. Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException - Key: LUCENE-5128 URL: https://issues.apache.org/jira/browse/LUCENE-5128 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.2 Reporter: crocket Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5128.patch, LUCENE-5128.patch ArrayIndexOutOfBoundsException makes it harder to reason about the cause. Is there a better way to notify programmers of the cause? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5062) 4.4 refguide updates related to shardsplitting and deleteshard
[ https://issues.apache.org/jira/browse/SOLR-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716161#comment-13716161 ] Shalin Shekhar Mangar commented on SOLR-5062: - Thanks Cassandra! I'll take a look. 4.4 refguide updates related to shardsplitting and deleteshard -- Key: SOLR-5062 URL: https://issues.apache.org/jira/browse/SOLR-5062 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Shalin Shekhar Mangar Fix For: 4.4 breaking off from parent issue... * https://cwiki.apache.org/confluence/display/solr/Collections+API ** in general, we need to review this in page in lite of all the shardsplitting stuff and make sure everything is up to date. ** SOLR-4693: A deleteshard collections API that unloads all replicas of a given shard and then removes it from the cluster state. It will remove only those shards which are INACTIVE or have no range (created for custom sharding). (Anshum Gupta, shalin) *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Collections+API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4987) Test framework may fail internally under J9 (some serious JVM exclusive-section issue).
[ https://issues.apache.org/jira/browse/LUCENE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716165#comment-13716165 ] Shai Erera commented on LUCENE-4987: Word is that the fix will be included in the next J9 SR. Test framework may fail internally under J9 (some serious JVM exclusive-section issue). --- Key: LUCENE-4987 URL: https://issues.apache.org/jira/browse/LUCENE-4987 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 5.0, 4.4 Attachments: j9.zip This was reported by Shai. The runner failed with an exception: {code} [junit4:junit4] Caused by: java.util.NoSuchElementException [junit4:junit4] at java.util.ArrayDeque.removeFirst(ArrayDeque.java:289) [junit4:junit4] at java.util.ArrayDeque.pop(ArrayDeque.java:518) [junit4:junit4] at com.carrotsearch.ant.tasks.junit4.JUnit4$1.onSlaveIdle(JUnit4.java:809) [junit4:junit4] ... 17 more {code} The problem is that this is impossible because the code around JUnit4.java:809 looks like this: {code} final DequeString stealingQueue = new ArrayDequeString(...); aggregatedBus.register(new Object() { @Subscribe public void onSlaveIdle(SlaveIdle slave) { if (stealingQueue.isEmpty()) { ... } else { String suiteName = stealingQueue.pop(); ... } } }); {code} and the contract on Guava's EventBus states that: {code} * pThe EventBus guarantees that it will not call a handler method from * multiple threads simultaneously, unless the method explicitly allows it by * bearing the {@link AllowConcurrentEvents} annotation. If this annotation is * not present, handler methods need not worry about being reentrant, unless * also called from outside the EventBus {code} I wrote a simple snippet of code that does it in a loop and indeed, two threads can appear in the critical section at once. This is not reproducible on Hotspot and only appears to be the problem on J9/1.7/Windows (J9 1.6 works fine). I'll provide a workaround in the runner (an explicit monitor seems to be working) but this is some serious J9 issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5059) 4.4 refguide pages on schemaless schema rest api for adding fields
[ https://issues.apache.org/jira/browse/SOLR-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716168#comment-13716168 ] Steve Rowe commented on SOLR-5059: -- {quote} * https://cwiki.apache.org/confluence/display/solr/Schema+API ** SOLR-3251: Dynamically add fields to schema. (Steve Rowe, Robert Muir, yonik) *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API ** SOLR-5010: Add support for creating copy fields to the Fields REST API (gsingers) *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API {quote} These are done. 4.4 refguide pages on schemaless schema rest api for adding fields Key: SOLR-5059 URL: https://issues.apache.org/jira/browse/SOLR-5059 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Steve Rowe Fix For: 4.4 breaking off from parent... * https://cwiki.apache.org/confluence/display/solr/Documents%2C+Fields%2C+and+Schema+Design ** SOLR-4897: Add solr/example/example-schemaless/, an example config set for schemaless mode. (Steve Rowe) *** CT: Schemaless in general needs to be added. The most likely place today is a new page under https://cwiki.apache.org/confluence/display/solr/Documents%2C+Fields%2C+and+Schema+Design * https://cwiki.apache.org/confluence/display/solr/Schema+API ** SOLR-3251: Dynamically add fields to schema. (Steve Rowe, Robert Muir, yonik) *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API ** SOLR-5010: Add support for creating copy fields to the Fields REST API (gsingers) *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716169#comment-13716169 ] Elran Dvir commented on SOLR-2894: -- Andrew, Thank you very much for the fix! Does this version fix the issue of f.field.facet.limit not being respected? Thanks. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.5 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 668 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/668/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean Error Message: IOException occured when talking to server at: https://127.0.0.1:51783/solr/collection1 Stack Trace: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:51783/solr/collection1 at __randomizedtesting.SeedInfo.seed([4DE818604E65B55:6735808464829C77]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146) at org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean(TestBatchUpdate.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at
[jira] [Created] (SOLR-5066) Managed schema triggers a 404 error code in the Admin UI's Schema pane
Steve Rowe created SOLR-5066: Summary: Managed schema triggers a 404 error code in the Admin UI's Schema pane Key: SOLR-5066 URL: https://issues.apache.org/jira/browse/SOLR-5066 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.3 Reporter: Steve Rowe When using a managed schema (e.g. by setting {{-Dsolr.solr.home=example-schemaless/solr}} when running {{java -jar start.jar}} under {{solr/example/}}), the admin UI's Schema pane shows: {noformat} http://localhost:8983/solr/collection1/admin/file?file=nullcontentType=text/xml;charset=utf-8 {noformat} and {code:xml} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status404/int int name=QTime0/int /lst lst name=error str name=msg Can not find: null [/path/to/solr.solr.home/collection1/conf/null] /str int name=code404/int /lst /response {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5065) ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent
[ https://issues.apache.org/jira/browse/SOLR-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716194#comment-13716194 ] Steve Rowe commented on SOLR-5065: -- Another alternative: apply a regex in front of the NumberFormat parser to strip out the (superfluous, obvi) plus sign: {noformat}s/E\+(\d+)$/E$1/{noformat} ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent - Key: SOLR-5065 URL: https://issues.apache.org/jira/browse/SOLR-5065 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.4 Reporter: Jack Krupansky The ParseDoubleFieldUpdateProcessorFactory is unable to parse the full syntax of Java/JSON scientific notation. Parse fails for 4.5E+10, but does succeed for 4.5E10 and 4.5E-10. Using the schema and config from example-schemaless, I added this data: {code} curl http://localhost:8983/solr/update?commit=true; \ -H 'Content-type:application/json' -d ' [{id: doc-1, a1: Hello World, a2: 123, a3: 123.0, a4: 1.23, a5: 4.5E+10, a6: 123, a7: true, a8: false, a9: true, a10: 2013-07-22, a11: 4.5E10, a12: 4.5E-10, a13: 4.5E+10, a14: 4.5E10, a15: 4.5E-10}]' {code} A query returns: {code} doc str name=iddoc-1/str arr name=a1 strHello World/str /arr arr name=a2 long123/long /arr arr name=a3 double123.0/double /arr arr name=a4 double1.23/double /arr arr name=a5 double4.5E10/double /arr arr name=a6 long123/long /arr arr name=a7 booltrue/bool /arr arr name=a8 boolfalse/bool /arr arr name=a9 booltrue/bool /arr arr name=a10 date2013-07-22T00:00:00Z/date /arr arr name=a11 double4.5E10/double /arr arr name=a12 double4.5E-10/double /arr arr name=a13 str4.5E+10/str /arr arr name=a14 double4.5E10/double /arr arr name=a15 double4.5E-10/double /arr long name=_version_1441308941516537856/long/doc {code} The input value of a13 was the same as a5, but was treated as a string, rather than parsed as a double. So, JSON/Java was able to parse 4.5E+10, but this update processor was not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/ibm-j9-jdk7) - Build # 6692 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6692/ Java: 64bit/ibm-j9-jdk7 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest Error Message: 2 threads leaked from SUITE scope at org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780)2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from SUITE scope at org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780) 2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780) at __randomizedtesting.SeedInfo.seed([BE68034CE6D929B8]:0) FAILED: junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780)2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at
Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/ibm-j9-jdk7) - Build # 6692 - Failure!
I could reproduce it, with ant test -Dtestcase=UIMABaseAnalyzerTest -Dtests.seed=BE68034CE6D929B8 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ca -Dtests.timezone=Pacific/Rarotonga -Dtests.file.encoding=UTF-8 I'll look into it. Tommaso 2013/7/23 Policeman Jenkins Server jenk...@thetaphi.de Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6692/ Java: 64bit/ibm-j9-jdk7 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;} 2 tests failed. FAILED: junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest Error Message: 2 threads leaked from SUITE scope at org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780)2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from SUITE scope at org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780) 2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780) at __randomizedtesting.SeedInfo.seed([BE68034CE6D929B8]:0) FAILED: junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest Error Message: There are still zombie threads that couldn't be terminated:1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:780)2) Thread[id=32, name=LuceneTestCase-1-thread-1,
[jira] [Commented] (SOLR-5043) hostanme lookup in SystemInfoHandler should be refactored to not block core (re)load
[ https://issues.apache.org/jira/browse/SOLR-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716231#comment-13716231 ] Alan Woodward commented on SOLR-5043: - Can we use a CompletionService for this? Maybe have one running on the CoreContainer which can then be stopped when the container is shutdown, which should stop any thread leaks. hostanme lookup in SystemInfoHandler should be refactored to not block core (re)load Key: SOLR-5043 URL: https://issues.apache.org/jira/browse/SOLR-5043 Project: Solr Issue Type: Improvement Reporter: Hoss Man Attachments: SOLR-5043.patch SystemInfoHandler currently lookups the hostname of the machine on it's init, and caches for it's lifecycle -- there is a comment to the effect that the reason for this is because on some machines (notably ones with wacky DNS settings) looking up the hostname can take a long ass time in some JVMs... {noformat} // on some platforms, resolving canonical hostname can cause the thread // to block for several seconds if nameservices aren't available // so resolve this once per handler instance //(ie: not static, so core reload will refresh) {noformat} But as we move forward with a lot more multi-core, solr-cloud, dynamically updated instances, even paying this cost per core-reload is expensive. we should refactoring this so that SystemInfoHandler instances init immediately, with some kind of lazy loading of the hostname info in a background thread, (especially since hte only real point of having that info here is for UI use so you cna keep track of what machine you are looking at) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: -- Attachment: LUCENE-3069.patch Upload patch: implemented IntersectEnum.next() seekCeil() lots of nocommits, but passed all tests The main idea is to run a DFS on FST, and backtrack as early as possible (i.e. when we see this label is rejected by automaton) For this version, there is one explicit perf overhead: I use a real stack here, which can be replaced by a Frame[] to reuse objects. There're several aspects I didn't dig deep: * currently, CompiledAutomaton provides a commonSuffixRef, but how can we make use of it in FST? * the DFS is somewhat a 'goto' version, i.e, we can make the code cleaner with a single while-loop similar to BFS search. However, since FST doesn't always tell us how may arcs are leaving current arc, we have problem dealing with this... * when FST is large enough, the next() operation will takes much time doing the linear arc read, maybe we should make use of CompiledAutomaton.sortedTransition[] when leaving arcs are heavy. Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5067) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test
Vadim Kirilchuk created SOLR-5067: - Summary: TestReplicationHandler doTestReplicateAfterWrite2Slave bad test Key: SOLR-5067 URL: https://issues.apache.org/jira/browse/SOLR-5067 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Vadim Kirilchuk Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. (Previously Thread.sleep(n) was used here) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5067) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test
[ https://issues.apache.org/jira/browse/SOLR-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Kirilchuk updated SOLR-5067: -- Description: Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. (Previously Thread.sleep( n ) was used here) was: Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. (Previously Thread.sleep(n) was used here) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test --- Key: SOLR-5067 URL: https://issues.apache.org/jira/browse/SOLR-5067 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Vadim Kirilchuk Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. (Previously Thread.sleep( n ) was used here) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5067) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test
[ https://issues.apache.org/jira/browse/SOLR-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim Kirilchuk updated SOLR-5067: -- Description: Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. was: Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. (Previously Thread.sleep( n ) was used here) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test --- Key: SOLR-5067 URL: https://issues.apache.org/jira/browse/SOLR-5067 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Vadim Kirilchuk Hi, TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code which actually performs necessary assertions. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java While these assertions commented out it checks nothing. Also as index fetching starts in a new thread it's worth to perform fetchindex with 'wait' parameter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5068) ExtractingRequestHandler (via SolrContentHandler) doesn't add fields in schema-less mode
Erik Hatcher created SOLR-5068: -- Summary: ExtractingRequestHandler (via SolrContentHandler) doesn't add fields in schema-less mode Key: SOLR-5068 URL: https://issues.apache.org/jira/browse/SOLR-5068 Project: Solr Issue Type: Improvement Affects Versions: 4.4 Reporter: Erik Hatcher Fix For: 5.0, 4.5 SolrContentHandler checks against the schema before adding fields to documents. This does not work well in schema-less mode with those fields not yet defined. Example, using empty managed schema and auto-field adding update processor: {code}java -Dauto -Drecursive -jar post.jar ../../site/html/{code} results in http://localhost:8983/solr/collection1/query?q=*:* - {code} { responseHeader:{ status:0, QTime:1, params:{ q:*:*}}, response:{numFound:1,start:0,docs:[ { id:/Users/erikhatcher/solr-4.4.0/solr/example/exampledocs/../../site/html/tutorial.html, _version_:1441348012271992832}] }} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Tokenizing on logical operators
Greetings, I am looking a way to tokenize the String based on Logical operators Below String needs to be tokenized as *arg1:aaa,bbb AND arg2:ccc OR arg3:ddd,eee,fff* Token 1: arg1:aaa,bbb Token 2: arg2:ccc Token 3: arg3:ddd,eee,fff Later i want to fetch each token and tokenize them again on : operator. Is there a library already available? or i should be creating a custom library for this? If you could point at any similar examples that could also help Regards DJ -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizing-on-logical-operators-tp4079667.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Tokenizing on logical operators
Hi, You should use the user list for this, not dev. Have a look at Lucene's query parser. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 23, 2013 at 6:54 AM, dheerajjoshim dheeraj.ma...@gmail.com wrote: Greetings, I am looking a way to tokenize the String based on Logical operators Below String needs to be tokenized as *arg1:aaa,bbb AND arg2:ccc OR arg3:ddd,eee,fff* Token 1: arg1:aaa,bbb Token 2: arg2:ccc Token 3: arg3:ddd,eee,fff Later i want to fetch each token and tokenize them again on : operator. Is there a library already available? or i should be creating a custom library for this? If you could point at any similar examples that could also help Regards DJ -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizing-on-logical-operators-tp4079667.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716342#comment-13716342 ] Joel Bernstein commented on SOLR-4787: -- Kranti, Let me know how the pjoin is performing for you. I'm going to be testing out some different data structures for the pjoin to see if I can get better performance. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.4 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *ValueSourceJoinParserPlugin aka vjoin* The second implementation is the ValueSourceJoinParserPlugin aka vjoin. This implements a ValueSource function query that can return a value from a second core based on join keys and limiting query. The limiting query can be used to select a specific subset of data from the join core. This allows customer specific relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey, query) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. The query is used to select a specific set of records to join with in fromCore. Currently the fromKey and toKey must be longs but this will change in future versions. Like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.ValueSourceJoinParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Updated] (SOLR-5056) Further clean up of ConfigSolr interface and CoreContainer construction
[ https://issues.apache.org/jira/browse/SOLR-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-5056: Attachment: SOLR-5056.patch Updated patch, with CHANGES entry and a test bugfix (TestHarness default solr.xml didn't specify a logwatcher parameter properly - bug found by being type safe!). I'll commit shortly. Further clean up of ConfigSolr interface and CoreContainer construction --- Key: SOLR-5056 URL: https://issues.apache.org/jira/browse/SOLR-5056 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-5056.patch, SOLR-5056.patch Makes ConfigSolr a bit more typesafe, and pushes a bunch of cloud-specific config into ZkController. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5056) Further clean up of ConfigSolr interface and CoreContainer construction
[ https://issues.apache.org/jira/browse/SOLR-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716363#comment-13716363 ] ASF subversion and git services commented on SOLR-5056: --- Commit 1506020 from [~romseygeek] in branch 'dev/trunk' [ https://svn.apache.org/r1506020 ] SOLR-5056: Further cleanup of ConfigSolr API Further clean up of ConfigSolr interface and CoreContainer construction --- Key: SOLR-5056 URL: https://issues.apache.org/jira/browse/SOLR-5056 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-5056.patch, SOLR-5056.patch Makes ConfigSolr a bit more typesafe, and pushes a bunch of cloud-specific config into ZkController. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5056) Further clean up of ConfigSolr interface and CoreContainer construction
[ https://issues.apache.org/jira/browse/SOLR-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716365#comment-13716365 ] ASF subversion and git services commented on SOLR-5056: --- Commit 1506022 from [~romseygeek] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1506022 ] SOLR-5056: Further cleanup of ConfigSolr API Further clean up of ConfigSolr interface and CoreContainer construction --- Key: SOLR-5056 URL: https://issues.apache.org/jira/browse/SOLR-5056 Project: Solr Issue Type: Improvement Reporter: Alan Woodward Assignee: Alan Woodward Priority: Minor Attachments: SOLR-5056.patch, SOLR-5056.patch Makes ConfigSolr a bit more typesafe, and pushes a bunch of cloud-specific config into ZkController. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716380#comment-13716380 ] Kranti Parisa commented on SOLR-4787: - Joel, Initial performance results looks like: (Restarted solr - hence no caches at the beginning) - with no cache: pjoin is 2-3 times faster than join - with cache: pjoin is 3-4 times slower than join Agree with your idea, we should try with other data structures and may be a look at the caching strategy used in pjoin. Are the queries already running in parallel to find the intersection? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.4 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *ValueSourceJoinParserPlugin aka vjoin* The second implementation is the ValueSourceJoinParserPlugin aka vjoin. This implements a ValueSource function query that can return a value from a second core based on join keys and limiting query. The limiting query can be used to select a specific subset of data from the join core. This allows customer specific relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey, query) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. The query is used to select a specific set of records to join with in fromCore. Currently the fromKey and toKey must be longs but this will change in future versions. Like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.ValueSourceJoinParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Updated] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feihong Huang updated SOLR-5057: Attachment: SOLR-5057.patch queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Priority: Minor Attachments: SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716404#comment-13716404 ] Feihong Huang commented on SOLR-5057: - hi, erickson. Thank you for your comments. Patch attached, with new test. If it is ok, I'll commit shortly. queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Priority: Minor Attachments: SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5069) MapReduce for SolrCloud
Noble Paul created SOLR-5069: Summary: MapReduce for SolrCloud Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716411#comment-13716411 ] Erick Erickson commented on SOLR-5057: -- I don't think you have commit rights G. one of the committers will have to pick it up. And _everyone_ is swamped so it may take some gentle nudging queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Priority: Minor Attachments: SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716428#comment-13716428 ] Andrew Muldowney commented on SOLR-2894: Yes Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.5 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716428#comment-13716428 ] Andrew Muldowney edited comment on SOLR-2894 at 7/23/13 2:38 PM: - Yes, it should was (Author: andrew.muldowney): Yes Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.5 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716433#comment-13716433 ] Feihong Huang commented on SOLR-5057: - Well, Thank you for your reply. I am interested in contributing my work to solr. queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Priority: Minor Attachments: SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-5057: - Attachment: SOLR-5057.patch Moved test to pre-existing file. queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Priority: Minor Attachments: SOLR-5057.patch, SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716459#comment-13716459 ] Erick Erickson commented on SOLR-5057: -- Didn't mean to sound like it wouldn't be done, just making you aware that you only have read-only access to the repository and one of the committers has to pick it up and commit it. That said, I took a quick look at it and it looks reasonable, I've assigned it to myself. I rearranged things a bit (I think the test you wrote fits better in a pre-existing file), I'll attach the change momentarily. Do you think you could extend this for the filterCache to? That way we'd be able to re-use the filterCache when the fq clauses were ordered differently. [~yo...@apache.org] [~hossman_luc...@fucit.org] I've gotten myself in trouble by not understanding the nuances of query semantics, do you see a problem with this approach? Seems like an easy win, which makes me nervous that it hasn't been done before G... Erick queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Priority: Minor Attachments: SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-5057: Assignee: Erick Erickson queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Assignee: Erick Erickson Priority: Minor Attachments: SOLR-5057.patch, SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716486#comment-13716486 ] Andrzej Bialecki commented on SOLR-5069: - Exciting idea! Almost as exciting as SolrCloud on MapReduce :) A few comments: # distributed map-reduce in reality is a sequence of: ## split input and assign splits to M nodes ## apply map() on M nodes in parallel ##* for large datasets the emitted data from mappers is spooled to disk ## shuffle - ie. partition and ship emitted data from M mappers into N reducers ##* (wait until all mappers are done, so that each partition's key-space is complete) ## sort by key in each of N reducers, collecting values for each key ##* again, for large datasets this is a disk-based sort ## apply N reducers in parallel and emit final output (in N parts) # if I understand it correctly the model that you presented has some limitations: ## as many input splits as there are shards (and consequently as many mappers) ## single reducer. Theoretically it should be possible to use N nodes to act as reducers if you implement the concept of partitioner - this would cut down the memory load on each reducer node. Of course, streaming back the results would be a challenge, but saving them into a collection should work just fine. ## no shuffling - all data from mappers will go to a single reducer ## no intermediate storage of data, all intermediate values need to fit in memory ## what about the sorting phase? I assume it's an implicit function in the reducedMap (treemap?) # since all fine-grained emitted values from map end up being sent to 1 reducer, which has to collect all this data in memory first before applying the reduce() op, the concept of a map-side combiner seems useful, to be able to quickly minimize the amount of data to be sent to reducer. # it would be very easy to OOM your Solr nodes at the reduce phase. There should be some built-in safety mechanism for this. # what parts of Solr are available in the script's context? Making all Solr API available could lead to unpredictable side-effects, so this set of APIs needs to be curated. E.g. I think it would make sense to make analyzer factories available. And finally, an observation: regular distributed search can be viewed as a special case of map-reduce computation ;) MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: -- Attachment: LUCENE-3069.patch Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716577#comment-13716577 ] Michael McCandless commented on LUCENE-3069: Patch looks great! Wonderful how you were able to share some code in BaseTermsEnum... It looks like you impl'd seekCeil in general for the IntersectEnum? Wild :) You should not need to .getPosition / .setPosition on the fstReader: the FST APIs do this under-the-hood. bq. currently, CompiledAutomaton provides a commonSuffixRef, but how can we make use of it in FST? I think we can't really make use of it, which is fine (it's an optional optimization). {quote} when FST is large enough, the next() operation will takes much time doing the linear arc read, maybe we should make use of CompiledAutomaton.sortedTransition[] when leaving arcs are heavy. {quote} Interesting ... you mean e.g. if the Automaton is very restrictive compared to the FST, then we can do a binary search. But this can only be done if that FST node's arcs are array'd right? Separately, supporting ord w/ FST terms dict should in theory be not so hard; you'd need to use getByOutput to seek by ord. Maybe (later, eventually) we can make this a write-time option. We should open a separate issue ... Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Solr 4.4 released
July 2013, Apache Solr™ 4.4 available The Lucene PMC is pleased to announce the release of Apache Solr 4.4 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.4 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Solr 4.4 Release Highlights: * Solr indexes and transaction logs may stored in HDFS with full read/write capability. * Schemaless mode: Added support for a mode that requires no up-front schema modifications, in which previously unknown fields' types are guessed based on the values in added/updated documents, and are then added to the schema prior to processing the update. Note that the below-described features are also useful independently from schemaless mode operation. * New Parse{Date,Integer,Long,Float,Double,Boolean}UpdateProcessorFactory classes parse/guess the field value class for String-valued and unknown fields. * New AddSchemaFieldsUpdateProcessor: Automatically add new fields to the schema when adding/updating documents with unknown fields. Custom rules map field value class(es) to schema fieldTypes. * A new schemaless mode example configuration, using the above-described field-value-class-guessing and unknown-field-schema-addition features, is provided at solr/example/example-schemaless/. * Core Discovery mode: A new solr.xml format which does not store core information, but instead searches for files named 'core.properties' in the filesystem which tell Solr all the details about that core. The main example and the schemaless example both use this new format. * Schema REST API: Add support for creating copy fields. * A merged segment warmer may now be plugged into solrconfig.xml. * New MaxScoreQParserPlugin: Return max() instead of sum() of terms. * Binary files are now supported in ZooKeeper. * SolrJ's SolrPing object has new methods for ping, enable, and disable. * The Admin UI now supports adding documents to Solr. * Added a PUT command to the Solr ZkCli tool. * New deleteshard collections API that unloads all replicas of a given shard and then removes it from the cluster state. It will remove only those shards which are INACTIVE or have no range. * The Overseer can now optionally assign generic node names so that new addresses can host shards without naming confusion. * The CSV Update Handler now supports optionally adding the line number/ row id to a document. * Added a new system wide info admin handler that exposes the system info that could previously only be retrieved using a SolrCore. Solr 4.4 also includes many other new features as well as numerous optimizations and bugfixes. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) In the coming days, we will also be announcing the first official Solr Reference Guide available for download. In the meantime, users are encouraged to browse the online version and post comments and suggestions on the documentation: https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Lucene 4.4 released
July 2013, Apache Lucene™ 4.4 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.4 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Lucene 4.4 Release Highlights: * New Replicator module: replicate index revisions between server and client. See http://shaierera.blogspot.com/2013/05/the-replicator.html * New AnalyzingInfixSuggester: finds suggestions based on matches to any tokens in the suggestion, not just based on pure prefix matching. See http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html * New PatternCaptureGroupTokenFilter: emit multiple tokens, one for each capture group in one or more Java regexes. * New Lucene Facet module features: * Added dynamic (no taxonomy index used) numeric range faceting (see http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html ) * Arbitrary Querys are now allowed for per-dimension drill-down on DrillDownQuery and DrillSideways, to support future dynamic faceting. * New FacetResult.mergeHierarchies: merge multiple FacetResult of the same dimension into a single one with the reconstructed hierarchy. * FST's Builder can now handle more than 2.1 billion tail nodes while building a minimal FST. * FieldCache Ints and Longs now use bit-packing to save memory. String fields have more efficient compression if there are many unique terms. * Improved compression for NumericDocValues for dates and fields with very small numbers of unique values. * New IndexWriter.hasUncommittedChanges(): returns true if there are changes that have not been committed. * multiValuedSeparator in PostingsHighlighter is now configurable, for cases where you want a different logical separator between field values. * NorwegianLightStemFilter and NorwegianMinimalStemFilter have been extended to handle nynorsk. * New ScandinavianFoldingFilter and ScandinavianNormalizationFilter. * Easier compressed norms: Lucene42NormsFormat now takes an overhead parameter, allowing for values other than PackedInts.FASTEST. * Analyzer now has an additional tokenStream(String fieldName, String text) method, so wrapping by StringReader for common use is no longer needed. * New SimpleMergedSegmentWarmer: just ensures that data structures (terms, norms, docvalues, etc.) are initialized. * IndexWriter flushes segments to the compound file format by default. * Various bugfixes and optimizations since the 4.3.1 release. Please read CHANGES.txt for a full list of new features. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-1301: -- Affects Version/s: (was: 1.4) Fix Version/s: (was: 4.4) 4.5 5.0 Assignee: Mark Miller Issue Type: New Feature (was: Improvement) Summary: Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. (was: Solr + Hadoop) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716621#comment-13716621 ] Mark Miller commented on SOLR-1301: --- As I mentioned above, Cloudera has a done a lot with moving this issue forward. I've been working on converting the build system from maven to ivy+ant and will post my current progress before long. Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.5 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716624#comment-13716624 ] Noble Paul commented on SOLR-5069: -- Thanks Andrzej I started off with a simple model so that the version 1 can be implemented easily. 'N' reducers add to implementation complexity. However , it should be done eventually. bq.no intermediate storage of data, all intermediate values need to fit in memory Yes,in my model, the mappers will be throttled so that we can fix the amount of intermediate data kept in memory. $.map() call would wait if the size threshold is reached bq. what about the sorting phase? I assume it's an implicit function in the reducedMap (treemap?) we should have the choice on how to sort the map .Yes, Some kind of sorted map should be offered .probably sort on some key's value in the map bq.it would be very easy to OOM your Solr nodes at the reduce phase. Sure, here the idea is to do some overflow to disk beyond a threshold. With memory getting abundant , we probably should use some off heap solution , so that the reduce is not I/O bound. bq.what parts of Solr are available in the script's context Good that you asked. We should keep the API's available limited . For instance , anything that can alter the state of the system should not be exposed to the script. Anything that can help processing /manipulating data should be exposed MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} TBD * The format in which the output is written to the target collection, I assume
[jira] [Commented] (SOLR-4408) Server hanging on startup
[ https://issues.apache.org/jira/browse/SOLR-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716626#comment-13716626 ] Brendan Grainger commented on SOLR-4408: Having the same issue here. Solr 4.3.1 Server hanging on startup - Key: SOLR-4408 URL: https://issues.apache.org/jira/browse/SOLR-4408 Project: Solr Issue Type: Bug Affects Versions: 4.1 Environment: OpenJDK 64-Bit Server VM (23.2-b09 mixed mode) Tomcat 7.0 Eclipse Juno + WTP Reporter: Francois-Xavier Bonnet Assignee: Erick Erickson Fix For: 4.4 Attachments: patch-4408.txt While starting, the server hangs indefinitely. Everything works fine when I first start the server with no index created yet but if I fill the index then stop and start the server, it hangs. Could it be a lock that is never released? Here is what I get in a full thread dump: 2013-02-06 16:28:52 Full thread dump OpenJDK 64-Bit Server VM (23.2-b09 mixed mode): searcherExecutor-4-thread-1 prio=10 tid=0x7fbdfc16a800 nid=0x42c6 in Object.wait() [0x7fbe0ab1] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc34c1c48 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492) - locked 0xc34c1c48 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247) at org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:213) at org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:112) at org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1594) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) coreLoadExecutor-3-thread-1 prio=10 tid=0x7fbe04194000 nid=0x42c5 in Object.wait() [0x7fbe0ac11000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xc34c1c48 (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492) - locked 0xc34c1c48 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247) at org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:495) at org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:518) at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:232) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:512) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:636) at org.apache.solr.core.SolrCore.init(SolrCore.java:809) at org.apache.solr.core.SolrCore.init(SolrCore.java:607) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1003) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
[jira] [Updated] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5069: - Description: Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection was: Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716655#comment-13716655 ] Andrzej Bialecki commented on SOLR-5069: - bq. Sure, here the idea is to do some overflow to disk beyond a threshold. Berkeley DB, db4o, and an Apache-licensed MapDB (mapdb.org), and probably others, all provide persistent Java Collections API. We could use one of these - you could add a provider mechanism to separate the actual implementation from the plain Collections api. bq. $.map() call would wait if the size threshold is reached Throttling the mappers wouldn't help with OOM on the reduce() side - reduce() can start only when all mappers are finished. I think a map-side combiner would be much more helpful, if possible (reductions that are not simple aggregations usually can't be performed in map-side combiners). MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail:
Problem while modifying IndexSearcher
Hi, I have a problem which is explained completely herehttp://stackoverflow.com/questions/17816509/unable-to-find-definition-of-a-abstract-function. Please help!! or just give me some suggestion about from where to get help. -- Abhishek Gupta, 897876422, 9416106204, 9624799165
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #395: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/395/ 2 tests failed. FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=620, name=recoveryCmdExecutor-201-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=620, name=recoveryCmdExecutor-201-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:546) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) at __randomizedtesting.SeedInfo.seed([6FF9EF9DE071A43E]:0) FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=620, name=recoveryCmdExecutor-201-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at
[jira] [Commented] (SOLR-5063) 4.4 refguide improvements on new doc adding screen in ui
[ https://issues.apache.org/jira/browse/SOLR-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716674#comment-13716674 ] Cassandra Targett commented on SOLR-5063: - [~grant_ingers...@yahoo.com] I added a comment with draft content for the page (https://cwiki.apache.org/confluence/display/solr/Documents+Screen) - feel free to use it as is, as a starting point, or whatever. 4.4 refguide improvements on new doc adding screen in ui Key: SOLR-5063 URL: https://issues.apache.org/jira/browse/SOLR-5063 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Grant Ingersoll Fix For: 4.4 breaking off from parent issue... * https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools ** SOLR-4921: Admin UI now supports adding documents to Solr (gsingers, steffkes) ** stub page with screenshot exists, but it needs verbage explaining how it works and what the diff options mean -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects
[ https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4906: --- Attachment: LUCENE-4906.patch Here's a simple patch, implementing Robs #1 idea (PassageFormatter.format returns Object, and then add an expert PostingsHighlighter.highlightFieldsAsObjects). The change seems minimal and seems to work (I added a basic test) ... PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects -- Key: LUCENE-4906 URL: https://issues.apache.org/jira/browse/LUCENE-4906 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-4906.patch For example, in a server, I may want to render the highlight result to JsonObject to send back to the front-end. Today since we render to string, I have to render to JSON string and then re-parse to JsonObject, which is inefficient... Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls terms from snippets instead, so you get proximity-influenced salient/expanded terms, then perhaps that renders to just an array of tokens or fragments or something from each snippet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-4998: --- Attachment: SOLR-4998.patch A very basic and non-invasive patch. Anything invasive would require a lot of changes to the Java public APIs and I guess would lead to a lot of stuff breaking outside of Solr. Retaining Slice/Shard and Replica. Have changed shard to replica wherever it should have been. Make the use of Slice and Shard consistent across the code and document base Key: SOLR-4998 URL: https://issues.apache.org/jira/browse/SOLR-4998 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Anshum Gupta Attachments: SOLR-4998.patch The interchangeable use of Slice and Shard is pretty confusing at times. We should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717474#comment-13717474 ] Shai Erera commented on LUCENE-4583: Patch looks good. I prefer the current way of the test (the 'protected' method). Also, you have a printout in Lucene40DocValuesWriter after the if (b.length MAX_BINARY) - remove/comment? +1 to commit. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 5.0, 4.5 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Solr Ref Guide caveat needs update
“This Guide Covers The Unreleased Apache Solr 4.4.” See: https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide 4.4 is of course released now. Sounds like yet another step to add to the “ReleaseToDo” wiki. http://wiki.apache.org/lucene-java/ReleaseTodo This also begs the question of when/how the new ref guide will switch to “Covers the Unreleased Apache Solr 4.5”. -- Jack Krupansky
[jira] [Created] (SOLR-5070) add mbeans for everything in /solr/admin/cores?wt=jsonindexInfo=true
Matthew Sporleder created SOLR-5070: --- Summary: add mbeans for everything in /solr/admin/cores?wt=jsonindexInfo=true Key: SOLR-5070 URL: https://issues.apache.org/jira/browse/SOLR-5070 Project: Solr Issue Type: Improvement Reporter: Matthew Sporleder for solr4, JMX should have everything in /solr/admin/cores?wt=jsonindexInfo=true One major omission is: lastModified -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Solr Ref Guide caveat needs update
: “This Guide Covers The Unreleased Apache Solr 4.4.” ... : 4.4 is of course released now. The text was ment to refer to the fact that the *guide* is unreleased - I've tweaked it to be more clear in all cases. : Sounds like yet another step to add to the “ReleaseToDo” wiki. ... : This also begs the question of when/how the new ref guide will switch to “Covers the Unreleased Apache Solr 4.5”. This is all already well documented as part of the *doc* release process (a process i've emailed out to dev@lucene many times asking for feedback). Changing the text can not, and must not, be part of the *code* release process, since they are not voted on in lock step https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4335: --- Attachment: LUCENE-4335.patch First cut at top-level ant regenerate... Something is still wrong w/ my ant changes because a top-level ant regenerate hits this: {code} BUILD FAILED /l/trunk/lucene/build.xml:614: The following error occurred while executing this line: /l/trunk/lucene/common-build.xml:1902: The following error occurred while executing this line: /l/trunk/lucene/analysis/build.xml:139: The following error occurred while executing this line: /l/trunk/lucene/analysis/build.xml:38: The following error occurred while executing this line: Target regenerate does not exist in the project analyzers-morfologik. {code} But some of the generators make harmless mods to the sources, e.g. JavaCC does this: {code} Index: lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/parser/CharStream.java === --- lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/parser/CharStream.java (revision 1506176) +++ lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/parser/CharStream.java (working copy) @@ -112,4 +112,4 @@ void Done(); } -/* JavaCC - OriginalChecksum=c95f1720d9b38046dc5d294b741c44cb (do not edit this line) */ +/* JavaCC - OriginalChecksum=53b2ec7502d50e2290e86187a6c01270 (do not edit this line) */ {code} JFlex does this: {code} Index: lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java === --- lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java (revision 1506176) +++ lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java (working copy) @@ -1,4 +1,4 @@ -/* The following code was generated by JFlex 1.5.0-SNAPSHOT on 9/19/12 6:23 PM */ +/* The following code was generated by JFlex 1.5.0-SNAPSHOT on 7/23/13 3:22 PM */ @@ -33,8 +33,8 @@ /** * This class is a scanner generated by * a href=http://www.jflex.de/;JFlex/a 1.5.0-SNAPSHOT - * on 9/19/12 6:23 PM from the specification file - * ttC:/svn/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex/tt + * on 7/23/13 3:22 PM from the specification file + * tt/l/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex/tt */ class ClassicTokenizerImpl implements StandardTokenizerInterface { {code} I was able to remove some timestamps from our own gen tools in analysis/icu/src/tools (thanks Rob for the pointers!)... Also, there seem to be some real cases where the generated code was changed but not the generator, e.g. packed ints sources show real diffs (and won't compile after regeneration... I haven't dug into this yet), and JFlex seemed to lose some @Overrides... Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-4335: -- Assignee: Michael McCandless Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717513#comment-13717513 ] ASF subversion and git services commented on LUCENE-4335: - Commit 1506240 from [~mikemccand] in branch 'dev/branches/lucene4335' [ https://svn.apache.org/r1506240 ] LUCENE-4335: commit current patch Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717511#comment-13717511 ] ASF subversion and git services commented on LUCENE-4335: - Commit 1506234 from [~mikemccand] in branch 'dev/branches/lucene4335' [ https://svn.apache.org/r1506234 ] LUCENE-4335: make branch Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717515#comment-13717515 ] Michael McCandless commented on LUCENE-4335: OK I made a branch https://svn.apache.org/repos/asf/lucene/dev/branches/lucene4335 and committed the last (broken, but a starting point) patch ... Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717517#comment-13717517 ] Mark Miller commented on SOLR-4998: --- I think for things like: - public static final String MAX_SHARDS_PER_NODE = maxShardsPerNode; + public static final String MAX_REPLICAS_PER_NODE = maxReplicasPerNode; We have to be really careful. Solr does not error/warn on unknown params - existing users might keeping using the existing param for a long time, and not even notice it no longer has an affect. I think if we make any type of change like that, we should be sure to support them as an alias or perhaps explicitly look for the old key and fail if it's found. Make the use of Slice and Shard consistent across the code and document base Key: SOLR-4998 URL: https://issues.apache.org/jira/browse/SOLR-4998 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Anshum Gupta Attachments: SOLR-4998.patch The interchangeable use of Slice and Shard is pretty confusing at times. We should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717517#comment-13717517 ] Mark Miller edited comment on SOLR-4998 at 7/23/13 7:36 PM: I think for things like: {noformat} - public static final String MAX_SHARDS_PER_NODE = maxShardsPerNode; + public static final String MAX_REPLICAS_PER_NODE = maxReplicasPerNode; {noformat} We have to be really careful. Solr does not error/warn on unknown params - existing users might keeping using the existing param for a long time, and not even notice it no longer has an affect. I think if we make any type of change like that, we should be sure to support them as an alias or perhaps explicitly look for the old key and fail if it's found. was (Author: markrmil...@gmail.com): I think for things like: - public static final String MAX_SHARDS_PER_NODE = maxShardsPerNode; + public static final String MAX_REPLICAS_PER_NODE = maxReplicasPerNode; We have to be really careful. Solr does not error/warn on unknown params - existing users might keeping using the existing param for a long time, and not even notice it no longer has an affect. I think if we make any type of change like that, we should be sure to support them as an alias or perhaps explicitly look for the old key and fail if it's found. Make the use of Slice and Shard consistent across the code and document base Key: SOLR-4998 URL: https://issues.apache.org/jira/browse/SOLR-4998 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Anshum Gupta Attachments: SOLR-4998.patch The interchangeable use of Slice and Shard is pretty confusing at times. We should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717528#comment-13717528 ] Yonik Seeley commented on SOLR-3076: bq. Indeed! Yonik Seeley we don't need _root_ if we can submit two queries for deletion: ToChild(parentid:foo) and TQ(parentid:foo)! Since solr wouldn't know how to create those queries, it seems like the user would need to provide them (which doesn't seem very friendly). Also, IndexWriter currently only allows atomically specifying a term with the document block... deleteByQuery wouldn't be atomic. Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Yonik Seeley Fix For: 5.0, 4.5 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717532#comment-13717532 ] Anshum Gupta commented on SOLR-4998: Sure, will add an alias for the same perhaps with a WARN log saying it's to be deprecated? Make the use of Slice and Shard consistent across the code and document base Key: SOLR-4998 URL: https://issues.apache.org/jira/browse/SOLR-4998 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Anshum Gupta Attachments: SOLR-4998.patch The interchangeable use of Slice and Shard is pretty confusing at times. We should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717539#comment-13717539 ] ASF subversion and git services commented on LUCENE-4335: - Commit 1506248 from [~mikemccand] in branch 'dev/branches/lucene4335' [ https://svn.apache.org/r1506248 ] LUCENE-4335: add empty target in common-build.xml Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5004) Allow a shard to be split into 'n' sub-shards
[ https://issues.apache.org/jira/browse/SOLR-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717535#comment-13717535 ] Anshum Gupta commented on SOLR-5004: Any preference on the variable use here? splits, splitcount, subshards, numsubshards ? Allow a shard to be split into 'n' sub-shards - Key: SOLR-5004 URL: https://issues.apache.org/jira/browse/SOLR-5004 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.3, 4.3.1 Reporter: Anshum Gupta As of now, a SPLITSHARD call is hardcoded to create 2 sub-shards from the parent one. Accept a parameter to split into n sub-shards. Default it to 2 and perhaps also have an upper bound to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717540#comment-13717540 ] Joel Bernstein commented on SOLR-4787: -- Kranti, Odd that the pjoin cache is making things slower. I'll do some testing and see if I can turn up the same results. The join query runs first and builds a data structure in memory that is used to post filter the main query. The main query then runs and the post filter is applied. I'm exploring another scenario that will perform 5x faster then the current pjoin. But the tradeoff is a longer warmup time when a new searcher is opened. Do you have real-time indexing requirements or can you live with some warm-up time. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0, 4.5 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *ValueSourceJoinParserPlugin aka vjoin* The second implementation is the ValueSourceJoinParserPlugin aka vjoin. This implements a ValueSource function query that can return a value from a second core based on join keys and limiting query. The limiting query can be used to select a specific subset of data from the join core. This allows customer specific relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey, query) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. The query is used to select a specific set of records to join with in fromCore. Currently the fromKey and toKey must be longs but this will change in future versions. Like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.ValueSourceJoinParserPlugin / -- This message
Solr.xml parameters
I'm trying to finalize some of the documentation for the release of the docs that'll happen Real Soon Now so I need to nail these down. How close are these definitions for the following parameters? distribUpdateConnTimeout - the time any update will wait for a node to respond to an indexing request. distribUpdateSoTimeout - The socket read timeout before the thread assumes the read operation will never complete due to some kind of networking problem. leaderVoteWait - when SolrCloud is starting up, how long we'll wait before assuming that no leader will identify itself. genericCoreNodeNames - I have no idea. managementPath - no clue roles - why do I care to set this parameter? coreNodeName - how is this different than name? Is it something anyone should mess with and why? logging watcher threshold - No clue what this does. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717553#comment-13717553 ] Kranti Parisa commented on SOLR-4787: - Joel, Thanks for the details. Yes, we do some real-time indexing. Say, every 30min we get deltas. how much warmup time that we are looking at for 5M docs? Also, if we have more than one pjoins in the fq, each points to their own cores, can those pjoins be executed in parallel and find the intersection which will finally be applied as a filter for the main query? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 5.0, 4.5 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *ValueSourceJoinParserPlugin aka vjoin* The second implementation is the ValueSourceJoinParserPlugin aka vjoin. This implements a ValueSource function query that can return a value from a second core based on join keys and limiting query. The limiting query can be used to select a specific subset of data from the join core. This allows customer specific relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey, query) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. The query is used to select a specific set of records to join with in fromCore. Currently the fromKey and toKey must be longs but this will change in future versions. Like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.ValueSourceJoinParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Commented] (SOLR-5061) 4.4 refguide pages new solr.xml format
[ https://issues.apache.org/jira/browse/SOLR-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717550#comment-13717550 ] Cassandra Targett commented on SOLR-5061: - [~erickerickson] I'm adding comments to each page as I read through to help you with some formatting issues content suggestions. The pages under Solr Cores and solr.xml need to be ordered better - they're in alpha order now. The order I'd suggest is the order I discussed them in my initial proposal - up to you, but it's ideal to have them flow together on screen and in the PDF: a. Format of solr.xml b. Legacy solr.xml Configuration c. Moving to the New solr.xml Format d. CoreAdminHandler Parameters and Usage (To re-order pages, go to Browse then Pages (up by your name at top). Then choose Tree. You'll see a hierarchical list of pages and can move re-order them there.) 4.4 refguide pages new solr.xml format -- Key: SOLR-5061 URL: https://issues.apache.org/jira/browse/SOLR-5061 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Erick Erickson Fix For: 5.0, 4.5 breaking off from parent issue... * https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml ** SOLR-4757: Change the example to use the new solr.xml format and core discovery by directory structure. (Mark Miller) *** CT: There is a page on solr.xml: https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml. This should be updated to show the new format and still include information on the old format for anyone with the old format who uses this guide for reference. ** SOLR-4655: Add option to have Overseer assign generic node names so that new addresses can host shards without naming confusion. (Mark Miller, Anshum Gupta) *** CT: I think this only needs to be added to any new content for solr.xml at https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml It should also be noted that cassandra posted some additional deailed suggests in a comment on the existing page in the ref guide... https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml?focusedCommentId=33296160#comment-33296160 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717567#comment-13717567 ] ASF subversion and git services commented on LUCENE-4335: - Commit 1506258 from [~mikemccand] in branch 'dev/branches/lucene4335' [ https://svn.apache.org/r1506258 ] LUCENE-4335: fix generators to match recent code changes to the gen'd files Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717620#comment-13717620 ] ASF subversion and git services commented on LUCENE-4335: - Commit 1506281 from [~mikemccand] in branch 'dev/branches/lucene4335' [ https://svn.apache.org/r1506281 ] LUCENE-4335: add -r 623 to instructions for checking out jflex Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717621#comment-13717621 ] crocket commented on LUCENE-5128: - Wait until this weekend. I'm going to check the stacktrace this saturday. Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException - Key: LUCENE-5128 URL: https://issues.apache.org/jira/browse/LUCENE-5128 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.2 Reporter: crocket Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5128.patch, LUCENE-5128.patch ArrayIndexOutOfBoundsException makes it harder to reason about the cause. Is there a better way to notify programmers of the cause? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support
[ https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717623#comment-13717623 ] Mark Miller commented on SOLR-5060: --- Any comments on a good location for sticking this? It seems perhaps another top level topic. It simply lets you put the index, lock and transaction log in hdfs rather than on the local filesystem. 4.4 refguide pages on hdfs support -- Key: SOLR-5060 URL: https://issues.apache.org/jira/browse/SOLR-5060 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Mark Miller Fix For: 5.0, 4.5 breaking off from parent... * Completley new docs about the HDFS SolrCloud support ... somewhere ** SOLR-4916: Add support to write and read Solr index files and transaction log files to and from HDFS. (phunt, Mark Miller, Greg Chanan) *** CT: Without studying this more, it's hard to know where this should go. It's not really SolrCloud, and it's not really a client, but depending on why it's being done it could overlap with either...If someone writes up what you'd tell someone about using it, I could give a better idea of where it fits in the existing page organization (if it does). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717625#comment-13717625 ] Robert Muir commented on LUCENE-4335: - Cool Mike: regenerate seems to be working! But now I think we need to edit [~thetaphi]'s groovy script to be a macro that fails also if any files were modified. We should use this for verifying the regenerated sources have not changed. I think we should also use this in jenkins after running tests. The precommit test can keep it off as it does now, but jenkins can be more strict. Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717627#comment-13717627 ] ASF subversion and git services commented on LUCENE-4335: - Commit 1506284 from [~mikemccand] in branch 'dev/branches/lucene4335' [ https://svn.apache.org/r1506284 ] LUCENE-4335: don't regenerate for precommit Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support
[ https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717646#comment-13717646 ] Cassandra Targett commented on SOLR-5060: - Maybe under https://cwiki.apache.org/confluence/display/solr/Managing+Solr? It's a teeny bit of a stretch for what's already there, but not wildly so (since logging is under there). If not there, I don't think there's a big problem with a top level topic for now. 4.4 refguide pages on hdfs support -- Key: SOLR-5060 URL: https://issues.apache.org/jira/browse/SOLR-5060 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Mark Miller Fix For: 5.0, 4.5 breaking off from parent... * Completley new docs about the HDFS SolrCloud support ... somewhere ** SOLR-4916: Add support to write and read Solr index files and transaction log files to and from HDFS. (phunt, Mark Miller, Greg Chanan) *** CT: Without studying this more, it's hard to know where this should go. It's not really SolrCloud, and it's not really a client, but depending on why it's being done it could overlap with either...If someone writes up what you'd tell someone about using it, I could give a better idea of where it fits in the existing page organization (if it does). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support
[ https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717650#comment-13717650 ] Mark Miller commented on SOLR-5060: --- Thanks, Managing+Solr looks good. 4.4 refguide pages on hdfs support -- Key: SOLR-5060 URL: https://issues.apache.org/jira/browse/SOLR-5060 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Mark Miller Fix For: 5.0, 4.5 breaking off from parent... * Completley new docs about the HDFS SolrCloud support ... somewhere ** SOLR-4916: Add support to write and read Solr index files and transaction log files to and from HDFS. (phunt, Mark Miller, Greg Chanan) *** CT: Without studying this more, it's hard to know where this should go. It's not really SolrCloud, and it's not really a client, but depending on why it's being done it could overlap with either...If someone writes up what you'd tell someone about using it, I could give a better idea of where it fits in the existing page organization (if it does). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support
[ https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717679#comment-13717679 ] Hoss Man commented on SOLR-5060: bq. It simply lets you put the index, lock and transaction log in hdfs rather than on the local filesystem. how is it enabled/configured? would it make sense just to mention the details on the appropriate sub-pages of https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml ? 4.4 refguide pages on hdfs support -- Key: SOLR-5060 URL: https://issues.apache.org/jira/browse/SOLR-5060 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Mark Miller Fix For: 5.0, 4.5 breaking off from parent... * Completley new docs about the HDFS SolrCloud support ... somewhere ** SOLR-4916: Add support to write and read Solr index files and transaction log files to and from HDFS. (phunt, Mark Miller, Greg Chanan) *** CT: Without studying this more, it's hard to know where this should go. It's not really SolrCloud, and it's not really a client, but depending on why it's being done it could overlap with either...If someone writes up what you'd tell someone about using it, I could give a better idea of where it fits in the existing page organization (if it does). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #917: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/917/ 2 tests failed. FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=8077, name=recoveryCmdExecutor-4833-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.cloud.BasicDistributedZkTest: 1) Thread[id=8077, name=recoveryCmdExecutor-4833-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) at __randomizedtesting.SeedInfo.seed([7F33FE98D353BCE3]:0) FAILED: org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest Error Message: There are still zombie threads that couldn't be terminated: 1) Thread[id=8077, name=recoveryCmdExecutor-4833-thread-1, state=RUNNABLE, group=TGRP-BasicDistributedZkTest] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at
[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources
[ https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717740#comment-13717740 ] Robert Muir commented on LUCENE-4335: - {code} regenerateAndCheck: BUILD SUCCESSFUL Total time: 57 seconds {code} Builds should regenerate all generated sources -- Key: LUCENE-4335 URL: https://issues.apache.org/jira/browse/LUCENE-4335 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4335.patch, LUCENE-4335.patch We have more and more sources that are generated programmatically (query parsers, fuzzy levN tables from Moman, packed ints specialized decoders, etc.), and it's dangerous because developers may directly edit the generated sources and forget to edit the meta-source. It's happened to me several times ... most recently just after landing the BlockPostingsFormat branch. I think we should re-gen all of these in our builds and fail the build if this creates a difference. I know some generators (eg JavaCC) embed timestamps and so always create mods ... we can leave them out of this for starters (or maybe post-process the sources to remove the timestamps) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5063) 4.4 refguide improvements on new doc adding screen in ui
[ https://issues.apache.org/jira/browse/SOLR-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5063. Resolution: Fixed Fix Version/s: (was: 4.5) (was: 5.0) Assignee: Hoss Man (was: Grant Ingersoll) Pretty sure grant is traveling at the moment ... cassandra's changes all looked good to me, so i updated the doc 4.4 refguide improvements on new doc adding screen in ui Key: SOLR-5063 URL: https://issues.apache.org/jira/browse/SOLR-5063 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Hoss Man breaking off from parent issue... * https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools ** SOLR-4921: Admin UI now supports adding documents to Solr (gsingers, steffkes) ** stub page with screenshot exists, but it needs verbage explaining how it works and what the diff options mean -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5061) 4.4 refguide pages new solr.xml format
[ https://issues.apache.org/jira/browse/SOLR-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717814#comment-13717814 ] Hoss Man commented on SOLR-5061: FYI: i went ahead and re-orderd the pages as cassandra suggested, and cleaned up a bunch of hte formatting -- both based on the suggestions in cassandra's various comments, as well as some other minor formatting nits. meat of the content is still pretty much the same. 4.4 refguide pages new solr.xml format -- Key: SOLR-5061 URL: https://issues.apache.org/jira/browse/SOLR-5061 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Erick Erickson Fix For: 5.0, 4.5 breaking off from parent issue... * https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml ** SOLR-4757: Change the example to use the new solr.xml format and core discovery by directory structure. (Mark Miller) *** CT: There is a page on solr.xml: https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml. This should be updated to show the new format and still include information on the old format for anyone with the old format who uses this guide for reference. ** SOLR-4655: Add option to have Overseer assign generic node names so that new addresses can host shards without naming confusion. (Mark Miller, Anshum Gupta) *** CT: I think this only needs to be added to any new content for solr.xml at https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml It should also be noted that cassandra posted some additional deailed suggests in a comment on the existing page in the ref guide... https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml?focusedCommentId=33296160#comment-33296160 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717827#comment-13717827 ] Feihong Huang commented on SOLR-5057: - hi, erickson. I think that way can re-use the filterCache when the fq clauses were ordered differently. I am a new comer in learning solr. If there is some points that i am not considering sufficiently, i apologize for this. queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Assignee: Erick Erickson Priority: Minor Attachments: SOLR-5057.patch, SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5065) ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent
[ https://issues.apache.org/jira/browse/SOLR-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717835#comment-13717835 ] Hoss Man commented on SOLR-5065: would it be worthwhile to add a java-mode init param to all of the Parse[NumberClass]FieldUpdateProcessorFactories that was mutually exclusive to the locale and generated UpdateProcessors that used the appropriate NumberClass.valueOf(String) instead of a NumberFormat? And assuming that would be worthwhile ... would it also make sense to change the default behavior of {{processor class=solr.ParseDoubleFieldUpdateProcessorFactory /}} from assuming {{locale=ROOT}} to {{java-mode=true}} ? ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent - Key: SOLR-5065 URL: https://issues.apache.org/jira/browse/SOLR-5065 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.4 Reporter: Jack Krupansky The ParseDoubleFieldUpdateProcessorFactory is unable to parse the full syntax of Java/JSON scientific notation. Parse fails for 4.5E+10, but does succeed for 4.5E10 and 4.5E-10. Using the schema and config from example-schemaless, I added this data: {code} curl http://localhost:8983/solr/update?commit=true; \ -H 'Content-type:application/json' -d ' [{id: doc-1, a1: Hello World, a2: 123, a3: 123.0, a4: 1.23, a5: 4.5E+10, a6: 123, a7: true, a8: false, a9: true, a10: 2013-07-22, a11: 4.5E10, a12: 4.5E-10, a13: 4.5E+10, a14: 4.5E10, a15: 4.5E-10}]' {code} A query returns: {code} doc str name=iddoc-1/str arr name=a1 strHello World/str /arr arr name=a2 long123/long /arr arr name=a3 double123.0/double /arr arr name=a4 double1.23/double /arr arr name=a5 double4.5E10/double /arr arr name=a6 long123/long /arr arr name=a7 booltrue/bool /arr arr name=a8 boolfalse/bool /arr arr name=a9 booltrue/bool /arr arr name=a10 date2013-07-22T00:00:00Z/date /arr arr name=a11 double4.5E10/double /arr arr name=a12 double4.5E-10/double /arr arr name=a13 str4.5E+10/str /arr arr name=a14 double4.5E10/double /arr arr name=a15 double4.5E-10/double /arr long name=_version_1441308941516537856/long/doc {code} The input value of a13 was the same as a5, but was treated as a string, rather than parsed as a double. So, JSON/Java was able to parse 4.5E+10, but this update processor was not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5061) 4.4 refguide pages new solr.xml format
[ https://issues.apache.org/jira/browse/SOLR-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-5061. -- Resolution: Fixed Thanks Hoss and Cassandra for helping! I'm declaring victory here, we can re-open these as necessary. 4.4 refguide pages new solr.xml format -- Key: SOLR-5061 URL: https://issues.apache.org/jira/browse/SOLR-5061 Project: Solr Issue Type: Sub-task Components: documentation Reporter: Hoss Man Assignee: Erick Erickson Fix For: 5.0, 4.5 breaking off from parent issue... * https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml ** SOLR-4757: Change the example to use the new solr.xml format and core discovery by directory structure. (Mark Miller) *** CT: There is a page on solr.xml: https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml. This should be updated to show the new format and still include information on the old format for anyone with the old format who uses this guide for reference. ** SOLR-4655: Add option to have Overseer assign generic node names so that new addresses can host shards without naming confusion. (Mark Miller, Anshum Gupta) *** CT: I think this only needs to be added to any new content for solr.xml at https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml It should also be noted that cassandra posted some additional deailed suggests in a comment on the existing page in the ref guide... https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml?focusedCommentId=33296160#comment-33296160 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4542) Add entries to CHANGES.txt and Wiki for the obsoleting solr.xml and lots of cores
[ https://issues.apache.org/jira/browse/SOLR-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4542. -- Resolution: Fixed Fix Version/s: 4.3 We updated CHANGES.text an embarrassingly long time ago, I'm finally getting this JIRA closed. Add entries to CHANGES.txt and Wiki for the obsoleting solr.xml and lots of cores - Key: SOLR-4542 URL: https://issues.apache.org/jira/browse/SOLR-4542 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.3 Marker to be sure I don't forget... SOLR-4196, SOLR-4401, etc. Several new capabilities need to be elucidated both on the Wiki and in CHANGES.txt 1 rapidly opening/closing cores 2 discovery-based core enumeration -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717911#comment-13717911 ] Han Jiang commented on LUCENE-3069: --- bq. You should not need to .getPosition / .setPosition on the fstReader: Oh, yes! I'll fix. bq. I think we can't really make use of it, which is fine (it's an optional optimization). OK, actually I was quite curious why we don't make use of commonPrefixRef in CompiledAutomaton. Maybe we can determinize the input Automaton first, then get commonPrefixRef via SpecialOperation? Is it too slow, or the prefix isn't always long enough to take into consideration? bq. But this can only be done if that FST node's arcs are array'd right? Yes, array arcs only, and we might need methods like advance(label) to do the search, and here gossip search might work better than traditional binary search. {quote} Separately, supporting ord w/ FST terms dict should in theory be not so hard; you'd need to use getByOutput to seek by ord. Maybe (later, eventually) we can make this a write-time option. We should open a separate issue ... {quote} Ah, yes, but seems that getByOutput doesn't rewind/reuse previous state? We always have to start from first arc during every seek. However, I'm not sure in what kinds of usecase we need the ord information. I'll commit current version first, so we can iterate. Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 5.0, 4.5 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717912#comment-13717912 ] ASF subversion and git services commented on LUCENE-3069: - Commit 1506385 from [~billy] in branch 'dev/branches/lucene3069' [ https://svn.apache.org/r1506385 ] LUCENE-3069: support intersect operations Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 5.0, 4.5 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717922#comment-13717922 ] Noble Paul commented on SOLR-5069: -- bq.reduce() can start only when all mappers are finished Why. why can't reduce start as soon as the mappers start producing? whatever is emitted by the mapper is up for reducer to chew on. All said, map side combiner is definitely useful and would reduce memory/network IO MapReduce for SolrCloud --- Key: SOLR-5069 URL: https://issues.apache.org/jira/browse/SOLR-5069 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717926#comment-13717926 ] ASF subversion and git services commented on LUCENE-3069: - Commit 1506389 from [~billy] in branch 'dev/branches/lucene3069' [ https://svn.apache.org/r1506389 ] LUCENE-3069: no need to reseek FSTReader, update nocommits Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 5.0, 4.5 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5071) Solrcloud change core to another shard issue
Illu Y Ying created SOLR-5071: - Summary: Solrcloud change core to another shard issue Key: SOLR-5071 URL: https://issues.apache.org/jira/browse/SOLR-5071 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Illu Y Ying I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page, you can see this core as a down replica still in shard1 and also as a active replica in shard2. So I would like to suggest you to remove the down replica information from clusterStatus.json. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5071) Solrcloud change core to another shard issue
[ https://issues.apache.org/jira/browse/SOLR-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illu Y Ying updated SOLR-5071: -- Attachment: 2013-7-24 11-55-45.png Solrcloud change core to another shard issue Key: SOLR-5071 URL: https://issues.apache.org/jira/browse/SOLR-5071 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Illu Y Ying Attachments: 2013-7-24 11-55-45.png I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page, you can see this core as a down replica still in shard1 and also as a active replica in shard2. So I would like to suggest you to remove the down replica information from clusterStatus.json. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5071) Solrcloud change core to another shard issue
[ https://issues.apache.org/jira/browse/SOLR-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illu Y Ying updated SOLR-5071: -- Description: I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page(see attachment), you can see this core as a down replica still in shard1 and also as a active replica in shard2. So I would like to suggest you to remove the down replica information from clusterStatus.json. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. was: I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page, you can see this core as a down replica still in shard1 and also as a active replica in shard2. So I would like to suggest you to remove the down replica information from clusterStatus.json. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. Solrcloud change core to another shard issue Key: SOLR-5071 URL: https://issues.apache.org/jira/browse/SOLR-5071 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Illu Y Ying Attachments: 2013-7-24 11-55-45.png I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page(see attachment), you can see this core as a down replica still in shard1 and also as a active replica in shard2. So I would like to suggest you to remove the down replica information from clusterStatus.json. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5071) Solrcloud change core to another shard issue
[ https://issues.apache.org/jira/browse/SOLR-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Illu Y Ying updated SOLR-5071: -- Description: I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page(see attachment), you can see this core as a down replica still in shard1 and also as a active replica in shard2. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. was: I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page(see attachment), you can see this core as a down replica still in shard1 and also as a active replica in shard2. So I would like to suggest you to remove the down replica information from clusterStatus.json. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. Solrcloud change core to another shard issue Key: SOLR-5071 URL: https://issues.apache.org/jira/browse/SOLR-5071 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Illu Y Ying Attachments: 2013-7-24 11-55-45.png I have a solrcloud cluster with one collection and two shards. One core is a replica for shard1, I stop it and change its solr.xml like this: core name=collection1 instanceDir=collection1 shard=shard2/ So this core should be a shard2 replica, Then I restart it, open cloud graph page(see attachment), you can see this core as a down replica still in shard1 and also as a active replica in shard2. There is doubt about one core status in two shards. In this one core has two status scenario I suggest that if we could remove the down replica information of other shard in clusterStatus.json. I remember when core is changing to active status, it will send overseer an active status message, so add this logic to overseer change core to active status part. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jrockit-jdk1.6.0_45-R28.2.7-4.1.0) - Build # 6627 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6627/ Java: 32bit/jrockit-jdk1.6.0_45-R28.2.7-4.1.0 -XnoOpt 2 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxRegistration Error Message: No SolrDynamicMBeans found Stack Trace: java.lang.AssertionError: No SolrDynamicMBeans found at __randomizedtesting.SeedInfo.seed([61ED01840FE48A5F:EF3C65BE62A5D23A]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:774) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:683) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:44) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:662) REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxUpdate Error Message: No mbean found for SolrIndexSearcher Stack Trace:
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b99) - Build # 6628 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6628/ Java: 64bit/jdk1.8.0-ea-b99 -XX:+UseCompressedOops -XX:+UseParallelGC 2 tests failed. FAILED: org.apache.solr.core.TestJmxIntegration.testJmxRegistration Error Message: No SolrDynamicMBeans found Stack Trace: java.lang.AssertionError: No SolrDynamicMBeans found at __randomizedtesting.SeedInfo.seed([CDCE5B4F951B0045:431F3F75F85A5820]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:491) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:724) FAILED: org.apache.solr.core.TestJmxIntegration.testJmxUpdate Error Message: No mbean found for SolrIndexSearcher Stack Trace:
[jira] [Updated] (SOLR-5069) MapReduce for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5069: - Description: Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the same collection where the reduce program is run. Each run is given an id and the details of the nodes participating in the run will be written to ZK (as an ephemeral node). h4. map script {code:JavaScript} var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on this index while(res.hasMore()){ var doc = res.next(); var txt = doc.get(“txt”);//the field on which word count is performed var words = txt.split( ); for(i = 0; i words.length; i++){ $.map(words[i],{‘count’:1});// this will send the map over to //the reduce host } } {code} Essentially two threads are created in the 'map' hosts . One for running the program and the other for co-ordinating with the 'reduce' host . The maps emitted are streamed live over an http connection to the reduce program h4. reduce script This script is run in one node . This node accepts http connections from map nodes and the 'maps' that are sent are collected in a queue which will be polled and fed into the reduce program. This also keeps the 'reduced' data in memory till the whole run is complete. It expects a done message from all 'map' nodes before it declares the tasks are complete. After reduce program is executed for all the input it proceeds to write out the result to the 'sink' collection or it is written straight out to the response. {code:JavaScript} var pair = $.nextMap(); var reduced = $.getCtx().getReducedMap();// a hashmap var count = reduced.get(pair.key()); if(count === null) { count = {“count”:0}; reduced.put(pair.key(), count); } count.count += pair.val().count ; {code} h4.example output {code:JavaScript} { “result”:[ “wordx”:{ “count”:15876765 }, “wordy” : { “count”:24657654 } ] } {code} TBD * The format in which the output is written to the target collection, I assume the reducedMap will have values mapping to the schema of the collection was: Solr currently does not have a way to run long running computational tasks across the cluster. We can piggyback on the mapreduce paradigm so that users have smooth learning curve. * The mapreduce component will be written as a RequestHandler in Solr * Works only in SolrCloud mode. (No support for standalone mode) * Users can write MapReduce programs in Javascript or Java. First cut would be JS ( ? ) h1. sample word count program h2.how to invoke? http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX h3. params * map : A javascript implementation of the map program * reduce : a Javascript implementation of the reduce program * sink : The collection to which the output is written. If this is not passed , the request will wait till completion and respond with the output of the reduce program and will be emitted as a standard solr response. . If no sink is passed the request will be redirected to the reduce node where it will wait till the process is complete. If the sink param is passed ,the rsponse will contain an id of the run which can be used to query the status in another command. * reduceNode : Node name where the reduce is run . If not passed an arbitrary node is chosen The node which received the command would first identify one replica from each slice where the map program is executed . It will also identify one another node from the