[jira] [Commented] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716142#comment-13716142
 ] 

ASF subversion and git services commented on LUCENE-5128:
-

Commit 1505909 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1505909 ]

LUCENE-5128: IndexSearcher.searchAfter should throw IllegalArgumentException if 
after.doc = reader.maxDoc()

 Calling IndexSearcher.searchAfter beyond the number of stored documents 
 causes ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5128
 URL: https://issues.apache.org/jira/browse/LUCENE-5128
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.2
Reporter: crocket
 Attachments: LUCENE-5128.patch, LUCENE-5128.patch


 ArrayIndexOutOfBoundsException makes it harder to reason about the cause.
 Is there a better way to notify programmers of the cause?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [CONF] Apache Solr Reference Guide Schema API

2013-07-23 Thread Steve Rowe
Comment spam, hooray.

I deleted the spammer's comment and disabled their account.  Confluence now 
says This user has been disabled.  This user will not be able to log in to 
Confluence.  Not sure if this is better than removing the user's account?

I guess we'll have to see how prevalent this will be.  Yuck.

Steve

On Jul 23, 2013, at 2:24 AM, gaobin (Confluence) conflue...@apache.org 
wrote:

 Space: Apache Solr Reference Guide 
 (https://cwiki.apache.org/confluence/display/solr)
 Page: Schema API (https://cwiki.apache.org/confluence/display/solr/Schema+API)
 
 Comment edited by gaobin :
 -
 Mr. Kelly cheapest christian louboutin shoes china 
 [link=http://www.zarafarms.com/]zarafarms.com[/link] has great christian 
 louboutin shoes for men [link=http://www.zarafarms.com/]Ralph Lauren Pas 
 cher[/link] price and distinctive features features to draw. Here is a man 
 with a serious look, projected in his stern eyes and firm jaw. M. Raja 
 Shanmugam, who plays for the Kauvery Recreation Club, says to the ball 
 instead of letting the ball come to [link=http://www.zarafarms.com/]Ralph 
 Lauren Chemises[/link] you while fielding is an ideal fitness technique. 
 Shanmugam insists that mental 
 [link=http://www.zarafarms.com/enfants-ralph-lauren/]Ralph Lauren 
 Enfant[/link] fitness should complement physical fitness and yoga while 
 breathing exercises are ideal.. 
 
 
 Comment was previously :
 -
 PMr. Kelly cheapest christian louboutin shoes china has great christian 
 louboutin shoes for men price and distinctive features features to draw. Here 
 is a man with a serious look, projected in 
 [url=http://www.zarafarms.com/][b]Ralph Lauren Pas cher[/b][/url] his stern 
 eyes and firm jaw. M. 
 [url=http://www.zarafarms.com/enfants-ralph-lauren/][b]Ralph Lauren 
 Enfant[/b][/url] Raja Shanmugam, who plays for the Kauvery Recreation Club, 
 says to the ball [url=http://www.zarafarms.com/][b]Ralph Lauren 
 Chemises[/b][/url] instead of letting the ball come to you while fielding is 
 an ideal fitness technique. Shanmugam insists that mental 
 [url=http://www.zarafarms.com/][b]zarafarms.com[/b][/url] fitness should 
 complement physical fitness and yoga while breathing exercises are ideal.. 
 /P
 
 
 Stop watching space: 
 https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
 Change email notification preferences: 
 https://cwiki.apache.org/confluence/users/editmyemailsettings.action
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716147#comment-13716147
 ] 

ASF subversion and git services commented on LUCENE-5128:
-

Commit 1505910 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1505910 ]

LUCENE-5128: IndexSearcher.searchAfter should throw IllegalArgumentException if 
after.doc = reader.maxDoc()

 Calling IndexSearcher.searchAfter beyond the number of stored documents 
 causes ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5128
 URL: https://issues.apache.org/jira/browse/LUCENE-5128
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.2
Reporter: crocket
 Attachments: LUCENE-5128.patch, LUCENE-5128.patch


 ArrayIndexOutOfBoundsException makes it harder to reason about the cause.
 Is there a better way to notify programmers of the cause?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5065) ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent

2013-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716148#comment-13716148
 ] 

Robert Muir commented on SOLR-5065:
---

The programming parser of Double.parseDouble is different than the 
locale-sensitive stuff in NumberFormat... it will parse your number there, as 
well as hex formats and other things like Infinity, and I think won't throw 
exception if the value ends with d or f.

alternatively, there is also NumberFormat.getScientificInstance in ICU.


 ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent
 -

 Key: SOLR-5065
 URL: https://issues.apache.org/jira/browse/SOLR-5065
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.4
Reporter: Jack Krupansky

 The ParseDoubleFieldUpdateProcessorFactory is unable to parse the full syntax 
 of Java/JSON scientific notation. Parse fails for 4.5E+10, but does succeed 
 for 4.5E10 and 4.5E-10.
 Using the schema and config from example-schemaless, I added this data:
 {code}
   curl http://localhost:8983/solr/update?commit=true; \
   -H 'Content-type:application/json' -d '
   [{id: doc-1,
 a1: Hello World,
 a2: 123,
 a3: 123.0,
 a4: 1.23,
 a5: 4.5E+10,
 a6: 123,
 a7: true,
 a8: false,
 a9: true,
 a10: 2013-07-22,
 a11: 4.5E10,
 a12: 4.5E-10,
 a13: 4.5E+10,
 a14: 4.5E10,
 a15: 4.5E-10}]'
 {code}
 A query returns:
 {code}
   doc
 str name=iddoc-1/str
 arr name=a1
   strHello World/str
 /arr
 arr name=a2
   long123/long
 /arr
 arr name=a3
   double123.0/double
 /arr
 arr name=a4
   double1.23/double
 /arr
 arr name=a5
   double4.5E10/double
 /arr
 arr name=a6
   long123/long
 /arr
 arr name=a7
   booltrue/bool
 /arr
 arr name=a8
   boolfalse/bool
 /arr
 arr name=a9
   booltrue/bool
 /arr
 arr name=a10
   date2013-07-22T00:00:00Z/date
 /arr
 arr name=a11
   double4.5E10/double
 /arr
 arr name=a12
   double4.5E-10/double
 /arr
 arr name=a13
   str4.5E+10/str
 /arr
 arr name=a14
   double4.5E10/double
 /arr
 arr name=a15
   double4.5E-10/double
 /arr
 long name=_version_1441308941516537856/long/doc
 {code}
 The input value of a13 was the same as a5, but was treated as a string, 
 rather than parsed as a double. So, JSON/Java was able to parse 4.5E+10, 
 but this update processor was not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException

2013-07-23 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-5128.


   Resolution: Fixed
Fix Version/s: 4.5
   5.0
 Assignee: Shai Erera
Lucene Fields: New,Patch Available  (was: New)

Committed to trunk and 4x. Closing it now, crocket - still it would be good if 
you can paste the full stacktrace, so we can check if there are collectors that 
are sensitive to that.

 Calling IndexSearcher.searchAfter beyond the number of stored documents 
 causes ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5128
 URL: https://issues.apache.org/jira/browse/LUCENE-5128
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.2
Reporter: crocket
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5128.patch, LUCENE-5128.patch


 ArrayIndexOutOfBoundsException makes it harder to reason about the cause.
 Is there a better way to notify programmers of the cause?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5062) 4.4 refguide updates related to shardsplitting and deleteshard

2013-07-23 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716161#comment-13716161
 ] 

Shalin Shekhar Mangar commented on SOLR-5062:
-

Thanks Cassandra! I'll take a look.

 4.4 refguide updates related to shardsplitting and deleteshard
 --

 Key: SOLR-5062
 URL: https://issues.apache.org/jira/browse/SOLR-5062
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Shalin Shekhar Mangar
 Fix For: 4.4


 breaking off from parent issue...
 * https://cwiki.apache.org/confluence/display/solr/Collections+API
 ** in general, we need to review this in page in lite of all the 
 shardsplitting stuff and make sure everything is up to date.
 ** SOLR-4693: A deleteshard collections API that unloads all replicas of a 
 given shard and then removes it from the cluster state. It will remove only 
 those shards which are INACTIVE or have no range (created for custom 
 sharding). (Anshum Gupta, shalin)
 *** CT: Add to 
 https://cwiki.apache.org/confluence/display/solr/Collections+API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4987) Test framework may fail internally under J9 (some serious JVM exclusive-section issue).

2013-07-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716165#comment-13716165
 ] 

Shai Erera commented on LUCENE-4987:


Word is that the fix will be included in the next J9 SR.

 Test framework may fail internally under J9 (some serious JVM 
 exclusive-section issue).
 ---

 Key: LUCENE-4987
 URL: https://issues.apache.org/jira/browse/LUCENE-4987
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 5.0, 4.4

 Attachments: j9.zip


 This was reported by Shai. The runner failed with an exception:
 {code}
 [junit4:junit4] Caused by: java.util.NoSuchElementException
 [junit4:junit4] at 
 java.util.ArrayDeque.removeFirst(ArrayDeque.java:289)
 [junit4:junit4] at java.util.ArrayDeque.pop(ArrayDeque.java:518)
 [junit4:junit4] at 
 com.carrotsearch.ant.tasks.junit4.JUnit4$1.onSlaveIdle(JUnit4.java:809)
 [junit4:junit4] ... 17 more
 {code}
 The problem is that this is impossible because the code around 
 JUnit4.java:809 looks like this:
 {code}
  final DequeString stealingQueue = new ArrayDequeString(...);
  aggregatedBus.register(new Object() {
 @Subscribe
 public void onSlaveIdle(SlaveIdle slave) {
   if (stealingQueue.isEmpty()) {
 ...
   } else {
 String suiteName = stealingQueue.pop();
 ...
   }
 }
   });
 {code}
 and the contract on Guava's EventBus states that:
 {code}
  * pThe EventBus guarantees that it will not call a handler method from
  * multiple threads simultaneously, unless the method explicitly allows it by
  * bearing the {@link AllowConcurrentEvents} annotation.  If this annotation 
 is
  * not present, handler methods need not worry about being reentrant, unless
  * also called from outside the EventBus
 {code}
 I wrote a simple snippet of code that does it in a loop and indeed, two 
 threads can appear in the critical section at once. This is not reproducible 
 on Hotspot and only appears to be the problem on J9/1.7/Windows (J9 1.6 works 
 fine).
 I'll provide a workaround in the runner (an explicit monitor seems to be 
 working) but this is some serious J9 issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5059) 4.4 refguide pages on schemaless schema rest api for adding fields

2013-07-23 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716168#comment-13716168
 ] 

Steve Rowe commented on SOLR-5059:
--

{quote}
* https://cwiki.apache.org/confluence/display/solr/Schema+API
** SOLR-3251: Dynamically add fields to schema. (Steve Rowe, Robert Muir, yonik)
*** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API
** SOLR-5010: Add support for creating copy fields to the Fields REST API 
(gsingers)
*** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API
{quote}

These are done.

 4.4 refguide pages on schemaless  schema rest api for adding fields
 

 Key: SOLR-5059
 URL: https://issues.apache.org/jira/browse/SOLR-5059
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Steve Rowe
 Fix For: 4.4


 breaking off from parent...
 * 
 https://cwiki.apache.org/confluence/display/solr/Documents%2C+Fields%2C+and+Schema+Design
 ** SOLR-4897: Add solr/example/example-schemaless/, an example config set for 
 schemaless mode. (Steve Rowe)
 *** CT: Schemaless in general needs to be added. The most likely place today 
 is a new page under 
 https://cwiki.apache.org/confluence/display/solr/Documents%2C+Fields%2C+and+Schema+Design
 * https://cwiki.apache.org/confluence/display/solr/Schema+API
 ** SOLR-3251: Dynamically add fields to schema. (Steve Rowe, Robert Muir, 
 yonik)
 *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API
 ** SOLR-5010: Add support for creating copy fields to the Fields REST API 
 (gsingers)
 *** CT: Add to https://cwiki.apache.org/confluence/display/solr/Schema+API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-07-23 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716169#comment-13716169
 ] 

Elran Dvir commented on SOLR-2894:
--

Andrew, Thank you very much for the fix!

Does this version fix the issue of f.field.facet.limit not being respected?

Thanks.

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.5

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 668 - Still Failing!

2013-07-23 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/668/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean

Error Message:
IOException occured when talking to server at: 
https://127.0.0.1:51783/solr/collection1

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server at: https://127.0.0.1:51783/solr/collection1
at 
__randomizedtesting.SeedInfo.seed([4DE818604E65B55:6735808464829C77]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:435)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
at 
org.apache.solr.client.solrj.TestBatchUpdate.testWithBinaryBean(TestBatchUpdate.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 

[jira] [Created] (SOLR-5066) Managed schema triggers a 404 error code in the Admin UI's Schema pane

2013-07-23 Thread Steve Rowe (JIRA)
Steve Rowe created SOLR-5066:


 Summary: Managed schema triggers a 404 error code in the Admin 
UI's Schema pane
 Key: SOLR-5066
 URL: https://issues.apache.org/jira/browse/SOLR-5066
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.3
Reporter: Steve Rowe


When using a managed schema (e.g. by setting 
{{-Dsolr.solr.home=example-schemaless/solr}} when running {{java -jar 
start.jar}} under {{solr/example/}}), the admin UI's Schema pane shows:

{noformat}
http://localhost:8983/solr/collection1/admin/file?file=nullcontentType=text/xml;charset=utf-8
{noformat}

and

{code:xml}
?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status404/int
int name=QTime0/int
  /lst
  lst name=error
str name=msg
  Can not find: null [/path/to/solr.solr.home/collection1/conf/null]
/str
int name=code404/int
  /lst
/response
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5065) ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent

2013-07-23 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716194#comment-13716194
 ] 

Steve Rowe commented on SOLR-5065:
--

Another alternative: apply a regex in front of the NumberFormat parser to strip 
out the (superfluous, obvi) plus sign: {noformat}s/E\+(\d+)$/E$1/{noformat}

 ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent
 -

 Key: SOLR-5065
 URL: https://issues.apache.org/jira/browse/SOLR-5065
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.4
Reporter: Jack Krupansky

 The ParseDoubleFieldUpdateProcessorFactory is unable to parse the full syntax 
 of Java/JSON scientific notation. Parse fails for 4.5E+10, but does succeed 
 for 4.5E10 and 4.5E-10.
 Using the schema and config from example-schemaless, I added this data:
 {code}
   curl http://localhost:8983/solr/update?commit=true; \
   -H 'Content-type:application/json' -d '
   [{id: doc-1,
 a1: Hello World,
 a2: 123,
 a3: 123.0,
 a4: 1.23,
 a5: 4.5E+10,
 a6: 123,
 a7: true,
 a8: false,
 a9: true,
 a10: 2013-07-22,
 a11: 4.5E10,
 a12: 4.5E-10,
 a13: 4.5E+10,
 a14: 4.5E10,
 a15: 4.5E-10}]'
 {code}
 A query returns:
 {code}
   doc
 str name=iddoc-1/str
 arr name=a1
   strHello World/str
 /arr
 arr name=a2
   long123/long
 /arr
 arr name=a3
   double123.0/double
 /arr
 arr name=a4
   double1.23/double
 /arr
 arr name=a5
   double4.5E10/double
 /arr
 arr name=a6
   long123/long
 /arr
 arr name=a7
   booltrue/bool
 /arr
 arr name=a8
   boolfalse/bool
 /arr
 arr name=a9
   booltrue/bool
 /arr
 arr name=a10
   date2013-07-22T00:00:00Z/date
 /arr
 arr name=a11
   double4.5E10/double
 /arr
 arr name=a12
   double4.5E-10/double
 /arr
 arr name=a13
   str4.5E+10/str
 /arr
 arr name=a14
   double4.5E10/double
 /arr
 arr name=a15
   double4.5E-10/double
 /arr
 long name=_version_1441308941516537856/long/doc
 {code}
 The input value of a13 was the same as a5, but was treated as a string, 
 rather than parsed as a double. So, JSON/Java was able to parse 4.5E+10, 
 but this update processor was not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (64bit/ibm-j9-jdk7) - Build # 6692 - Failure!

2013-07-23 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6692/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest

Error Message:
2 threads leaked from SUITE scope at 
org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 1) Thread[id=33, 
name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] 
at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) 
at java.lang.Thread.run(Thread.java:780)2) Thread[id=32, 
name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] 
at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) 
at java.lang.Thread.run(Thread.java:780)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from SUITE 
scope at org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 
   1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING, 
group=TGRP-UIMABaseAnalyzerTest]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.lang.Thread.run(Thread.java:780)
   2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING, 
group=TGRP-UIMABaseAnalyzerTest]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.lang.Thread.run(Thread.java:780)
at __randomizedtesting.SeedInfo.seed([BE68034CE6D929B8]:0)


FAILED:  
junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest

Error Message:
There are still zombie threads that couldn't be terminated:1) Thread[id=33, 
name=LuceneTestCase-1-thread-2, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] 
at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) 
at java.lang.Thread.run(Thread.java:780)2) Thread[id=32, 
name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-UIMABaseAnalyzerTest] 
at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:197) at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at 

Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/ibm-j9-jdk7) - Build # 6692 - Failure!

2013-07-23 Thread Tommaso Teofili
I could reproduce it, with

ant test  -Dtestcase=UIMABaseAnalyzerTest -Dtests.seed=BE68034CE6D929B8
-Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ca
-Dtests.timezone=Pacific/Rarotonga -Dtests.file.encoding=UTF-8

I'll look into it.

Tommaso


2013/7/23 Policeman Jenkins Server jenk...@thetaphi.de

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6692/
 Java: 64bit/ibm-j9-jdk7
 -Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

 2 tests failed.
 FAILED:
  
 junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest

 Error Message:
 2 threads leaked from SUITE scope at
 org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest: 1) Thread[id=33,
 name=LuceneTestCase-1-thread-2, state=WAITING,
 group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native
 Method) at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:780)2) Thread[id=32,
 name=LuceneTestCase-1-thread-1, state=WAITING,
 group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native
 Method) at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:780)

 Stack Trace:
 com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from
 SUITE scope at org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest:
1) Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING,
 group=TGRP-UIMABaseAnalyzerTest]
 at sun.misc.Unsafe.park(Native Method)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:780)
2) Thread[id=32, name=LuceneTestCase-1-thread-1, state=WAITING,
 group=TGRP-UIMABaseAnalyzerTest]
 at sun.misc.Unsafe.park(Native Method)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:780)
 at __randomizedtesting.SeedInfo.seed([BE68034CE6D929B8]:0)


 FAILED:
  
 junit.framework.TestSuite.org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest

 Error Message:
 There are still zombie threads that couldn't be terminated:1)
 Thread[id=33, name=LuceneTestCase-1-thread-2, state=WAITING,
 group=TGRP-UIMABaseAnalyzerTest] at sun.misc.Unsafe.park(Native
 Method) at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:197)
 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2054)
 at
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:453)
 at
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:780)2) Thread[id=32,
 name=LuceneTestCase-1-thread-1, 

[jira] [Commented] (SOLR-5043) hostanme lookup in SystemInfoHandler should be refactored to not block core (re)load

2013-07-23 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716231#comment-13716231
 ] 

Alan Woodward commented on SOLR-5043:
-

Can we use a CompletionService for this?  Maybe have one running on the 
CoreContainer which can then be stopped when the container is shutdown, which 
should stop any thread leaks.

 hostanme lookup in SystemInfoHandler should be refactored to not block core 
 (re)load
 

 Key: SOLR-5043
 URL: https://issues.apache.org/jira/browse/SOLR-5043
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
 Attachments: SOLR-5043.patch


 SystemInfoHandler currently lookups the hostname of the machine on it's init, 
 and caches for it's lifecycle -- there is a comment to the effect that the 
 reason for this is because on some machines (notably ones with wacky DNS 
 settings) looking up the hostname can take a long ass time in some JVMs...
 {noformat}
   // on some platforms, resolving canonical hostname can cause the thread
   // to block for several seconds if nameservices aren't available
   // so resolve this once per handler instance 
   //(ie: not static, so core reload will refresh)
 {noformat}
 But as we move forward with a lot more multi-core, solr-cloud, dynamically 
 updated instances, even paying this cost per core-reload is expensive.
 we should refactoring this so that SystemInfoHandler instances init 
 immediately, with some kind of lazy loading of the hostname info in a 
 background thread, (especially since hte only real point of having that info 
 here is for UI use so you cna keep track of what machine you are looking at)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-23 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

Upload patch: implemented IntersectEnum.next()  seekCeil()
lots of nocommits, but passed all tests

The main idea is to run a DFS on FST, and backtrack as early as
possible (i.e. when we see this label is rejected by automaton)

For this version, there is one explicit perf overhead: I use a 
real stack here, which can be replaced by a Frame[] to reuse objects.

There're several aspects I didn't dig deep: 

* currently, CompiledAutomaton provides a commonSuffixRef, but how
  can we make use of it in FST?
* the DFS is somewhat a 'goto' version, i.e, we can make the code 
  cleaner with a single while-loop similar to BFS search. 
  However, since FST doesn't always tell us how may arcs are leaving 
  current arc, we have problem dealing with this...
* when FST is large enough, the next() operation will takes much time
  doing the linear arc read, maybe we should make use of 
  CompiledAutomaton.sortedTransition[] when leaving arcs are heavy.


 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5067) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test

2013-07-23 Thread Vadim Kirilchuk (JIRA)
Vadim Kirilchuk created SOLR-5067:
-

 Summary: TestReplicationHandler doTestReplicateAfterWrite2Slave 
bad test
 Key: SOLR-5067
 URL: https://issues.apache.org/jira/browse/SOLR-5067
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Vadim Kirilchuk


Hi,

TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code 
which actually performs necessary assertions.

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java

While these assertions commented out it checks nothing. Also as index fetching  
starts in a new thread it's worth to perform fetchindex with 'wait' parameter. 
(Previously Thread.sleep(n) was used here)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5067) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test

2013-07-23 Thread Vadim Kirilchuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Kirilchuk updated SOLR-5067:
--

Description: 
Hi,

TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code 
which actually performs necessary assertions.

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java

While these assertions commented out it checks nothing. Also as index fetching  
starts in a new thread it's worth to perform fetchindex with 'wait' parameter. 
(Previously Thread.sleep( n ) was used here)

  was:
Hi,

TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code 
which actually performs necessary assertions.

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java

While these assertions commented out it checks nothing. Also as index fetching  
starts in a new thread it's worth to perform fetchindex with 'wait' parameter. 
(Previously Thread.sleep(n) was used here)


 TestReplicationHandler doTestReplicateAfterWrite2Slave bad test
 ---

 Key: SOLR-5067
 URL: https://issues.apache.org/jira/browse/SOLR-5067
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Vadim Kirilchuk

 Hi,
 TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented 
 code which actually performs necessary assertions.
 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java
 While these assertions commented out it checks nothing. Also as index 
 fetching  starts in a new thread it's worth to perform fetchindex with 'wait' 
 parameter. (Previously Thread.sleep( n ) was used here)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5067) TestReplicationHandler doTestReplicateAfterWrite2Slave bad test

2013-07-23 Thread Vadim Kirilchuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Kirilchuk updated SOLR-5067:
--

Description: 
Hi,

TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code 
which actually performs necessary assertions.

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java

While these assertions commented out it checks nothing. Also as index fetching  
starts in a new thread it's worth to perform fetchindex with 'wait' parameter.

  was:
Hi,

TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented code 
which actually performs necessary assertions.

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java

While these assertions commented out it checks nothing. Also as index fetching  
starts in a new thread it's worth to perform fetchindex with 'wait' parameter. 
(Previously Thread.sleep( n ) was used here)


 TestReplicationHandler doTestReplicateAfterWrite2Slave bad test
 ---

 Key: SOLR-5067
 URL: https://issues.apache.org/jira/browse/SOLR-5067
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Reporter: Vadim Kirilchuk

 Hi,
 TestReplicationHandler#doTestReplicateAfterWrite2Slave has some commented 
 code which actually performs necessary assertions.
 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/handler/TestReplicationHandler.java
 While these assertions commented out it checks nothing. Also as index 
 fetching  starts in a new thread it's worth to perform fetchindex with 'wait' 
 parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5068) ExtractingRequestHandler (via SolrContentHandler) doesn't add fields in schema-less mode

2013-07-23 Thread Erik Hatcher (JIRA)
Erik Hatcher created SOLR-5068:
--

 Summary: ExtractingRequestHandler (via SolrContentHandler) doesn't 
add fields in schema-less mode
 Key: SOLR-5068
 URL: https://issues.apache.org/jira/browse/SOLR-5068
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Erik Hatcher
 Fix For: 5.0, 4.5


SolrContentHandler checks against the schema before adding fields to documents. 
 This does not work well in schema-less mode with those fields not yet defined.

Example, using empty managed schema and auto-field adding update processor:
{code}java -Dauto -Drecursive -jar post.jar ../../site/html/{code}

results in http://localhost:8983/solr/collection1/query?q=*:* -
{code}
{
  responseHeader:{
status:0,
QTime:1,
params:{
  q:*:*}},
  response:{numFound:1,start:0,docs:[
  {

id:/Users/erikhatcher/solr-4.4.0/solr/example/exampledocs/../../site/html/tutorial.html,
_version_:1441348012271992832}]
  }}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Tokenizing on logical operators

2013-07-23 Thread dheerajjoshim
Greetings,

I am looking a way to tokenize the String based on Logical operators 

Below String needs to be tokenized as
*arg1:aaa,bbb AND arg2:ccc OR arg3:ddd,eee,fff*

Token 1: arg1:aaa,bbb
Token 2: arg2:ccc
Token 3: arg3:ddd,eee,fff

Later i want to fetch each token and tokenize them again on : operator.

Is there a library already available? or i should be creating a custom
library for this?

If you could point at any similar examples that could also help

Regards
DJ 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenizing-on-logical-operators-tp4079667.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Tokenizing on logical operators

2013-07-23 Thread Otis Gospodnetic
Hi,

You should use the user list for this, not dev.

Have a look at Lucene's query parser.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Jul 23, 2013 at 6:54 AM, dheerajjoshim dheeraj.ma...@gmail.com wrote:
 Greetings,

 I am looking a way to tokenize the String based on Logical operators

 Below String needs to be tokenized as
 *arg1:aaa,bbb AND arg2:ccc OR arg3:ddd,eee,fff*

 Token 1: arg1:aaa,bbb
 Token 2: arg2:ccc
 Token 3: arg3:ddd,eee,fff

 Later i want to fetch each token and tokenize them again on : operator.

 Is there a library already available? or i should be creating a custom
 library for this?

 If you could point at any similar examples that could also help

 Regards
 DJ



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Tokenizing-on-logical-operators-tp4079667.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716342#comment-13716342
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

Let me know how the pjoin is performing for you. I'm going to be testing out 
some different data structures for the pjoin to see if I can get better 
performance.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SOLR-5056) Further clean up of ConfigSolr interface and CoreContainer construction

2013-07-23 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-5056:


Attachment: SOLR-5056.patch

Updated patch, with CHANGES entry and a test bugfix (TestHarness default 
solr.xml didn't specify a logwatcher parameter properly - bug found by being 
type safe!).  I'll commit shortly.

 Further clean up of ConfigSolr interface and CoreContainer construction
 ---

 Key: SOLR-5056
 URL: https://issues.apache.org/jira/browse/SOLR-5056
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-5056.patch, SOLR-5056.patch


 Makes ConfigSolr a bit more typesafe, and pushes a bunch of cloud-specific 
 config into ZkController.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5056) Further clean up of ConfigSolr interface and CoreContainer construction

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716363#comment-13716363
 ] 

ASF subversion and git services commented on SOLR-5056:
---

Commit 1506020 from [~romseygeek] in branch 'dev/trunk'
[ https://svn.apache.org/r1506020 ]

SOLR-5056: Further cleanup of ConfigSolr API

 Further clean up of ConfigSolr interface and CoreContainer construction
 ---

 Key: SOLR-5056
 URL: https://issues.apache.org/jira/browse/SOLR-5056
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-5056.patch, SOLR-5056.patch


 Makes ConfigSolr a bit more typesafe, and pushes a bunch of cloud-specific 
 config into ZkController.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5056) Further clean up of ConfigSolr interface and CoreContainer construction

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716365#comment-13716365
 ] 

ASF subversion and git services commented on SOLR-5056:
---

Commit 1506022 from [~romseygeek] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1506022 ]

SOLR-5056: Further cleanup of ConfigSolr API

 Further clean up of ConfigSolr interface and CoreContainer construction
 ---

 Key: SOLR-5056
 URL: https://issues.apache.org/jira/browse/SOLR-5056
 Project: Solr
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward
Priority: Minor
 Attachments: SOLR-5056.patch, SOLR-5056.patch


 Makes ConfigSolr a bit more typesafe, and pushes a bunch of cloud-specific 
 config into ZkController.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716380#comment-13716380
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Initial performance results looks like:
(Restarted solr - hence no caches at the beginning)

- with no cache: pjoin is 2-3 times faster than join
- with cache: pjoin is 3-4 times slower than join

Agree with your idea, we should try with other data structures and may be a 
look at the caching strategy used in pjoin.

Are the queries already running in parallel to find the intersection?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Updated] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Feihong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feihong Huang updated SOLR-5057:


Attachment: SOLR-5057.patch

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Priority: Minor
 Attachments: SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Feihong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716404#comment-13716404
 ] 

Feihong Huang commented on SOLR-5057:
-

hi, erickson. Thank you for your comments. Patch attached, with new test. If it 
is ok, I'll commit shortly.

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Priority: Minor
 Attachments: SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Noble Paul (JIRA)
Noble Paul created SOLR-5069:


 Summary: MapReduce for SolrCloud
 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul


Solr currently does not have a way to run long running computational tasks 
across the cluster. We can piggyback on the mapreduce paradigm so that users 
have smooth learning curve.

 * The mapreduce component will be written as a RequestHandler in Solr
 * Works only in SolrCloud mode. (No support for standalone mode) 
 * Users can write MapReduce programs in Javascript or Java. First cut would be 
JS ( ? )

h1. sample word count program

h2.how to invoke?

http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX

h3. params 
* map :  A javascript implementation of the map program
* reduce : a Javascript implementation of the reduce program
* sink : The collection to which the output is written. If this is not passed , 
the request will wait till completion and respond with the output of the reduce 
program and will be emitted as a standard solr response. . If no sink is passed 
the request will be redirected to the reduce node where it will wait till the 
process is complete. If the sink param is passed ,the rsponse will contain an 
id of the run which can be used to query the status in another command.
* reduceNode : Node name where the reduce is run . If not passed an arbitrary 
node is chosen


The node which received the command would first identify one replica from each 
slice where the map program is executed . It will also identify one another 
node from the same collection where the reduce program is run. Each run is 
given an id and the details of the nodes participating in the run will be 
written to ZK (as an ephemeral node). 

h4. map script 

{code:JavaScript}
var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
this index
while(res.hasMore()){
  var doc = res.next();
  var txt = doc.get(“txt”);//the field on which word count is performed
  var words = txt.split( );
   for(i = 0; i  words.length; i++){
$.map(words[i],{‘count’:1});// this will send the map over to //the 
reduce host
}
}
{code}

Essentially two threads are created in the 'map' hosts . One for running the 
program and the other for co-ordinating with the 'reduce' host . The maps 
emitted are streamed live over an http connection to the reduce program

h4. reduce script

This script is run in one node . This node accepts http connections from map 
nodes and the 'maps' that are sent are collected in a queue which will be 
polled and fed into the reduce program. This also keeps the 'reduced' data in 
memory till the whole run is complete. It expects a done message from all 
'map' nodes before it declares the tasks are complete. After  reduce program is 
executed for all the input it proceeds to write out the result to the 'sink' 
collection or it is written straight out to the response.

{code:JavaScript}
var pair = $.nextMap();
var reduced = $.getCtx().getReducedMap();// a hashmap
var count = reduced.get(pair.key());
if(count === null) {
  count = {“count”:0};
  reduced.put(pair.key(), count);
}
count.count += pair.val().count ;
{code}

TBD
* The format in which the output is written to the target collection, I assume 
the reducedMap will have values mapping to the schema of the collection


 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716411#comment-13716411
 ] 

Erick Erickson commented on SOLR-5057:
--

I don't think you have commit rights G. one of the committers will have 
to pick it up. And _everyone_ is swamped so it may take some gentle nudging

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Priority: Minor
 Attachments: SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

2013-07-23 Thread Andrew Muldowney (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716428#comment-13716428
 ] 

Andrew Muldowney commented on SOLR-2894:


Yes

 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.5

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-2894) Implement distributed pivot faceting

2013-07-23 Thread Andrew Muldowney (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716428#comment-13716428
 ] 

Andrew Muldowney edited comment on SOLR-2894 at 7/23/13 2:38 PM:
-

Yes, it should

  was (Author: andrew.muldowney):
Yes
  
 Implement distributed pivot faceting
 

 Key: SOLR-2894
 URL: https://issues.apache.org/jira/browse/SOLR-2894
 Project: Solr
  Issue Type: Improvement
Reporter: Erik Hatcher
 Fix For: 4.5

 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
 SOLR-2894.patch, SOLR-2894-reworked.patch


 Following up on SOLR-792, pivot faceting currently only supports 
 undistributed mode.  Distributed pivot faceting needs to be implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Feihong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716433#comment-13716433
 ] 

Feihong Huang commented on SOLR-5057:
-

Well, Thank you for your reply. I am interested in contributing my work to solr.

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Priority: Minor
 Attachments: SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5057:
-

Attachment: SOLR-5057.patch

Moved test to pre-existing file.

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Priority: Minor
 Attachments: SOLR-5057.patch, SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716459#comment-13716459
 ] 

Erick Erickson commented on SOLR-5057:
--

Didn't mean to sound like it wouldn't be done, just making you aware that you 
only have read-only access to the repository and one of the committers has to 
pick it up and commit it.

That said, I took a quick look at it and it looks reasonable, I've assigned it 
to myself. I rearranged things a bit (I think the test you wrote fits better in 
a pre-existing file), I'll attach the change momentarily.

Do you think you could extend this for the filterCache to? That way we'd be 
able to re-use the filterCache when the fq clauses were ordered differently.

[~yo...@apache.org] [~hossman_luc...@fucit.org] I've gotten myself in trouble 
by not understanding the nuances of query semantics, do you see a problem with 
this approach? Seems like an easy win, which makes me nervous that it hasn't 
been done before G...

Erick

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Priority: Minor
 Attachments: SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-5057:


Assignee: Erick Erickson

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-5057.patch, SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716486#comment-13716486
 ] 

Andrzej Bialecki  commented on SOLR-5069:
-

Exciting idea! Almost as exciting as SolrCloud on MapReduce :)

A few comments:
# distributed map-reduce in reality is a sequence of:
## split input and assign splits to M nodes
## apply map() on M nodes in parallel
##* for large datasets the emitted data from mappers is spooled to disk
## shuffle - ie. partition and ship emitted data from M mappers into N 
reducers
##* (wait until all mappers are done, so that each partition's key-space is 
complete)
## sort by key in each of N reducers, collecting values for each key
##* again, for large datasets this is a disk-based sort
## apply N reducers in parallel and emit final output (in N parts)
# if I understand it correctly the model that you presented has some 
limitations:
## as many input splits as there are shards (and consequently as many mappers)
## single reducer. Theoretically it should be possible to use N nodes to act as 
reducers if you implement the concept of partitioner - this would cut down the 
memory load on each reducer node. Of course, streaming back the results would 
be a challenge, but saving them into a collection should work just fine.
## no shuffling - all data from mappers will go to a single reducer
## no intermediate storage of data, all intermediate values need to fit in 
memory
## what about the sorting phase? I assume it's an implicit function in the 
reducedMap (treemap?)
# since all fine-grained emitted values from map end up being sent to 1 
reducer, which has to collect all this data in memory first before applying the 
reduce() op, the concept of a map-side combiner seems useful, to be able to 
quickly minimize the amount of data to be sent to reducer.
# it would be very easy to OOM your Solr nodes at the reduce phase. There 
should be some built-in safety mechanism for this.
# what parts of Solr are available in the script's context? Making all Solr API 
available could lead to unpredictable side-effects, so this set of APIs needs 
to be curated. E.g. I think it would make sense to make analyzer factories 
available.

And finally, an observation: regular distributed search can be viewed as a 
special case of map-reduce computation ;)

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce 

[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-23 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
--

Attachment: LUCENE-3069.patch

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716577#comment-13716577
 ] 

Michael McCandless commented on LUCENE-3069:


Patch looks great!  Wonderful how you were able to share some code in
BaseTermsEnum...

It looks like you impl'd seekCeil in general for the IntersectEnum?  Wild :)

You should not need to .getPosition / .setPosition on the fstReader:
the FST APIs do this under-the-hood.

bq. currently, CompiledAutomaton provides a commonSuffixRef, but how can we 
make use of it in FST?

I think we can't really make use of it, which is fine (it's an
optional optimization).

{quote}
when FST is large enough, the next() operation will takes much time
doing the linear arc read, maybe we should make use of 
CompiledAutomaton.sortedTransition[] when leaving arcs are heavy.
{quote}

Interesting ... you mean e.g. if the Automaton is very restrictive
compared to the FST, then we can do a binary search.  But this can
only be done if that FST node's arcs are array'd right?

Separately, supporting ord w/ FST terms dict should in theory be not
so hard; you'd need to use getByOutput to seek by ord.  Maybe (later,
eventually) we can make this a write-time option.  We should open a
separate issue ...


 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Solr 4.4 released

2013-07-23 Thread Steve Rowe
July 2013, Apache Solr™ 4.4 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.4

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.4 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.4 Release Highlights:

* Solr indexes and transaction logs may stored in HDFS with full read/write
  capability.

* Schemaless mode: Added support for a mode that requires no up-front schema
  modifications, in which previously unknown fields' types are guessed based
  on the values in added/updated documents, and are then added to the schema
  prior to processing the update.  Note that the below-described features
  are also useful independently from schemaless mode operation.   
  * New Parse{Date,Integer,Long,Float,Double,Boolean}UpdateProcessorFactory
classes parse/guess the field value class for String-valued and unknown
fields.
  * New AddSchemaFieldsUpdateProcessor: Automatically add new fields to the
schema when adding/updating documents with unknown fields. Custom rules
map field value class(es) to schema fieldTypes.
  * A new schemaless mode example configuration, using the above-described 
field-value-class-guessing and unknown-field-schema-addition features,
is provided at solr/example/example-schemaless/.

* Core Discovery mode: A new solr.xml format which does not store core
  information, but instead searches for files named 'core.properties' in
  the filesystem which tell Solr all the details about that core.  The main
  example and the schemaless example both use this new format.

* Schema REST API: Add support for creating copy fields.

* A merged segment warmer may now be plugged into solrconfig.xml. 

* New MaxScoreQParserPlugin: Return max() instead of sum() of terms.

* Binary files are now supported in ZooKeeper.

* SolrJ's SolrPing object has new methods for ping, enable, and disable.

* The Admin UI now supports adding documents to Solr.

* Added a PUT command to the Solr ZkCli tool.

* New deleteshard collections API that unloads all replicas of a given
  shard and then removes it from the cluster state. It will remove only
  those shards which are INACTIVE or have no range.

* The Overseer can now optionally assign generic node names so that
  new addresses can host shards without naming confusion.

* The CSV Update Handler now supports optionally adding the line number/
  row id to a document.

* Added a new system wide info admin handler that exposes the system info
  that could previously only be retrieved using a SolrCore.

Solr 4.4 also includes many other new features as well as numerous
optimizations and bugfixes.

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html)

In the coming days, we will also be announcing the first official Solr 
Reference Guide available for download.  In the meantime, users are 
encouraged to browse the online version and post comments and suggestions on 
the documentation: 
  https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Apache Lucene 4.4 released

2013-07-23 Thread Steve Rowe
July 2013, Apache Lucene™ 4.4 available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.4

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release
is available for immediate download at:
  http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 4.4 Release Highlights:

* New Replicator module: replicate index revisions between server and
  client. See http://shaierera.blogspot.com/2013/05/the-replicator.html

* New AnalyzingInfixSuggester: finds suggestions based on matches to any
  tokens in the suggestion, not just based on pure prefix matching.  See
  
http://blog.mikemccandless.com/2013/06/a-new-lucene-suggester-based-on-infix.html

* New PatternCaptureGroupTokenFilter: emit multiple tokens, one for each 
  capture group in one or more Java regexes.

* New Lucene Facet module features: 
  * Added dynamic (no taxonomy index used) numeric range faceting (see
http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html )
  * Arbitrary Querys are now allowed for per-dimension drill-down on
DrillDownQuery and DrillSideways, to support future dynamic faceting.
  * New FacetResult.mergeHierarchies: merge multiple FacetResult of the
same dimension into a single one with the reconstructed hierarchy.

* FST's Builder can now handle more than 2.1 billion tail nodes while
  building a minimal FST.

* FieldCache Ints and Longs now use bit-packing to save memory. String fields
  have more efficient compression if there are many unique terms.

* Improved compression for NumericDocValues for dates and fields with very
  small numbers of unique values.

* New IndexWriter.hasUncommittedChanges(): returns true if there are changes
  that have not been committed.

* multiValuedSeparator in PostingsHighlighter is now configurable, for cases
  where you want a different logical separator between field values.

* NorwegianLightStemFilter and NorwegianMinimalStemFilter have been extended 
  to handle nynorsk.

* New ScandinavianFoldingFilter and ScandinavianNormalizationFilter.

* Easier compressed norms: Lucene42NormsFormat now takes an overhead
  parameter, allowing for values other than PackedInts.FASTEST.
  
* Analyzer now has an additional tokenStream(String fieldName, String text)
  method, so wrapping by StringReader for common use is no longer needed.

* New SimpleMergedSegmentWarmer: just ensures that data structures
  (terms, norms, docvalues, etc.) are initialized.

* IndexWriter flushes segments to the compound file format by default.

* Various bugfixes and optimizations since the 4.3.1 release.

Please read CHANGES.txt for a full list of new features.

Please report any feedback to the mailing lists
(http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-07-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1301:
--

Affects Version/s: (was: 1.4)
Fix Version/s: (was: 4.4)
   4.5
   5.0
 Assignee: Mark Miller
   Issue Type: New Feature  (was: Improvement)
  Summary: Add a Solr contrib that allows for building Solr indexes 
via Hadoop's Map-Reduce.  (was: Solr + Hadoop)

 Add a Solr contrib that allows for building Solr indexes via Hadoop's 
 Map-Reduce.
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: New Feature
Reporter: Andrzej Bialecki 
Assignee: Mark Miller
 Fix For: 5.0, 4.5

 Attachments: commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
 hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
 log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, 
 SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SolrRecordWriter.java


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.

2013-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716621#comment-13716621
 ] 

Mark Miller commented on SOLR-1301:
---

As I mentioned above, Cloudera has a done a lot with moving this issue forward. 
I've been working on converting the build system from maven to ivy+ant and will 
post my current progress before long.

 Add a Solr contrib that allows for building Solr indexes via Hadoop's 
 Map-Reduce.
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: New Feature
Reporter: Andrzej Bialecki 
Assignee: Mark Miller
 Fix For: 5.0, 4.5

 Attachments: commons-logging-1.0.4.jar, 
 commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, 
 hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, 
 log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, 
 SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
 SOLR-1301.patch, SolrRecordWriter.java


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716624#comment-13716624
 ] 

Noble Paul commented on SOLR-5069:
--

Thanks Andrzej

I started off with a simple model so  that the version 1 can be implemented 
easily.

'N' reducers add to implementation complexity. However , it should be done 
eventually.  

bq.no intermediate storage of data, all intermediate values need to fit in 
memory

Yes,in my model,  the mappers will be throttled so that we can fix the amount 
of intermediate data kept in memory. $.map() call would wait if the size 
threshold is reached

bq. what about the sorting phase? I assume it's an implicit function in the 
reducedMap (treemap?)

we should have the choice on how to sort the map .Yes, Some kind of sorted map 
should be offered .probably sort on some key's value in the map


bq.it would be very easy to OOM your Solr nodes at the reduce phase. 

Sure, here the idea is to do some overflow to disk beyond a threshold. With 
memory getting abundant , we probably should use some off heap solution , so 
that the reduce is not I/O bound.

bq.what parts of Solr are available in the script's context

Good that you asked. We should keep the API's available limited . For instance 
, anything that can alter the state of the system should not be exposed to the 
script. Anything that can help processing /manipulating data should be exposed


 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume 

[jira] [Commented] (SOLR-4408) Server hanging on startup

2013-07-23 Thread Brendan Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716626#comment-13716626
 ] 

Brendan Grainger commented on SOLR-4408:


Having the same issue here. Solr 4.3.1

 Server hanging on startup
 -

 Key: SOLR-4408
 URL: https://issues.apache.org/jira/browse/SOLR-4408
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
 Environment: OpenJDK 64-Bit Server VM (23.2-b09 mixed mode)
 Tomcat 7.0
 Eclipse Juno + WTP
Reporter: Francois-Xavier Bonnet
Assignee: Erick Erickson
 Fix For: 4.4

 Attachments: patch-4408.txt


 While starting, the server hangs indefinitely. Everything works fine when I 
 first start the server with no index created yet but if I fill the index then 
 stop and start the server, it hangs. Could it be a lock that is never 
 released?
 Here is what I get in a full thread dump:
 2013-02-06 16:28:52
 Full thread dump OpenJDK 64-Bit Server VM (23.2-b09 mixed mode):
 searcherExecutor-4-thread-1 prio=10 tid=0x7fbdfc16a800 nid=0x42c6 in 
 Object.wait() [0x7fbe0ab1]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xc34c1c48 (a java.lang.Object)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492)
   - locked 0xc34c1c48 (a java.lang.Object)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247)
   at 
 org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:94)
   at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:213)
   at 
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:112)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:203)
   at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:180)
   at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
   at 
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:64)
   at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1594)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:722)
 coreLoadExecutor-3-thread-1 prio=10 tid=0x7fbe04194000 nid=0x42c5 in 
 Object.wait() [0x7fbe0ac11000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 0xc34c1c48 (a java.lang.Object)
   at java.lang.Object.wait(Object.java:503)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1492)
   - locked 0xc34c1c48 (a java.lang.Object)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1312)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1247)
   at 
 org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:495)
   at 
 org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:518)
   at 
 org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:232)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:333)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:319)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:512)
   at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:140)
   at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:51)
   at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:636)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:809)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:607)
   at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1003)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
   at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)

[jira] [Updated] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5069:
-

Description: 
Solr currently does not have a way to run long running computational tasks 
across the cluster. We can piggyback on the mapreduce paradigm so that users 
have smooth learning curve.

 * The mapreduce component will be written as a RequestHandler in Solr
 * Works only in SolrCloud mode. (No support for standalone mode) 
 * Users can write MapReduce programs in Javascript or Java. First cut would be 
JS ( ? )

h1. sample word count program

h2.how to invoke?

http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX

h3. params 
* map :  A javascript implementation of the map program
* reduce : a Javascript implementation of the reduce program
* sink : The collection to which the output is written. If this is not passed , 
the request will wait till completion and respond with the output of the reduce 
program and will be emitted as a standard solr response. . If no sink is passed 
the request will be redirected to the reduce node where it will wait till the 
process is complete. If the sink param is passed ,the rsponse will contain an 
id of the run which can be used to query the status in another command.
* reduceNode : Node name where the reduce is run . If not passed an arbitrary 
node is chosen


The node which received the command would first identify one replica from each 
slice where the map program is executed . It will also identify one another 
node from the same collection where the reduce program is run. Each run is 
given an id and the details of the nodes participating in the run will be 
written to ZK (as an ephemeral node). 

h4. map script 

{code:JavaScript}
var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
this index
while(res.hasMore()){
  var doc = res.next();
  var txt = doc.get(“txt”);//the field on which word count is performed
  var words = txt.split( );
   for(i = 0; i  words.length; i++){
$.map(words[i],{‘count’:1});// this will send the map over to //the 
reduce host
}
}
{code}

Essentially two threads are created in the 'map' hosts . One for running the 
program and the other for co-ordinating with the 'reduce' host . The maps 
emitted are streamed live over an http connection to the reduce program

h4. reduce script

This script is run in one node . This node accepts http connections from map 
nodes and the 'maps' that are sent are collected in a queue which will be 
polled and fed into the reduce program. This also keeps the 'reduced' data in 
memory till the whole run is complete. It expects a done message from all 
'map' nodes before it declares the tasks are complete. After  reduce program is 
executed for all the input it proceeds to write out the result to the 'sink' 
collection or it is written straight out to the response.

{code:JavaScript}
var pair = $.nextMap();
var reduced = $.getCtx().getReducedMap();// a hashmap
var count = reduced.get(pair.key());
if(count === null) {
  count = {“count”:0};
  reduced.put(pair.key(), count);
}
count.count += pair.val().count ;
{code}

h4.example output
{code:JavaScript}
{
“result”:[
“wordx”:{ 
 “count”:15876765
 },
“wordy” : {
   “count”:24657654
  }
 
  ]
}
{code}

TBD
* The format in which the output is written to the target collection, I assume 
the reducedMap will have values mapping to the schema of the collection


 



  was:
Solr currently does not have a way to run long running computational tasks 
across the cluster. We can piggyback on the mapreduce paradigm so that users 
have smooth learning curve.

 * The mapreduce component will be written as a RequestHandler in Solr
 * Works only in SolrCloud mode. (No support for standalone mode) 
 * Users can write MapReduce programs in Javascript or Java. First cut would be 
JS ( ? )

h1. sample word count program

h2.how to invoke?

http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX

h3. params 
* map :  A javascript implementation of the map program
* reduce : a Javascript implementation of the reduce program
* sink : The collection to which the output is written. If this is not passed , 
the request will wait till completion and respond with the output of the reduce 
program and will be emitted as a standard solr response. . If no sink is passed 
the request will be redirected to the reduce node where it will wait till the 
process is complete. If the sink param is passed ,the rsponse will contain an 
id of the run which can be used to query the status in another command.
* reduceNode : Node name where the reduce is run . If not passed an arbitrary 
node is chosen


The node which received the command would first identify one replica from each 
slice where the map program is executed . It will also identify one another 
node from the same 

[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716655#comment-13716655
 ] 

Andrzej Bialecki  commented on SOLR-5069:
-

bq. Sure, here the idea is to do some overflow to disk beyond a threshold.
Berkeley DB, db4o, and an Apache-licensed MapDB (mapdb.org), and probably 
others, all provide persistent Java Collections API. We could use one of these 
- you could add a provider mechanism to separate the actual implementation from 
the plain Collections api.

bq.  $.map() call would wait if the size threshold is reached
Throttling the mappers wouldn't help with OOM on the reduce() side - reduce() 
can start only when all mappers are finished. I think a map-side combiner would 
be much more helpful, if possible (reductions that are not simple aggregations 
usually can't be performed in map-side combiners).

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: 

Problem while modifying IndexSearcher

2013-07-23 Thread Abhishek Gupta
Hi,
I have a problem which is explained completely
herehttp://stackoverflow.com/questions/17816509/unable-to-find-definition-of-a-abstract-function.
Please help!! or just give me some suggestion about from where to get help.

-- 
Abhishek Gupta,
897876422, 9416106204, 9624799165


[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #395: POMs out of sync

2013-07-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/395/

2 tests failed.
FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=620, name=recoveryCmdExecutor-201-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=620, name=recoveryCmdExecutor-201-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
at __randomizedtesting.SeedInfo.seed([6FF9EF9DE071A43E]:0)


FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=620, name=recoveryCmdExecutor-201-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
at 

[jira] [Commented] (SOLR-5063) 4.4 refguide improvements on new doc adding screen in ui

2013-07-23 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716674#comment-13716674
 ] 

Cassandra Targett commented on SOLR-5063:
-

[~grant_ingers...@yahoo.com] I added a comment with draft content for the page 
(https://cwiki.apache.org/confluence/display/solr/Documents+Screen) - feel free 
to use it as is, as a starting point, or whatever.

 4.4 refguide improvements on new doc adding screen in ui
 

 Key: SOLR-5063
 URL: https://issues.apache.org/jira/browse/SOLR-5063
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Grant Ingersoll
 Fix For: 4.4


 breaking off from parent issue...
 * https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools
 ** SOLR-4921: Admin UI now supports adding documents to Solr (gsingers, 
 steffkes)
 ** stub page with screenshot exists, but it needs verbage explaining how it 
 works and what the diff options mean

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects

2013-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4906:
---

Attachment: LUCENE-4906.patch

Here's a simple patch, implementing Robs #1 idea (PassageFormatter.format 
returns Object, and then add an expert 
PostingsHighlighter.highlightFieldsAsObjects).

The change seems minimal and seems to work (I added a basic test) ...

 PostingsHighlighter's PassageFormatter should allow for rendering to 
 arbitrary objects
 --

 Key: LUCENE-4906
 URL: https://issues.apache.org/jira/browse/LUCENE-4906
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-4906.patch


 For example, in a server, I may want to render the highlight result to 
 JsonObject to send back to the front-end. Today since we render to string, I 
 have to render to JSON string and then re-parse to JsonObject, which is 
 inefficient...
 Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls 
 terms from snippets instead, so you get proximity-influenced salient/expanded 
 terms, then perhaps that renders to just an array of tokens or fragments or 
 something from each snippet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-23 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-4998:
---

Attachment: SOLR-4998.patch

A very basic and non-invasive patch.
Anything invasive would require a lot of changes to the Java public APIs and I 
guess would lead to a lot of stuff breaking outside of Solr.

Retaining Slice/Shard and Replica.
Have changed shard to replica wherever it should have been.

 Make the use of Slice and Shard consistent across the code and document base
 

 Key: SOLR-4998
 URL: https://issues.apache.org/jira/browse/SOLR-4998
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Anshum Gupta
 Attachments: SOLR-4998.patch


 The interchangeable use of Slice and Shard is pretty confusing at times. We 
 should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-07-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717474#comment-13717474
 ] 

Shai Erera commented on LUCENE-4583:


Patch looks good. I prefer the current way of the test (the 'protected' method).

Also, you have a printout in Lucene40DocValuesWriter after the if (b.length  
MAX_BINARY) - remove/comment?

+1 to commit.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 5.0, 4.5

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr Ref Guide caveat needs update

2013-07-23 Thread Jack Krupansky
“This Guide Covers The Unreleased Apache Solr 4.4.”

See:
https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

4.4 is of course released now.

Sounds like yet another step to add to the “ReleaseToDo” wiki.

http://wiki.apache.org/lucene-java/ReleaseTodo

This also begs the question of when/how the new ref guide will switch to 
“Covers the Unreleased Apache Solr 4.5”.

-- Jack Krupansky

[jira] [Created] (SOLR-5070) add mbeans for everything in /solr/admin/cores?wt=jsonindexInfo=true

2013-07-23 Thread Matthew Sporleder (JIRA)
Matthew Sporleder created SOLR-5070:
---

 Summary: add mbeans for everything in 
/solr/admin/cores?wt=jsonindexInfo=true
 Key: SOLR-5070
 URL: https://issues.apache.org/jira/browse/SOLR-5070
 Project: Solr
  Issue Type: Improvement
Reporter: Matthew Sporleder


for solr4,
JMX should have everything in /solr/admin/cores?wt=jsonindexInfo=true

One major omission is: lastModified

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr Ref Guide caveat needs update

2013-07-23 Thread Chris Hostetter

: “This Guide Covers The Unreleased Apache Solr 4.4.”
...
: 4.4 is of course released now.

The text was ment to refer to the fact that the *guide* is unreleased - 
I've tweaked it to be more clear in all cases.

: Sounds like yet another step to add to the “ReleaseToDo” wiki.
...
: This also begs the question of when/how the new ref guide will switch to 
“Covers the Unreleased Apache Solr 4.5”.

This is all already well documented as part of the *doc* release process 
(a process i've emailed out to dev@lucene many times asking for feedback).  
Changing the text can not, and must not, be part of the *code* release 
process, since they are not voted on in lock step

https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation








-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4335:
---

Attachment: LUCENE-4335.patch

First cut at top-level ant regenerate...

Something is still wrong w/ my ant changes because a top-level ant
regenerate hits this:

{code}
BUILD FAILED
/l/trunk/lucene/build.xml:614: The following error occurred while executing 
this line:
/l/trunk/lucene/common-build.xml:1902: The following error occurred while 
executing this line:
/l/trunk/lucene/analysis/build.xml:139: The following error occurred while 
executing this line:
/l/trunk/lucene/analysis/build.xml:38: The following error occurred while 
executing this line:
Target regenerate does not exist in the project analyzers-morfologik. 
{code}

But some of the generators make harmless mods to the sources,
e.g. JavaCC does this:

{code}
Index: 
lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/parser/CharStream.java
===
--- 
lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/parser/CharStream.java
  (revision 1506176)
+++ 
lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/parser/CharStream.java
  (working copy)
@@ -112,4 +112,4 @@
   void Done();
 
 }
-/* JavaCC - OriginalChecksum=c95f1720d9b38046dc5d294b741c44cb (do not edit 
this line) */
+/* JavaCC - OriginalChecksum=53b2ec7502d50e2290e86187a6c01270 (do not edit 
this line) */
{code}

JFlex does this:

{code}
Index: 
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java
===
--- 
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java
   (revision 1506176)
+++ 
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.java
   (working copy)
@@ -1,4 +1,4 @@
-/* The following code was generated by JFlex 1.5.0-SNAPSHOT on 9/19/12 6:23 PM 
*/
+/* The following code was generated by JFlex 1.5.0-SNAPSHOT on 7/23/13 3:22 PM 
*/
@@ -33,8 +33,8 @@
 /**
  * This class is a scanner generated by 
  * a href=http://www.jflex.de/;JFlex/a 1.5.0-SNAPSHOT
- * on 9/19/12 6:23 PM from the specification file
- * 
ttC:/svn/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex/tt
+ * on 7/23/13 3:22 PM from the specification file
+ * 
tt/l/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex/tt
  */
 class ClassicTokenizerImpl implements StandardTokenizerInterface {
{code}

I was able to remove some timestamps from our own gen tools in
analysis/icu/src/tools (thanks Rob for the pointers!)...

Also, there seem to be some real cases where the generated code was
changed but not the generator, e.g. packed ints sources show real
diffs (and won't compile after regeneration... I haven't dug into this
yet), and JFlex seemed to lose some @Overrides...


 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4335:
--

Assignee: Michael McCandless

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717513#comment-13717513
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1506240 from [~mikemccand] in branch 'dev/branches/lucene4335'
[ https://svn.apache.org/r1506240 ]

LUCENE-4335: commit current patch

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717511#comment-13717511
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1506234 from [~mikemccand] in branch 'dev/branches/lucene4335'
[ https://svn.apache.org/r1506234 ]

LUCENE-4335: make branch

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717515#comment-13717515
 ] 

Michael McCandless commented on LUCENE-4335:


OK I made a branch 
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene4335 and committed 
the last (broken, but a starting point) patch ...

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717517#comment-13717517
 ] 

Mark Miller commented on SOLR-4998:
---

I think for things like:

-  public static final String MAX_SHARDS_PER_NODE = maxShardsPerNode;
+  public static final String MAX_REPLICAS_PER_NODE = maxReplicasPerNode;

We have to be really careful. Solr does not error/warn on unknown params - 
existing users might keeping using the existing param for a long time, and not 
even notice it no longer has an affect. I think if we make any type of change 
like that, we should be sure to support them as an alias or perhaps explicitly 
look for the old key and fail if it's found.

 Make the use of Slice and Shard consistent across the code and document base
 

 Key: SOLR-4998
 URL: https://issues.apache.org/jira/browse/SOLR-4998
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Anshum Gupta
 Attachments: SOLR-4998.patch


 The interchangeable use of Slice and Shard is pretty confusing at times. We 
 should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717517#comment-13717517
 ] 

Mark Miller edited comment on SOLR-4998 at 7/23/13 7:36 PM:


I think for things like:

{noformat}
-  public static final String MAX_SHARDS_PER_NODE = maxShardsPerNode;
+  public static final String MAX_REPLICAS_PER_NODE = maxReplicasPerNode;
{noformat}

We have to be really careful. Solr does not error/warn on unknown params - 
existing users might keeping using the existing param for a long time, and not 
even notice it no longer has an affect. I think if we make any type of change 
like that, we should be sure to support them as an alias or perhaps explicitly 
look for the old key and fail if it's found.

  was (Author: markrmil...@gmail.com):
I think for things like:

-  public static final String MAX_SHARDS_PER_NODE = maxShardsPerNode;
+  public static final String MAX_REPLICAS_PER_NODE = maxReplicasPerNode;

We have to be really careful. Solr does not error/warn on unknown params - 
existing users might keeping using the existing param for a long time, and not 
even notice it no longer has an affect. I think if we make any type of change 
like that, we should be sure to support them as an alias or perhaps explicitly 
look for the old key and fail if it's found.
  
 Make the use of Slice and Shard consistent across the code and document base
 

 Key: SOLR-4998
 URL: https://issues.apache.org/jira/browse/SOLR-4998
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Anshum Gupta
 Attachments: SOLR-4998.patch


 The interchangeable use of Slice and Shard is pretty confusing at times. We 
 should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-07-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717528#comment-13717528
 ] 

Yonik Seeley commented on SOLR-3076:


bq. Indeed! Yonik Seeley we don't need _root_ if we can submit two queries for 
deletion: ToChild(parentid:foo) and TQ(parentid:foo)!

Since solr wouldn't know how to create those queries, it seems like the user 
would need to provide them (which doesn't seem very friendly).
Also, IndexWriter currently only allows atomically specifying a term with the 
document block... deleteByQuery wouldn't be atomic.

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 5.0, 4.5

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-23 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717532#comment-13717532
 ] 

Anshum Gupta commented on SOLR-4998:


Sure, will add an alias for the same perhaps with a WARN log saying it's to be 
deprecated?

 Make the use of Slice and Shard consistent across the code and document base
 

 Key: SOLR-4998
 URL: https://issues.apache.org/jira/browse/SOLR-4998
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Anshum Gupta
 Attachments: SOLR-4998.patch


 The interchangeable use of Slice and Shard is pretty confusing at times. We 
 should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717539#comment-13717539
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1506248 from [~mikemccand] in branch 'dev/branches/lucene4335'
[ https://svn.apache.org/r1506248 ]

LUCENE-4335: add empty target in common-build.xml

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5004) Allow a shard to be split into 'n' sub-shards

2013-07-23 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717535#comment-13717535
 ] 

Anshum Gupta commented on SOLR-5004:


Any preference on the variable use here?
splits, splitcount, subshards, numsubshards ?

 Allow a shard to be split into 'n' sub-shards
 -

 Key: SOLR-5004
 URL: https://issues.apache.org/jira/browse/SOLR-5004
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.3, 4.3.1
Reporter: Anshum Gupta

 As of now, a SPLITSHARD call is hardcoded to create 2 sub-shards from the 
 parent one. Accept a parameter to split into n sub-shards.
 Default it to 2 and perhaps also have an upper bound to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717540#comment-13717540
 ] 

Joel Bernstein commented on SOLR-4787:
--

Kranti,

Odd that the pjoin cache is making things slower. I'll do some testing and see 
if I can turn up the same results. 

The join query runs first and builds a data structure in memory that is used to 
post filter the main query. The main query then runs and the post filter is 
applied.

I'm exploring another scenario that will perform 5x faster then the current 
pjoin. But the tradeoff is a longer warmup time when a new searcher is opened.

Do you have real-time indexing requirements or can you live with some warm-up 
time.

 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message 

Solr.xml parameters

2013-07-23 Thread Erick Erickson
I'm trying to finalize some of the documentation for the release of
the docs that'll
happen Real Soon Now so I need to nail these down.

How close are these definitions for the following parameters?

distribUpdateConnTimeout - the time any update will wait for a node to
respond to an
  indexing request.

distribUpdateSoTimeout - The socket read timeout before the thread
assumes the read
operation will never complete due to some kind of networking problem.

leaderVoteWait - when SolrCloud is starting up, how long we'll wait
before assuming that
no leader will identify itself.

genericCoreNodeNames - I have no idea.

managementPath - no clue

roles - why do I care to set this parameter?

coreNodeName - how is this different than name? Is it something anyone
should mess with
 and why?

logging watcher threshold - No clue what this does.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-07-23 Thread Kranti Parisa (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717553#comment-13717553
 ] 

Kranti Parisa commented on SOLR-4787:
-

Joel,

Thanks for the details. Yes, we do some real-time indexing. Say, every 30min we 
get deltas. how much warmup time that we are looking at for 5M docs?

Also, if we have more than one pjoins in the fq, each points to their own 
cores, can those pjoins be executed in parallel and find the intersection which 
will finally be applied as a filter for the main query?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 5.0, 4.5

 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787-pjoin-long-keys.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *ValueSourceJoinParserPlugin aka vjoin*
 The second implementation is the ValueSourceJoinParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return a value from a 
 second core based on join keys and limiting query. The limiting query can be 
 used to select a specific subset of data from the join core. This allows 
 customer specific relevance data to be stored in a separate core and then 
 joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore. The query is used to select a specific set of 
 records to join with in fromCore.
 Currently the fromKey and toKey must be longs but this will change in future 
 versions. Like the pjoin, the join SolrCache is used to hold the join 
 memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
 valueSourceParser name=vjoin 
 class=org.apache.solr.joins.ValueSourceJoinParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (SOLR-5061) 4.4 refguide pages new solr.xml format

2013-07-23 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717550#comment-13717550
 ] 

Cassandra Targett commented on SOLR-5061:
-

[~erickerickson] I'm adding comments to each page as I read through to help you 
with some formatting issues  content suggestions.

The pages under Solr Cores and solr.xml need to be ordered better - they're 
in alpha order now. The order I'd suggest is the order I discussed them in my 
initial proposal - up to you, but it's ideal to have them flow together on 
screen and in the PDF:

a. Format of solr.xml
b. Legacy solr.xml Configuration
c. Moving to the New solr.xml Format
d. CoreAdminHandler Parameters and Usage

(To re-order pages, go to Browse then Pages (up by your name at top). Then 
choose Tree. You'll see a hierarchical list of pages and can move re-order them 
there.)

 4.4 refguide pages new solr.xml format
 --

 Key: SOLR-5061
 URL: https://issues.apache.org/jira/browse/SOLR-5061
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Erick Erickson
 Fix For: 5.0, 4.5


 breaking off from parent issue...
 * 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml
 ** SOLR-4757: Change the example to use the new solr.xml format and core 
 discovery by directory structure. (Mark Miller)
 *** CT: There is a page on solr.xml: 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml.
  This should be updated to show the new format and still include information 
 on the old format for anyone with the old format who uses this guide for 
 reference.
 ** SOLR-4655: Add option to have Overseer assign generic node names so that 
 new addresses can host shards without naming confusion. (Mark Miller, Anshum 
 Gupta)
 *** CT: I think this only needs to be added to any new content for solr.xml 
 at 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml
 It should also be noted that cassandra posted some additional deailed 
 suggests in a comment on the existing page in the ref guide...
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml?focusedCommentId=33296160#comment-33296160

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717567#comment-13717567
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1506258 from [~mikemccand] in branch 'dev/branches/lucene4335'
[ https://svn.apache.org/r1506258 ]

LUCENE-4335: fix generators to match recent code changes to the gen'd files

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717620#comment-13717620
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1506281 from [~mikemccand] in branch 'dev/branches/lucene4335'
[ https://svn.apache.org/r1506281 ]

LUCENE-4335: add -r 623 to instructions for checking out jflex

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5128) Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException

2013-07-23 Thread crocket (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717621#comment-13717621
 ] 

crocket commented on LUCENE-5128:
-

Wait until this weekend. I'm going to check the stacktrace this saturday.

 Calling IndexSearcher.searchAfter beyond the number of stored documents 
 causes ArrayIndexOutOfBoundsException
 -

 Key: LUCENE-5128
 URL: https://issues.apache.org/jira/browse/LUCENE-5128
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.2
Reporter: crocket
Assignee: Shai Erera
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5128.patch, LUCENE-5128.patch


 ArrayIndexOutOfBoundsException makes it harder to reason about the cause.
 Is there a better way to notify programmers of the cause?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support

2013-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717623#comment-13717623
 ] 

Mark Miller commented on SOLR-5060:
---

Any comments on a good location for sticking this? It seems perhaps another top 
level topic. It simply lets you put the index, lock and transaction log in hdfs 
rather than on the local filesystem.

 4.4 refguide pages on hdfs support
 --

 Key: SOLR-5060
 URL: https://issues.apache.org/jira/browse/SOLR-5060
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 5.0, 4.5


 breaking off from parent...
 * Completley new docs about the HDFS SolrCloud support ... somewhere
 ** SOLR-4916: Add support to write and read Solr index files and transaction 
 log files to and from HDFS. (phunt, Mark Miller, Greg Chanan)
 *** CT: Without studying this more, it's hard to know where this should go. 
 It's not really SolrCloud, and it's not really a client, but depending on why 
 it's being done it could overlap with either...If someone writes up what 
 you'd tell someone about using it, I could give a better idea of where it 
 fits in the existing page organization (if it does).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717625#comment-13717625
 ] 

Robert Muir commented on LUCENE-4335:
-

Cool Mike: regenerate seems to be working!

But now I think we need to edit [~thetaphi]'s groovy script to be a macro that 
fails also if any files were modified.
We should use this for verifying the regenerated sources have not changed.
I think we should also use this in jenkins after running tests.

The precommit test can keep it off as it does now, but jenkins can be more 
strict.

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717627#comment-13717627
 ] 

ASF subversion and git services commented on LUCENE-4335:
-

Commit 1506284 from [~mikemccand] in branch 'dev/branches/lucene4335'
[ https://svn.apache.org/r1506284 ]

LUCENE-4335: don't regenerate for precommit

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support

2013-07-23 Thread Cassandra Targett (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717646#comment-13717646
 ] 

Cassandra Targett commented on SOLR-5060:
-

Maybe under https://cwiki.apache.org/confluence/display/solr/Managing+Solr? 
It's a teeny bit of a stretch for what's already there, but not wildly so 
(since logging is under there).

If not there, I don't think there's a big problem with a top level topic for 
now.

 4.4 refguide pages on hdfs support
 --

 Key: SOLR-5060
 URL: https://issues.apache.org/jira/browse/SOLR-5060
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 5.0, 4.5


 breaking off from parent...
 * Completley new docs about the HDFS SolrCloud support ... somewhere
 ** SOLR-4916: Add support to write and read Solr index files and transaction 
 log files to and from HDFS. (phunt, Mark Miller, Greg Chanan)
 *** CT: Without studying this more, it's hard to know where this should go. 
 It's not really SolrCloud, and it's not really a client, but depending on why 
 it's being done it could overlap with either...If someone writes up what 
 you'd tell someone about using it, I could give a better idea of where it 
 fits in the existing page organization (if it does).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support

2013-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717650#comment-13717650
 ] 

Mark Miller commented on SOLR-5060:
---

Thanks, Managing+Solr looks good.

 4.4 refguide pages on hdfs support
 --

 Key: SOLR-5060
 URL: https://issues.apache.org/jira/browse/SOLR-5060
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 5.0, 4.5


 breaking off from parent...
 * Completley new docs about the HDFS SolrCloud support ... somewhere
 ** SOLR-4916: Add support to write and read Solr index files and transaction 
 log files to and from HDFS. (phunt, Mark Miller, Greg Chanan)
 *** CT: Without studying this more, it's hard to know where this should go. 
 It's not really SolrCloud, and it's not really a client, but depending on why 
 it's being done it could overlap with either...If someone writes up what 
 you'd tell someone about using it, I could give a better idea of where it 
 fits in the existing page organization (if it does).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5060) 4.4 refguide pages on hdfs support

2013-07-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717679#comment-13717679
 ] 

Hoss Man commented on SOLR-5060:


bq. It simply lets you put the index, lock and transaction log in hdfs rather 
than on the local filesystem.

how is it enabled/configured?  would it make sense just to mention the details 
on the appropriate sub-pages of 
https://cwiki.apache.org/confluence/display/solr/Configuring+solrconfig.xml ?

 4.4 refguide pages on hdfs support
 --

 Key: SOLR-5060
 URL: https://issues.apache.org/jira/browse/SOLR-5060
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Mark Miller
 Fix For: 5.0, 4.5


 breaking off from parent...
 * Completley new docs about the HDFS SolrCloud support ... somewhere
 ** SOLR-4916: Add support to write and read Solr index files and transaction 
 log files to and from HDFS. (phunt, Mark Miller, Greg Chanan)
 *** CT: Without studying this more, it's hard to know where this should go. 
 It's not really SolrCloud, and it's not really a client, but depending on why 
 it's being done it could overlap with either...If someone writes up what 
 you'd tell someone about using it, I could give a better idea of where it 
 fits in the existing page organization (if it does).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #917: POMs out of sync

2013-07-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/917/

2 tests failed.
FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=8077, name=recoveryCmdExecutor-4833-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.BasicDistributedZkTest: 
   1) Thread[id=8077, name=recoveryCmdExecutor-4833-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:365)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:291)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
at __randomizedtesting.SeedInfo.seed([7F33FE98D353BCE3]:0)


FAILED:  
org.apache.solr.cloud.BasicDistributedZkTest.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
There are still zombie threads that couldn't be terminated:
   1) Thread[id=8077, name=recoveryCmdExecutor-4833-thread-1, state=RUNNABLE, 
group=TGRP-BasicDistributedZkTest]
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at 

[jira] [Commented] (LUCENE-4335) Builds should regenerate all generated sources

2013-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717740#comment-13717740
 ] 

Robert Muir commented on LUCENE-4335:
-

{code}
regenerateAndCheck:

BUILD SUCCESSFUL
Total time: 57 seconds
{code}

 Builds should regenerate all generated sources
 --

 Key: LUCENE-4335
 URL: https://issues.apache.org/jira/browse/LUCENE-4335
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4335.patch, LUCENE-4335.patch


 We have more and more sources that are generated programmatically (query 
 parsers, fuzzy levN tables from Moman, packed ints specialized decoders, 
 etc.), and it's dangerous because developers may directly edit the generated 
 sources and forget to edit the meta-source.  It's happened to me several 
 times ... most recently just after landing the BlockPostingsFormat branch.
 I think we should re-gen all of these in our builds and fail the build if 
 this creates a difference.  I know some generators (eg JavaCC) embed 
 timestamps and so always create mods ... we can leave them out of this for 
 starters (or maybe post-process the sources to remove the timestamps) ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5063) 4.4 refguide improvements on new doc adding screen in ui

2013-07-23 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-5063.


   Resolution: Fixed
Fix Version/s: (was: 4.5)
   (was: 5.0)
 Assignee: Hoss Man  (was: Grant Ingersoll)

Pretty sure grant is traveling at the moment ... cassandra's changes all looked 
good to me, so i updated the doc

 4.4 refguide improvements on new doc adding screen in ui
 

 Key: SOLR-5063
 URL: https://issues.apache.org/jira/browse/SOLR-5063
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Hoss Man

 breaking off from parent issue...
 * https://cwiki.apache.org/confluence/display/solr/Core-Specific+Tools
 ** SOLR-4921: Admin UI now supports adding documents to Solr (gsingers, 
 steffkes)
 ** stub page with screenshot exists, but it needs verbage explaining how it 
 works and what the diff options mean

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5061) 4.4 refguide pages new solr.xml format

2013-07-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717814#comment-13717814
 ] 

Hoss Man commented on SOLR-5061:


FYI: i went ahead and re-orderd the pages as cassandra suggested, and cleaned 
up a bunch of hte formatting -- both based on the suggestions in cassandra's 
various comments, as well as some other minor formatting nits.

meat of the content is still pretty much the same.

 4.4 refguide pages new solr.xml format
 --

 Key: SOLR-5061
 URL: https://issues.apache.org/jira/browse/SOLR-5061
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Erick Erickson
 Fix For: 5.0, 4.5


 breaking off from parent issue...
 * 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml
 ** SOLR-4757: Change the example to use the new solr.xml format and core 
 discovery by directory structure. (Mark Miller)
 *** CT: There is a page on solr.xml: 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml.
  This should be updated to show the new format and still include information 
 on the old format for anyone with the old format who uses this guide for 
 reference.
 ** SOLR-4655: Add option to have Overseer assign generic node names so that 
 new addresses can host shards without naming confusion. (Mark Miller, Anshum 
 Gupta)
 *** CT: I think this only needs to be added to any new content for solr.xml 
 at 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml
 It should also be noted that cassandra posted some additional deailed 
 suggests in a comment on the existing page in the ref guide...
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml?focusedCommentId=33296160#comment-33296160

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list

2013-07-23 Thread Feihong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717827#comment-13717827
 ] 

Feihong Huang commented on SOLR-5057:
-

hi, erickson. I think that way can re-use the filterCache when the fq clauses 
were ordered differently. I am a new comer in learning solr. If there is some 
points that i am not considering sufficiently, i apologize for this.

 queryResultCache should not related with the order of fq's list
 ---

 Key: SOLR-5057
 URL: https://issues.apache.org/jira/browse/SOLR-5057
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0, 4.1, 4.2, 4.3
Reporter: Feihong Huang
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-5057.patch, SOLR-5057.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 There are two case query with the same meaning below. But the case2 can't use 
 the queryResultCache when case1 is executed.
 case1: q=*:*fq=field1:value1fq=field2:value2
 case2: q=*:*fq=field2:value2fq=field1:value1
 I think queryResultCache should not be related with the order of fq's list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5065) ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent

2013-07-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717835#comment-13717835
 ] 

Hoss Man commented on SOLR-5065:


would it be worthwhile to add a java-mode init param to all of the 
Parse[NumberClass]FieldUpdateProcessorFactories that was mutually exclusive to 
the locale and generated UpdateProcessors that used the appropriate 
NumberClass.valueOf(String) instead of a NumberFormat?

And assuming that would be worthwhile ... would it also make sense to change 
the default behavior of {{processor 
class=solr.ParseDoubleFieldUpdateProcessorFactory /}} from assuming 
{{locale=ROOT}} to {{java-mode=true}} ?

 ParseDoubleFieldUpdateProcessorFactory is unable to parse + in exponent
 -

 Key: SOLR-5065
 URL: https://issues.apache.org/jira/browse/SOLR-5065
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.4
Reporter: Jack Krupansky

 The ParseDoubleFieldUpdateProcessorFactory is unable to parse the full syntax 
 of Java/JSON scientific notation. Parse fails for 4.5E+10, but does succeed 
 for 4.5E10 and 4.5E-10.
 Using the schema and config from example-schemaless, I added this data:
 {code}
   curl http://localhost:8983/solr/update?commit=true; \
   -H 'Content-type:application/json' -d '
   [{id: doc-1,
 a1: Hello World,
 a2: 123,
 a3: 123.0,
 a4: 1.23,
 a5: 4.5E+10,
 a6: 123,
 a7: true,
 a8: false,
 a9: true,
 a10: 2013-07-22,
 a11: 4.5E10,
 a12: 4.5E-10,
 a13: 4.5E+10,
 a14: 4.5E10,
 a15: 4.5E-10}]'
 {code}
 A query returns:
 {code}
   doc
 str name=iddoc-1/str
 arr name=a1
   strHello World/str
 /arr
 arr name=a2
   long123/long
 /arr
 arr name=a3
   double123.0/double
 /arr
 arr name=a4
   double1.23/double
 /arr
 arr name=a5
   double4.5E10/double
 /arr
 arr name=a6
   long123/long
 /arr
 arr name=a7
   booltrue/bool
 /arr
 arr name=a8
   boolfalse/bool
 /arr
 arr name=a9
   booltrue/bool
 /arr
 arr name=a10
   date2013-07-22T00:00:00Z/date
 /arr
 arr name=a11
   double4.5E10/double
 /arr
 arr name=a12
   double4.5E-10/double
 /arr
 arr name=a13
   str4.5E+10/str
 /arr
 arr name=a14
   double4.5E10/double
 /arr
 arr name=a15
   double4.5E-10/double
 /arr
 long name=_version_1441308941516537856/long/doc
 {code}
 The input value of a13 was the same as a5, but was treated as a string, 
 rather than parsed as a double. So, JSON/Java was able to parse 4.5E+10, 
 but this update processor was not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5061) 4.4 refguide pages new solr.xml format

2013-07-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5061.
--

Resolution: Fixed

Thanks Hoss and Cassandra for helping!

I'm declaring victory here, we can re-open these as necessary.

 4.4 refguide pages new solr.xml format
 --

 Key: SOLR-5061
 URL: https://issues.apache.org/jira/browse/SOLR-5061
 Project: Solr
  Issue Type: Sub-task
  Components: documentation
Reporter: Hoss Man
Assignee: Erick Erickson
 Fix For: 5.0, 4.5


 breaking off from parent issue...
 * 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml
 ** SOLR-4757: Change the example to use the new solr.xml format and core 
 discovery by directory structure. (Mark Miller)
 *** CT: There is a page on solr.xml: 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml.
  This should be updated to show the new format and still include information 
 on the old format for anyone with the old format who uses this guide for 
 reference.
 ** SOLR-4655: Add option to have Overseer assign generic node names so that 
 new addresses can host shards without naming confusion. (Mark Miller, Anshum 
 Gupta)
 *** CT: I think this only needs to be added to any new content for solr.xml 
 at 
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml
 It should also be noted that cassandra posted some additional deailed 
 suggests in a comment on the existing page in the ref guide...
 https://cwiki.apache.org/confluence/display/solr/Core+Admin+and+Configuring+solr.xml?focusedCommentId=33296160#comment-33296160

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4542) Add entries to CHANGES.txt and Wiki for the obsoleting solr.xml and lots of cores

2013-07-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4542.
--

   Resolution: Fixed
Fix Version/s: 4.3

We updated CHANGES.text an embarrassingly long time ago, I'm finally getting 
this JIRA closed.

 Add entries to CHANGES.txt and Wiki for the obsoleting solr.xml and lots 
 of cores
 -

 Key: SOLR-4542
 URL: https://issues.apache.org/jira/browse/SOLR-4542
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.3


 Marker to be sure I don't forget...
 SOLR-4196, SOLR-4401, etc. Several new capabilities need to be elucidated 
 both on the Wiki and in CHANGES.txt
 1 rapidly opening/closing cores
 2 discovery-based core enumeration

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-23 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717911#comment-13717911
 ] 

Han Jiang commented on LUCENE-3069:
---

bq. You should not need to .getPosition / .setPosition on the fstReader:

Oh, yes! I'll fix.

bq. I think we can't really make use of it, which is fine (it's an optional 
optimization).

OK, actually I was quite curious why we don't make use of commonPrefixRef 
in CompiledAutomaton. Maybe we can determinize the input Automaton first, then
get commonPrefixRef via SpecialOperation? Is it too slow, or the prefix isn't
always long enough to take into consideration?

bq. But this can only be done if that FST node's arcs are array'd right?

Yes, array arcs only, and we might need methods like advance(label) to do the 
search,
and here gossip search might work better than traditional binary search.

{quote}
Separately, supporting ord w/ FST terms dict should in theory be not
so hard; you'd need to use getByOutput to seek by ord. Maybe (later,
eventually) we can make this a write-time option. We should open a
separate issue ...
{quote}

Ah, yes, but seems that getByOutput doesn't rewind/reuse previous state?
We always have to start from first arc during every seek. However, I'm 
not sure in what kinds of usecase we need the ord information.


I'll commit current version first, so we can iterate.

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 5.0, 4.5

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717912#comment-13717912
 ] 

ASF subversion and git services commented on LUCENE-3069:
-

Commit 1506385 from [~billy] in branch 'dev/branches/lucene3069'
[ https://svn.apache.org/r1506385 ]

LUCENE-3069: support intersect operations

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 5.0, 4.5

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717922#comment-13717922
 ] 

Noble Paul commented on SOLR-5069:
--

bq.reduce() can start only when all mappers are finished

Why. why can't reduce start as soon as the mappers start producing? whatever is 
emitted by the mapper is up for reducer to chew on. 

All said,  map side combiner is definitely useful and would reduce 
memory/network IO

 MapReduce for SolrCloud
 ---

 Key: SOLR-5069
 URL: https://issues.apache.org/jira/browse/SOLR-5069
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul

 Solr currently does not have a way to run long running computational tasks 
 across the cluster. We can piggyback on the mapreduce paradigm so that users 
 have smooth learning curve.
  * The mapreduce component will be written as a RequestHandler in Solr
  * Works only in SolrCloud mode. (No support for standalone mode) 
  * Users can write MapReduce programs in Javascript or Java. First cut would 
 be JS ( ? )
 h1. sample word count program
 h2.how to invoke?
 http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX
 h3. params 
 * map :  A javascript implementation of the map program
 * reduce : a Javascript implementation of the reduce program
 * sink : The collection to which the output is written. If this is not passed 
 , the request will wait till completion and respond with the output of the 
 reduce program and will be emitted as a standard solr response. . If no sink 
 is passed the request will be redirected to the reduce node where it will 
 wait till the process is complete. If the sink param is passed ,the rsponse 
 will contain an id of the run which can be used to query the status in 
 another command.
 * reduceNode : Node name where the reduce is run . If not passed an arbitrary 
 node is chosen
 The node which received the command would first identify one replica from 
 each slice where the map program is executed . It will also identify one 
 another node from the same collection where the reduce program is run. Each 
 run is given an id and the details of the nodes participating in the run will 
 be written to ZK (as an ephemeral node). 
 h4. map script 
 {code:JavaScript}
 var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
 this index
 while(res.hasMore()){
   var doc = res.next();
   var txt = doc.get(“txt”);//the field on which word count is performed
   var words = txt.split( );
for(i = 0; i  words.length; i++){
   $.map(words[i],{‘count’:1});// this will send the map over to //the 
 reduce host
 }
 }
 {code}
 Essentially two threads are created in the 'map' hosts . One for running the 
 program and the other for co-ordinating with the 'reduce' host . The maps 
 emitted are streamed live over an http connection to the reduce program
 h4. reduce script
 This script is run in one node . This node accepts http connections from map 
 nodes and the 'maps' that are sent are collected in a queue which will be 
 polled and fed into the reduce program. This also keeps the 'reduced' data in 
 memory till the whole run is complete. It expects a done message from all 
 'map' nodes before it declares the tasks are complete. After  reduce program 
 is executed for all the input it proceeds to write out the result to the 
 'sink' collection or it is written straight out to the response.
 {code:JavaScript}
 var pair = $.nextMap();
 var reduced = $.getCtx().getReducedMap();// a hashmap
 var count = reduced.get(pair.key());
 if(count === null) {
   count = {“count”:0};
   reduced.put(pair.key(), count);
 }
 count.count += pair.val().count ;
 {code}
 h4.example output
 {code:JavaScript}
 {
 “result”:[
 “wordx”:{ 
  “count”:15876765
  },
 “wordy” : {
“count”:24657654
   }
  
   ]
 }
 {code}
 TBD
 * The format in which the output is written to the target collection, I 
 assume the reducedMap will have values mapping to the schema of the collection
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13717926#comment-13717926
 ] 

ASF subversion and git services commented on LUCENE-3069:
-

Commit 1506389 from [~billy] in branch 'dev/branches/lucene3069'
[ https://svn.apache.org/r1506389 ]

LUCENE-3069: no need to reseek FSTReader, update nocommits

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 5.0, 4.5

 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5071) Solrcloud change core to another shard issue

2013-07-23 Thread Illu Y Ying (JIRA)
Illu Y Ying created SOLR-5071:
-

 Summary: Solrcloud change core to another shard issue
 Key: SOLR-5071
 URL: https://issues.apache.org/jira/browse/SOLR-5071
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Illu Y Ying


I have a solrcloud cluster with one collection and two shards.
One core is a replica for shard1, I stop it and change its solr.xml like this:
core name=collection1 instanceDir=collection1 shard=shard2/
So this core should be a shard2 replica, Then I restart it, open cloud graph 
page, you can see this core as a down replica still in shard1 and also as a 
active replica in shard2.
So I would like to suggest you to remove the down replica information from 
clusterStatus.json.
There is doubt about one core status in two shards.
In this one core has two status scenario I suggest that if we could remove the 
down replica information of other shard in clusterStatus.json.
I remember when core is changing to active status, it will send overseer an 
active status message, so add this logic to overseer change core to active 
status part.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5071) Solrcloud change core to another shard issue

2013-07-23 Thread Illu Y Ying (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illu Y Ying updated SOLR-5071:
--

Attachment: 2013-7-24 11-55-45.png

 Solrcloud change core to another shard issue
 

 Key: SOLR-5071
 URL: https://issues.apache.org/jira/browse/SOLR-5071
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Illu Y Ying
 Attachments: 2013-7-24 11-55-45.png


 I have a solrcloud cluster with one collection and two shards.
 One core is a replica for shard1, I stop it and change its solr.xml like this:
 core name=collection1 instanceDir=collection1 shard=shard2/
 So this core should be a shard2 replica, Then I restart it, open cloud graph 
 page, you can see this core as a down replica still in shard1 and also as a 
 active replica in shard2.
 So I would like to suggest you to remove the down replica information from 
 clusterStatus.json.
 There is doubt about one core status in two shards.
 In this one core has two status scenario I suggest that if we could remove 
 the down replica information of other shard in clusterStatus.json.
 I remember when core is changing to active status, it will send overseer an 
 active status message, so add this logic to overseer change core to active 
 status part.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5071) Solrcloud change core to another shard issue

2013-07-23 Thread Illu Y Ying (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illu Y Ying updated SOLR-5071:
--

Description: 
I have a solrcloud cluster with one collection and two shards.
One core is a replica for shard1, I stop it and change its solr.xml like this:
core name=collection1 instanceDir=collection1 shard=shard2/
So this core should be a shard2 replica, Then I restart it, open cloud graph 
page(see attachment), you can see this core as a down replica still in shard1 
and also as a active replica in shard2.
So I would like to suggest you to remove the down replica information from 
clusterStatus.json.
There is doubt about one core status in two shards.
In this one core has two status scenario I suggest that if we could remove the 
down replica information of other shard in clusterStatus.json.
I remember when core is changing to active status, it will send overseer an 
active status message, so add this logic to overseer change core to active 
status part.

  was:
I have a solrcloud cluster with one collection and two shards.
One core is a replica for shard1, I stop it and change its solr.xml like this:
core name=collection1 instanceDir=collection1 shard=shard2/
So this core should be a shard2 replica, Then I restart it, open cloud graph 
page, you can see this core as a down replica still in shard1 and also as a 
active replica in shard2.
So I would like to suggest you to remove the down replica information from 
clusterStatus.json.
There is doubt about one core status in two shards.
In this one core has two status scenario I suggest that if we could remove the 
down replica information of other shard in clusterStatus.json.
I remember when core is changing to active status, it will send overseer an 
active status message, so add this logic to overseer change core to active 
status part.


 Solrcloud change core to another shard issue
 

 Key: SOLR-5071
 URL: https://issues.apache.org/jira/browse/SOLR-5071
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Illu Y Ying
 Attachments: 2013-7-24 11-55-45.png


 I have a solrcloud cluster with one collection and two shards.
 One core is a replica for shard1, I stop it and change its solr.xml like this:
 core name=collection1 instanceDir=collection1 shard=shard2/
 So this core should be a shard2 replica, Then I restart it, open cloud graph 
 page(see attachment), you can see this core as a down replica still in shard1 
 and also as a active replica in shard2.
 So I would like to suggest you to remove the down replica information from 
 clusterStatus.json.
 There is doubt about one core status in two shards.
 In this one core has two status scenario I suggest that if we could remove 
 the down replica information of other shard in clusterStatus.json.
 I remember when core is changing to active status, it will send overseer an 
 active status message, so add this logic to overseer change core to active 
 status part.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5071) Solrcloud change core to another shard issue

2013-07-23 Thread Illu Y Ying (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illu Y Ying updated SOLR-5071:
--

Description: 
I have a solrcloud cluster with one collection and two shards.
One core is a replica for shard1, I stop it and change its solr.xml like this:
core name=collection1 instanceDir=collection1 shard=shard2/
So this core should be a shard2 replica, Then I restart it, open cloud graph 
page(see attachment), you can see this core as a down replica still in shard1 
and also as a active replica in shard2.

There is doubt about one core status in two shards.
In this one core has two status scenario I suggest that if we could remove the 
down replica information of other shard in clusterStatus.json.
I remember when core is changing to active status, it will send overseer an 
active status message, so add this logic to overseer change core to active 
status part.

  was:
I have a solrcloud cluster with one collection and two shards.
One core is a replica for shard1, I stop it and change its solr.xml like this:
core name=collection1 instanceDir=collection1 shard=shard2/
So this core should be a shard2 replica, Then I restart it, open cloud graph 
page(see attachment), you can see this core as a down replica still in shard1 
and also as a active replica in shard2.
So I would like to suggest you to remove the down replica information from 
clusterStatus.json.
There is doubt about one core status in two shards.
In this one core has two status scenario I suggest that if we could remove the 
down replica information of other shard in clusterStatus.json.
I remember when core is changing to active status, it will send overseer an 
active status message, so add this logic to overseer change core to active 
status part.


 Solrcloud change core to another shard issue
 

 Key: SOLR-5071
 URL: https://issues.apache.org/jira/browse/SOLR-5071
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Illu Y Ying
 Attachments: 2013-7-24 11-55-45.png


 I have a solrcloud cluster with one collection and two shards.
 One core is a replica for shard1, I stop it and change its solr.xml like this:
 core name=collection1 instanceDir=collection1 shard=shard2/
 So this core should be a shard2 replica, Then I restart it, open cloud graph 
 page(see attachment), you can see this core as a down replica still in shard1 
 and also as a active replica in shard2.
 There is doubt about one core status in two shards.
 In this one core has two status scenario I suggest that if we could remove 
 the down replica information of other shard in clusterStatus.json.
 I remember when core is changing to active status, it will send overseer an 
 active status message, so add this logic to overseer change core to active 
 status part.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (32bit/jrockit-jdk1.6.0_45-R28.2.7-4.1.0) - Build # 6627 - Failure!

2013-07-23 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6627/
Java: 32bit/jrockit-jdk1.6.0_45-R28.2.7-4.1.0 -XnoOpt

2 tests failed.
REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxRegistration

Error Message:
No SolrDynamicMBeans found

Stack Trace:
java.lang.AssertionError: No SolrDynamicMBeans found
at 
__randomizedtesting.SeedInfo.seed([61ED01840FE48A5F:EF3C65BE62A5D23A]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:774)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:683)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:44)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:662)


REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxUpdate

Error Message:
No mbean found for SolrIndexSearcher

Stack Trace:

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b99) - Build # 6628 - Still Failing!

2013-07-23 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6628/
Java: 64bit/jdk1.8.0-ea-b99 -XX:+UseCompressedOops -XX:+UseParallelGC

2 tests failed.
FAILED:  org.apache.solr.core.TestJmxIntegration.testJmxRegistration

Error Message:
No SolrDynamicMBeans found

Stack Trace:
java.lang.AssertionError: No SolrDynamicMBeans found
at 
__randomizedtesting.SeedInfo.seed([CDCE5B4F951B0045:431F3F75F85A5820]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:491)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:724)


FAILED:  org.apache.solr.core.TestJmxIntegration.testJmxUpdate

Error Message:
No mbean found for SolrIndexSearcher

Stack Trace:

[jira] [Updated] (SOLR-5069) MapReduce for SolrCloud

2013-07-23 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5069:
-

Description: 
Solr currently does not have a way to run long running computational tasks 
across the cluster. We can piggyback on the mapreduce paradigm so that users 
have smooth learning curve.

 * The mapreduce component will be written as a RequestHandler in Solr
 * Works only in SolrCloud mode. (No support for standalone mode) 
 * Users can write MapReduce programs in Javascript or Java. First cut would be 
JS ( ? )

h1. sample word count program

h2.how to invoke?

http://host:port/solr/collection-x/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX

h3. params 
* map :  A javascript implementation of the map program
* reduce : a Javascript implementation of the reduce program
* sink : The collection to which the output is written. If this is not passed , 
the request will wait till completion and respond with the output of the reduce 
program and will be emitted as a standard solr response. . If no sink is passed 
the request will be redirected to the reduce node where it will wait till the 
process is complete. If the sink param is passed ,the rsponse will contain an 
id of the run which can be used to query the status in another command.
* reduceNode : Node name where the reduce is run . If not passed an arbitrary 
node is chosen


The node which received the command would first identify one replica from each 
slice where the map program is executed . It will also identify one another 
node from the same collection where the reduce program is run. Each run is 
given an id and the details of the nodes participating in the run will be 
written to ZK (as an ephemeral node). 

h4. map script 

{code:JavaScript}
var res = $.streamQuery(*:*);//this is not run across the cluster. //Only on 
this index
while(res.hasMore()){
  var doc = res.next();
  var txt = doc.get(“txt”);//the field on which word count is performed
  var words = txt.split( );
   for(i = 0; i  words.length; i++){
$.map(words[i],{‘count’:1});// this will send the map over to //the 
reduce host
}
}
{code}

Essentially two threads are created in the 'map' hosts . One for running the 
program and the other for co-ordinating with the 'reduce' host . The maps 
emitted are streamed live over an http connection to the reduce program

h4. reduce script

This script is run in one node . This node accepts http connections from map 
nodes and the 'maps' that are sent are collected in a queue which will be 
polled and fed into the reduce program. This also keeps the 'reduced' data in 
memory till the whole run is complete. It expects a done message from all 
'map' nodes before it declares the tasks are complete. After  reduce program is 
executed for all the input it proceeds to write out the result to the 'sink' 
collection or it is written straight out to the response.

{code:JavaScript}
var pair = $.nextMap();
var reduced = $.getCtx().getReducedMap();// a hashmap
var count = reduced.get(pair.key());
if(count === null) {
  count = {“count”:0};
  reduced.put(pair.key(), count);
}
count.count += pair.val().count ;
{code}

h4.example output
{code:JavaScript}
{
“result”:[
“wordx”:{ 
 “count”:15876765
 },
“wordy” : {
   “count”:24657654
  }
 
  ]
}
{code}

TBD
* The format in which the output is written to the target collection, I assume 
the reducedMap will have values mapping to the schema of the collection


 



  was:
Solr currently does not have a way to run long running computational tasks 
across the cluster. We can piggyback on the mapreduce paradigm so that users 
have smooth learning curve.

 * The mapreduce component will be written as a RequestHandler in Solr
 * Works only in SolrCloud mode. (No support for standalone mode) 
 * Users can write MapReduce programs in Javascript or Java. First cut would be 
JS ( ? )

h1. sample word count program

h2.how to invoke?

http://host:port/solr/mapreduce?map=map-scriptreduce=reduce-scriptsink=collectionX

h3. params 
* map :  A javascript implementation of the map program
* reduce : a Javascript implementation of the reduce program
* sink : The collection to which the output is written. If this is not passed , 
the request will wait till completion and respond with the output of the reduce 
program and will be emitted as a standard solr response. . If no sink is passed 
the request will be redirected to the reduce node where it will wait till the 
process is complete. If the sink param is passed ,the rsponse will contain an 
id of the run which can be used to query the status in another command.
* reduceNode : Node name where the reduce is run . If not passed an arbitrary 
node is chosen


The node which received the command would first identify one replica from each 
slice where the map program is executed . It will also identify one another 
node from the 

  1   2   >