[jira] [Commented] (SOLR-2592) Custom Hashing

2013-07-05 Thread Leonid Krogliak (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701256#comment-13701256
 ] 

Leonid Krogliak commented on SOLR-2592:
---

I don't see this changesin the code. 
I use solr 4.3.1, but I looked for this changes in solr 4.1 too. 


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1, 5.0

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2013-01-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555459#comment-13555459
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1434401

- Make complex SOLR-2592 changes entry not get converted to a single wrapped 
line in Changes.html.
- 'Via' - 'via' (merged lucene_solr_4_1 r1434389)


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1, 5.0

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2013-01-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555463#comment-13555463
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[branch_4x commit] Steven Rowe
http://svn.apache.org/viewvc?view=revisionrevision=1434402

- Make complex SOLR-2592 changes entry not get converted to a single wrapped 
line in Changes.html.
- 'Via' - 'via' (merged lucene_solr_4_1 r1434389)


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1, 5.0

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2013-01-14 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553058#comment-13553058
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1433082

SOLR-2592: changes entry for doc routing


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2013-01-14 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553066#comment-13553066
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[branch_4x commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1433084

SOLR-2592: changes entry for doc routing


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1, 5.0

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-29 Thread Shreejay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541048#comment-13541048
 ] 

Shreejay commented on SOLR-2592:


{quote}Right - I've made good progress on that front in trunk, and we'll figure 
out how to get it ported back to 4x. I'm in the process of adding more tests 
right now.
The separator: I went with ! by default since it doesn't require URL 
encoding, but is less common than underscore. It also reminded me of the bang 
paths of old-style UUCP email addresses (like bigco!user).{quote}

Can the separator be provided as an option instead of being hard coded? I have 
been using Michaels patch and now I am trying to move to 4.1. But I will have 
to re-index all my data by changing the unique key, which right now contains a 
underscore. 

If not, will changing the separator to a underscore in CompositeIdRouter.java 
be enough? 


--Shreejay

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-17 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534003#comment-13534003
 ] 

Markus Jelsma commented on SOLR-2592:
-

Additional note for anyone: you will also see a similar exception with an older 
SolrJ application:
{code}
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:291)
at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:263)
at 
org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:274)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:142)
at 
org.apache.nutch.indexer.solr.SolrUtils.getCloudServer(SolrUtils.java:107)
at 
org.apache.nutch.indexer.solr.SolrUtils.getSolrServers(SolrUtils.java:65)
at org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:65)
at 
org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:42)
at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.init(ReduceTask.java:448)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
{code}

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-14 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532219#comment-13532219
 ] 

Markus Jelsma commented on SOLR-2592:
-

I think this belong to this issue. We're seeing with today's trunk and a clean 
Zookeeper the following exception. For some reason, only one node exhibits this 
behavior.

{code}
2012-12-14 10:10:27,355 ERROR [apache.zookeeper.ClientCnxn] - 
[main-EventThread] - : Error while calling watcher 
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at 
org.apache.solr.common.cloud.ClusterState.makeSlices(ClusterState.java:246)
at 
org.apache.solr.common.cloud.ClusterState.collectionFromObjects(ClusterState.java:231)
at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:219)
at 
org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:207)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
{code}

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-14 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532241#comment-13532241
 ] 

Markus Jelsma commented on SOLR-2592:
-

Ah! For some reason this specific node did not get the upgrade from December 
13th to the 14th while all others did. This proves the change is indeed 
backward incompatible ;) The problem is solved so please ignore my previous 
comment.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-13 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531450#comment-13531450
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1421499

SOLR-2592: add additional level in clusterstate.json for collection properties


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-13 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531455#comment-13531455
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[branch_4x commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1421506

SOLR-2592: add additional level in clusterstate.json for collection properties


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-13 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531936#comment-13531936
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1421644

SOLR-2592: test expected number of requests to servers


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-13 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532051#comment-13532051
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[branch_4x commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1421670

SOLR-2592: test expected number of requests to servers


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, 
 SOLR-2592_collectionProperties.patch, SOLR-2592.patch, 
 SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-12 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530162#comment-13530162
 ] 

Shawn Heisey commented on SOLR-2592:


I use the hot shard concept in Solr 3.5.0.  For the cold shards, I split 
documents using a MOD on the CRC32 hash of a MySQL bigint autoincrement field - 
my MySQL query does the CRC32 and the MOD.  That field's actual value is 
translated to a tlong field in the schema.  For the hot shard, I simply use a 
split point on the actual value of that field.  Everything less than or equal 
to the split point goes to the cold shards, everything greater than the split 
point goes to the hot shard.  Multiple shards are handled by a single Solr 
instance - seven shards live on two servers.

This arrangement requires that I do a daily distribute process where I index 
(from MySQL) data between the old split point and the new split point to the 
cold shards, then delete that data from the hot shard. Full reindexes are done 
with the dataimport handler and controlled by SolrJ, everything else (including 
the distribute) is done directly with SolrJ.

How much of that could be automated and put server-side with the features added 
by this issue?  If I have to track shard and core names myself in order to do 
the distribute, then I will have to decide whether the other automation I would 
gain is worth switching to SolrCloud.

If I could avoid the client-side distribute indexing and have Solr shuffle the 
data around itself, that would be awesome, but I'm not sure that's possible, 
and it may be somewhat complicated by the fact that I have a number of unstored 
fields that I search on.

At some point I will test performance on an index where I do not have a hot 
shard, where the data is simply hashed between several large shards.  This 
entire concept was implemented for fast indexing of new data - because Solr 1.4 
did not have NRT features.


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-12 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530378#comment-13530378
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[branch_4x commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1420992

SOLR-2592: SOLR-1028: SOLR-3922: SOLR-3911: sync trunk with 4x


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-11 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529124#comment-13529124
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1420284

SOLR-2592: cleanups - remove unneeded range info, rename getLeaderProps to 
getLeaderRetry


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-11 Thread Shreejay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529146#comment-13529146
 ] 

Shreejay commented on SOLR-2592:


Hi Yonik,

What configuration (solrconfig / solr.xml /schema.xml / ZK ?) do I have to 
change in order to use these changes in trunk? I have a string field on which I 
want to do group by and hence I want all documents with same value of the 
string field to be on same side. 

I was using a 4x branch with Michael's patch, but ran into heap space issues 
mentioned in https://issues.apache.org/jira/browse/SOLR-4144 . I was not able 
to merge trunk to 4x and just saw your comment that you also had issues while 
doing this.  

So my next step might be to use the trunk directly with the patch supplied for 
4144. How do I specify that I want to shard based on a specific string field? 

Thanks. 

--Shreejay

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529150#comment-13529150
 ] 

Mark Miller commented on SOLR-2592:
---

Great to see someone that can help with additional testing :)

This will get merged back to 4x very soon for 4.1. It's a pain because of some 
work I did around full support for custom Directories. Merging both back at the 
same time will probably be easiest.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-11 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529240#comment-13529240
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1420338

SOLR-2592: avoid splitting composite router hash domains


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529248#comment-13529248
 ] 

Yonik Seeley commented on SOLR-2592:


So with the composite key hashing strategy of using the first part of the ID as 
the upper bits (what I've started calling a hash domain) and the second part 
the lower bits, one needed to use an initial numShards equal to a power of two 
*if* one wanted to absolutely avoid a hash domain from being split over more 
than one shard.  For most people, this wouldn't be an issue - you only need to 
guarantee splitting domains if you're using a feature that doesn't work across 
shards (like pseudo-join for example).

I just checked in a change to remove that limitation.  You can now use any 
numShards and the resulting shard ranges will be rounded to the nearest hash 
domain to avoid splitting them (this amounts to a maximum size rounding error 
of 1/65536th of the hash ring, or 0.002%).


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
Assignee: Yonik Seeley
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-09 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527523#comment-13527523
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1419019

SOLR-2592: fix implicit router to return found slice, test querying via shards


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-08 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527263#comment-13527263
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418755

SOLR-2592: integration tests for routing



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-08 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527275#comment-13527275
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418762

SOLR-2592: realtime-get support



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-08 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527342#comment-13527342
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418814

SOLR-2592: deleteByQuery routing support



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Shreejay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526502#comment-13526502
 ] 

Shreejay commented on SOLR-2592:


Hi Michael, 

I ran the tests for solr after compiling it with the patch. A few of the tests 
failed. I am trying to fix these. If anyone else has used this patch 
successfully without errors, please let me know. 

[junit4:junit4] Completed in 0.10s, 1 test, 1 skipped
[junit4:junit4]
[junit4:junit4]
[junit4:junit4] Tests with failures:
[junit4:junit4]   - org.apache.solr.handler.admin.ShowFileRequestHandlerTest.tes
tGetRawFile
[junit4:junit4]   - org.apache.solr.handler.admin.ShowFileRequestHandlerTest.tes
tDirList
[junit4:junit4]   - org.apache.solr.cloud.ZkCLITest.testBootstrap
[junit4:junit4]   - org.apache.solr.cloud.BasicDistributedZkTest.testDistribSear
ch
[junit4:junit4]   - org.apache.solr.cloud.BasicDistributedZkTest (suite)
[junit4:junit4]
[junit4:junit4]
[junit4:junit4] JVM J0: 1.87 ..  2645.62 =  2643.75s
[junit4:junit4] Execution time total: 44 minutes 10 seconds
[junit4:junit4] Tests summary: 238 suites, 983 tests, 3 suite-level errors, 3 er
rors, 1 failure, 426 ignored (4 assumptions)

BUILD FAILED

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526522#comment-13526522
 ] 

Yonik Seeley commented on SOLR-2592:


bq. That being said the implementation is going in a different direction that 
will move the custom hashing configuration to be persisted within Zookeeper to 
allow a client to be aware the custom hashing on the collection without having 
to parse the solrconfig. I have not yet had the chance to review the work Yonik 
has committed so far.

Right - I've made good progress on that front in trunk, and we'll figure out 
how to get it ported back to 4x.  I'm in the process of adding more tests right 
now.
The separator: I went with ! by default since it doesn't require URL 
encoding, but is less common than underscore.  It also reminded me of the bang 
paths of old-style UUCP email addresses (like bigco!user).

A composite id router is enabled by default since the bangs are optional.

For querying, I used the param that Michael started with, shard.keys.
So you can do shard.keys=bigco!,littleco! at query time.
(the bangs matter!  leaving it out will simply query the shard containing a 
simple document id, while putting it in will query a range of documents all 
covered by that domain).

I've added distributed short-circuiting as well... if you only need to query a 
single shard, and you send the query directly to a replica of that shard that 
is active, the distributed search phase will be skipped and we'll drop directly 
to a local search.


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527051#comment-13527051
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418507

SOLR-2592: more DocCollection refactoring in tests



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527053#comment-13527053
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418337

SOLR-2592: fix composite id router for full-range



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527054#comment-13527054
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418116

SOLR-2592: fix composite id router slice selection



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527055#comment-13527055
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418100

SOLR-2592: fix shifting by 32 bits in compositeId router



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527056#comment-13527056
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418093

SOLR-2592: add hash router tests, fix compositeId bit specification parsing



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527058#comment-13527058
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418043

SOLR-2592: refactor routers to separate public classes



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-07 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527059#comment-13527059
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1418030

SOLR-2592: avoid distrib search if current core is active and hosting targeted 
shard



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-06 Thread Shreejay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511602#comment-13511602
 ] 

Shreejay commented on SOLR-2592:


Hi All, 

I am trying to use the patches mentioned here. Specifically the one by Michael 
which hashes on a combined field. 
(https://issues.apache.org/jira/browse/SOLR-2592?focusedCommentId=13280334page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13280334)

My requirement is exactly as mentioned by Andy 
https://issues.apache.org/jira/browse/SOLR-2592?focusedCommentId=13294837page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13294837

I have downloaded the latest Nightly build but am not sure how to configure it. 
Are the patches on the nightly build? 
(https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/)

How and where in solrconfig do I configure shardKeyParserFactory 
class=solr.ShardKeyParserFactory/ ?





 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-06 Thread Michael Garski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13514651#comment-13514651
 ] 

Michael Garski commented on SOLR-2592:
--

Hi Shreejay - 

I have not updated my patch in a few months so it may not apply cleanly to the 
latest in branch_4x.  The last patch I updated is SOLR-2592_r1384367.patch. If 
it does apply cleanly you can configure the shard key parser as a child of the 
'config' element. Here is an example that will work with a composite unique id 
delimited with underscores such as 123_abc, and the document will be hashed 
based on 123. The clause section of the config will be used in a delete by 
query, and in the example below the value of clause for the field foo_field 
will be used to hash on to determine which shard the delete will be forwarded 
to. If the clause is required and a delete by query is submitted by a client 
the request will return an error.

{code:xml}
shardKeyParserFactory 
class=org.apache.solr.common.cloud.CompositeIdShardKeyParser
str name=clausefoo_field/str
bool name=clauseRequiredfalse/bool
/shardKeyParserFactory
{code}

That being said the implementation is going in a different direction that will 
move the custom hashing configuration to be persisted within Zookeeper to allow 
a client to be aware the custom hashing on the collection without having to 
parse the solrconfig. I have not yet had the chance to review the work Yonik 
has committed so far.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-06 Thread Michael Garski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526035#comment-13526035
 ] 

Michael Garski commented on SOLR-2592:
--

Shreejay, in your example the value that would be hashed to determine shard 
membership would be the guidid, and those that hashed to the same value would 
be on the same shard. The foo_field is a different field from the unique id and 
its value would be the same as that is hashed. For example, in a system that 
will keep all documents that belong to a specific user id on the same shard, 
the unique, composite id would be userId_otherIdData and the clause field 
would contain the user id to allow for a delete by query using the user id as a 
clause in the query to forward the delete to only the shards that contain that 
data.

Did all of the tests pass on your build? There are tests in my patch that will 
verify it works properly.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-06 Thread Shreejay (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526060#comment-13526060
 ] 

Shreejay commented on SOLR-2592:


Thanks Michael. So my configuration seems to be good. I just downloaded the 
source and applied the patch again. 
I am running the tests again now. 


Last time I ran them it got stuck at these messages for a long time. So I 
cancelled it. I will let it run this time till the finish. 

only … [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:32:15, stalled for 2508s at: 
TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads
[junit4:junit4] HEARTBEAT J0: 2012-12-06T13:33:16, stalled for 2569s at: TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads
[junit4:junit4] HEARTBEAT J0: 2012-12-06T13:34:16, stalled for 2629s at: TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads
[junit4:junit4] HEARTBEAT J0: 2012-12-06T13:35:16, stalled for 2689s at: TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads
[junit4:junit4] HEARTBEAT J0: 2012-12-06T13:36:16, stalled for 2749s at: TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads
[junit4:junit4] HEARTBEAT J0: 2012-12-06T13:37:16, stalled for 2809s at: TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads
[junit4:junit4] HEARTBEAT J0: 2012-12-06T13:38:16, stalled for 2869s at: TestInd
exWriterNRTIsCurrent.testIsCurrentWithThreads


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509428#comment-13509428
 ] 

Yonik Seeley commented on SOLR-2592:


bq. Something is off though

False alarm - there was some sort of non-reproducible build issue.  I've 
committed the latest patch.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-03 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509432#comment-13509432
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1416744

SOLR-2592: delegate query-time shard selection to collection router



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-02 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508307#comment-13508307
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1416216

SOLR-2592: refactor doc routers, use implicit router when implicity creating 
collection, use collection router to find correct shard when indexing



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507945#comment-13507945
 ] 

Yonik Seeley commented on SOLR-2592:


Latest issue I think I'm running into is manual shard assignment... when a core 
comes up and says hey, I'm foo-shard of bar-collection, and that collection 
is dynamically created on the fly since it doesn't exist yet.  I think we have 
to assume custom sharding in that case... assigning hash ranges doesn't make 
sense.  But our current indexing code happily splits up all of the shards at 
the time (without recording any hash range because numShards was never passed) 
and indexes to them based on that (which is a no-no if new shards can be 
manually added at any time).

I guess in situations like this, we should create the collection with an 
implicit document router.  The shard the document belongs to is defined by 
the shard it's sent to (it's custom sharding with the shard coming from the URL 
essentially).

Before I diverge too much from trunk, I think I may try to first clean up what 
I have and get that committed back (the introduction of collection properties 
and associated cleanups).

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-01 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508004#comment-13508004
 ] 

Commit Tag Bot commented on SOLR-2592:
--

[trunk commit] Yonik Seeley
http://svn.apache.org/viewvc?view=revisionrevision=1416025

SOLR-2592: progress - introduce DocCollection, add properties for collections, 
add a router collection property, add some builtin routers, rename 
HashPartitioner to DocRouter



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508020#comment-13508020
 ] 

Yonik Seeley commented on SOLR-2592:


Hmmm, I've committed to trunk, but having issues merging back to 4x.
I got a conflict in AbstractFullDistribZkTestBase.java (and 4x does look like 
it varies a lot from trunk).  Getting test failures in things like 
BasicDistributedZkTest even after resolving conflicts, so this could be a 
deeper issue of too much divergence between 4x and trunk.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-01 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508053#comment-13508053
 ] 

Mark Miller commented on SOLR-2592:
---

I just noticed that with these changes, the cloud viz ui needs to be updated - 
it sees properties as a shard name and displays it along with the actual shards.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-12-01 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508072#comment-13508072
 ] 

Mark Miller commented on SOLR-2592:
---

If I remember right, when I got the directory with replication stuff working 
and more tests using ram dir and what not, I started seeing unrelated test 
failures pop up and has to do some test work to address them. I probably could 
have back ported all of those, but hadn't yet. I was hoping to merge all that 
stuff back by now but have been waiting for it to bake a bit.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Fix For: 4.1

 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, 
 SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503807#comment-13503807
 ] 

Yonik Seeley commented on SOLR-2592:


Hi Michael, while I was reviewing your patch, I realized that we want something 
that clients (like cloud aware SolrJ) can easily get to.
So instead of config in solrconfig.xml that defines a parser for the core, it 
seems like this should be a property of the collection?

The HashPartitioner object is currently on the ClusterState object, but it 
really seems like this needs to be per-collection (or per config-set?).

Currently, there is a /collections/collection1 node with
{code}
{configName:myconf}
{code}

We could add a partitioner or shardPartitioner attribute.

And of course there is /clusterstate.json with
{code}
{collection1:{
shard1:{
  range:8000-,
  replicas:{192.168.1.109:8983_solr_collection1:{
  shard:shard1,
  roles:null,
  state:active,
  core:collection1,
  collection:collection1,
  node_name:192.168.1.109:8983_solr,
  base_url:http://192.168.1.109:8983/solr;,
  leader:true}}},
shard2:{
  range:0-7fff,
  replicas:{
{code}

Now, currently the ClusterState object is created by reading clusterstate.json, 
but I don't believe it reads /collections/collection_name, but perhaps we 
should change this?



 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-26 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503837#comment-13503837
 ] 

Mark Miller commented on SOLR-2592:
---

bq. Now, currently the ClusterState object is created by reading 
clusterstate.json, but I don't believe it reads /collections/collection_name, 
but perhaps we should change this?

Depending on the number of collections, it may add a lot of reads on cluster 
changes, but the Overseer could probably grab this from the collection node and 
add it to the cluster state json file? It could perhaps even use a watcher per 
collection and not read when it does not have to.

Or it could be read separately like live nodes - that gets a bit messy though.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503839#comment-13503839
 ] 

Yonik Seeley commented on SOLR-2592:


bq. the Overseer could probably grab this from the collection node and add it 
to the cluster state json file?

Yeah, that's the other way, and might be initially easier.  To keep back 
compat, we would need to add a magic name to the map that we know isn't a shard.
properties?

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-26 Thread Michael Garski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503978#comment-13503978
 ] 

Michael Garski commented on SOLR-2592:
--

bq. I realized that we want something that clients (like cloud aware SolrJ) can 
easily get to. So instead of config in solrconfig.xml that defines a parser for 
the core, it seems like this should be a property of the collection?

Agreed - that makes perfect sense and was an oversight on my part. Persisting 
that information in /collections/collection_name makes sense as it is a 
collection level option.

One thing about my patch that has been on my mind has been the parsing of the 
value to hash on from a delete by query on the client side. Currently it is 
very simple and uses a regex to extract the value as using any of the query 
parsers would require the cloud aware SolrJ client to also have the Lucene jar, 
which I'm not sure is desired. 

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504033#comment-13504033
 ] 

Yonik Seeley commented on SOLR-2592:


FWIW, I'm trying out github for this patch.

My current changes to add a DocCollection abstraction to ClusterState are on 
the custom_hash branch in my lucene-solr fork:
https://github.com/yonik/lucene-solr/tree/custom_hash
https://github.com/yonik/lucene-solr/commit/5f82a7917862a1f9e70d6d268c44b23af18aca3b

Warning: it doesn't work yet... I just got it to compile, but tests are failing.


 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504305#comment-13504305
 ] 

Yonik Seeley commented on SOLR-2592:


OK, fixed the bugs: 
https://github.com/yonik/lucene-solr/commit/3c0de9db3f737ee24a8f47ab3db9c573a173ce7d
branch is back to working.

 Custom Hashing
 --

 Key: SOLR-2592
 URL: https://issues.apache.org/jira/browse/SOLR-2592
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
Reporter: Noble Paul
 Attachments: dbq_fix.patch, pluggable_sharding.patch, 
 pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, 
 SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, 
 SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch


 If the data in a cloud can be partitioned on some criteria (say range, hash, 
 attribute value etc) It will be easy to narrow down the search to a smaller 
 subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2592) Custom Hashing

2012-11-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502577#comment-13502577
 ] 

Yonik Seeley commented on SOLR-2592:


Just want to recap thoughts on all the different types/levels of custom 
sharding/hashing:

1) custom sharding with complete user control... user is responsible for 
adding/removing shards, and specifying what shard an update is targeted toward 
(SOLR-4059)
  - example: http://.../update?shard=NY_NJ_area
  - Solr still keeps track of leader, still forward updates to the leader which 
forwards to replicas
  - replicas can still be added on the fly
  - Users could also provide a pre-built shard and we could still replicate it 
out 
  - search side of this is already implemented:  ?shards=NY_NJ_area,SF_area
  - OPTIONAL: we could still provide a shard splitting service for custom shards
  - OPTIONAL: somehow specify a syntax for including the shard with the 
document to support bulk loading, multiple docs per request (perhaps magic 
field \_shard\_?)
- if we go with _shard_, perhaps we should change the request param name to 
match (like we do with _version_?). Example: 
http://.../update?_shard_=NY_NJ_area

2) custom sharding based on plugin: a superset of #1
  - a plugin looks at the document being indexed and somehow determines what 
shard the document belongs to (including possibly consulting an external system)
  - an implementation that takes the shard name from a given field
  - IDEA: plugin should have access to request parameters to make decision 
based on that also (i.e. this may be how _shard_ is implemented above?)
  - atomic updates, realtime-get, deletes, etc. would need to specify enough 
info to determine shard (and not change info that determines shard)
- trickier for parameters that already specify a comma separated list of 
ids... how do we specify the additional info to determine shard?
  - OPTIONAL: allow some mechanism for the plugin to indicate that the location 
of a document has changed (i.e. a delete should be issued to the old shard?)
  - is there a standard mechanism to provide enough information to determine 
shard (for example on a delete)?  It would seem this is dependent on the plugin 
specifics and thus all clients must know the details.

3) Time-based sharding.  A user could do time based sharding based on #1 or #2, 
or we could provide more specific support (perhaps this is just a specific 
implementation of #2).
   - automatically route to the correct shard by time field
   - on search side, allow a time range to be specified and all shards covering 
part of the range will be selected
   - OPTIONAL: automatically add a filter to restrict to exactly the given time 
range (as opposed to just the shards)
   - OPTIONAL: allow automatically reducing the replication level of older 
shards (and down to 0 would mean complete removal - they have been aged out)

4) Custom hashing based on plugin
  - The plugin determines the hash code, the hash code determines the shard 
(based on the hash range stored in the shard descriptor today)
  - This is very much like option #2, but the output is hash instead of shard

5) Hash based on field (a specific implementation of #4?)
  - collection defines field to hash on
  - OPTIONAL: could define multiple fields in comma separated list.  the hash 
value would be constructed by catenation of the values. 
  - how to specify the restriction/range on the query side?  

6) Hash based on first part of ID (composite id)
  - Like #5, but the hask key value is contained in the ID.
  - very transparent and unobtrusive - can be enabled by default since there 
should be no additional requirements or restrictions


For both custom hashing options #5 and #6, instead of deriving the complete 
hash value from the hash key, we could
use it for only the top bits of the hash value with the remainder coming from 
the ID.  The advantage of this is 
that it allows for splitting of over-allocated groups.  If the hash code is 
derived only from the hashKey then 
all of a certain customer's records would share the exact same hash value and 
there would be no way to split it
later.

I think eventually, we want *all* of these options.
It still seems natural to go with #6 first since it's the only one that can 
actually be enabled by default w/o any configuration.

Other things to think about:
Where should the hashing/sharding specification/configuration be kept?
 a) as a collection property (like configName)?
 b) as part of the standard config referenced by configName (either part of 
the schema or a separate file more amenable to live updating)
 
Handing grouping of documents by more than one dimension:
Let's say you have multiple customers (and you want to group each customers 
documents together), but you also want to do time based sharding.


 Custom Hashing
 --

 Key: