[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701256#comment-13701256 ] Leonid Krogliak commented on SOLR-2592: --- I don't see this changesin the code. I use solr 4.3.1, but I looked for this changes in solr 4.1 too. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1, 5.0 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555459#comment-13555459 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1434401 - Make complex SOLR-2592 changes entry not get converted to a single wrapped line in Changes.html. - 'Via' - 'via' (merged lucene_solr_4_1 r1434389) Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1, 5.0 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555463#comment-13555463 ] Commit Tag Bot commented on SOLR-2592: -- [branch_4x commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1434402 - Make complex SOLR-2592 changes entry not get converted to a single wrapped line in Changes.html. - 'Via' - 'via' (merged lucene_solr_4_1 r1434389) Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1, 5.0 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553058#comment-13553058 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1433082 SOLR-2592: changes entry for doc routing Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553066#comment-13553066 ] Commit Tag Bot commented on SOLR-2592: -- [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1433084 SOLR-2592: changes entry for doc routing Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1, 5.0 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13541048#comment-13541048 ] Shreejay commented on SOLR-2592: {quote}Right - I've made good progress on that front in trunk, and we'll figure out how to get it ported back to 4x. I'm in the process of adding more tests right now. The separator: I went with ! by default since it doesn't require URL encoding, but is less common than underscore. It also reminded me of the bang paths of old-style UUCP email addresses (like bigco!user).{quote} Can the separator be provided as an option instead of being hard coded? I have been using Michaels patch and now I am trying to move to 4.1. But I will have to re-index all my data by changing the unique key, which right now contains a underscore. If not, will changing the separator to a underscore in CompositeIdRouter.java be enough? --Shreejay Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534003#comment-13534003 ] Markus Jelsma commented on SOLR-2592: - Additional note for anyone: you will also see a similar exception with an older SolrJ application: {code} java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:291) at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:263) at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:274) at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:142) at org.apache.nutch.indexer.solr.SolrUtils.getCloudServer(SolrUtils.java:107) at org.apache.nutch.indexer.solr.SolrUtils.getSolrServers(SolrUtils.java:65) at org.apache.nutch.indexer.solr.SolrWriter.open(SolrWriter.java:65) at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:42) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.init(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {code} Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532219#comment-13532219 ] Markus Jelsma commented on SOLR-2592: - I think this belong to this issue. We're seeing with today's trunk and a clean Zookeeper the following exception. For some reason, only one node exhibits this behavior. {code} 2012-12-14 10:10:27,355 ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling watcher java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.common.cloud.ClusterState.makeSlices(ClusterState.java:246) at org.apache.solr.common.cloud.ClusterState.collectionFromObjects(ClusterState.java:231) at org.apache.solr.common.cloud.ClusterState.load(ClusterState.java:219) at org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:207) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) {code} Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532241#comment-13532241 ] Markus Jelsma commented on SOLR-2592: - Ah! For some reason this specific node did not get the upgrade from December 13th to the 14th while all others did. This proves the change is indeed backward incompatible ;) The problem is solved so please ignore my previous comment. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531450#comment-13531450 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1421499 SOLR-2592: add additional level in clusterstate.json for collection properties Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531455#comment-13531455 ] Commit Tag Bot commented on SOLR-2592: -- [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1421506 SOLR-2592: add additional level in clusterstate.json for collection properties Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531936#comment-13531936 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1421644 SOLR-2592: test expected number of requests to servers Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532051#comment-13532051 ] Commit Tag Bot commented on SOLR-2592: -- [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1421670 SOLR-2592: test expected number of requests to servers Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592_collectionProperties.patch, SOLR-2592_collectionProperties.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530162#comment-13530162 ] Shawn Heisey commented on SOLR-2592: I use the hot shard concept in Solr 3.5.0. For the cold shards, I split documents using a MOD on the CRC32 hash of a MySQL bigint autoincrement field - my MySQL query does the CRC32 and the MOD. That field's actual value is translated to a tlong field in the schema. For the hot shard, I simply use a split point on the actual value of that field. Everything less than or equal to the split point goes to the cold shards, everything greater than the split point goes to the hot shard. Multiple shards are handled by a single Solr instance - seven shards live on two servers. This arrangement requires that I do a daily distribute process where I index (from MySQL) data between the old split point and the new split point to the cold shards, then delete that data from the hot shard. Full reindexes are done with the dataimport handler and controlled by SolrJ, everything else (including the distribute) is done directly with SolrJ. How much of that could be automated and put server-side with the features added by this issue? If I have to track shard and core names myself in order to do the distribute, then I will have to decide whether the other automation I would gain is worth switching to SolrCloud. If I could avoid the client-side distribute indexing and have Solr shuffle the data around itself, that would be awesome, but I'm not sure that's possible, and it may be somewhat complicated by the fact that I have a number of unstored fields that I search on. At some point I will test performance on an index where I do not have a hot shard, where the data is simply hashed between several large shards. This entire concept was implemented for fast indexing of new data - because Solr 1.4 did not have NRT features. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530378#comment-13530378 ] Commit Tag Bot commented on SOLR-2592: -- [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1420992 SOLR-2592: SOLR-1028: SOLR-3922: SOLR-3911: sync trunk with 4x Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529124#comment-13529124 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1420284 SOLR-2592: cleanups - remove unneeded range info, rename getLeaderProps to getLeaderRetry Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529146#comment-13529146 ] Shreejay commented on SOLR-2592: Hi Yonik, What configuration (solrconfig / solr.xml /schema.xml / ZK ?) do I have to change in order to use these changes in trunk? I have a string field on which I want to do group by and hence I want all documents with same value of the string field to be on same side. I was using a 4x branch with Michael's patch, but ran into heap space issues mentioned in https://issues.apache.org/jira/browse/SOLR-4144 . I was not able to merge trunk to 4x and just saw your comment that you also had issues while doing this. So my next step might be to use the trunk directly with the patch supplied for 4144. How do I specify that I want to shard based on a specific string field? Thanks. --Shreejay Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529150#comment-13529150 ] Mark Miller commented on SOLR-2592: --- Great to see someone that can help with additional testing :) This will get merged back to 4x very soon for 4.1. It's a pain because of some work I did around full support for custom Directories. Merging both back at the same time will probably be easiest. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529240#comment-13529240 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1420338 SOLR-2592: avoid splitting composite router hash domains Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529248#comment-13529248 ] Yonik Seeley commented on SOLR-2592: So with the composite key hashing strategy of using the first part of the ID as the upper bits (what I've started calling a hash domain) and the second part the lower bits, one needed to use an initial numShards equal to a power of two *if* one wanted to absolutely avoid a hash domain from being split over more than one shard. For most people, this wouldn't be an issue - you only need to guarantee splitting domains if you're using a feature that doesn't work across shards (like pseudo-join for example). I just checked in a change to remove that limitation. You can now use any numShards and the resulting shard ranges will be rounded to the nearest hash domain to avoid splitting them (this amounts to a maximum size rounding error of 1/65536th of the hash ring, or 0.002%). Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527523#comment-13527523 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1419019 SOLR-2592: fix implicit router to return found slice, test querying via shards Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527263#comment-13527263 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418755 SOLR-2592: integration tests for routing Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527275#comment-13527275 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418762 SOLR-2592: realtime-get support Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527342#comment-13527342 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418814 SOLR-2592: deleteByQuery routing support Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526502#comment-13526502 ] Shreejay commented on SOLR-2592: Hi Michael, I ran the tests for solr after compiling it with the patch. A few of the tests failed. I am trying to fix these. If anyone else has used this patch successfully without errors, please let me know. [junit4:junit4] Completed in 0.10s, 1 test, 1 skipped [junit4:junit4] [junit4:junit4] [junit4:junit4] Tests with failures: [junit4:junit4] - org.apache.solr.handler.admin.ShowFileRequestHandlerTest.tes tGetRawFile [junit4:junit4] - org.apache.solr.handler.admin.ShowFileRequestHandlerTest.tes tDirList [junit4:junit4] - org.apache.solr.cloud.ZkCLITest.testBootstrap [junit4:junit4] - org.apache.solr.cloud.BasicDistributedZkTest.testDistribSear ch [junit4:junit4] - org.apache.solr.cloud.BasicDistributedZkTest (suite) [junit4:junit4] [junit4:junit4] [junit4:junit4] JVM J0: 1.87 .. 2645.62 = 2643.75s [junit4:junit4] Execution time total: 44 minutes 10 seconds [junit4:junit4] Tests summary: 238 suites, 983 tests, 3 suite-level errors, 3 er rors, 1 failure, 426 ignored (4 assumptions) BUILD FAILED Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526522#comment-13526522 ] Yonik Seeley commented on SOLR-2592: bq. That being said the implementation is going in a different direction that will move the custom hashing configuration to be persisted within Zookeeper to allow a client to be aware the custom hashing on the collection without having to parse the solrconfig. I have not yet had the chance to review the work Yonik has committed so far. Right - I've made good progress on that front in trunk, and we'll figure out how to get it ported back to 4x. I'm in the process of adding more tests right now. The separator: I went with ! by default since it doesn't require URL encoding, but is less common than underscore. It also reminded me of the bang paths of old-style UUCP email addresses (like bigco!user). A composite id router is enabled by default since the bangs are optional. For querying, I used the param that Michael started with, shard.keys. So you can do shard.keys=bigco!,littleco! at query time. (the bangs matter! leaving it out will simply query the shard containing a simple document id, while putting it in will query a range of documents all covered by that domain). I've added distributed short-circuiting as well... if you only need to query a single shard, and you send the query directly to a replica of that shard that is active, the distributed search phase will be skipped and we'll drop directly to a local search. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527051#comment-13527051 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418507 SOLR-2592: more DocCollection refactoring in tests Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527053#comment-13527053 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418337 SOLR-2592: fix composite id router for full-range Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527054#comment-13527054 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418116 SOLR-2592: fix composite id router slice selection Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527055#comment-13527055 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418100 SOLR-2592: fix shifting by 32 bits in compositeId router Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527056#comment-13527056 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418093 SOLR-2592: add hash router tests, fix compositeId bit specification parsing Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527058#comment-13527058 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418043 SOLR-2592: refactor routers to separate public classes Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13527059#comment-13527059 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1418030 SOLR-2592: avoid distrib search if current core is active and hosting targeted shard Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511602#comment-13511602 ] Shreejay commented on SOLR-2592: Hi All, I am trying to use the patches mentioned here. Specifically the one by Michael which hashes on a combined field. (https://issues.apache.org/jira/browse/SOLR-2592?focusedCommentId=13280334page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13280334) My requirement is exactly as mentioned by Andy https://issues.apache.org/jira/browse/SOLR-2592?focusedCommentId=13294837page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13294837 I have downloaded the latest Nightly build but am not sure how to configure it. Are the patches on the nightly build? (https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/) How and where in solrconfig do I configure shardKeyParserFactory class=solr.ShardKeyParserFactory/ ? Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13514651#comment-13514651 ] Michael Garski commented on SOLR-2592: -- Hi Shreejay - I have not updated my patch in a few months so it may not apply cleanly to the latest in branch_4x. The last patch I updated is SOLR-2592_r1384367.patch. If it does apply cleanly you can configure the shard key parser as a child of the 'config' element. Here is an example that will work with a composite unique id delimited with underscores such as 123_abc, and the document will be hashed based on 123. The clause section of the config will be used in a delete by query, and in the example below the value of clause for the field foo_field will be used to hash on to determine which shard the delete will be forwarded to. If the clause is required and a delete by query is submitted by a client the request will return an error. {code:xml} shardKeyParserFactory class=org.apache.solr.common.cloud.CompositeIdShardKeyParser str name=clausefoo_field/str bool name=clauseRequiredfalse/bool /shardKeyParserFactory {code} That being said the implementation is going in a different direction that will move the custom hashing configuration to be persisted within Zookeeper to allow a client to be aware the custom hashing on the collection without having to parse the solrconfig. I have not yet had the chance to review the work Yonik has committed so far. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526035#comment-13526035 ] Michael Garski commented on SOLR-2592: -- Shreejay, in your example the value that would be hashed to determine shard membership would be the guidid, and those that hashed to the same value would be on the same shard. The foo_field is a different field from the unique id and its value would be the same as that is hashed. For example, in a system that will keep all documents that belong to a specific user id on the same shard, the unique, composite id would be userId_otherIdData and the clause field would contain the user id to allow for a delete by query using the user id as a clause in the query to forward the delete to only the shards that contain that data. Did all of the tests pass on your build? There are tests in my patch that will verify it works properly. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526060#comment-13526060 ] Shreejay commented on SOLR-2592: Thanks Michael. So my configuration seems to be good. I just downloaded the source and applied the patch again. I am running the tests again now. Last time I ran them it got stuck at these messages for a long time. So I cancelled it. I will let it run this time till the finish. only … [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:32:15, stalled for 2508s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:33:16, stalled for 2569s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:34:16, stalled for 2629s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:35:16, stalled for 2689s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:36:16, stalled for 2749s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:37:16, stalled for 2809s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads [junit4:junit4] HEARTBEAT J0: 2012-12-06T13:38:16, stalled for 2869s at: TestInd exWriterNRTIsCurrent.testIsCurrentWithThreads Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509428#comment-13509428 ] Yonik Seeley commented on SOLR-2592: bq. Something is off though False alarm - there was some sort of non-reproducible build issue. I've committed the latest patch. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509432#comment-13509432 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1416744 SOLR-2592: delegate query-time shard selection to collection router Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_query_try1.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508307#comment-13508307 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1416216 SOLR-2592: refactor doc routers, use implicit router when implicity creating collection, use collection router to find correct shard when indexing Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507945#comment-13507945 ] Yonik Seeley commented on SOLR-2592: Latest issue I think I'm running into is manual shard assignment... when a core comes up and says hey, I'm foo-shard of bar-collection, and that collection is dynamically created on the fly since it doesn't exist yet. I think we have to assume custom sharding in that case... assigning hash ranges doesn't make sense. But our current indexing code happily splits up all of the shards at the time (without recording any hash range because numShards was never passed) and indexes to them based on that (which is a no-no if new shards can be manually added at any time). I guess in situations like this, we should create the collection with an implicit document router. The shard the document belongs to is defined by the shard it's sent to (it's custom sharding with the shard coming from the URL essentially). Before I diverge too much from trunk, I think I may try to first clean up what I have and get that committed back (the introduction of collection properties and associated cleanups). Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508004#comment-13508004 ] Commit Tag Bot commented on SOLR-2592: -- [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revisionrevision=1416025 SOLR-2592: progress - introduce DocCollection, add properties for collections, add a router collection property, add some builtin routers, rename HashPartitioner to DocRouter Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508020#comment-13508020 ] Yonik Seeley commented on SOLR-2592: Hmmm, I've committed to trunk, but having issues merging back to 4x. I got a conflict in AbstractFullDistribZkTestBase.java (and 4x does look like it varies a lot from trunk). Getting test failures in things like BasicDistributedZkTest even after resolving conflicts, so this could be a deeper issue of too much divergence between 4x and trunk. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508053#comment-13508053 ] Mark Miller commented on SOLR-2592: --- I just noticed that with these changes, the cloud viz ui needs to be updated - it sees properties as a shard name and displays it along with the actual shards. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508072#comment-13508072 ] Mark Miller commented on SOLR-2592: --- If I remember right, when I got the directory with replication stuff working and more tests using ram dir and what not, I started seeing unrelated test failures pop up and has to do some test work to address them. I probably could have back ported all of those, but hadn't yet. I was hoping to merge all that stuff back by now but have been waiting for it to bake a bit. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Fix For: 4.1 Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_progress.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503807#comment-13503807 ] Yonik Seeley commented on SOLR-2592: Hi Michael, while I was reviewing your patch, I realized that we want something that clients (like cloud aware SolrJ) can easily get to. So instead of config in solrconfig.xml that defines a parser for the core, it seems like this should be a property of the collection? The HashPartitioner object is currently on the ClusterState object, but it really seems like this needs to be per-collection (or per config-set?). Currently, there is a /collections/collection1 node with {code} {configName:myconf} {code} We could add a partitioner or shardPartitioner attribute. And of course there is /clusterstate.json with {code} {collection1:{ shard1:{ range:8000-, replicas:{192.168.1.109:8983_solr_collection1:{ shard:shard1, roles:null, state:active, core:collection1, collection:collection1, node_name:192.168.1.109:8983_solr, base_url:http://192.168.1.109:8983/solr;, leader:true}}}, shard2:{ range:0-7fff, replicas:{ {code} Now, currently the ClusterState object is created by reading clusterstate.json, but I don't believe it reads /collections/collection_name, but perhaps we should change this? Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503837#comment-13503837 ] Mark Miller commented on SOLR-2592: --- bq. Now, currently the ClusterState object is created by reading clusterstate.json, but I don't believe it reads /collections/collection_name, but perhaps we should change this? Depending on the number of collections, it may add a lot of reads on cluster changes, but the Overseer could probably grab this from the collection node and add it to the cluster state json file? It could perhaps even use a watcher per collection and not read when it does not have to. Or it could be read separately like live nodes - that gets a bit messy though. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503839#comment-13503839 ] Yonik Seeley commented on SOLR-2592: bq. the Overseer could probably grab this from the collection node and add it to the cluster state json file? Yeah, that's the other way, and might be initially easier. To keep back compat, we would need to add a magic name to the map that we know isn't a shard. properties? Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13503978#comment-13503978 ] Michael Garski commented on SOLR-2592: -- bq. I realized that we want something that clients (like cloud aware SolrJ) can easily get to. So instead of config in solrconfig.xml that defines a parser for the core, it seems like this should be a property of the collection? Agreed - that makes perfect sense and was an oversight on my part. Persisting that information in /collections/collection_name makes sense as it is a collection level option. One thing about my patch that has been on my mind has been the parsing of the value to hash on from a delete by query on the client side. Currently it is very simple and uses a regex to extract the value as using any of the query parsers would require the cloud aware SolrJ client to also have the Lucene jar, which I'm not sure is desired. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504033#comment-13504033 ] Yonik Seeley commented on SOLR-2592: FWIW, I'm trying out github for this patch. My current changes to add a DocCollection abstraction to ClusterState are on the custom_hash branch in my lucene-solr fork: https://github.com/yonik/lucene-solr/tree/custom_hash https://github.com/yonik/lucene-solr/commit/5f82a7917862a1f9e70d6d268c44b23af18aca3b Warning: it doesn't work yet... I just got it to compile, but tests are failing. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504305#comment-13504305 ] Yonik Seeley commented on SOLR-2592: OK, fixed the bugs: https://github.com/yonik/lucene-solr/commit/3c0de9db3f737ee24a8f47ab3db9c573a173ce7d branch is back to working. Custom Hashing -- Key: SOLR-2592 URL: https://issues.apache.org/jira/browse/SOLR-2592 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.0-ALPHA Reporter: Noble Paul Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2592) Custom Hashing
[ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13502577#comment-13502577 ] Yonik Seeley commented on SOLR-2592: Just want to recap thoughts on all the different types/levels of custom sharding/hashing: 1) custom sharding with complete user control... user is responsible for adding/removing shards, and specifying what shard an update is targeted toward (SOLR-4059) - example: http://.../update?shard=NY_NJ_area - Solr still keeps track of leader, still forward updates to the leader which forwards to replicas - replicas can still be added on the fly - Users could also provide a pre-built shard and we could still replicate it out - search side of this is already implemented: ?shards=NY_NJ_area,SF_area - OPTIONAL: we could still provide a shard splitting service for custom shards - OPTIONAL: somehow specify a syntax for including the shard with the document to support bulk loading, multiple docs per request (perhaps magic field \_shard\_?) - if we go with _shard_, perhaps we should change the request param name to match (like we do with _version_?). Example: http://.../update?_shard_=NY_NJ_area 2) custom sharding based on plugin: a superset of #1 - a plugin looks at the document being indexed and somehow determines what shard the document belongs to (including possibly consulting an external system) - an implementation that takes the shard name from a given field - IDEA: plugin should have access to request parameters to make decision based on that also (i.e. this may be how _shard_ is implemented above?) - atomic updates, realtime-get, deletes, etc. would need to specify enough info to determine shard (and not change info that determines shard) - trickier for parameters that already specify a comma separated list of ids... how do we specify the additional info to determine shard? - OPTIONAL: allow some mechanism for the plugin to indicate that the location of a document has changed (i.e. a delete should be issued to the old shard?) - is there a standard mechanism to provide enough information to determine shard (for example on a delete)? It would seem this is dependent on the plugin specifics and thus all clients must know the details. 3) Time-based sharding. A user could do time based sharding based on #1 or #2, or we could provide more specific support (perhaps this is just a specific implementation of #2). - automatically route to the correct shard by time field - on search side, allow a time range to be specified and all shards covering part of the range will be selected - OPTIONAL: automatically add a filter to restrict to exactly the given time range (as opposed to just the shards) - OPTIONAL: allow automatically reducing the replication level of older shards (and down to 0 would mean complete removal - they have been aged out) 4) Custom hashing based on plugin - The plugin determines the hash code, the hash code determines the shard (based on the hash range stored in the shard descriptor today) - This is very much like option #2, but the output is hash instead of shard 5) Hash based on field (a specific implementation of #4?) - collection defines field to hash on - OPTIONAL: could define multiple fields in comma separated list. the hash value would be constructed by catenation of the values. - how to specify the restriction/range on the query side? 6) Hash based on first part of ID (composite id) - Like #5, but the hask key value is contained in the ID. - very transparent and unobtrusive - can be enabled by default since there should be no additional requirements or restrictions For both custom hashing options #5 and #6, instead of deriving the complete hash value from the hash key, we could use it for only the top bits of the hash value with the remainder coming from the ID. The advantage of this is that it allows for splitting of over-allocated groups. If the hash code is derived only from the hashKey then all of a certain customer's records would share the exact same hash value and there would be no way to split it later. I think eventually, we want *all* of these options. It still seems natural to go with #6 first since it's the only one that can actually be enabled by default w/o any configuration. Other things to think about: Where should the hashing/sharding specification/configuration be kept? a) as a collection property (like configName)? b) as part of the standard config referenced by configName (either part of the schema or a separate file more amenable to live updating) Handing grouping of documents by more than one dimension: Let's say you have multiple customers (and you want to group each customers documents together), but you also want to do time based sharding. Custom Hashing -- Key: