Thanks for the quick response. With that change (have not done numShards yet) shard1 got updated. But now when executing the following queries I get information back from both, which doesn't seem right
http://localhost:7574/solr/select/?q=*:* <doc><str name="key">1</str><str name="content_mvtxt">updated value</str></doc> http://localhost:8983/solr/select?q=*:* <doc><str name="key">1</str><str name="content_mvtxt">updated value</str></doc> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller <markrmil...@gmail.com> wrote: > Hmm...sorry bout that - so my first guess is that right now we are not > distributing a commit (easy to add, just have not done it). > > Right now I explicitly commit on each server for tests. > > Can you try explicitly committing on server1 after updating the doc on server > 2? > > I can start distributing commits tomorrow - been meaning to do it for my own > convenience anyhow. > > Also, you want to pass the sys property numShards=1 on startup. I think it > defaults to 3. That will give you one leader and one replica. > > - Mark > > On Dec 1, 2011, at 9:56 PM, Jamie Johnson wrote: > >> So I couldn't resist, I attempted to do this tonight, I used the >> solrconfig you mentioned (as is, no modifications), I setup a 2 shard >> cluster in collection1, I sent 1 doc to 1 of the shards, updated it >> and sent the update to the other. I don't see the modifications >> though I only see the original document. The following is the test >> >> public void update() throws Exception { >> >> String key = "1"; >> >> SolrInputDocument solrDoc = new SolrInputDocument(); >> solrDoc.setField("key", key); >> >> solrDoc.addField("content", "initial value"); >> >> SolrServer server = servers >> .get("http://localhost:8983/solr/collection1"); >> server.add(solrDoc); >> >> server.commit(); >> >> solrDoc = new SolrInputDocument(); >> solrDoc.addField("key", key); >> solrDoc.addField("content", "updated value"); >> >> server = servers.get("http://localhost:7574/solr/collection1"); >> >> UpdateRequest ureq = new UpdateRequest(); >> ureq.setParam("update.chain", "distrib-update-chain"); >> ureq.add(solrDoc); >> ureq.setParam("shards", >> >> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); >> ureq.setParam("self", "foo"); >> ureq.setAction(ACTION.COMMIT, true, true); >> server.request(ureq); >> System.out.println("done"); >> } >> >> key is my unique field in schema.xml >> >> What am I doing wrong? >> >> On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>> Yes, the ZK method seems much more flexible. Adding a new shard would >>> be simply updating the range assignments in ZK. Where is this >>> currently on the list of things to accomplish? I don't have time to >>> work on this now, but if you (or anyone) could provide direction I'd >>> be willing to work on this when I had spare time. I guess a JIRA >>> detailing where/how to do this could help. Not sure if the design has >>> been thought out that far though. >>> >>> On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller <markrmil...@gmail.com> wrote: >>>> Right now lets say you have one shard - everything there hashes to range X. >>>> >>>> Now you want to split that shard with an Index Splitter. >>>> >>>> You divide range X in two - giving you two ranges - then you start >>>> splitting. This is where the current Splitter needs a little modification. >>>> You decide which doc should go into which new index by rehashing each doc >>>> id in the index you are splitting - if its hash is greater than X/2, it >>>> goes into index1 - if its less, index2. I think there are a couple current >>>> Splitter impls, but one of them does something like, give me an id - now >>>> if the id's in the index are above that id, goto index1, if below, index2. >>>> We need to instead do a quick hash rather than simple id compare. >>>> >>>> Why do you need to do this on every shard? >>>> >>>> The other part we need that we dont have is to store hash range >>>> assignments in zookeeper - we don't do that yet because it's not needed >>>> yet. Instead we currently just simply calculate that on the fly (too often >>>> at the moment - on every request :) I intend to fix that of course). >>>> >>>> At the start, zk would say, for range X, goto this shard. After the split, >>>> it would say, for range less than X/2 goto the old node, for range greater >>>> than X/2 goto the new node. >>>> >>>> - Mark >>>> >>>> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote: >>>> >>>>> hmmm.....This doesn't sound like the hashing algorithm that's on the >>>>> branch, right? The algorithm you're mentioning sounds like there is >>>>> some logic which is able to tell that a particular range should be >>>>> distributed between 2 shards instead of 1. So seems like a trade off >>>>> between repartitioning the entire index (on every shard) and having a >>>>> custom hashing algorithm which is able to handle the situation where 2 >>>>> or more shards map to a particular range. >>>>> >>>>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller <markrmil...@gmail.com> wrote: >>>>>> >>>>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote: >>>>>> >>>>>>> I am not familiar with the index splitter that is in contrib, but I'll >>>>>>> take a look at it soon. So the process sounds like it would be to run >>>>>>> this on all of the current shards indexes based on the hash algorithm. >>>>>> >>>>>> Not something I've thought deeply about myself yet, but I think the idea >>>>>> would be to split as many as you felt you needed to. >>>>>> >>>>>> If you wanted to keep the full balance always, this would mean splitting >>>>>> every shard at once, yes. But this depends on how many boxes >>>>>> (partitions) you are willing/able to add at a time. >>>>>> >>>>>> You might just split one index to start - now it's hash range would be >>>>>> handled by two shards instead of one (if you have 3 replicas per shard, >>>>>> this would mean adding 3 more boxes). When you needed to expand again, >>>>>> you would split another index that was still handling its full starting >>>>>> range. As you grow, once you split every original index, you'd start >>>>>> again, splitting one of the now half ranges. >>>>>> >>>>>>> Is there also an index merger in contrib which could be used to merge >>>>>>> indexes? I'm assuming this would be the process? >>>>>> >>>>>> You can merge with IndexWriter.addIndexes (Solr also has an admin >>>>>> command that can do this). But I'm not sure where this fits in? >>>>>> >>>>>> - Mark >>>>>> >>>>>>> >>>>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller <markrmil...@gmail.com> >>>>>>> wrote: >>>>>>>> Not yet - we don't plan on working on this until a lot of other stuff >>>>>>>> is >>>>>>>> working solid at this point. But someone else could jump in! >>>>>>>> >>>>>>>> There are a couple ways to go about it that I know of: >>>>>>>> >>>>>>>> A more long term solution may be to start using micro shards - each >>>>>>>> index >>>>>>>> starts as multiple indexes. This makes it pretty fast to move mirco >>>>>>>> shards >>>>>>>> around as you decide to change partitions. It's also less flexible as >>>>>>>> you >>>>>>>> are limited by the number of micro shards you start with. >>>>>>>> >>>>>>>> A more simple and likely first step is to use an index splitter . We >>>>>>>> already have one in lucene contrib - we would just need to modify it so >>>>>>>> that it splits based on the hash of the document id. This is super >>>>>>>> flexible, but splitting will obviously take a little while on a huge >>>>>>>> index. >>>>>>>> The current index splitter is a multi pass splitter - good enough to >>>>>>>> start >>>>>>>> with, but most files under codec control these days, we may be able to >>>>>>>> make >>>>>>>> a single pass splitter soon as well. >>>>>>>> >>>>>>>> Eventually you could imagine using both options - micro shards that >>>>>>>> could >>>>>>>> also be split as needed. Though I still wonder if micro shards will be >>>>>>>> worth the extra complications myself... >>>>>>>> >>>>>>>> Right now though, the idea is that you should pick a good number of >>>>>>>> partitions to start given your expected data ;) Adding more replicas is >>>>>>>> trivial though. >>>>>>>> >>>>>>>> - Mark >>>>>>>> >>>>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson <jej2...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Another question, is there any support for repartitioning of the index >>>>>>>>> if a new shard is added? What is the recommended approach for >>>>>>>>> handling this? It seemed that the hashing algorithm (and probably >>>>>>>>> any) would require the index to be repartitioned should a new shard be >>>>>>>>> added. >>>>>>>>> >>>>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson <jej2...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Thanks I will try this first thing in the morning. >>>>>>>>>> >>>>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller <markrmil...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson <jej2...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am currently looking at the latest solrcloud branch and was >>>>>>>>>>>> wondering if there was any documentation on configuring the >>>>>>>>>>>> DistributedUpdateProcessor? What specifically in solrconfig.xml >>>>>>>>>>>> needs >>>>>>>>>>>> to be added/modified to make distributed indexing work? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Jaime - take a look at solrconfig-distrib-update.xml in >>>>>>>>>>> solr/core/src/test-files >>>>>>>>>>> >>>>>>>>>>> You need to enable the update log, add an empty replication handler >>>>>>>>>>> def, >>>>>>>>>>> and an update chain with solr.DistributedUpdateProcessFactory in it. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> - Mark >>>>>>>>>>> >>>>>>>>>>> http://www.lucidimagination.com >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> - Mark >>>>>>>> >>>>>>>> http://www.lucidimagination.com >>>>>>>> >>>>>> >>>>>> - Mark Miller >>>>>> lucidimagination.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> - Mark Miller >>>> lucidimagination.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >