Ah, okay - you are setting the shards in solr.xml - thats still an option to force a node to a particular shard - but if you take that out, shards will be auto assigned.
By the way, because of the version code, distrib deletes don't work at the moment - will get to that next week. - Mark On Fri, Dec 2, 2011 at 1:16 PM, Jamie Johnson <jej2...@gmail.com> wrote: > So I'm a fool. I did set the numShards, the issue was so trivial it's > embarrassing. I did indeed have it setup as a replica, the shard > names in solr.xml were both shard1. This worked as I expected now. > > On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller <markrmil...@gmail.com> wrote: > > > > They are unused params, so removing them wouldn't help anything. > > > > You might just want to wait till we are further along before playing > with it. > > > > Or if you submit your full self contained test, I can see what's going > on (eg its still unclear if you have started setting numShards?). > > > > I can do a similar set of actions in my tests and it works fine. The > only reason I could see things working like this is if it thinks you have > one shard - a leader and a replica. > > > > - Mark > > > > On Dec 2, 2011, at 12:41 PM, Jamie Johnson wrote: > > > >> Glad to hear I don't need to set shards/self, but removing them didn't > >> seem to change what I'm seeing. Doing this still results in 2 > >> documents 1 on 8983 and 1 on 7574. > >> > >> String key = "1"; > >> > >> SolrInputDocument solrDoc = new SolrInputDocument(); > >> solrDoc.setField("key", key); > >> > >> solrDoc.addField("content_mvtxt", "initial value"); > >> > >> SolrServer server = servers.get(" > http://localhost:8983/solr/collection1"); > >> > >> UpdateRequest ureq = new UpdateRequest(); > >> ureq.setParam("update.chain", "distrib-update-chain"); > >> ureq.add(solrDoc); > >> ureq.setAction(ACTION.COMMIT, true, true); > >> server.request(ureq); > >> server.commit(); > >> > >> solrDoc = new SolrInputDocument(); > >> solrDoc.addField("key", key); > >> solrDoc.addField("content_mvtxt", "updated value"); > >> > >> server = servers.get(" > http://localhost:7574/solr/collection1"); > >> > >> ureq = new UpdateRequest(); > >> ureq.setParam("update.chain", "distrib-update-chain"); > >> ureq.add(solrDoc); > >> ureq.setAction(ACTION.COMMIT, true, true); > >> server.request(ureq); > >> server.commit(); > >> > >> server = servers.get(" > http://localhost:8983/solr/collection1"); > >> > >> > >> server.commit(); > >> System.out.println("done"); > >> > >> On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller <markrmil...@gmail.com> > wrote: > >>> So I dunno. You are running a zk server and running in zk mode right? > >>> > >>> You don't need to / shouldn't set a shards or self param. The shards > are > >>> figured out from Zookeeper. > >>> > >>> You always want to use the distrib-update-chain. Eventually it will > >>> probably be part of the default chain and auto turn in zk mode. > >>> > >>> If you are running in zk mode attached to a zk server, this should > work no > >>> problem. You can add docs to any server and they will be forwarded to > the > >>> correct shard leader and then versioned and forwarded to replicas. > >>> > >>> You can also use the CloudSolrServer solrj client - that way you don't > even > >>> have to choose a server to send docs too - in which case if it went > down > >>> you would have to choose another manually - CloudSolrServer > automatically > >>> finds one that is up through ZooKeeper. Eventually it will also be > smart > >>> and do the hashing itself so that it can send directly to the shard > leader > >>> that the doc would be forwarded to anyway. > >>> > >>> - Mark > >>> > >>> On Fri, Dec 2, 2011 at 12:09 AM, Jamie Johnson <jej2...@gmail.com> > wrote: > >>> > >>>> Really just trying to do a simple add and update test, the chain > >>>> missing is just proof of my not understanding exactly how this is > >>>> supposed to work. I modified the code to this > >>>> > >>>> String key = "1"; > >>>> > >>>> SolrInputDocument solrDoc = new SolrInputDocument(); > >>>> solrDoc.setField("key", key); > >>>> > >>>> solrDoc.addField("content_mvtxt", "initial value"); > >>>> > >>>> SolrServer server = servers > >>>> .get(" > >>>> http://localhost:8983/solr/collection1"); > >>>> > >>>> UpdateRequest ureq = new UpdateRequest(); > >>>> ureq.setParam("update.chain", "distrib-update-chain"); > >>>> ureq.add(solrDoc); > >>>> ureq.setParam("shards", > >>>> > >>>> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); > >>>> ureq.setParam("self", "foo"); > >>>> ureq.setAction(ACTION.COMMIT, true, true); > >>>> server.request(ureq); > >>>> server.commit(); > >>>> > >>>> solrDoc = new SolrInputDocument(); > >>>> solrDoc.addField("key", key); > >>>> solrDoc.addField("content_mvtxt", "updated value"); > >>>> > >>>> server = servers.get(" > >>>> http://localhost:7574/solr/collection1"); > >>>> > >>>> ureq = new UpdateRequest(); > >>>> ureq.setParam("update.chain", "distrib-update-chain"); > >>>> // > ureq.deleteById("8060a9eb-9546-43ee-95bb-d18ea26a6285"); > >>>> ureq.add(solrDoc); > >>>> ureq.setParam("shards", > >>>> > >>>> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); > >>>> ureq.setParam("self", "foo"); > >>>> ureq.setAction(ACTION.COMMIT, true, true); > >>>> server.request(ureq); > >>>> // server.add(solrDoc); > >>>> server.commit(); > >>>> server = servers.get(" > >>>> http://localhost:8983/solr/collection1"); > >>>> > >>>> > >>>> server.commit(); > >>>> System.out.println("done"); > >>>> > >>>> but I'm still seeing the doc appear on both shards. After the first > >>>> commit I see the doc on 8983 with "initial value". after the second > >>>> commit I see the updated value on 7574 and the old on 8983. After the > >>>> final commit the doc on 8983 gets updated. > >>>> > >>>> Is there something wrong with my test? > >>>> > >>>> On Thu, Dec 1, 2011 at 11:17 PM, Mark Miller <markrmil...@gmail.com> > >>>> wrote: > >>>>> Getting late - didn't really pay attention to your code I guess - why > >>>> are you adding the first doc without specifying the distrib update > chain? > >>>> This is not really supported. It's going to just go to the server you > >>>> specified - even with everything setup right, the update might then > go to > >>>> that same server or the other one depending on how it hashes. You > really > >>>> want to just always use the distrib update chain. I guess I don't yet > >>>> understand what you are trying to test. > >>>>> > >>>>> Sent from my iPad > >>>>> > >>>>> On Dec 1, 2011, at 10:57 PM, Mark Miller <markrmil...@gmail.com> > wrote: > >>>>> > >>>>>> Not sure offhand - but things will be funky if you don't specify the > >>>> correct numShards. > >>>>>> > >>>>>> The instance to shard assignment should be using numShards to > assign. > >>>> But then the hash to shard mapping actually goes on the number of > shards it > >>>> finds registered in ZK (it doesn't have to, but really these should be > >>>> equal). > >>>>>> > >>>>>> So basically you are saying, I want 3 partitions, but you are only > >>>> starting up 2 nodes, and the code is just not happy about that I'd > guess. > >>>> For the system to work properly, you have to fire up at least as many > >>>> servers as numShards. > >>>>>> > >>>>>> What are you trying to do? 2 partitions with no replicas, or one > >>>> partition with one replica? > >>>>>> > >>>>>> In either case, I think you will have better luck if you fire up at > >>>> least as many servers as the numShards setting. Or lower the numShards > >>>> setting. > >>>>>> > >>>>>> This is all a work in progress by the way - what you are trying to > test > >>>> should work if things are setup right though. > >>>>>> > >>>>>> - Mark > >>>>>> > >>>>>> > >>>>>> On Dec 1, 2011, at 10:40 PM, Jamie Johnson wrote: > >>>>>> > >>>>>>> Thanks for the quick response. With that change (have not done > >>>>>>> numShards yet) shard1 got updated. But now when executing the > >>>>>>> following queries I get information back from both, which doesn't > seem > >>>>>>> right > >>>>>>> > >>>>>>> http://localhost:7574/solr/select/?q=*:* > >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated > >>>> value</str></doc> > >>>>>>> > >>>>>>> http://localhost:8983/solr/select?q=*:* > >>>>>>> <doc><str name="key">1</str><str name="content_mvtxt">updated > >>>> value</str></doc> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Dec 1, 2011 at 10:21 PM, Mark Miller < > markrmil...@gmail.com> > >>>> wrote: > >>>>>>>> Hmm...sorry bout that - so my first guess is that right now we are > >>>> not distributing a commit (easy to add, just have not done it). > >>>>>>>> > >>>>>>>> Right now I explicitly commit on each server for tests. > >>>>>>>> > >>>>>>>> Can you try explicitly committing on server1 after updating the > doc > >>>> on server 2? > >>>>>>>> > >>>>>>>> I can start distributing commits tomorrow - been meaning to do it > for > >>>> my own convenience anyhow. > >>>>>>>> > >>>>>>>> Also, you want to pass the sys property numShards=1 on startup. I > >>>> think it defaults to 3. That will give you one leader and one replica. > >>>>>>>> > >>>>>>>> - Mark > >>>>>>>> > >>>>>>>> On Dec 1, 2011, at 9:56 PM, Jamie Johnson wrote: > >>>>>>>> > >>>>>>>>> So I couldn't resist, I attempted to do this tonight, I used the > >>>>>>>>> solrconfig you mentioned (as is, no modifications), I setup a 2 > shard > >>>>>>>>> cluster in collection1, I sent 1 doc to 1 of the shards, updated > it > >>>>>>>>> and sent the update to the other. I don't see the modifications > >>>>>>>>> though I only see the original document. The following is the > test > >>>>>>>>> > >>>>>>>>> public void update() throws Exception { > >>>>>>>>> > >>>>>>>>> String key = "1"; > >>>>>>>>> > >>>>>>>>> SolrInputDocument solrDoc = new SolrInputDocument(); > >>>>>>>>> solrDoc.setField("key", key); > >>>>>>>>> > >>>>>>>>> solrDoc.addField("content", "initial value"); > >>>>>>>>> > >>>>>>>>> SolrServer server = servers > >>>>>>>>> .get(" > >>>> http://localhost:8983/solr/collection1"); > >>>>>>>>> server.add(solrDoc); > >>>>>>>>> > >>>>>>>>> server.commit(); > >>>>>>>>> > >>>>>>>>> solrDoc = new SolrInputDocument(); > >>>>>>>>> solrDoc.addField("key", key); > >>>>>>>>> solrDoc.addField("content", "updated value"); > >>>>>>>>> > >>>>>>>>> server = servers.get(" > >>>> http://localhost:7574/solr/collection1"); > >>>>>>>>> > >>>>>>>>> UpdateRequest ureq = new UpdateRequest(); > >>>>>>>>> ureq.setParam("update.chain", > "distrib-update-chain"); > >>>>>>>>> ureq.add(solrDoc); > >>>>>>>>> ureq.setParam("shards", > >>>>>>>>> > >>>> "localhost:8983/solr/collection1,localhost:7574/solr/collection1"); > >>>>>>>>> ureq.setParam("self", "foo"); > >>>>>>>>> ureq.setAction(ACTION.COMMIT, true, true); > >>>>>>>>> server.request(ureq); > >>>>>>>>> System.out.println("done"); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> key is my unique field in schema.xml > >>>>>>>>> > >>>>>>>>> What am I doing wrong? > >>>>>>>>> > >>>>>>>>> On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson <jej2...@gmail.com > > > >>>> wrote: > >>>>>>>>>> Yes, the ZK method seems much more flexible. Adding a new shard > >>>> would > >>>>>>>>>> be simply updating the range assignments in ZK. Where is this > >>>>>>>>>> currently on the list of things to accomplish? I don't have > time to > >>>>>>>>>> work on this now, but if you (or anyone) could provide > direction I'd > >>>>>>>>>> be willing to work on this when I had spare time. I guess a > JIRA > >>>>>>>>>> detailing where/how to do this could help. Not sure if the > design > >>>> has > >>>>>>>>>> been thought out that far though. > >>>>>>>>>> > >>>>>>>>>> On Thu, Dec 1, 2011 at 8:15 PM, Mark Miller < > markrmil...@gmail.com> > >>>> wrote: > >>>>>>>>>>> Right now lets say you have one shard - everything there > hashes to > >>>> range X. > >>>>>>>>>>> > >>>>>>>>>>> Now you want to split that shard with an Index Splitter. > >>>>>>>>>>> > >>>>>>>>>>> You divide range X in two - giving you two ranges - then you > start > >>>> splitting. This is where the current Splitter needs a little > modification. > >>>> You decide which doc should go into which new index by rehashing each > doc > >>>> id in the index you are splitting - if its hash is greater than X/2, > it > >>>> goes into index1 - if its less, index2. I think there are a couple > current > >>>> Splitter impls, but one of them does something like, give me an id - > now if > >>>> the id's in the index are above that id, goto index1, if below, > index2. We > >>>> need to instead do a quick hash rather than simple id compare. > >>>>>>>>>>> > >>>>>>>>>>> Why do you need to do this on every shard? > >>>>>>>>>>> > >>>>>>>>>>> The other part we need that we dont have is to store hash range > >>>> assignments in zookeeper - we don't do that yet because it's not > needed > >>>> yet. Instead we currently just simply calculate that on the fly (too > often > >>>> at the moment - on every request :) I intend to fix that of course). > >>>>>>>>>>> > >>>>>>>>>>> At the start, zk would say, for range X, goto this shard. After > >>>> the split, it would say, for range less than X/2 goto the old node, > for > >>>> range greater than X/2 goto the new node. > >>>>>>>>>>> > >>>>>>>>>>> - Mark > >>>>>>>>>>> > >>>>>>>>>>> On Dec 1, 2011, at 7:44 PM, Jamie Johnson wrote: > >>>>>>>>>>> > >>>>>>>>>>>> hmmm.....This doesn't sound like the hashing algorithm that's > on > >>>> the > >>>>>>>>>>>> branch, right? The algorithm you're mentioning sounds like > there > >>>> is > >>>>>>>>>>>> some logic which is able to tell that a particular range > should be > >>>>>>>>>>>> distributed between 2 shards instead of 1. So seems like a > trade > >>>> off > >>>>>>>>>>>> between repartitioning the entire index (on every shard) and > >>>> having a > >>>>>>>>>>>> custom hashing algorithm which is able to handle the situation > >>>> where 2 > >>>>>>>>>>>> or more shards map to a particular range. > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:34 PM, Mark Miller < > >>>> markrmil...@gmail.com> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I am not familiar with the index splitter that is in > contrib, > >>>> but I'll > >>>>>>>>>>>>>> take a look at it soon. So the process sounds like it > would be > >>>> to run > >>>>>>>>>>>>>> this on all of the current shards indexes based on the hash > >>>> algorithm. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Not something I've thought deeply about myself yet, but I > think > >>>> the idea would be to split as many as you felt you needed to. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If you wanted to keep the full balance always, this would > mean > >>>> splitting every shard at once, yes. But this depends on how many boxes > >>>> (partitions) you are willing/able to add at a time. > >>>>>>>>>>>>> > >>>>>>>>>>>>> You might just split one index to start - now it's hash range > >>>> would be handled by two shards instead of one (if you have 3 replicas > per > >>>> shard, this would mean adding 3 more boxes). When you needed to expand > >>>> again, you would split another index that was still handling its full > >>>> starting range. As you grow, once you split every original index, > you'd > >>>> start again, splitting one of the now half ranges. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Is there also an index merger in contrib which could be > used to > >>>> merge > >>>>>>>>>>>>>> indexes? I'm assuming this would be the process? > >>>>>>>>>>>>> > >>>>>>>>>>>>> You can merge with IndexWriter.addIndexes (Solr also has an > >>>> admin command that can do this). But I'm not sure where this fits in? > >>>>>>>>>>>>> > >>>>>>>>>>>>> - Mark > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 7:18 PM, Mark Miller < > >>>> markrmil...@gmail.com> wrote: > >>>>>>>>>>>>>>> Not yet - we don't plan on working on this until a lot of > >>>> other stuff is > >>>>>>>>>>>>>>> working solid at this point. But someone else could jump > in! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> There are a couple ways to go about it that I know of: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> A more long term solution may be to start using micro > shards - > >>>> each index > >>>>>>>>>>>>>>> starts as multiple indexes. This makes it pretty fast to > move > >>>> mirco shards > >>>>>>>>>>>>>>> around as you decide to change partitions. It's also less > >>>> flexible as you > >>>>>>>>>>>>>>> are limited by the number of micro shards you start with. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> A more simple and likely first step is to use an index > >>>> splitter . We > >>>>>>>>>>>>>>> already have one in lucene contrib - we would just need to > >>>> modify it so > >>>>>>>>>>>>>>> that it splits based on the hash of the document id. This > is > >>>> super > >>>>>>>>>>>>>>> flexible, but splitting will obviously take a little while > on > >>>> a huge index. > >>>>>>>>>>>>>>> The current index splitter is a multi pass splitter - good > >>>> enough to start > >>>>>>>>>>>>>>> with, but most files under codec control these days, we > may be > >>>> able to make > >>>>>>>>>>>>>>> a single pass splitter soon as well. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Eventually you could imagine using both options - micro > shards > >>>> that could > >>>>>>>>>>>>>>> also be split as needed. Though I still wonder if micro > shards > >>>> will be > >>>>>>>>>>>>>>> worth the extra complications myself... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Right now though, the idea is that you should pick a good > >>>> number of > >>>>>>>>>>>>>>> partitions to start given your expected data ;) Adding more > >>>> replicas is > >>>>>>>>>>>>>>> trivial though. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:35 PM, Jamie Johnson < > >>>> jej2...@gmail.com> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Another question, is there any support for repartitioning > of > >>>> the index > >>>>>>>>>>>>>>>> if a new shard is added? What is the recommended > approach for > >>>>>>>>>>>>>>>> handling this? It seemed that the hashing algorithm (and > >>>> probably > >>>>>>>>>>>>>>>> any) would require the index to be repartitioned should a > new > >>>> shard be > >>>>>>>>>>>>>>>> added. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 6:32 PM, Jamie Johnson < > >>>> jej2...@gmail.com> wrote: > >>>>>>>>>>>>>>>>> Thanks I will try this first thing in the morning. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller < > >>>> markrmil...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson < > >>>> jej2...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I am currently looking at the latest solrcloud branch > and > >>>> was > >>>>>>>>>>>>>>>>>>> wondering if there was any documentation on > configuring the > >>>>>>>>>>>>>>>>>>> DistributedUpdateProcessor? What specifically in > >>>> solrconfig.xml needs > >>>>>>>>>>>>>>>>>>> to be added/modified to make distributed indexing work? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi Jaime - take a look at solrconfig-distrib-update.xml > in > >>>>>>>>>>>>>>>>>> solr/core/src/test-files > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> You need to enable the update log, add an empty > replication > >>>> handler def, > >>>>>>>>>>>>>>>>>> and an update chain with > >>>> solr.DistributedUpdateProcessFactory in it. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> http://www.lucidimagination.com > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> - Mark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> http://www.lucidimagination.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> - Mark Miller > >>>>>>>>>>>>> lucidimagination.com > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> - Mark Miller > >>>>>>>>>>> lucidimagination.com > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> - Mark Miller > >>>>>>>> lucidimagination.com > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> - Mark Miller > >>>>>> lucidimagination.com > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> - Mark > >>> > >>> http://www.lucidimagination.com > >>> > > > > - Mark Miller > > lucidimagination.com > > > > > > > > > > > > > > > > > > > > > > > > > -- - Mark http://www.lucidimagination.com