Re: Configuring the Distributed

2011-12-05 Thread Jamie Johnson
What does the version field need to look like? Something like? field name=_version_ type=string indexed=true stored=true required=true / On Sun, Dec 4, 2011 at 2:00 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller markrmil...@gmail.com wrote:

Re: Configuring the Distributed

2011-12-05 Thread Yonik Seeley
On Mon, Dec 5, 2011 at 9:21 AM, Jamie Johnson jej2...@gmail.com wrote: What does the version field need to look like? It's in the example schema: field name=_version_ type=long indexed=true stored=true/ -Yonik http://www.lucidimagination.com

Re: Configuring the Distributed

2011-12-05 Thread Jamie Johnson
Thanks Yonik, must have just missed it. A question about adding a new shard to the index. I am definitely not a hashing expert, but the goal is to have a uniform distribution of buckets based on what we're hashing. If that happens then our shards would reach capacity at approximately the same

Re: Configuring the Distributed

2011-12-05 Thread Yonik Seeley
On Mon, Dec 5, 2011 at 1:29 PM, Jamie Johnson jej2...@gmail.com wrote: In this situation I don't think splitting one shard would help us we'd need to split every shard to reduce the load on the burdened systems right? Sure... but if you can split one, you can split them all :-) -Yonik

Re: Configuring the Distributed

2011-12-05 Thread Jamie Johnson
Yes completely agree, just wanted to make sure I wasn't missing the obvious :) On Mon, Dec 5, 2011 at 1:39 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Dec 5, 2011 at 1:29 PM, Jamie Johnson jej2...@gmail.com wrote: In this situation I don't think splitting one shard would help us

Re: Configuring the Distributed

2011-12-04 Thread Yonik Seeley
On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller markrmil...@gmail.com wrote: On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at the latest solrcloud branch and was wondering if there was any documentation on configuring the

Re: Configuring the Distributed

2011-12-04 Thread Yonik Seeley
On Fri, Dec 2, 2011 at 10:48 AM, Mark Miller markrmil...@gmail.com wrote: You always want to use the distrib-update-chain. Eventually it will probably be part of the default chain and auto turn in zk mode. I'm working on this now... -Yonik http://www.lucidimagination.com

Re: Configuring the Distributed

2011-12-03 Thread Mark Miller
bq. A few questions if a master goes down does a replica get promoted? Right - if the leader goes down there is a leader election and one of the replicas takes over. bq. If a new shard needs to be added is it just a matter of starting a new solr instance with a higher numShards? Eventually,

Re: Configuring the Distributed

2011-12-03 Thread Jamie Johnson
Again great stuff. Once distributed update/delete works (sounds like it's not far off) I'll have to reevaluate our current stack. You had mentioned storing the shad hash assignments in ZK, is there a JIRA around this? I'll keep my eyes on the JIRA tickets. Right now the distirbuted

Re: Configuring the Distributed

2011-12-03 Thread Mark Miller
On Sat, Dec 3, 2011 at 1:31 PM, Jamie Johnson jej2...@gmail.com wrote: Again great stuff. Once distributed update/delete works (sounds like it's not far off) Yeah, I only realized it was not working with the Version code on Friday as I started adding tests for it - the work to fix it is not

Re: Configuring the Distributed

2011-12-02 Thread Mark Miller
So I dunno. You are running a zk server and running in zk mode right? You don't need to / shouldn't set a shards or self param. The shards are figured out from Zookeeper. You always want to use the distrib-update-chain. Eventually it will probably be part of the default chain and auto turn in zk

Re: Configuring the Distributed

2011-12-02 Thread Jamie Johnson
Glad to hear I don't need to set shards/self, but removing them didn't seem to change what I'm seeing. Doing this still results in 2 documents 1 on 8983 and 1 on 7574. String key = 1; SolrInputDocument solrDoc = new SolrInputDocument(); solrDoc.setField(key,

Re: Configuring the Distributed

2011-12-02 Thread Mark Miller
They are unused params, so removing them wouldn't help anything. You might just want to wait till we are further along before playing with it. Or if you submit your full self contained test, I can see what's going on (eg its still unclear if you have started setting numShards?). I can do a

Re: Configuring the Distributed

2011-12-02 Thread Jamie Johnson
So I'm a fool. I did set the numShards, the issue was so trivial it's embarrassing. I did indeed have it setup as a replica, the shard names in solr.xml were both shard1. This worked as I expected now. On Fri, Dec 2, 2011 at 1:02 PM, Mark Miller markrmil...@gmail.com wrote: They are unused

Re: Configuring the Distributed

2011-12-02 Thread Mark Miller
Ah, okay - you are setting the shards in solr.xml - thats still an option to force a node to a particular shard - but if you take that out, shards will be auto assigned. By the way, because of the version code, distrib deletes don't work at the moment - will get to that next week. - Mark On

Re: Configuring the Distributed

2011-12-02 Thread Jamie Johnson
How does it determine the number of shards to create? How many replicas to create? On Fri, Dec 2, 2011 at 4:30 PM, Mark Miller markrmil...@gmail.com wrote: Ah, okay - you are setting the shards in solr.xml - thats still an option to force a node to a particular shard - but if you take that

Re: Configuring the Distributed

2011-12-02 Thread Jamie Johnson
I think I see it.so if I understand this correctly you specify numShards as a system property, as new nodes come up they check ZK to see if they should be a new shard or a replica based on if numShards is met. A few questions if a master goes down does a replica get promoted? If a new shard

Re: Configuring the Distributed

2011-12-02 Thread Jamie Johnson
So I just tried this out, seems like it does the things I asked about. Really really cool stuff, it's progressed quite a bit in the time since I took a snapshot of the branch. Last question, how do you change numShards? Is there a command you can use to do this now? I understand there will be

Configuring the Distributed

2011-12-01 Thread Jamie Johnson
I am currently looking at the latest solrcloud branch and was wondering if there was any documentation on configuring the DistributedUpdateProcessor? What specifically in solrconfig.xml needs to be added/modified to make distributed indexing work?

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at the latest solrcloud branch and was wondering if there was any documentation on configuring the DistributedUpdateProcessor? What specifically in solrconfig.xml needs to be added/modified to make

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
Thanks I will try this first thing in the morning. On Thu, Dec 1, 2011 at 3:39 PM, Mark Miller markrmil...@gmail.com wrote: On Thu, Dec 1, 2011 at 10:08 AM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at the latest solrcloud branch and was wondering if there was any

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
Another question, is there any support for repartitioning of the index if a new shard is added? What is the recommended approach for handling this? It seemed that the hashing algorithm (and probably any) would require the index to be repartitioned should a new shard be added. On Thu, Dec 1,

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
Not yet - we don't plan on working on this until a lot of other stuff is working solid at this point. But someone else could jump in! There are a couple ways to go about it that I know of: A more long term solution may be to start using micro shards - each index starts as multiple indexes. This

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
I am not familiar with the index splitter that is in contrib, but I'll take a look at it soon. So the process sounds like it would be to run this on all of the current shards indexes based on the hash algorithm. Is there also an index merger in contrib which could be used to merge indexes? I'm

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
On Dec 1, 2011, at 7:20 PM, Jamie Johnson wrote: I am not familiar with the index splitter that is in contrib, but I'll take a look at it soon. So the process sounds like it would be to run this on all of the current shards indexes based on the hash algorithm. Not something I've thought

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
hmmm.This doesn't sound like the hashing algorithm that's on the branch, right? The algorithm you're mentioning sounds like there is some logic which is able to tell that a particular range should be distributed between 2 shards instead of 1. So seems like a trade off between repartitioning

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
Right now lets say you have one shard - everything there hashes to range X. Now you want to split that shard with an Index Splitter. You divide range X in two - giving you two ranges - then you start splitting. This is where the current Splitter needs a little modification. You decide which

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
Yes, the ZK method seems much more flexible. Adding a new shard would be simply updating the range assignments in ZK. Where is this currently on the list of things to accomplish? I don't have time to work on this now, but if you (or anyone) could provide direction I'd be willing to work on this

Re: Configuring the Distributed

2011-12-01 Thread Ted Dunning
Of course, resharding is almost never necessary if you use micro-shards. Micro-shards are shards small enough that you can fit 20 or more on a node. If you have that many on each node, then adding a new node consists of moving some shards to the new machine rather than moving lots of little

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
In this case we are still talking about moving a whole index at a time rather than lots of little documents. You split the index into two, and then ship one of them off. The extra cost you can avoid with micro sharding will be the cost of splitting the index - which could be significant for a

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
Sorry - missed something - you also have the added cost of shipping the new half index to all of the replicas of the original shard with the splitting method. Unless you somehow split on every replica at the same time - then of course you wouldn't be able to avoid the 'busy' replica, and it

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
So I couldn't resist, I attempted to do this tonight, I used the solrconfig you mentioned (as is, no modifications), I setup a 2 shard cluster in collection1, I sent 1 doc to 1 of the shards, updated it and sent the update to the other. I don't see the modifications though I only see the original

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
It's not full of details yet, but there is a JIRA issue here: https://issues.apache.org/jira/browse/SOLR-2595 On Thu, Dec 1, 2011 at 8:51 PM, Jamie Johnson jej2...@gmail.com wrote: Yes, the ZK method seems much more flexible. Adding a new shard would be simply updating the range assignments

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
Hmm...sorry bout that - so my first guess is that right now we are not distributing a commit (easy to add, just have not done it). Right now I explicitly commit on each server for tests. Can you try explicitly committing on server1 after updating the doc on server 2? I can start distributing

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
Thanks for the quick response. With that change (have not done numShards yet) shard1 got updated. But now when executing the following queries I get information back from both, which doesn't seem right http://localhost:7574/solr/select/?q=*:* docstr name=key1/strstr name=content_mvtxtupdated

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
Not sure offhand - but things will be funky if you don't specify the correct numShards. The instance to shard assignment should be using numShards to assign. But then the hash to shard mapping actually goes on the number of shards it finds registered in ZK (it doesn't have to, but really these

Re: Configuring the Distributed

2011-12-01 Thread Mark Miller
Getting late - didn't really pay attention to your code I guess - why are you adding the first doc without specifying the distrib update chain? This is not really supported. It's going to just go to the server you specified - even with everything setup right, the update might then go to that

Re: Configuring the Distributed

2011-12-01 Thread Jamie Johnson
Really just trying to do a simple add and update test, the chain missing is just proof of my not understanding exactly how this is supposed to work. I modified the code to this String key = 1; SolrInputDocument solrDoc = new SolrInputDocument();

Re: Configuring the Distributed

2011-12-01 Thread Ted Dunning
Well, this goes both ways. It is not that unusual to take a node down for maintenance of some kind or even to have a node failure. In that case, it is very nice to have the load from the lost node be spread fairly evenly across the remaining cluster. Regarding the cost of having several

Re: Configuring the Distributed

2011-12-01 Thread Ted Dunning
With micro-shards, you can use random numbers for all placements with minor constraints like avoiding replicas sitting in the same rack. Since the number of shards never changes, things stay very simple. On Thu, Dec 1, 2011 at 6:44 PM, Mark Miller markrmil...@gmail.com wrote: Sorry - missed