subject:"Re\: Distributed Indexing"

Re: Distributed Indexing

2011-02-14 Thread Yonik Seeley

On Mon, Feb 14, 2011 at 10:04 AM, Alex Cowell wrote: > There seem to be some nuances which we have yet to encounter/discover like > the way you've implemented the processCommit() method to wait for all the > adds/deletes to complete before sending the commits. Are these things which > you were awa

Re: Distributed Indexing

2011-02-14 Thread Alex Cowell

I've uploaded a patch of what we've done so far: https://issues.apache.org/jira/browse/SOLR-2358 It's still very much work in progress and there are some obvious issues which are being resolved at the moment (such as the inefficient method of waiting for all the docs to be processed before distri

Re: Distributed Indexing

2011-02-09 Thread Yonik Seeley

I haven't had time to follow all of this discussion, but this issue might help: https://issues.apache.org/jira/browse/SOLR-2355 It's an implementation of the basic http://localhost:8983/solr/update/csv?shards=shard1,shard2... -Yonik http://lucidimagination.com On Mon, Feb 7, 2011 at 8:55 AM, Upa

Re: Distributed Indexing

2011-02-07 Thread Upayavira

Surely you want to be implementing an UpdateRequestProcessor, rather than a RequestHandler. The ContentStreamHandlerBase, in the handleRequestBody method gets an UpdateRequestProcessor and uses it to process the request. What we need is that handleRequestBody method to, as you have suggested, chec

Re: Distributed Indexing

2011-02-07 Thread Upayavira

I'm saying that deterministic policies are a requirement that *some* people will want. Others might want a random spread. Thus, I'd have deterministic based on ID and random as the two initial implementations. Upayavira NB. In case folks haven't worked it out already, I have been tasked to mentor

Re: Distributed Indexing

2011-02-06 Thread Alex Cowell

Hey, We're making good progress, but our DistributedUpdateRequestHandler is having a bit of an identity crisis, so we thought we'd ask what other people's opinions are. The current situation is as follows: We've added a method to ContentStreamHandlerBase to check if an update request is distribut

Re: Distributed Indexing

2011-02-06 Thread William Mayor

Hi Good call about the policies being deterministic, should've thought of that earlier. We've changed the patch to include this and I've removed the random assignment one (for obvious reasons). Take a look and let me know what's to do. ( https://issues.apache.org/jira/browse/SOLR-2341) Cheers

Re: Distributed Indexing

2011-02-03 Thread Upayavira

On Thu, 03 Feb 2011 15:12 +, "Alex Cowell" wrote: Hi all, Just a couple of questions that have arisen. 1. For handling non-distributed update requests (shards param is not present or is invalid), our code currently * assumes the user would like the data indexed, so gets the req

Re: Distributed Indexing

2011-02-03 Thread Alex Cowell

Hi all, Just a couple of questions that have arisen. 1. For handling non-distributed update requests (shards param is not present or is invalid), our code currently - assumes the user would like the data indexed, so gets the request handler assigned to "/update" - executes the request u

Re: Distributed Indexing

2011-02-02 Thread Upayavira

On Tue, 01 Feb 2011 19:52 -0800, "Lance Norskog" wrote: > Another use case is that N indexers operate independently, all pulling > data from the same database. Each has a separate query to get the > documents in its policy. But surely in this case, you are externalising the policy, and Solr do

Re: Distributed Indexing

2011-02-01 Thread Lance Norskog

Another use case is that N indexers operate independently, all pulling data from the same database. Each has a separate query to get the documents in its policy. On Tue, Feb 1, 2011 at 12:38 PM, Upayavira wrote: > > On Tue, 01 Feb 2011 19:04 +, "Alex Cowell" wrote: > > I noticed there is a

Re: Distributed Indexing

2011-02-01 Thread Upayavira

On Tue, 01 Feb 2011 19:04 +, "Alex Cowell" wrote: I noticed there is a comment in the org.apache.solr.servlet.DirectSolrConnection class which reads, "//Find a way to turn List into File/SolrDocument". Did anyone find a way to do this? Turns out that comment was left over from som

Re: Distributed Indexing

2011-02-01 Thread Alex Cowell

> > I noticed there is a comment in the > org.apache.solr.servlet.DirectSolrConnection class which reads, "//Find a > way to turn List into File/SolrDocument". Did anyone find a > way to do this? > Turns out that comment was left over from some experimenting one of our team was doing. But I suppos

Re: Distributed Indexing

2011-02-01 Thread Alex Cowell

> > Your code looks fine to me, except it should take in a SolrDocument > object or list of, rather than strings. Then, for your Hash version, you > can take a hash of the "id" field. > As far as I can see I have access to a List that > represents all of the files being POSTed. Do I want to open t

Re: Distributed Indexing

2011-02-01 Thread William Mayor

Hello Thanks for your prompt reply. In regards to using a SolrDocument instead of Strings (and I agree that List doesn't seem to be the best way of going) how do I get reference to a SolrDoc? As far as I can see I have access to a List that represents all of the files being POSTed. Do I want to

Re: Distributed Indexing

2011-02-01 Thread Upayavira

On Tue, 01 Feb 2011 00:26 +, "William Mayor" wrote: > Hi Guys > > I've had a go at creating the ShardDistributionPolicy interface and a > few implementations. I've created a patch > (https://issues.apache.org/jira/browse/SOLR-2341) let me know what > needs doing. > Currently I assume that t

Re: Distributed Indexing

2011-01-31 Thread Soheb Mahmood

(I'm sending this on behalf of William, a guy on our team working on ShardDistributedPolicy): Hi Guys I've had a go at creating the ShardDistributionPolicy interface and a few implementations. I've created a patch (https://issues.apache.org/jira/browse/SOLR-2341) let me know what needs doing. Cu

Re: Distributed Indexing

2011-01-31 Thread William Mayor

Hi Guys I've had a go at creating the ShardDistributionPolicy interface and a few implementations. I've created a patch (https://issues.apache.org/jira/browse/SOLR-2341) let me know what needs doing. Currently I assume that the documents passed to the policy will be represented by some kind of id

Re: Distributed Indexing

2011-01-29 Thread Upayavira

Lance, Firstly, we're proposing a ShardDistributionPolicy interface for which there is a default (mod of the doc ID) but other implementations are possible. Another easy implementation would be a randomised or round robin one. As to threading, the first task would be to put all of the source docu

Re: Distributed Indexing

2011-01-29 Thread Lance Norskog

I would suggest that a DistributedRequestUpdateHandler run single-threaded, doing only one document at a time. If I want more than one, I run it twice or N times with my own program. Also, this should have a policy object which decides exactly how documents are distributed. There are different tec

Re: Distributed Indexing

2011-01-29 Thread Soheb Mahmood

Hello Yonik, On Thu, 2011-01-27 at 08:01 -0500, Yonik Seeley wrote: > Making it easy for clients I think is key... one should be able to > update any node in the solr cluster and have solr take care of the > hard part about updating all relevant shards. This will most likely > involve an update p

Re: Distributed Indexing

2011-01-28 Thread Alex Cowell

Hi Yonik and Upayavira, Thank you both for your insightful responses. We now have a much better understanding of how to implement distributed indexing, although no doubt more issues will emerge along the way. Just to clarify (and for critique), our approach goes something like this: We will use a

Re: Distributed Indexing

2011-01-28 Thread Yonik Seeley

On Fri, Jan 28, 2011 at 7:55 AM, Upayavira wrote: > > On Thu, 27 Jan 2011 16:01 +, "Alex Cowell" wrote: > > Making it easy for clients I think is key... one should be able to > update any node in the solr cluster and have solr take care of the > hard part about updating all relevant shards.

Re: Distributed Indexing

2011-01-28 Thread Upayavira

Another point that will need some thought, as I have heard alluded to, is error handling. Currently, as I understand it, if you post 500 documents to Solr, and one has an error, the whole batch will fail. Leaving aside whether that is the best behaviour, it is a behaviour that will be impossible

Re: Distributed Indexing

2011-01-28 Thread Upayavira

On Thu, 27 Jan 2011 16:01 +, "Alex Cowell" wrote: Making it easy for clients I think is key... one should be able to update any node in the solr cluster and have solr take care of the hard part about updating all relevant shards. This will most likely involve an update process

Re: Distributed Indexing

2011-01-28 Thread Upayavira

Hi Soheb, On Wed, 26 Jan 2011 16:29 +, "Soheb Mahmood" wrote: > We are going to implement distributed indexing for Solr - without the > use of SolrCloud (so it can be easily up-scaled). We have a deadline by > February to get this done, so we need to get cracking ;) :-) > So far, we've h

Re: Distributed Indexing

2011-01-27 Thread Alex Cowell

> > Making it easy for clients I think is key... one should be able to > update any node in the solr cluster and have solr take care of the > hard part about updating all relevant shards. This will most likely > involve an update processor. This approach allows all existing update > methods (incl

Re: Distributed Indexing

2011-01-27 Thread Yonik Seeley

On Wed, Jan 26, 2011 at 11:29 AM, Soheb Mahmood wrote: > We were wondering if there was a simple way of > applying these changes we wrote in Java across all the other languages. Making it easy for clients I think is key... one should be able to update any node in the solr cluster and have solr ta

Re: Distributed Indexing

2011-01-26 Thread Soheb Mahmood

Hey guys! On Thu, 2011-01-27 at 10:04 +1300, Todd Nine wrote: > Just throwing in my 2 cents. If you're on a tight deadline have you > had a look at Solandra? We were already using Cassandra, so it was > incredibly easy to get a scalable Solr installation up and running. > In short: We are doin

Re: Distributed Indexing

2011-01-26 Thread Todd Nine

Just throwing in my 2 cents. If you're on a tight deadline have you had a look at Solandra? We were already using Cassandra, so it was incredibly easy to get a scalable Solr installation up and running. On 27 January 2011 08:17, Alex Cowell wrote: > Hi Soheb, > > Sounds good! A few things I th

Re: Distributed Indexing

2011-01-26 Thread Alex Cowell

Hi Soheb, Sounds good! A few things I thought of: With regard to #1, would the list of shards to index to (if present) be exclusive or would we assume that the shard the update request was sent to should also be included? For example, say, using the example you gave, an update request was sent li

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

Re: Distributed Indexing

31 matches

Site Navigation

Mail list logo

Footer information