I've uploaded a patch of what we've done so far:
https://issues.apache.org/jira/browse/SOLR-2358
It's still very much work in progress and there are some obvious issues
which are being resolved at the moment (such as the inefficient method of
waiting for all the docs to be processed before
On Mon, Feb 14, 2011 at 10:04 AM, Alex Cowell alxc...@gmail.com wrote:
There seem to be some nuances which we have yet to encounter/discover like
the way you've implemented the processCommit() method to wait for all the
adds/deletes to complete before sending the commits. Are these things which
I haven't had time to follow all of this discussion, but this issue might help:
https://issues.apache.org/jira/browse/SOLR-2355
It's an implementation of the basic
http://localhost:8983/solr/update/csv?shards=shard1,shard2...
-Yonik
http://lucidimagination.com
On Mon, Feb 7, 2011 at 8:55 AM,
I'm saying that deterministic policies are a requirement that
*some* people will want. Others might want a random spread. Thus,
I'd have deterministic based on ID and random as the two initial
implementations.
Upayavira
NB. In case folks haven't worked it out already, I have been
tasked to mentor
Surely you want to be implementing an UpdateRequestProcessor,
rather than a RequestHandler.
The ContentStreamHandlerBase, in the handleRequestBody method
gets an UpdateRequestProcessor and uses it to process the
request. What we need is that handleRequestBody method to, as you
have suggested,
Hi
Good call about the policies being deterministic, should've thought of that
earlier.
We've changed the patch to include this and I've removed the random
assignment one (for obvious reasons).
Take a look and let me know what's to do. (
https://issues.apache.org/jira/browse/SOLR-2341)
Cheers
Hey,
We're making good progress, but our DistributedUpdateRequestHandler is
having a bit of an identity crisis, so we thought we'd ask what other
people's opinions are. The current situation is as follows:
We've added a method to ContentStreamHandlerBase to check if an update
request is
Hi all,
Just a couple of questions that have arisen.
1. For handling non-distributed update requests (shards param is not present
or is invalid), our code currently
- assumes the user would like the data indexed, so gets the request
handler assigned to /update
- executes the request
On Thu, 03 Feb 2011 15:12 +, Alex Cowell
alxc...@gmail.com wrote:
Hi all,
Just a couple of questions that have arisen.
1. For handling non-distributed update requests (shards param
is not present or is invalid), our code currently
* assumes the user would like the data indexed, so
On Tue, 01 Feb 2011 19:52 -0800, Lance Norskog goks...@gmail.com
wrote:
Another use case is that N indexers operate independently, all pulling
data from the same database. Each has a separate query to get the
documents in its policy.
But surely in this case, you are externalising the
On Tue, 01 Feb 2011 00:26 +, William Mayor
m...@williammayor.co.uk wrote:
Hi Guys
I've had a go at creating the ShardDistributionPolicy interface and a
few implementations. I've created a patch
(https://issues.apache.org/jira/browse/SOLR-2341) let me know what
needs doing.
Currently
Hello
Thanks for your prompt reply.
In regards to using a SolrDocument instead of Strings (and I agree
that ListString doesn't seem to be the best way of going) how do I
get reference to a SolrDoc?
As far as I can see I have access to a ListContentStream that
represents all of the files being
Your code looks fine to me, except it should take in a SolrDocument
object or list of, rather than strings. Then, for your Hash version, you
can take a hash of the id field.
As far as I can see I have access to a ListContentStream that
represents all of the files being POSTed. Do I want to
I noticed there is a comment in the
org.apache.solr.servlet.DirectSolrConnection class which reads, //Find a
way to turn ListContentStream into File/SolrDocument. Did anyone find a
way to do this?
Turns out that comment was left over from some experimenting one of our team
was doing. But I
Another use case is that N indexers operate independently, all pulling
data from the same database. Each has a separate query to get the
documents in its policy.
On Tue, Feb 1, 2011 at 12:38 PM, Upayavira u...@odoko.co.uk wrote:
On Tue, 01 Feb 2011 19:04 +, Alex Cowell alxc...@gmail.com
Hi Guys
I've had a go at creating the ShardDistributionPolicy interface and a
few implementations. I've created a patch
(https://issues.apache.org/jira/browse/SOLR-2341) let me know what
needs doing.
Currently I assume that the documents passed to the policy will be
represented by some kind of
(I'm sending this on behalf of William, a guy on our team working on
ShardDistributedPolicy):
Hi Guys
I've had a go at creating the ShardDistributionPolicy interface and a
few implementations. I've created a patch
(https://issues.apache.org/jira/browse/SOLR-2341) let me know what
needs doing.
Hello Yonik,
On Thu, 2011-01-27 at 08:01 -0500, Yonik Seeley wrote:
Making it easy for clients I think is key... one should be able to
update any node in the solr cluster and have solr take care of the
hard part about updating all relevant shards. This will most likely
involve an update
I would suggest that a DistributedRequestUpdateHandler run
single-threaded, doing only one document at a time. If I want more
than one, I run it twice or N times with my own program.
Also, this should have a policy object which decides exactly how
documents are distributed. There are different
Hi Soheb,
On Wed, 26 Jan 2011 16:29 +, Soheb Mahmood
soheb.luc...@gmail.com wrote:
We are going to implement distributed indexing for Solr - without the
use of SolrCloud (so it can be easily up-scaled). We have a deadline by
February to get this done, so we need to get cracking ;)
:-)
On Thu, 27 Jan 2011 16:01 +, Alex Cowell
alxc...@gmail.com wrote:
Making it easy for clients I think is key... one should be
able to
update any node in the solr cluster and have solr take care of
the
hard part about updating all relevant shards. This will most
likely
involve
Another point that will need some thought, as I have heard alluded to,
is error handling.
Currently, as I understand it, if you post 500 documents to Solr, and
one has an error, the whole batch will fail.
Leaving aside whether that is the best behaviour, it is a behaviour that
will be impossible
On Fri, Jan 28, 2011 at 7:55 AM, Upayavira u...@odoko.co.uk wrote:
On Thu, 27 Jan 2011 16:01 +, Alex Cowell alxc...@gmail.com wrote:
Making it easy for clients I think is key... one should be able to
update any node in the solr cluster and have solr take care of the
hard part about
Hi Yonik and Upayavira,
Thank you both for your insightful responses. We now have a much better
understanding of how to implement distributed indexing, although no doubt
more issues will emerge along the way.
Just to clarify (and for critique), our approach goes something like this:
We will use
On Wed, Jan 26, 2011 at 11:29 AM, Soheb Mahmood soheb.luc...@gmail.com wrote:
We were wondering if there was a simple way of
applying these changes we wrote in Java across all the other languages.
Making it easy for clients I think is key... one should be able to
update any node in the solr
Making it easy for clients I think is key... one should be able to
update any node in the solr cluster and have solr take care of the
hard part about updating all relevant shards. This will most likely
involve an update processor. This approach allows all existing update
methods (including
Hi Soheb,
Sounds good! A few things I thought of:
With regard to #1, would the list of shards to index to (if present) be
exclusive or would we assume that the shard the update request was sent to
should also be included? For example, say, using the example you gave, an
update request was sent
Just throwing in my 2 cents. If you're on a tight deadline have you had a
look at Solandra? We were already using Cassandra, so it was incredibly
easy to get a scalable Solr installation up and running.
On 27 January 2011 08:17, Alex Cowell alxc...@gmail.com wrote:
Hi Soheb,
Sounds good! A
Hey guys!
On Thu, 2011-01-27 at 10:04 +1300, Todd Nine wrote:
Just throwing in my 2 cents. If you're on a tight deadline have you
had a look at Solandra? We were already using Cassandra, so it was
incredibly easy to get a scalable Solr installation up and running.
In short: We are doing
29 matches
Mail list logo