Andrzej Bialecki wrote:
Dennis Kubes wrote:
All,
We have two version of a type of index splitter. The first version
would run an indexing job and then using the completed index as input
would read the number of documents in the index and take a requested
split size. From this it used a c
Dennis Kubes wrote:
All,
We have two version of a type of index splitter. The first version
would run an indexing job and then using the completed index as input
would read the number of documents in the index and take a requested
split size. From this it used a custom index input format to
Dennis,
You should consider splitting based on a function that will give you a more
uniform distribution (e.g.: MD5(url)). That way, you should see much less of
a variation in the number of documents per partition.
Chad
On 12/4/06 3:29 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote:
> All,
>
> W
All,
We have two version of a type of index splitter. The first version
would run an indexing job and then using the completed index as input
would read the number of documents in the index and take a requested
split size. From this it used a custom index input format to create
splits accor
Dennis Kubes wrote:
[...]
Having a new index on each machine and having to create separate
indexes is not the most elegant way to accomplish this architecture.
The best way that we have found is to have an splitter job that
indexes and splits the index and
Have you implemented a Lucene ind
The distributed searching section assumes that you have split the index
into multiple pieces and there is a piece on each machine. The tutorial
doesn't tell you how to split the indexes because there is not tool to
do that yet. I was trying to layout a general architecture for how to
do distr