Re: distributed search

2006-12-05 Thread Dennis Kubes
Andrzej Bialecki wrote: Dennis Kubes wrote: All, We have two version of a type of index splitter. The first version would run an indexing job and then using the completed index as input would read the number of documents in the index and take a requested split size. From this it used a c

Re: distributed search

2006-12-05 Thread Andrzej Bialecki
Dennis Kubes wrote: All, We have two version of a type of index splitter. The first version would run an indexing job and then using the completed index as input would read the number of documents in the index and take a requested split size. From this it used a custom index input format to

Re: distributed search

2006-12-04 Thread Chad Walters
Dennis, You should consider splitting based on a function that will give you a more uniform distribution (e.g.: MD5(url)). That way, you should see much less of a variation in the number of documents per partition. Chad On 12/4/06 3:29 PM, "Dennis Kubes" <[EMAIL PROTECTED]> wrote: > All, > > W

Re: distributed search

2006-12-04 Thread Dennis Kubes
All, We have two version of a type of index splitter. The first version would run an indexing job and then using the completed index as input would read the number of documents in the index and take a requested split size. From this it used a custom index input format to create splits accor

Re: distributed search

2006-12-04 Thread Andrzej Bialecki
Dennis Kubes wrote: [...] Having a new index on each machine and having to create separate indexes is not the most elegant way to accomplish this architecture. The best way that we have found is to have an splitter job that indexes and splits the index and Have you implemented a Lucene ind

Re: distributed search

2006-12-04 Thread Dennis Kubes
The distributed searching section assumes that you have split the index into multiple pieces and there is a piece on each machine. The tutorial doesn't tell you how to split the indexes because there is not tool to do that yet. I was trying to layout a general architecture for how to do distr