Re: [pylucene-dev] pylucene and recommendations for RAM

David Pratt Thu, 05 Apr 2007 11:35:58 -0700

I was reading of scaling in Lucene with Remote Parallel Multisearcher. Ihave not tried this beast yet and would be interested in hearing fromanyone who has attempted it use. I see that there have been someprevious posts about it a couple of years back. I think if somethinglike this could work, it may be possible.


Regards,
David



Pete wrote:

On Thursday April 5 2007 10:43 am, David Pratt wrote:
Hi Pete. Many thanks for this advice. It would seem that perhaps a
cluster would best solve this and then spread over some number of lower
end servers. From what i read on large indexing, this seems to be the
approach (but with as much RAM as possible per server). I am looking at
costs so the lower end 2GB RAM servers are attractive but just use more
of them.

I have only used pylucene for tests on smaller indexes. Is a cluster
arrangement possible using pylucene? I am not a java programmer so would
like to stay with what I know. Many thanks.
For indexing? Not really sure how'd that work. If you want to serve allsearches for all of the documents off one box, you're gonna have to move allof the indexes together at some point. It's possible to use multiple serversto create indexes, ship them to a single box and then merge.
As for searching a collection this large, your options are either Big Iron ordistribution. Google's pretty convincingly demonstrated that the later isthe way to go. Hadoop (http://lucene.apache.org/hadoop/about.html) is alucene-based platform for doing exactly this, but it's a) Java b) nowherenear done. I believe http://hyperestraier.sourceforge.net/ has support fordistribution (and Python bindings) but I haven't tried it.
The short version: if you can partition your index into logically distinctchunks and have no need to perform searches across these chunks, distributionis pretty straightforward - it's really just setting up a bunch of smallservers. If you can't partition your data this way, the problem is muchharder. AFAIK (and I've done quite a lot of research), there is no matureOSS package to do this in any language (and certainly not Python). There area number of commercial solutions, including http://www.dieselpoint.com/(Java, but interoperable).
See my message title "Distributed Indexes, Pycon, was Re: [pylucene-dev] Isthere PyNutch?" from February 19 in the archives for a discussion of some ofthese issues.

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] pylucene and recommendations for RAM

Reply via email to