On Fri, 2014-06-06 at 14:05 +0200, Vineet Mishra wrote:

> Could you state what indexing mechanism are you using, as I started
> with EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of
> indexing.

I suspect that is due to too-frequent commits, too small heap or
something third, unrelated to EmbeddedSolrServer itself. Underneath the
surface it is just the same as a standalone Solr.

We're building our ~1TB indexes individually, using standalone workers
for the heavy part of the analysis (Tika). The delivery from the workers
to the Solr server is over the network, using the Solr binary protocol.
My colleague Thomas Egense just created a small write-up at
https://github.com/netarchivesuite/netsearch

>  I started indexing 1 week back and still its 37GB, although I assume
> HttpPost mechanism will perform lethargic slow due to network latency
> and for the response await.

Maybe if you send the documents one at a time, but if you bundle them in
larger updates, the post-method should be fine.

- Toke Eskildsen, State and University Library, Denmark


Reply via email to