On Fri, 2014-06-06 at 14:05 +0200, Vineet Mishra wrote: > Could you state what indexing mechanism are you using, as I started > with EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of > indexing.
I suspect that is due to too-frequent commits, too small heap or something third, unrelated to EmbeddedSolrServer itself. Underneath the surface it is just the same as a standalone Solr. We're building our ~1TB indexes individually, using standalone workers for the heavy part of the analysis (Tika). The delivery from the workers to the Solr server is over the network, using the Solr binary protocol. My colleague Thomas Egense just created a small write-up at https://github.com/netarchivesuite/netsearch > I started indexing 1 week back and still its 37GB, although I assume > HttpPost mechanism will perform lethargic slow due to network latency > and for the response await. Maybe if you send the documents one at a time, but if you bundle them in larger updates, the post-method should be fine. - Toke Eskildsen, State and University Library, Denmark