If you're concerned about throughput, consider moving all the
SolrCell (Tika) processing off the server. SolrCell is way cool
for showing what can be done, but its downside is you're
moving all the processing of the structured documents to the
same machine doing the indexing. Pretty soon, especially
with significant size files, you're spending all your CPU cycles
parsing the files...

Happens there's a blog about this:
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/

By moving the indexing to N clients, you can increase
throughput until you make Solr work hard to do the indexing....

Best
Erick

On Mon, Sep 24, 2012 at 10:04 AM,  <johannes.schwendin...@blum.com> wrote:
> Hi,
>
> Im currently experimenting with Solr Cell to index files to Solr. During
> this some questions came up.
>
> 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
> at the same time to index several documents at the same time?
> This question came up because my prrogramm takes about 6hours to index
> round 35000 docs. (no production environment, only example solr and a
> little desktop machine but I think its very slow, and I know solr isn't
> the bottleneck (yet))
>
> 2. If 1 is possible, how many Threads should do this and how many memory
> Solr needs? I've tried it but i run into an out of memory exception.
>
> Thanks in advantage
>
> Best Regards
> Johannes

Reply via email to