Are you by any chance committing after every file being indexed? That
could cause the speed issues.

Also, have you tried to optimize your indexer's java memory params. I
use this for mine which used to run out of memory as well:
java -server -Xms512m -Xmx2048m

Regards,
   Alex.
P.s. I may have some issues with mine still, so this is just a
direction hint, not a full answer.
P.p.s. I have not tried this, but you may be able to run multiple
Tikas in parallel queues/processes and then feed that into
single-queue to send to Solr.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Sep 24, 2012 at 10:04 AM,  <johannes.schwendin...@blum.com> wrote:
> Hi,
>
> Im currently experimenting with Solr Cell to index files to Solr. During
> this some questions came up.
>
> 1. Is it possible (and wise) to connect to Solr Cell with multiple Threads
> at the same time to index several documents at the same time?
> This question came up because my prrogramm takes about 6hours to index
> round 35000 docs. (no production environment, only example solr and a
> little desktop machine but I think its very slow, and I know solr isn't
> the bottleneck (yet))
>
> 2. If 1 is possible, how many Threads should do this and how many memory
> Solr needs? I've tried it but i run into an out of memory exception.
>
> Thanks in advantage
>
> Best Regards
> Johannes

Reply via email to