Sub entities can slow down indexing remarkably.What is that
datasource? DB? then try using CachedSqlEntityProcessor

On Tue, Jun 14, 2011 at 8:31 PM, Mark <static.void....@gmail.com> wrote:
> Hello all,
>
> We are using DIH to index our data (~6M documents) and its taking an
> extremely long time (~24 hours). I am trying to find ways that we can speed
> this up. I've been reading through older posts and it's my understanding
> this should not take that long.
>
> One probably bottleneck is that we have a sub entity pulling in item
> descriptions from a separate datasource which we then strip html from.
> Before stripping the html we run it through JTidy. Our data-config looks
> something like this: http://pastie.org/2067011
>
> I've heard about entity threads and I was wondering if this would help in my
> case? I haven't been able to find any good documentation on this.
>
> Another possible bottleneck is the the number of sub entities we have... 5
> (only 1 of which is CachedSqlEntityProcessor). Any ideas?
>
> Thanks for the help
>
>
>



-- 
-----------------------------------------------------
Noble Paul

Reply via email to