Re: mergeFactor / indexing speed

Chantal Ackermann Thu, 06 Aug 2009 09:47:22 -0700

Hi all,

to keep this thread up to date... ;-)



d) jdbc batch size
changed to 10. (Was default: 500, then 1000)

The problem with my dih setup is that the root entity query returns ahuge set (all ids that shall be indexed). A larger fetchsize would begood for that query.The nested entity, however, returns only up 9 rows, ever. Theconstraints are so strict (by id) that there is no way that anyadditional data could be pre-fetched.(Actually, anynone using DIH with nested entities should run into thatproblem?)

After changing to 10, I cannot see that this low batch size slowed theindexer down (significantly).

As I would like to stick with DIH (instead of dumping the data into CSVand import it then) here is my question:

Do you think it's possible to return (in the nested entity) rowsindependent of the unique id, and let the processor decide when adocument is complete?The examples in the wiki always use an ID to get the data for the nestedentity, so I'm not sure it was planned with that in mind. But as I'malready handling multiple db rows for one document, it might not be toodifficult to change to handling the unique id correctly, as well?Of course, I would need something like a look ahead to know whether thenext row is already part of the next document.



Cheers,
Chantal



Concerning the other settings (just fyi):

a) mergeFactor 10 (and also tried 100)

I don't think that changed anything to the worse, rather to the better.So, I'll stick with 10 from now on.


b) ramBufferSizeMB

tried 512, 1024. RAM usage went up when I increased from 256 to 512. Notsure about 1024. I'll stick to 512.

Re: mergeFactor / indexing speed

Reply via email to