Hi all,

to keep this thread up to date... ;-)


d) jdbc batch size
changed to 10. (Was default: 500, then 1000)

The problem with my dih setup is that the root entity query returns a huge set (all ids that shall be indexed). A larger fetchsize would be good for that query. The nested entity, however, returns only up 9 rows, ever. The constraints are so strict (by id) that there is no way that any additional data could be pre-fetched. (Actually, anynone using DIH with nested entities should run into that problem?)

After changing to 10, I cannot see that this low batch size slowed the indexer down (significantly).

As I would like to stick with DIH (instead of dumping the data into CSV and import it then) here is my question:

Do you think it's possible to return (in the nested entity) rows independent of the unique id, and let the processor decide when a document is complete? The examples in the wiki always use an ID to get the data for the nested entity, so I'm not sure it was planned with that in mind. But as I'm already handling multiple db rows for one document, it might not be too difficult to change to handling the unique id correctly, as well? Of course, I would need something like a look ahead to know whether the next row is already part of the next document.


Cheers,
Chantal



Concerning the other settings (just fyi):

a) mergeFactor 10 (and also tried 100)
I don't think that changed anything to the worse, rather to the better. So, I'll stick with 10 from now on.

b) ramBufferSizeMB
tried 512, 1024. RAM usage went up when I increased from 256 to 512. Not sure about 1024. I'll stick to 512.


Reply via email to