CursorMark, batch size/speed

Markus Jelsma Wed, 12 Jun 2019 14:59:35 -0700

Hello,

One of our collections hates CursorMark, it really does. When under very heavy 
load the nodes can occasionally consume GBs additional heap for no clear reason 
immediately after downloading the entire corpus.


Although the additional heap consumption is a separate problem that i hope 
anyone can shed some light on, there is another strange behaviour i would like 
to see explained.

When under little load and with a batch size of just a few hundred, the 
download speed creeps at at most 150 doc/s. But when i increase batch size to 
absurd numbers such as 20k, the speed jumps to 2.5k docs/s. Changing total time 
from days to just a few hours.

We see the heap and the speed differences only really with one big collection 
of millions of small documents. They are just query, click and view logs with 
additional metadata fields such as time, digests, ranks, dates, uids, view time 
etc.

Is there someone here to shed some light on these vague subjects?

Many thanks,
Markus

CursorMark, batch size/speed

Reply via email to