Re: Performance issues with CursorMark

Anshum Gupta Mon, 26 Oct 2020 09:00:08 -0700

Hey Markus,

What are you sorting on? Do you have docValues enabled on the sort field ?


On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> We have been using a simple Python tool for a long time that eases
> movement of data between Solr collections, it uses CursorMark to fetch
> small or large pieces of data. Recently it stopped working when moving data
> from a production collection to my local machine for testing, the Solr
> nodes began to run OOM.
>
> I added 500M to the 3G heap and now it works again, but slow (240docs/s)
> and costing 3G of the entire heap just to move 32k docs out of 76m total.
>
> Solr 8.6.0 is running with two shards (1 leader+1 replica), each shard has
> 38m docs almost no deletions (0.4%) taking up ~10.6g disk space. The
> documents are very small, they are logs of various interactions of users
> with our main text search engine.
>
> I monitored all four nodes with VisualVM during the transfer, all four
> went up to 3g heap consumption very quickly. After the transfer it took a
> while for two nodes to (forcefully) release the no longer for the transfer
> needed heap space. The two other nodes, now, 17 minutes later, still think
> they have to hang on to their heap consumption. When i start the same
> transfer again, the nodes that already have high memory consumption just
> seem to reuse that, not consuming additional heap. At least the second time
> it went 920docs/s. While we are used to transfer these tiny documents at
> light speed of multiple thousands per second.
>
> What is going on? We do not need additional heap, Solr is clearly not
> asking for more and GC activity is minimal. Why did it become so slow?
> Regular queries on the collection are still going fast, but CursorMarking
> even through a tiny portion is taking time and memory.
>
> Many thanks,
> Markus
>


-- 
Anshum Gupta

Re: Performance issues with CursorMark

Reply via email to