Spoke too soon, looks like it memory leaks.  After about 1.3m the old gc times 
went through the root and solr was almost unresponsive, had to abort.  We're 
going to write our own implementation to copy data from one core to another 
that runs outside of solr.

On 06/02/2020, 09:57, "Karl Stoney" <karl.sto...@autotrader.co.uk> wrote:

    I cannot believe how much of a difference that cursorMark and sort order 
made.
    Previously it died about 800k docs, now we're at 1.2m without any slowdown.

    Thank you so much

    On 06/02/2020, 08:14, "Mikhail Khludnev" <m...@apache.org> wrote:

        Hello, Karl.
        Please check these:
        
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fpagination-of-results.html%23constraints-when-using-cursors&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457&amp;sdata=pNw8x6YUBTtXst60oMAe8UqWvUtakYvoJ9%2FKn7R8ETo%3D&amp;reserved=0

        
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fuploading-structured-data-store-data-with-the-data-import-handler.html%23solrentityprocessor&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457&amp;sdata=572w%2Br7QtZ8eHORG5UVrE3yE3SZaUXsuqFpRuwE80sw%3D&amp;reserved=0
         cursorMark="true"
        Good luck.


        On Wed, Feb 5, 2020 at 10:06 PM Karl Stoney
        <karl.sto...@autotrader.co.uk.invalid> wrote:

        > Hey All,
        > I'm trying to implement a simplistic reindex strategy to copy all of 
the
        > data out of one collection, into another, on a single node (no 
distributed
        > queries).
        >
        > It's approx 4 million documents, with an index size of 26gig.  Based 
on
        > your experience, I'm wondering what people feel sensible values for 
the
        > SolrEntityProcessor are (to give me a sensible starting point, to 
save me
        > iterating over loads of them).
        >
        > This is where I'm at right now.  I know `rows` would increase memory
        > pressure but speed up the copy, I can't really find anywhere online 
where
        > people have benchmarked different values for rows and the default (50)
        > seems quite low.
        >
        > <dataConfig>
        > <document>
        >    <entity name="solr_doc" processor="SolrEntityProcessor"
        >      query="*:*"
        >      rows="100"
        >      fl="*,old_version:_version_"
        >      wt="javabin"
        >      
url="https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2F127.0.0.1%2Fsolr%2Fat-uk&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C31a2300d8a0e42a9e28f08d7aadc92c7%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165736641024457&amp;sdata=e9BfXappFygVqSlweYXJdsxf5TXtlrL%2BwHop7PrOsJQ%3D&amp;reserved=0";>
        >    </entity>
        > </document>
        > </dataConfig>
        >
        > Any suggestions are welcome.
        > Thanks
        > This e-mail is sent on behalf of Auto Trader Group Plc, Registered 
Office:
        > 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in 
England
        > No. 9439967). This email and any files transmitted with it are 
confidential
        > and may be legally privileged, and intended solely for the use of the
        > individual or entity to whom they are addressed. If you have received 
this
        > email in error please notify the sender. This email message has been 
swept
        > for the presence of computer viruses.
        >


        --
        Sincerely yours
        Mikhail Khludnev




This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.

Reply via email to