Re: DIH fails after processing roughly 10million records

Shawn Heisey Tue, 08 Jan 2013 06:31:30 -0800

On 1/8/2013 2:10 AM, vijeshnair wrote:

Solr version : 4.0 (running with 9GB of RAM)
MySQL : 5.5
JDBC : mysql-connector-java-5.1.22-bin.jar


I am trying to run the full import for my catalog data which is roughly
13million of products. The DIH ran smoothly for 18 hours, and processed
roughly 10million of records. But all of a sudden it broke due to the jdbc
exception i.e. Communication failure with the server. I did an extensive
googling on this topic, and there are multiple recommendation to use
"readonly=true", "autocommit=true" etc. If I understand it correctly, the
possible reason is when DIH stops indexing due to the segment merging, and
when it tries to reconnect with the server. When index is slightly large and
multiple merging happening at the same time, DIH stops indexing for some
time, and by the time it re-starts MySQL would have already discontinued the
connection. So I am going to increase the wait time out at MySQL side from
the default 120 to some thing slightly large, to see if that solve the issue
or not. I would know the result of that approach only after completing one
full run, which I will update you tomorrow. Mean time I thought of
validating my approach, and checking with you for any other fix which exist.

This is how I fixed it. On version 4, this goes in the indexConfigsection. On 3.x it goes into indexDefaults:


  <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
    <int name="maxThreadCount">4</int>
    <int name="maxMergeCount">4</int>
  </mergeScheduler>

A recent jira issue (LUCENE-4661) changed the maxThreadCount to 1 forbetter performance, so I'm not sure if both of my changes above areactually required or if just maxMergeCount will fix it. I commented onthe issue to find out.


https://issues.apache.org/jira/browse/LUCENE-4661

If I don't get a definitive answer soon, I'll go ahead and test for myself.

Side question: you're already setting batchSize to a negative number, right?

Thanks,
Shawn

Re: DIH fails after processing roughly 10million records

Reply via email to