Can you post the dataconfig.XML? Probably you didn't use batch size Sent from my iPhone
On Apr 21, 2011, at 5:09 PM, Scott Bigelow <eph...@gmail.com> wrote: > Thanks for the e-mail. I probably should have provided more details, > but I was more interested in making sure I was approaching the problem > correctly (using DIH, with one big SELECT statement for millions of > rows) instead of solving this specific problem. Here's a partial > stacktrace from this specific problem: > > ... > Caused by: java.io.EOFException: Can not read response from server. > Expected to read 4 bytes, read 0 bytes before connection was > unexpectedly lost. > at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) > at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989) > ... 22 more > Apr 21, 2011 3:53:28 AM > org.apache.solr.handler.dataimport.EntityProcessorBase getNext > SEVERE: getNext() failed for query 'REDACTED' > org.apache.solr.handler.dataimport.DataImportHandlerException: > com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: > Communications link failure > > The last packet successfully received from the server was 128 > milliseconds ago. The last packet sent successfully to the server was > 25,273,484 milliseconds ago. > ... > > > A custom indexer, so that's a fairly common practice? So when you are > dealing with these large indexes, do you try not to fully rebuild them > when you can? It's not a nightly thing, but something to do in case of > a disaster? Is there a difference in the performance of an index that > was built all at once vs. one that has had delta inserts and updates > applied over a period of months? > > Thank you for your insight. > > > On Thu, Apr 21, 2011 at 4:31 PM, Chris Hostetter > <hossman_luc...@fucit.org> wrote: >> >> : For a new project, I need to index about 20M records (30 fields) and I >> : have been running into issues with MySQL disconnects, right around >> : 15M. I've tried several remedies I've found on blogs, changing >> >> if you can provide some concrete error/log messages and the details of how >> you are configuring your datasource that might help folks provide better >> suggestions -- youv'e said you run into a problem but you havne't provided >> any details for people to go on in giving you feedback. >> >> : resolved the issue. It got me wondering: Is this the way everyone does >> : it? What about 100M records up to 1B; are those all pulled using DIH >> : and a single query? >> >> I've only recently started using DIH, and while it definitely has a lot >> of quirks/anoyances, it seems like a pretty good 80/20 solution for >> indexing with Solr -- but that doens't mean it's perfect for all >> situations. >> >> Writing custom indexer code can certianly make sense in a lot of cases -- >> particularly where you already have a data pblishing system that you wnat >> to tie into directly -- the trick is to ensure you have a decent strategy >> for rebuilding the entire index should the need arrise (but this is relaly >> only an issue if your primary indexing solution is incremental -- many use >> cases can be satisifed just fine with a brute force "full rebuild >> periodically" impelmentation. >> >> >> -Hoss >>