Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Alexey Serba
{quote} ... Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost. at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)

Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Shawn Heisey
I am running into this problem as well, but only sporadically, and only in my 3.1 test environment, not 1.4.1 production. I may have narrowed things down, I am interested now in learning whether this is a problem with the MySQL connector or DIH. On 4/21/2011 6:09 PM, Scott Bigelow wrote:

Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Scott Bigelow
Alex, thanks for your response. I suspect you're right about autoCommit; i ended up solving the problem by merely moving the entire Solr install, untouched, to a significantly larger instance (EC2 m1.small to m1.large). I think it is appropriately sized now for the quantity and intensity of

Re: Indexing 20M documents from MySQL with DIH

2011-04-24 Thread Scott Bigelow
Thank you everyone for your help. I ended up getting the index to work using the exact same config file on a (substantially) larger instance. On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson erickerick...@gmail.com wrote: {{{A custom indexer, so that's a fairly common practice? So when you are

Re: Indexing 20M documents from MySQL with DIH

2011-04-22 Thread Erick Erickson
{{{A custom indexer, so that's a fairly common practice? So when you are dealing with these large indexes, do you try not to fully rebuild them when you can? It's not a nightly thing, but something to do in case of a disaster? Is there a difference in the performance of an index that was built all

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Robert Gründler
we're indexing around 10M records from a mysql database into a single solr core. The DataImportHandler needs to join 3 sub-entities to denormalize the data. We've run into some troubles for the first 2 attempts, but setting batchSize=-1 for the dataSource resolved the issues. Do you need a lot

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow
Thanks for your response! I think the issue is that the records are being returned TOO fast from MySQL. I can dump them to CSV in about 30 minutes, but building the solr index takes hours on the system I'm using. I may just need to use a more powerful Solr instance so it doesn't leave MySQL

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Chris Hostetter
: For a new project, I need to index about 20M records (30 fields) and I : have been running into issues with MySQL disconnects, right around : 15M. I've tried several remedies I've found on blogs, changing if you can provide some concrete error/log messages and the details of how you are

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Scott Bigelow
Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the problem correctly (using DIH, with one big SELECT statement for millions of rows) instead of solving this specific problem. Here's a partial stacktrace from this

Re: Indexing 20M documents from MySQL with DIH

2011-04-21 Thread Li
Can you post the dataconfig.XML? Probably you didn't use batch size Sent from my iPhone On Apr 21, 2011, at 5:09 PM, Scott Bigelow eph...@gmail.com wrote: Thanks for the e-mail. I probably should have provided more details, but I was more interested in making sure I was approaching the