[ https://issues.apache.org/jira/browse/SOLR-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648685#action_12648685 ]
Glen Newton commented on SOLR-812: ---------------------------------- This is a related issue, but since I just got involved with Solr yesterday and got a jira account today, I am reluctant to make a career-limiting error! :-) If it is indeed valid, perhaps someone else can make it a full-fledged separate issue! Perusing: JdbcDataSource @version $Id: JdbcDataSource.java 696539 2008-09-18 02:16:26Z ryan Issue: MySQL fetchSize driver bug Both my experience and according to: http://benjchristensen.wordpress.com/2008/05/27/mysql-jdbc-memory-usage-on-large-resultset/ MySQL does not handle properly any fetchSize > Integer.MIN_VALUE, and the entire ResultSet is transfered and loaded into memory, which for large ResultSets can result in an out of memory. In JdbcDataSource.java: 175: stmt.setFetchSize(batchSize); where 57: private int batchSize = FETCH_SIZE; and 326: private static final int FETCH_SIZE = 500; Is is, this code will invoke this bug for MySQL for large ResultSets. Even for smaller ResultSets that do not cause an out of memory error, having all the ResultSet in memory will unnecessarily use up memory. The work around for this MySQL issue is: stmt.setFetchSize(Integer.MIN_VALUE); >From the blog entry, see also: * http://javaquirks.blogspot.com/2007/12/mysql-streaming-result-set.html * http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html > JDBC optimizations: setReadOnly, setMaxRows > ------------------------------------------- > > Key: SOLR-812 > URL: https://issues.apache.org/jira/browse/SOLR-812 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler > Affects Versions: 1.3 > Reporter: David Smiley > > I'm looking at the DataImport code as of Solr v1.3 and using it with Postgres > and very large data sets and there some improvement suggestions I have. > 1. call setReadOnly(true) on the connection. DIH doesn't change the data so > this is obvious. > 2. call setAutoCommit(false) on the connection. (this is needed by Postgres > to ensure that the fetchSize hint actually works) > 3. call setMaxRows(X) on the statement which is to be used when the > dataimport.jsp debugger is only grabbing X rows. fetchSize is just a hint > and alone it isn't sufficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.