Re: DIH idle in transaction forever
Btw, I removed the batchSize but performance is better with batchSize=1. I haven't done further testing to see what the best setting is, but the difference between setting it at 1 and not setting it is almost double the indexing time (~20 minutes vs ~37 minutes) On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor jasper.fl...@m4n.nl wrote: Actually, the readOnly=true makes things worse. What it does (among other things) is: c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED); which leads to: Caused by: org.postgresql.util.PSQLException: Cannot change transaction isolation level in the middle of a transaction. because the connection is idle in transaction. I found this issue: https://issues.apache.org/jira/browse/SOLR-2045 Patching DIH with the code they suggest seems to work. mvg, Jasper On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote: Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
DIH idle in transaction forever
Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
RE: DIH idle in transaction forever
Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
Re: DIH idle in transaction forever
Actually, the readOnly=true makes things worse. What it does (among other things) is: c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED); which leads to: Caused by: org.postgresql.util.PSQLException: Cannot change transaction isolation level in the middle of a transaction. because the connection is idle in transaction. I found this issue: https://issues.apache.org/jira/browse/SOLR-2045 Patching DIH with the code they suggest seems to work. mvg, Jasper On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote: Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper