Re: DIH idle in transaction forever

2012-06-15 Thread Jasper Floor
Btw, I removed the batchSize but performance is better with
batchSize=1. I haven't done further testing to see what the best
setting is, but the difference between setting it at 1 and not
setting it is almost double the indexing time (~20 minutes vs ~37
minutes)

On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor jasper.fl...@m4n.nl wrote:
 Actually, the readOnly=true makes things worse.
 What it does (among other things) is:
            c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

 which leads to:
 Caused by: org.postgresql.util.PSQLException: Cannot change
 transaction isolation level in the middle of a transaction.

 because the connection is idle in transaction.

 I found this issue:
 https://issues.apache.org/jira/browse/SOLR-2045

 Patching DIH with the code they suggest seems to work.

 mvg,
 Jasper

 On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com 
 wrote:
 Try readOnly=true in the dataSource configuration.  This causes several 
 defaults to get set in the JDBC connection, and often will solve problems 
 like this. (see 
 http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
 Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
 optimal.  This might be better than 1.

 There is also an issue in that it doesn't explicitly close the resultset but 
 relies on closing the connection to implicily close the child objects.  I 
 know when I tried using DIH with Derby a while back this had at the least 
 caused some log warnings, and it wouldn't work at all without 
 readOnly=false.  Not sure abour PostgreSql.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
 Sent: Thursday, June 14, 2012 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: DIH idle in transaction forever

 Hi all,

 It seems that DIH always holds two connections open to the database.
 One of them is almost always 'idle in transaction'. It may sometimes
 seem to do a little work but then it goes idle again.


 datasource definition:
        dataSource name=df-stream-store-ds
 jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
 autoCommit=false batchSize=1 /

 We have a datasource defined in the jndi:
        no-tx-datasource
                jndi-nameext_solr_datafeeds_dba/jndi-name
                
 security-domainext_solr_datafeeds_dba_realm/security-domain
                
 connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
                min-pool-size0/min-pool-size
                max-pool-size5/max-pool-size
                
 transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
                driver-classorg.postgresql.Driver/driver-class
                blocking-timeout-millis3/blocking-timeout-millis
                idle-timeout-minutes5/idle-timeout-minutes
                new-connection-sqlSELECT 1/new-connection-sql
                check-valid-connection-sqlSELECT 
 1/check-valid-connection-sql
        /no-tx-datasource


 If we set autocommit to true then we get an OOM on indexing so that is
 not an option.

 Does anyone have any idea why this happens? I would guess that DIH
 doesn't close the connection, but reading the code I can't be sure of
 this. The ResultSet object should close itself once it reaches the
 end.

 mvg,
 JAsper


DIH idle in transaction forever

2012-06-14 Thread Jasper Floor
Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
dataSource name=df-stream-store-ds
jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
autoCommit=false batchSize=1 /

We have a datasource defined in the jndi:
no-tx-datasource
jndi-nameext_solr_datafeeds_dba/jndi-name
security-domainext_solr_datafeeds_dba_realm/security-domain

connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
min-pool-size0/min-pool-size
max-pool-size5/max-pool-size

transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
driver-classorg.postgresql.Driver/driver-class
blocking-timeout-millis3/blocking-timeout-millis
idle-timeout-minutes5/idle-timeout-minutes
new-connection-sqlSELECT 1/new-connection-sql
check-valid-connection-sqlSELECT 
1/check-valid-connection-sql
/no-tx-datasource


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper


RE: DIH idle in transaction forever

2012-06-14 Thread Dyer, James
Try readOnly=true in the dataSource configuration.  This causes several 
defaults to get set in the JDBC connection, and often will solve problems like 
this. (see 
http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
optimal.  This might be better than 1.

There is also an issue in that it doesn't explicitly close the resultset but 
relies on closing the connection to implicily close the child objects.  I know 
when I tried using DIH with Derby a while back this had at the least caused 
some log warnings, and it wouldn't work at all without readOnly=false.  Not 
sure abour PostgreSql.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jasper Floor [mailto:jasper.fl...@m4n.nl] 
Sent: Thursday, June 14, 2012 8:21 AM
To: solr-user@lucene.apache.org
Subject: DIH idle in transaction forever

Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
dataSource name=df-stream-store-ds
jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
autoCommit=false batchSize=1 /

We have a datasource defined in the jndi:
no-tx-datasource
jndi-nameext_solr_datafeeds_dba/jndi-name
security-domainext_solr_datafeeds_dba_realm/security-domain

connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
min-pool-size0/min-pool-size
max-pool-size5/max-pool-size

transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
driver-classorg.postgresql.Driver/driver-class
blocking-timeout-millis3/blocking-timeout-millis
idle-timeout-minutes5/idle-timeout-minutes
new-connection-sqlSELECT 1/new-connection-sql
check-valid-connection-sqlSELECT 
1/check-valid-connection-sql
/no-tx-datasource


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper


Re: DIH idle in transaction forever

2012-06-14 Thread Jasper Floor
Actually, the readOnly=true makes things worse.
What it does (among other things) is:
c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

which leads to:
Caused by: org.postgresql.util.PSQLException: Cannot change
transaction isolation level in the middle of a transaction.

because the connection is idle in transaction.

I found this issue:
https://issues.apache.org/jira/browse/SOLR-2045

Patching DIH with the code they suggest seems to work.

mvg,
Jasper

On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote:
 Try readOnly=true in the dataSource configuration.  This causes several 
 defaults to get set in the JDBC connection, and often will solve problems 
 like this. (see 
 http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
 Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
 optimal.  This might be better than 1.

 There is also an issue in that it doesn't explicitly close the resultset but 
 relies on closing the connection to implicily close the child objects.  I 
 know when I tried using DIH with Derby a while back this had at the least 
 caused some log warnings, and it wouldn't work at all without readOnly=false. 
  Not sure abour PostgreSql.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
 Sent: Thursday, June 14, 2012 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: DIH idle in transaction forever

 Hi all,

 It seems that DIH always holds two connections open to the database.
 One of them is almost always 'idle in transaction'. It may sometimes
 seem to do a little work but then it goes idle again.


 datasource definition:
        dataSource name=df-stream-store-ds
 jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
 autoCommit=false batchSize=1 /

 We have a datasource defined in the jndi:
        no-tx-datasource
                jndi-nameext_solr_datafeeds_dba/jndi-name
                security-domainext_solr_datafeeds_dba_realm/security-domain
                
 connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
                min-pool-size0/min-pool-size
                max-pool-size5/max-pool-size
                
 transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
                driver-classorg.postgresql.Driver/driver-class
                blocking-timeout-millis3/blocking-timeout-millis
                idle-timeout-minutes5/idle-timeout-minutes
                new-connection-sqlSELECT 1/new-connection-sql
                check-valid-connection-sqlSELECT 
 1/check-valid-connection-sql
        /no-tx-datasource


 If we set autocommit to true then we get an OOM on indexing so that is
 not an option.

 Does anyone have any idea why this happens? I would guess that DIH
 doesn't close the connection, but reading the code I can't be sure of
 this. The ResultSet object should close itself once it reaches the
 end.

 mvg,
 JAsper