Re: DIH idle in transaction forever

2012-06-15 Thread Jasper Floor
Btw, I removed the batchSize but performance is better with
batchSize=1. I haven't done further testing to see what the best
setting is, but the difference between setting it at 1 and not
setting it is almost double the indexing time (~20 minutes vs ~37
minutes)

On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor jasper.fl...@m4n.nl wrote:
 Actually, the readOnly=true makes things worse.
 What it does (among other things) is:
            c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

 which leads to:
 Caused by: org.postgresql.util.PSQLException: Cannot change
 transaction isolation level in the middle of a transaction.

 because the connection is idle in transaction.

 I found this issue:
 https://issues.apache.org/jira/browse/SOLR-2045

 Patching DIH with the code they suggest seems to work.

 mvg,
 Jasper

 On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com 
 wrote:
 Try readOnly=true in the dataSource configuration.  This causes several 
 defaults to get set in the JDBC connection, and often will solve problems 
 like this. (see 
 http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
 Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
 optimal.  This might be better than 1.

 There is also an issue in that it doesn't explicitly close the resultset but 
 relies on closing the connection to implicily close the child objects.  I 
 know when I tried using DIH with Derby a while back this had at the least 
 caused some log warnings, and it wouldn't work at all without 
 readOnly=false.  Not sure abour PostgreSql.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
 Sent: Thursday, June 14, 2012 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: DIH idle in transaction forever

 Hi all,

 It seems that DIH always holds two connections open to the database.
 One of them is almost always 'idle in transaction'. It may sometimes
 seem to do a little work but then it goes idle again.


 datasource definition:
        dataSource name=df-stream-store-ds
 jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
 autoCommit=false batchSize=1 /

 We have a datasource defined in the jndi:
        no-tx-datasource
                jndi-nameext_solr_datafeeds_dba/jndi-name
                
 security-domainext_solr_datafeeds_dba_realm/security-domain
                
 connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
                min-pool-size0/min-pool-size
                max-pool-size5/max-pool-size
                
 transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
                driver-classorg.postgresql.Driver/driver-class
                blocking-timeout-millis3/blocking-timeout-millis
                idle-timeout-minutes5/idle-timeout-minutes
                new-connection-sqlSELECT 1/new-connection-sql
                check-valid-connection-sqlSELECT 
 1/check-valid-connection-sql
        /no-tx-datasource


 If we set autocommit to true then we get an OOM on indexing so that is
 not an option.

 Does anyone have any idea why this happens? I would guess that DIH
 doesn't close the connection, but reading the code I can't be sure of
 this. The ResultSet object should close itself once it reaches the
 end.

 mvg,
 JAsper


DIH idle in transaction forever

2012-06-14 Thread Jasper Floor
Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
dataSource name=df-stream-store-ds
jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
autoCommit=false batchSize=1 /

We have a datasource defined in the jndi:
no-tx-datasource
jndi-nameext_solr_datafeeds_dba/jndi-name
security-domainext_solr_datafeeds_dba_realm/security-domain

connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
min-pool-size0/min-pool-size
max-pool-size5/max-pool-size

transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
driver-classorg.postgresql.Driver/driver-class
blocking-timeout-millis3/blocking-timeout-millis
idle-timeout-minutes5/idle-timeout-minutes
new-connection-sqlSELECT 1/new-connection-sql
check-valid-connection-sqlSELECT 
1/check-valid-connection-sql
/no-tx-datasource


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper


Re: DIH idle in transaction forever

2012-06-14 Thread Jasper Floor
Actually, the readOnly=true makes things worse.
What it does (among other things) is:
c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

which leads to:
Caused by: org.postgresql.util.PSQLException: Cannot change
transaction isolation level in the middle of a transaction.

because the connection is idle in transaction.

I found this issue:
https://issues.apache.org/jira/browse/SOLR-2045

Patching DIH with the code they suggest seems to work.

mvg,
Jasper

On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote:
 Try readOnly=true in the dataSource configuration.  This causes several 
 defaults to get set in the JDBC connection, and often will solve problems 
 like this. (see 
 http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  
 Also, try a batch size of 0 to let your jdbc driver pick what it thinks is 
 optimal.  This might be better than 1.

 There is also an issue in that it doesn't explicitly close the resultset but 
 relies on closing the connection to implicily close the child objects.  I 
 know when I tried using DIH with Derby a while back this had at the least 
 caused some log warnings, and it wouldn't work at all without readOnly=false. 
  Not sure abour PostgreSql.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Jasper Floor [mailto:jasper.fl...@m4n.nl]
 Sent: Thursday, June 14, 2012 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: DIH idle in transaction forever

 Hi all,

 It seems that DIH always holds two connections open to the database.
 One of them is almost always 'idle in transaction'. It may sometimes
 seem to do a little work but then it goes idle again.


 datasource definition:
        dataSource name=df-stream-store-ds
 jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource
 autoCommit=false batchSize=1 /

 We have a datasource defined in the jndi:
        no-tx-datasource
                jndi-nameext_solr_datafeeds_dba/jndi-name
                security-domainext_solr_datafeeds_dba_realm/security-domain
                
 connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url
                min-pool-size0/min-pool-size
                max-pool-size5/max-pool-size
                
 transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation
                driver-classorg.postgresql.Driver/driver-class
                blocking-timeout-millis3/blocking-timeout-millis
                idle-timeout-minutes5/idle-timeout-minutes
                new-connection-sqlSELECT 1/new-connection-sql
                check-valid-connection-sqlSELECT 
 1/check-valid-connection-sql
        /no-tx-datasource


 If we set autocommit to true then we get an OOM on indexing so that is
 not an option.

 Does anyone have any idea why this happens? I would guess that DIH
 doesn't close the connection, but reading the code I can't be sure of
 this. The ResultSet object should close itself once it reaches the
 end.

 mvg,
 JAsper


Re: slave index not cleaned

2012-05-16 Thread Jasper Floor
The slave index does indeed grow over a period of time regardless of
restarts. We do run on 1.4 however. We will be updating to 3.6 very
soon however so I will see how that works out. Actually we should be
able to see this on our staging platform.

thanks everyone.

mvg,
Jasper

On Mon, May 14, 2012 at 4:40 PM, Bill Bell billnb...@gmail.com wrote:
 This is a known issue in 1.4 especially in Windows. Some of it was resolved 
 in 3x.

 Bill Bell
 Sent from mobile


 On May 14, 2012, at 5:54 AM, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, replication will require up to twice the space of the
 index _temporarily_, just checking if that's what you're seeing
 But that should go away reasonably soon. Out of curiosity, what
 happens if you restart your server, do the extra files go away?

 But it sounds like your index is growing over a longer period of time
 than just a single replication, is that true?

 Best
 Erick

 On Fri, May 11, 2012 at 6:03 AM, Jasper Floor jasper.fl...@m4n.nl wrote:
 Hi,

 On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Jasper,

 Sorry, I should've added more technical info wihtout being prompted.

 Solr does handle that for you.  Some more stuff to share:

 * Solr version?

 1.4

 * JVM version?
 1.7 update 2

 * OS?
 Debian (2.6.32-5-xen-amd64)

 * Java replication?
 yes

 * Errors in Solr logs?
 no

 * deletion policy section in solrconfig.xml?
 missing I would say, but I don't see this on the replication wiki page.

 This is what we have configured for replication:

 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=slave

        str 
 name=masterUrl${solr.master.url}/df-stream-store/replication/str

        str name=pollInterval00:20:00/str
        str name=compressioninternal/str
        str name=httpConnTimeout5000/str
        str name=httpReadTimeout1/str

     /lst
 /requestHandler

 We will be updating to 3.6 fairly soon however. To be honest, from
 what I've read, the Solr cloud is what we really want in the future
 but we will have to be patient for that.

 thanks in advance

 mvg,
 Jasper

 You may also want to look at your Index report in SPM 
 (http://sematext.com/spm) before/during/after replication and share what 
 you see.

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm



 - Original Message -
 From: Jasper Floor jasper.fl...@m4n.nl
 To: solr-user@lucene.apache.org
 Cc:
 Sent: Thursday, May 10, 2012 9:08 AM
 Subject: slave index not cleaned

 Perhaps I am missing the obvious but our slaves tend to run out of
 disk space. The index sizes grow to multiple times the size of the
 master. So I just toss all the data and trigger a replication.
 However, can't solr handle this for me?

 I'm sorry if I've missed a simple setting which does this for me, but
 if its there then I have missed it.

 mvg
 Jasper



Re: slave index not cleaned

2012-05-16 Thread Jasper Floor
Btw, confirmed that this doesn't happen on our development stage with 3.6.

On Wed, May 16, 2012 at 3:59 PM, Jasper Floor jasper.fl...@m4n.nl wrote:
 The slave index does indeed grow over a period of time regardless of
 restarts. We do run on 1.4 however. We will be updating to 3.6 very
 soon however so I will see how that works out. Actually we should be
 able to see this on our staging platform.

 thanks everyone.

 mvg,
 Jasper

 On Mon, May 14, 2012 at 4:40 PM, Bill Bell billnb...@gmail.com wrote:
 This is a known issue in 1.4 especially in Windows. Some of it was resolved 
 in 3x.

 Bill Bell
 Sent from mobile


 On May 14, 2012, at 5:54 AM, Erick Erickson erickerick...@gmail.com wrote:

 Hmmm, replication will require up to twice the space of the
 index _temporarily_, just checking if that's what you're seeing
 But that should go away reasonably soon. Out of curiosity, what
 happens if you restart your server, do the extra files go away?

 But it sounds like your index is growing over a longer period of time
 than just a single replication, is that true?

 Best
 Erick

 On Fri, May 11, 2012 at 6:03 AM, Jasper Floor jasper.fl...@m4n.nl wrote:
 Hi,

 On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Jasper,

 Sorry, I should've added more technical info wihtout being prompted.

 Solr does handle that for you.  Some more stuff to share:

 * Solr version?

 1.4

 * JVM version?
 1.7 update 2

 * OS?
 Debian (2.6.32-5-xen-amd64)

 * Java replication?
 yes

 * Errors in Solr logs?
 no

 * deletion policy section in solrconfig.xml?
 missing I would say, but I don't see this on the replication wiki page.

 This is what we have configured for replication:

 requestHandler name=/replication class=solr.ReplicationHandler 
    lst name=slave

        str 
 name=masterUrl${solr.master.url}/df-stream-store/replication/str

        str name=pollInterval00:20:00/str
        str name=compressioninternal/str
        str name=httpConnTimeout5000/str
        str name=httpReadTimeout1/str

     /lst
 /requestHandler

 We will be updating to 3.6 fairly soon however. To be honest, from
 what I've read, the Solr cloud is what we really want in the future
 but we will have to be patient for that.

 thanks in advance

 mvg,
 Jasper

 You may also want to look at your Index report in SPM 
 (http://sematext.com/spm) before/during/after replication and share what 
 you see.

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm



 - Original Message -
 From: Jasper Floor jasper.fl...@m4n.nl
 To: solr-user@lucene.apache.org
 Cc:
 Sent: Thursday, May 10, 2012 9:08 AM
 Subject: slave index not cleaned

 Perhaps I am missing the obvious but our slaves tend to run out of
 disk space. The index sizes grow to multiple times the size of the
 master. So I just toss all the data and trigger a replication.
 However, can't solr handle this for me?

 I'm sorry if I've missed a simple setting which does this for me, but
 if its there then I have missed it.

 mvg
 Jasper



Re: slave index not cleaned

2012-05-11 Thread Jasper Floor
Hi,

On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Jasper,

Sorry, I should've added more technical info wihtout being prompted.

 Solr does handle that for you.  Some more stuff to share:

 * Solr version?

1.4

 * JVM version?
1.7 update 2

 * OS?
Debian (2.6.32-5-xen-amd64)

 * Java replication?
yes

 * Errors in Solr logs?
no

 * deletion policy section in solrconfig.xml?
missing I would say, but I don't see this on the replication wiki page.

This is what we have configured for replication:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=slave

str 
name=masterUrl${solr.master.url}/df-stream-store/replication/str

str name=pollInterval00:20:00/str
str name=compressioninternal/str
str name=httpConnTimeout5000/str
str name=httpReadTimeout1/str

 /lst
/requestHandler

We will be updating to 3.6 fairly soon however. To be honest, from
what I've read, the Solr cloud is what we really want in the future
but we will have to be patient for that.

thanks in advance

mvg,
Jasper

 You may also want to look at your Index report in SPM 
 (http://sematext.com/spm) before/during/after replication and share what you 
 see.

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm



 - Original Message -
 From: Jasper Floor jasper.fl...@m4n.nl
 To: solr-user@lucene.apache.org
 Cc:
 Sent: Thursday, May 10, 2012 9:08 AM
 Subject: slave index not cleaned

 Perhaps I am missing the obvious but our slaves tend to run out of
 disk space. The index sizes grow to multiple times the size of the
 master. So I just toss all the data and trigger a replication.
 However, can't solr handle this for me?

 I'm sorry if I've missed a simple setting which does this for me, but
 if its there then I have missed it.

 mvg
 Jasper



Re: Suddenly OOM

2012-05-11 Thread Jasper Floor
Outr rambuffer is the default. the Xmx is 75% of the available memory
on the machine which is 4GB. We've tried increasing it to 85% and even
gave the machine 10GB of memory. So we more than doubled the memory.
The amount of data wasn't double but where it used to be enough now it
seems to never be enough.

mvg,
Jasper

On Thu, May 10, 2012 at 6:03 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Jasper,

 The simple answer is to increase -Xmx :)
 What is your ramBufferSizeMB (solrconfig.xml) set to?  Default is 32 (MB).

 That autocommit you mentioned is a DB commit?  Not Solr one, right?  If so, 
 why is commit needed when you *read* data from DB?

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm



 - Original Message -
 From: Jasper Floor jasper.fl...@m4n.nl
 To: solr-user@lucene.apache.org
 Cc:
 Sent: Thursday, May 10, 2012 9:06 AM
 Subject: Suddenly OOM

 Hi all,

 we've been running Solr 1.4 for about a year with no real problems. As
 of monday it became impossible to do a full import on our master
 because of an OOM. Now what I think is strange is that even after we
 more than doubled the available memory there would still always be an
 OOM.  We seem to have reached a magic number of documents beyond which
 Solr requires infinite memory (or at least more than 2.5x what it
 previously needed which is the same as infinite unless we invest in
 more resources).

 We have solved the immediate problem by changing autocommit=false,
 holdability=CLOSE_CURSORS_AT_COMMIT, batchSize=1. Now
 holdability in this case I don't think does very much as I believe
 this is the default behavior. BatchSize certainly has a direct effect
 on performance (about 3x time difference between 1 and 1). The
 autocommit is a problem for us however. This leaves transactions
 active in the db which may block other processes.

 We have about 5.1 million documents in the index which is about 2.2 
 gigabytes.

 A full index is a rare operation with us but when we need it we also
 need it to work (thank you captain obvious).

 With the settings above a full index takes 15 minutes. We anticipate
 we will be handling at least 10x the amount of data in the future. I
 actually hope to have solr 4 by then but I can't sell a product which
 isn't finalized yet here.


 Thanks for any insight you can give.

 mvg,
 Jasper



Suddenly OOM

2012-05-10 Thread Jasper Floor
Hi all,

we've been running Solr 1.4 for about a year with no real problems. As
of monday it became impossible to do a full import on our master
because of an OOM. Now what I think is strange is that even after we
more than doubled the available memory there would still always be an
OOM.  We seem to have reached a magic number of documents beyond which
Solr requires infinite memory (or at least more than 2.5x what it
previously needed which is the same as infinite unless we invest in
more resources).

We have solved the immediate problem by changing autocommit=false,
holdability=CLOSE_CURSORS_AT_COMMIT, batchSize=1. Now
holdability in this case I don't think does very much as I believe
this is the default behavior. BatchSize certainly has a direct effect
on performance (about 3x time difference between 1 and 1). The
autocommit is a problem for us however. This leaves transactions
active in the db which may block other processes.

We have about 5.1 million documents in the index which is about 2.2 gigabytes.

A full index is a rare operation with us but when we need it we also
need it to work (thank you captain obvious).

With the settings above a full index takes 15 minutes. We anticipate
we will be handling at least 10x the amount of data in the future. I
actually hope to have solr 4 by then but I can't sell a product which
isn't finalized yet here.


Thanks for any insight you can give.

mvg,
Jasper


slave index not cleaned

2012-05-10 Thread Jasper Floor
Perhaps I am missing the obvious but our slaves tend to run out of
disk space. The index sizes grow to multiple times the size of the
master. So I just toss all the data and trigger a replication.
However, can't solr handle this for me?

I'm sorry if I've missed a simple setting which does this for me, but
if its there then I have missed it.

mvg
Jasper


Re: Solr for routing a webapp

2012-05-03 Thread Jasper Floor
Why not pass the parameters using ?parameter1=value1parameter2=value2 ?


mvg,
Jasper

On Thu, Apr 26, 2012 at 9:03 PM, Paul Libbrecht p...@hoplahup.net wrote:
 Or write your own query component mapping /solr/* in the web.xml, exposing 
 the request by a thread-local through a filter, and reading this setting the 
 appropriate query parameters...

 Performance-wise, this seems quite reasonable I think.

 paul


 Le 26 avr. 2012 à 16:58, Paul Libbrecht a écrit :

 Have you tried using mod_rewrite for this?

 paul


 Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit :

 Hello,

 I'm thinking about using a Solr index for routing a webapp.

 I have pregenerated base urls in my index. E.g.
 /foo/bar1
 /foo/bar2
 /foo/bar3
 /foo/bar4
 /bar/foo1
 /bar/foo2
 /bar/foo3

 I try to find a way to match /foo/bar3/parameter1/value1/parameter2/value2 
 without knowing that parameter and value are not part of the base url. In 
 fact I need the best hit from the beginng.
 Is that possible and are there any performance issues?

 I hope my problem is understandable!

 Thanks in advance and best regards,
 Bjoern