Re: DIH idle in transaction forever
Btw, I removed the batchSize but performance is better with batchSize=1. I haven't done further testing to see what the best setting is, but the difference between setting it at 1 and not setting it is almost double the indexing time (~20 minutes vs ~37 minutes) On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor jasper.fl...@m4n.nl wrote: Actually, the readOnly=true makes things worse. What it does (among other things) is: c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED); which leads to: Caused by: org.postgresql.util.PSQLException: Cannot change transaction isolation level in the middle of a transaction. because the connection is idle in transaction. I found this issue: https://issues.apache.org/jira/browse/SOLR-2045 Patching DIH with the code they suggest seems to work. mvg, Jasper On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote: Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
DIH idle in transaction forever
Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
Re: DIH idle in transaction forever
Actually, the readOnly=true makes things worse. What it does (among other things) is: c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED); which leads to: Caused by: org.postgresql.util.PSQLException: Cannot change transaction isolation level in the middle of a transaction. because the connection is idle in transaction. I found this issue: https://issues.apache.org/jira/browse/SOLR-2045 Patching DIH with the code they suggest seems to work. mvg, Jasper On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James james.d...@ingrambook.com wrote: Try readOnly=true in the dataSource configuration. This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource) Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal. This might be better than 1. There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects. I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false. Not sure abour PostgreSql. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Jasper Floor [mailto:jasper.fl...@m4n.nl] Sent: Thursday, June 14, 2012 8:21 AM To: solr-user@lucene.apache.org Subject: DIH idle in transaction forever Hi all, It seems that DIH always holds two connections open to the database. One of them is almost always 'idle in transaction'. It may sometimes seem to do a little work but then it goes idle again. datasource definition: dataSource name=df-stream-store-ds jndiName=java:ext_solr_datafeeds_dba type=JdbcDataSource autoCommit=false batchSize=1 / We have a datasource defined in the jndi: no-tx-datasource jndi-nameext_solr_datafeeds_dba/jndi-name security-domainext_solr_datafeeds_dba_realm/security-domain connection-urljdbc:postgresql://db1.live.mbuyu.nl/datafeeds/connection-url min-pool-size0/min-pool-size max-pool-size5/max-pool-size transaction-isolationTRANSACTION_READ_COMMITTED/transaction-isolation driver-classorg.postgresql.Driver/driver-class blocking-timeout-millis3/blocking-timeout-millis idle-timeout-minutes5/idle-timeout-minutes new-connection-sqlSELECT 1/new-connection-sql check-valid-connection-sqlSELECT 1/check-valid-connection-sql /no-tx-datasource If we set autocommit to true then we get an OOM on indexing so that is not an option. Does anyone have any idea why this happens? I would guess that DIH doesn't close the connection, but reading the code I can't be sure of this. The ResultSet object should close itself once it reaches the end. mvg, JAsper
Re: slave index not cleaned
The slave index does indeed grow over a period of time regardless of restarts. We do run on 1.4 however. We will be updating to 3.6 very soon however so I will see how that works out. Actually we should be able to see this on our staging platform. thanks everyone. mvg, Jasper On Mon, May 14, 2012 at 4:40 PM, Bill Bell billnb...@gmail.com wrote: This is a known issue in 1.4 especially in Windows. Some of it was resolved in 3x. Bill Bell Sent from mobile On May 14, 2012, at 5:54 AM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, replication will require up to twice the space of the index _temporarily_, just checking if that's what you're seeing But that should go away reasonably soon. Out of curiosity, what happens if you restart your server, do the extra files go away? But it sounds like your index is growing over a longer period of time than just a single replication, is that true? Best Erick On Fri, May 11, 2012 at 6:03 AM, Jasper Floor jasper.fl...@m4n.nl wrote: Hi, On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Jasper, Sorry, I should've added more technical info wihtout being prompted. Solr does handle that for you. Some more stuff to share: * Solr version? 1.4 * JVM version? 1.7 update 2 * OS? Debian (2.6.32-5-xen-amd64) * Java replication? yes * Errors in Solr logs? no * deletion policy section in solrconfig.xml? missing I would say, but I don't see this on the replication wiki page. This is what we have configured for replication: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl${solr.master.url}/df-stream-store/replication/str str name=pollInterval00:20:00/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler We will be updating to 3.6 fairly soon however. To be honest, from what I've read, the Solr cloud is what we really want in the future but we will have to be patient for that. thanks in advance mvg, Jasper You may also want to look at your Index report in SPM (http://sematext.com/spm) before/during/after replication and share what you see. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm - Original Message - From: Jasper Floor jasper.fl...@m4n.nl To: solr-user@lucene.apache.org Cc: Sent: Thursday, May 10, 2012 9:08 AM Subject: slave index not cleaned Perhaps I am missing the obvious but our slaves tend to run out of disk space. The index sizes grow to multiple times the size of the master. So I just toss all the data and trigger a replication. However, can't solr handle this for me? I'm sorry if I've missed a simple setting which does this for me, but if its there then I have missed it. mvg Jasper
Re: slave index not cleaned
Btw, confirmed that this doesn't happen on our development stage with 3.6. On Wed, May 16, 2012 at 3:59 PM, Jasper Floor jasper.fl...@m4n.nl wrote: The slave index does indeed grow over a period of time regardless of restarts. We do run on 1.4 however. We will be updating to 3.6 very soon however so I will see how that works out. Actually we should be able to see this on our staging platform. thanks everyone. mvg, Jasper On Mon, May 14, 2012 at 4:40 PM, Bill Bell billnb...@gmail.com wrote: This is a known issue in 1.4 especially in Windows. Some of it was resolved in 3x. Bill Bell Sent from mobile On May 14, 2012, at 5:54 AM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, replication will require up to twice the space of the index _temporarily_, just checking if that's what you're seeing But that should go away reasonably soon. Out of curiosity, what happens if you restart your server, do the extra files go away? But it sounds like your index is growing over a longer period of time than just a single replication, is that true? Best Erick On Fri, May 11, 2012 at 6:03 AM, Jasper Floor jasper.fl...@m4n.nl wrote: Hi, On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Jasper, Sorry, I should've added more technical info wihtout being prompted. Solr does handle that for you. Some more stuff to share: * Solr version? 1.4 * JVM version? 1.7 update 2 * OS? Debian (2.6.32-5-xen-amd64) * Java replication? yes * Errors in Solr logs? no * deletion policy section in solrconfig.xml? missing I would say, but I don't see this on the replication wiki page. This is what we have configured for replication: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl${solr.master.url}/df-stream-store/replication/str str name=pollInterval00:20:00/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler We will be updating to 3.6 fairly soon however. To be honest, from what I've read, the Solr cloud is what we really want in the future but we will have to be patient for that. thanks in advance mvg, Jasper You may also want to look at your Index report in SPM (http://sematext.com/spm) before/during/after replication and share what you see. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm - Original Message - From: Jasper Floor jasper.fl...@m4n.nl To: solr-user@lucene.apache.org Cc: Sent: Thursday, May 10, 2012 9:08 AM Subject: slave index not cleaned Perhaps I am missing the obvious but our slaves tend to run out of disk space. The index sizes grow to multiple times the size of the master. So I just toss all the data and trigger a replication. However, can't solr handle this for me? I'm sorry if I've missed a simple setting which does this for me, but if its there then I have missed it. mvg Jasper
Re: slave index not cleaned
Hi, On Thu, May 10, 2012 at 5:59 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Jasper, Sorry, I should've added more technical info wihtout being prompted. Solr does handle that for you. Some more stuff to share: * Solr version? 1.4 * JVM version? 1.7 update 2 * OS? Debian (2.6.32-5-xen-amd64) * Java replication? yes * Errors in Solr logs? no * deletion policy section in solrconfig.xml? missing I would say, but I don't see this on the replication wiki page. This is what we have configured for replication: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrl${solr.master.url}/df-stream-store/replication/str str name=pollInterval00:20:00/str str name=compressioninternal/str str name=httpConnTimeout5000/str str name=httpReadTimeout1/str /lst /requestHandler We will be updating to 3.6 fairly soon however. To be honest, from what I've read, the Solr cloud is what we really want in the future but we will have to be patient for that. thanks in advance mvg, Jasper You may also want to look at your Index report in SPM (http://sematext.com/spm) before/during/after replication and share what you see. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm - Original Message - From: Jasper Floor jasper.fl...@m4n.nl To: solr-user@lucene.apache.org Cc: Sent: Thursday, May 10, 2012 9:08 AM Subject: slave index not cleaned Perhaps I am missing the obvious but our slaves tend to run out of disk space. The index sizes grow to multiple times the size of the master. So I just toss all the data and trigger a replication. However, can't solr handle this for me? I'm sorry if I've missed a simple setting which does this for me, but if its there then I have missed it. mvg Jasper
Re: Suddenly OOM
Outr rambuffer is the default. the Xmx is 75% of the available memory on the machine which is 4GB. We've tried increasing it to 85% and even gave the machine 10GB of memory. So we more than doubled the memory. The amount of data wasn't double but where it used to be enough now it seems to never be enough. mvg, Jasper On Thu, May 10, 2012 at 6:03 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Jasper, The simple answer is to increase -Xmx :) What is your ramBufferSizeMB (solrconfig.xml) set to? Default is 32 (MB). That autocommit you mentioned is a DB commit? Not Solr one, right? If so, why is commit needed when you *read* data from DB? Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm - Original Message - From: Jasper Floor jasper.fl...@m4n.nl To: solr-user@lucene.apache.org Cc: Sent: Thursday, May 10, 2012 9:06 AM Subject: Suddenly OOM Hi all, we've been running Solr 1.4 for about a year with no real problems. As of monday it became impossible to do a full import on our master because of an OOM. Now what I think is strange is that even after we more than doubled the available memory there would still always be an OOM. We seem to have reached a magic number of documents beyond which Solr requires infinite memory (or at least more than 2.5x what it previously needed which is the same as infinite unless we invest in more resources). We have solved the immediate problem by changing autocommit=false, holdability=CLOSE_CURSORS_AT_COMMIT, batchSize=1. Now holdability in this case I don't think does very much as I believe this is the default behavior. BatchSize certainly has a direct effect on performance (about 3x time difference between 1 and 1). The autocommit is a problem for us however. This leaves transactions active in the db which may block other processes. We have about 5.1 million documents in the index which is about 2.2 gigabytes. A full index is a rare operation with us but when we need it we also need it to work (thank you captain obvious). With the settings above a full index takes 15 minutes. We anticipate we will be handling at least 10x the amount of data in the future. I actually hope to have solr 4 by then but I can't sell a product which isn't finalized yet here. Thanks for any insight you can give. mvg, Jasper
Suddenly OOM
Hi all, we've been running Solr 1.4 for about a year with no real problems. As of monday it became impossible to do a full import on our master because of an OOM. Now what I think is strange is that even after we more than doubled the available memory there would still always be an OOM. We seem to have reached a magic number of documents beyond which Solr requires infinite memory (or at least more than 2.5x what it previously needed which is the same as infinite unless we invest in more resources). We have solved the immediate problem by changing autocommit=false, holdability=CLOSE_CURSORS_AT_COMMIT, batchSize=1. Now holdability in this case I don't think does very much as I believe this is the default behavior. BatchSize certainly has a direct effect on performance (about 3x time difference between 1 and 1). The autocommit is a problem for us however. This leaves transactions active in the db which may block other processes. We have about 5.1 million documents in the index which is about 2.2 gigabytes. A full index is a rare operation with us but when we need it we also need it to work (thank you captain obvious). With the settings above a full index takes 15 minutes. We anticipate we will be handling at least 10x the amount of data in the future. I actually hope to have solr 4 by then but I can't sell a product which isn't finalized yet here. Thanks for any insight you can give. mvg, Jasper
slave index not cleaned
Perhaps I am missing the obvious but our slaves tend to run out of disk space. The index sizes grow to multiple times the size of the master. So I just toss all the data and trigger a replication. However, can't solr handle this for me? I'm sorry if I've missed a simple setting which does this for me, but if its there then I have missed it. mvg Jasper
Re: Solr for routing a webapp
Why not pass the parameters using ?parameter1=value1parameter2=value2 ? mvg, Jasper On Thu, Apr 26, 2012 at 9:03 PM, Paul Libbrecht p...@hoplahup.net wrote: Or write your own query component mapping /solr/* in the web.xml, exposing the request by a thread-local through a filter, and reading this setting the appropriate query parameters... Performance-wise, this seems quite reasonable I think. paul Le 26 avr. 2012 à 16:58, Paul Libbrecht a écrit : Have you tried using mod_rewrite for this? paul Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit : Hello, I'm thinking about using a Solr index for routing a webapp. I have pregenerated base urls in my index. E.g. /foo/bar1 /foo/bar2 /foo/bar3 /foo/bar4 /bar/foo1 /bar/foo2 /bar/foo3 I try to find a way to match /foo/bar3/parameter1/value1/parameter2/value2 without knowing that parameter and value are not part of the base url. In fact I need the best hit from the beginng. Is that possible and are there any performance issues? I hope my problem is understandable! Thanks in advance and best regards, Bjoern