Re: DataImportHandler using new connection on each query
: Noble? Shalin? what's the point of throwing away a connection that's been : in use for more then 10 seconds? : Hoss, as others have noted, DIH throws away connections which have been idle : for more than the timeout value (10 seconds). The jdbc standard way of : checking for a valid connection is not implemented or incorrectly : implemented by many drivers. So, either you can execute a query and get an : exception and try to determine if the exception was a case of an invalid : connection (which again is sometimes different from driver to driver) or : take the easy way out and throw away connections idle for more than 10 : seconds, which is what we went for. Hmmm... a) at a minimum this seems like it should be a config option -- why punish people using good jdbc drivers? b) you keep refering to this time out in relation to connections being *idle* longer then 10 seconds, but unless i'm missing something that's not what it's doing at all. The only time connLastUsed is assigned to is when getConnection() is called - so even if a connection has only been idle for 1 pico-second, it will still be closed/reopened if the total amount of time it was used before being idle was more then 1 second -- that was the scenerio described in the first message of this thread... second 000: app starts second 006: ResultSetIterator constructed on queryA second 007: getConnection() called, conn initalized, connLastUsed = 007 ... conn in use for a while while iterating over results... second 099: done iterating over ResultSetIterator second 100: ResultSetIterator constructed on queryB second 101: getConnection() called again... ...at second #101, that connection has really only been idle for 2 seconds, but connLastUsed hasn't been updated for 94 seconds, so it forces a new connection for no reason. If the goal is to track how long the connection has been idle, shouldn't every method in ResultSetIterator update connLastUsed ? -Hoss
Re: DataImportHandler using new connection on each query
On Sat, Sep 3, 2011 at 1:29 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am not sure if current version has this, but DIH used to reload : connections after some idle time : : if (currTime - connLastUsed CONN_TIME_OUT) { : synchronized (this) { : Connection tmpConn = factory.call(); : closeConnection(); : connLastUsed = System.currentTimeMillis(); : return conn = tmpConn; : } : : : Where CONN_TIME_OUT = 10 seconds ...oh wow. i saw the CONN_TIME_OUT constant but i thought (foolishly evidently) that CONN was connect as it a timeout on creating a connection, not a timeout on how long DIH is willing ot use a perfectly good connection. I honestly can't make heads or tails of why that code would exist. Noble? Shalin? what's the point of throwing away a connection that's been in use for more then 10 seconds? Hoss, as others have noted, DIH throws away connections which have been idle for more than the timeout value (10 seconds). The jdbc standard way of checking for a valid connection is not implemented or incorrectly implemented by many drivers. So, either you can execute a query and get an exception and try to determine if the exception was a case of an invalid connection (which again is sometimes different from driver to driver) or take the easy way out and throw away connections idle for more than 10 seconds, which is what we went for. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler using new connection on each query
I am not sure if current version has this, but DIH used to reload connections after some idle time if (currTime - connLastUsed CONN_TIME_OUT) { synchronized (this) { Connection tmpConn = factory.call(); closeConnection(); connLastUsed = System.currentTimeMillis(); return conn = tmpConn; } Where CONN_TIME_OUT = 10 seconds On Fri, Sep 2, 2011 at 12:36 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : However, I tested this against a slower SQL Server and I saw : dramatically worse results. Instead of re-using their database, each of : the sub-entities is recreating a connection each time the query runs. are you seeing any specific errors logged before these new connections are created? I don't *think* there's anything in the DIH JDBC/SQL code that causes it to timeout existing connections -- is it possible this is sometihng specific to the JDBC Driver you are using? Or maybe you are using the DIH threads option along with a JNDI/JDBC based pool of connections that is configured to create new Connections on demand, and with the fast DB it can reuse them but on the slow DB it does enough stuff in parallel to keep asking for new connections to be created? If it's DIH creating new connections over and over then i'm pretty sure you should see an INFO level log message like this for each connection... LOG.info(Creating a connection for entity + context.getEntityAttribute(DataImporter.NAME) + with URL: + url); ...are those messages different against you fast DB and your slow DB? -Hoss
Re: DataImportHandler using new connection on each query
: I am not sure if current version has this, but DIH used to reload : connections after some idle time : : if (currTime - connLastUsed CONN_TIME_OUT) { : synchronized (this) { : Connection tmpConn = factory.call(); : closeConnection(); : connLastUsed = System.currentTimeMillis(); : return conn = tmpConn; : } : : : Where CONN_TIME_OUT = 10 seconds ...oh wow. i saw the CONN_TIME_OUT constant but i thought (foolishly evidently) that CONN was connect as it a timeout on creating a connection, not a timeout on how long DIH is willing ot use a perfectly good connection. I honestly can't make heads or tails of why that code would exist. Noble? Shalin? what's the point of throwing away a connection that's been in use for more then 10 seconds? -Hoss
Re: DataImportHandler using new connection on each query
On 9/2/2011 1:59 PM, Chris Hostetter wrote: : I am not sure if current version has this, but DIH used to reload : connections after some idle time : : if (currTime - connLastUsed CONN_TIME_OUT) { : synchronized (this) { : Connection tmpConn = factory.call(); : closeConnection(); : connLastUsed = System.currentTimeMillis(); : return conn = tmpConn; : } : : : Where CONN_TIME_OUT = 10 seconds ...oh wow. i saw the CONN_TIME_OUT constant but i thought (foolishly evidently) that CONN was connect as it a timeout on creating a connection, not a timeout on how long DIH is willing ot use a perfectly good connection. I honestly can't make heads or tails of why that code would exist. Noble? Shalin? what's the point of throwing away a connection that's been in use for more then 10 seconds? I use DIH with MySQL. When things are going well, a full rebuild will leave connections open and active for over two hours. This is the case with 1.4.0, 1.4.1, 3.1.0, and 3.2.0. Due to some kind of problem on the database server, last night I had a rebuild going for more than 11 hours with no problems, verified from the processlist on the server. Thanks, Shawn
Re: DataImportHandler using new connection on each query
On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey s...@elyograg.org wrote: [...] I use DIH with MySQL. When things are going well, a full rebuild will leave connections open and active for over two hours. This is the case with 1.4.0, 1.4.1, 3.1.0, and 3.2.0. Due to some kind of problem on the database server, last night I had a rebuild going for more than 11 hours with no problems, verified from the processlist on the server. Will second that. Have had DIH connections open to both mysql, and MS-SQL for 8-10h. Dropped connections could be traced to network issues, or some other exception. Regards, Gora
Re: DataImportHandler using new connection on each query
take care, running 10 hours != idling 10 seconds and trying again. Those are different cases. It is not dropping *used* connections (good to know it works that good, thanks for reporting!), just not reusing connections more than 10 seconds idle On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey s...@elyograg.org wrote: [...] I use DIH with MySQL. When things are going well, a full rebuild will leave connections open and active for over two hours. This is the case with 1.4.0, 1.4.1, 3.1.0, and 3.2.0. Due to some kind of problem on the database server, last night I had a rebuild going for more than 11 hours with no problems, verified from the processlist on the server. Will second that. Have had DIH connections open to both mysql, and MS-SQL for 8-10h. Dropped connections could be traced to network issues, or some other exception. Regards, Gora
Re: DataImportHandler using new connection on each query
watch out, running 10 hours != idling 10 seconds and trying again. Those are different cases. It is not dropping *used* connections (good to know it works that good, thanks for reporting!), just not reusing connections more than 10 seconds idle On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey s...@elyograg.org wrote: [...] I use DIH with MySQL. When things are going well, a full rebuild will leave connections open and active for over two hours. This is the case with 1.4.0, 1.4.1, 3.1.0, and 3.2.0. Due to some kind of problem on the database server, last night I had a rebuild going for more than 11 hours with no problems, verified from the processlist on the server. Will second that. Have had DIH connections open to both mysql, and MS-SQL for 8-10h. Dropped connections could be traced to network issues, or some other exception. Regards, Gora
Re: DataImportHandler using new connection on each query
: However, I tested this against a slower SQL Server and I saw : dramatically worse results. Instead of re-using their database, each of : the sub-entities is recreating a connection each time the query runs. are you seeing any specific errors logged before these new connections are created? I don't *think* there's anything in the DIH JDBC/SQL code that causes it to timeout existing connections -- is it possible this is sometihng specific to the JDBC Driver you are using? Or maybe you are using the DIH threads option along with a JNDI/JDBC based pool of connections that is configured to create new Connections on demand, and with the fast DB it can reuse them but on the slow DB it does enough stuff in parallel to keep asking for new connections to be created? If it's DIH creating new connections over and over then i'm pretty sure you should see an INFO level log message like this for each connection... LOG.info(Creating a connection for entity + context.getEntityAttribute(DataImporter.NAME) + with URL: + url); ...are those messages different against you fast DB and your slow DB? -Hoss
DataImportHandler using new connection on each query
I have a data import handler that is importing data in full mode from SQL Server. It has one main entity and three sub-entities. Against a good database, it appears to open 4 connections total. One for the main query and the other 3 subqueries just re-use their connections. This works well enough. However, I tested this against a slower SQL Server and I saw dramatically worse results. Instead of re-using their database, each of the sub-entities is recreating a connection each time the query runs. So, this is resulting in terrible performance. My guess is that it is some sort of timeout. The dataimporthandler is interpreting the slow connection as a dead connection, and re-creating the db connection. However, it is a slow connection (and does return data), but it is not a dead connection. I tried to apply the SOLR-2233 patch to Solr 1.4.1, but that did not seem to have much of an effect.