Re: DataImportHandler using new connection on each query

2011-09-30 Thread Chris Hostetter

:  Noble? Shalin?  what's the point of throwing away a connection that's been
:  in use for more then 10 seconds?

: Hoss, as others have noted, DIH throws away connections which have been idle
: for more than the timeout value (10 seconds). The jdbc standard way of
: checking for a valid connection is not implemented or incorrectly
: implemented by many drivers. So, either you can execute a query and get an
: exception and try to determine if the exception was a case of an invalid
: connection (which again is sometimes different from driver to driver) or
: take the easy way out and throw away connections idle for more than 10
: seconds, which is what we went for.

Hmmm...

a) at a minimum this seems like it should be a config option -- why punish 
people using good jdbc drivers?

b) you keep refering to this time out in relation to connections being 
*idle* longer then 10 seconds, but unless i'm missing something that's not 
what it's doing at all.  

The only time connLastUsed is assigned to is when getConnection() is 
called - so even if a connection has only been idle for 1 pico-second, it 
will still be closed/reopened if the total amount of time it was used 
before being idle was more then 1 second -- that was the scenerio 
described in the first message of this thread...

second 000: app starts
second 006: ResultSetIterator constructed on queryA
second 007: getConnection() called, conn initalized, connLastUsed = 007
   ... conn in use for a while while iterating over results...
second 099: done iterating over ResultSetIterator
second 100: ResultSetIterator constructed on queryB
second 101: getConnection() called again...

...at second #101, that connection has really only been idle for 2 
seconds, but connLastUsed hasn't been updated for 94 seconds, so it 
forces a new connection for no reason.

If the goal is to track how long the connection has been idle, shouldn't 
every method in ResultSetIterator update connLastUsed ?





-Hoss


Re: DataImportHandler using new connection on each query

2011-09-23 Thread Shalin Shekhar Mangar
On Sat, Sep 3, 2011 at 1:29 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : I am not sure if current version has this, but  DIH used to reload
 : connections after some idle time
 :
 : if (currTime - connLastUsed  CONN_TIME_OUT) {
 :   synchronized (this) {
 :   Connection tmpConn = factory.call();
 :   closeConnection();
 :   connLastUsed = System.currentTimeMillis();
 :   return conn = tmpConn;
 :   }
 :
 :
 : Where CONN_TIME_OUT = 10 seconds

 ...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly
 evidently) that CONN was connect as it a timeout on creating a
 connection, not a timeout on how long DIH is willing ot use a perfectly
 good connection.

 I honestly can't make heads or tails of why that code would exist.

 Noble? Shalin?  what's the point of throwing away a connection that's been
 in use for more then 10 seconds?


Hoss, as others have noted, DIH throws away connections which have been idle
for more than the timeout value (10 seconds). The jdbc standard way of
checking for a valid connection is not implemented or incorrectly
implemented by many drivers. So, either you can execute a query and get an
exception and try to determine if the exception was a case of an invalid
connection (which again is sometimes different from driver to driver) or
take the easy way out and throw away connections idle for more than 10
seconds, which is what we went for.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
I am not sure if current version has this, but  DIH used to reload
connections after some idle time

if (currTime - connLastUsed  CONN_TIME_OUT) {
synchronized (this) {
Connection tmpConn = factory.call();
closeConnection();
connLastUsed = System.currentTimeMillis();
return conn = tmpConn;
}


Where CONN_TIME_OUT = 10 seconds



On Fri, Sep 2, 2011 at 12:36 AM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : However, I tested this against a slower SQL Server and I saw
 : dramatically worse results. Instead of re-using their database, each of
 : the sub-entities is recreating a connection each time the query runs.

 are you seeing any specific errors logged before these new connections are
 created?

 I don't *think* there's anything in the DIH JDBC/SQL code that causes it
 to timeout existing connections -- is it possible this is sometihng
 specific to the JDBC Driver you are using?

 Or maybe you are using the DIH threads option along with a JNDI/JDBC
 based pool of connections that is configured to create new Connections on
 demand, and with the fast DB it can reuse them but on the slow DB it does
 enough stuff in parallel to keep asking for new connections to be created?


 If it's DIH creating new connections over and over then i'm pretty sure
 you should see an INFO level log message like this for each connection...

        LOG.info(Creating a connection for entity 
                + context.getEntityAttribute(DataImporter.NAME) +  with URL: 
                + url);

 ...are those messages different against you fast DB and your slow DB?

 -Hoss



Re: DataImportHandler using new connection on each query

2011-09-02 Thread Chris Hostetter

: I am not sure if current version has this, but  DIH used to reload
: connections after some idle time
: 
: if (currTime - connLastUsed  CONN_TIME_OUT) {
:   synchronized (this) {
:   Connection tmpConn = factory.call();
:   closeConnection();
:   connLastUsed = System.currentTimeMillis();
:   return conn = tmpConn;
:   }
: 
: 
: Where CONN_TIME_OUT = 10 seconds

...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly 
evidently) that CONN was connect as it a timeout on creating a 
connection, not a timeout on how long DIH is willing ot use a perfectly 
good connection.

I honestly can't make heads or tails of why that code would exist.

Noble? Shalin?  what's the point of throwing away a connection that's been 
in use for more then 10 seconds?



-Hoss


Re: DataImportHandler using new connection on each query

2011-09-02 Thread Shawn Heisey

On 9/2/2011 1:59 PM, Chris Hostetter wrote:

: I am not sure if current version has this, but  DIH used to reload
: connections after some idle time
:
: if (currTime - connLastUsed  CONN_TIME_OUT) {
:   synchronized (this) {
:   Connection tmpConn = factory.call();
:   closeConnection();
:   connLastUsed = System.currentTimeMillis();
:   return conn = tmpConn;
:   }
:
:
: Where CONN_TIME_OUT = 10 seconds

...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly
evidently) that CONN was connect as it a timeout on creating a
connection, not a timeout on how long DIH is willing ot use a perfectly
good connection.

I honestly can't make heads or tails of why that code would exist.

Noble? Shalin?  what's the point of throwing away a connection that's been
in use for more then 10 seconds?


I use DIH with MySQL.  When things are going well, a full rebuild will 
leave connections open and active for over two hours.  This is the case 
with 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the 
database server, last night I had a rebuild going for more than 11 hours 
with no problems, verified from the processlist on the server.


Thanks,
Shawn



Re: DataImportHandler using new connection on each query

2011-09-02 Thread Gora Mohanty
On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey s...@elyograg.org wrote:
[...]
 I use DIH with MySQL.  When things are going well, a full rebuild will leave
 connections open and active for over two hours.  This is the case with
 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database
 server, last night I had a rebuild going for more than 11 hours with no
 problems, verified from the processlist on the server.

Will second that. Have had DIH connections open to both
mysql, and MS-SQL for 8-10h. Dropped connections could
be traced to network issues, or some other exception.

Regards,
Gora


Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
take care, running 10 hours != idling 10 seconds and trying again.
Those are different cases.

It is not dropping *used* connections (good to know it works that
good, thanks for reporting!), just not reusing connections more than
10 seconds idle



On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty g...@mimirtech.com wrote:
 On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey s...@elyograg.org wrote:
 [...]
 I use DIH with MySQL.  When things are going well, a full rebuild will leave
 connections open and active for over two hours.  This is the case with
 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database
 server, last night I had a rebuild going for more than 11 hours with no
 problems, verified from the processlist on the server.

 Will second that. Have had DIH connections open to both
 mysql, and MS-SQL for 8-10h. Dropped connections could
 be traced to network issues, or some other exception.

 Regards,
 Gora



Re: DataImportHandler using new connection on each query

2011-09-02 Thread eks dev
watch out, running 10 hours != idling 10 seconds and trying again.
Those are different cases.

It is not dropping *used* connections (good to know it works that
good, thanks for reporting!), just not reusing connections more than
10 seconds idle



On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty g...@mimirtech.com wrote:
 On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey s...@elyograg.org wrote:
 [...]
 I use DIH with MySQL.  When things are going well, a full rebuild will leave
 connections open and active for over two hours.  This is the case with
 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database
 server, last night I had a rebuild going for more than 11 hours with no
 problems, verified from the processlist on the server.

 Will second that. Have had DIH connections open to both
 mysql, and MS-SQL for 8-10h. Dropped connections could
 be traced to network issues, or some other exception.

 Regards,
 Gora



Re: DataImportHandler using new connection on each query

2011-09-01 Thread Chris Hostetter

: However, I tested this against a slower SQL Server and I saw 
: dramatically worse results. Instead of re-using their database, each of 
: the sub-entities is recreating a connection each time the query runs. 

are you seeing any specific errors logged before these new connections are 
created?

I don't *think* there's anything in the DIH JDBC/SQL code that causes it 
to timeout existing connections -- is it possible this is sometihng 
specific to the JDBC Driver you are using?  

Or maybe you are using the DIH threads option along with a JNDI/JDBC 
based pool of connections that is configured to create new Connections on 
demand, and with the fast DB it can reuse them but on the slow DB it does 
enough stuff in parallel to keep asking for new connections to be created?


If it's DIH creating new connections over and over then i'm pretty sure 
you should see an INFO level log message like this for each connection...

LOG.info(Creating a connection for entity 
+ context.getEntityAttribute(DataImporter.NAME) +  with URL: 
+ url);

...are those messages different against you fast DB and your slow DB?

-Hoss


DataImportHandler using new connection on each query

2011-08-18 Thread Kevin Osborn
I have a data import handler that is importing data in full mode from SQL 
Server. It has one main entity and three sub-entities. Against a good database, 
it appears to open 4 connections total. One for the main query and the other 3 
subqueries just re-use their connections. This works well enough.

However, I tested this against a slower SQL Server and I saw dramatically worse 
results. Instead of re-using their database, each of the sub-entities is 
recreating a connection each time the query runs. So, this is resulting in 
terrible performance. My guess is that it is some sort of timeout. The 
dataimporthandler is interpreting the slow connection as a dead connection, and 
re-creating the db connection. However, it is a slow connection (and does 
return data), but it is not a dead connection.

I tried to apply the SOLR-2233 patch to Solr 1.4.1, but that did not seem to 
have much of an effect.