Re: [JDBC] Re: Proposal to fix Statement.executeBatch()

2001-09-05 Thread Barry Lind

Rene,

This writeup looks good.  As for your two followup questions:

1)  I agree with your opinion on the select issue.  Since selects are 
the only way to call functions and functions could themselves be doing 
inserts/updates, to not allow selects is a bad idea IMHO.

2)  The 8K limitation was removed from the database in 7.0 and from the 
JDBC driver in 7.0.2.  Therefore I don't think we should code for the 
case of a 7.2 JDBC driver needing to support a 6.5 database.

thanks,
--Barry


Rene Pijlman wrote:
 On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote:
 
I do not know what the server does if you have autocommit enabled 
and you issue multiple statements in one try.

 
 As you know, Peter Eisentraut said on hackers that all
 statements in a semicolon-separated query string are processed
 as one single transaction. So, if in S1;S2;S3 S2 fails, both
 S1, S2 and S3 are rolled back and in effect they have all
 failed.
 
 I think this means we can implement your proposal. I've removed
 the paragraph saying that we can't and I've added the following
 to http://lab.applinet.nl/postgresql-jdbc/#Batch
 
 -+-+-
 
 The current implementation of Statement.executeBatch() in the
 JDBC driver does not provide any performance improvement
 compared to processing statements individually. This is because
 the driver executes the statements one-by-one when
 executeBatch() is called, using one round trip per statement. We
 intend to reimplement executeBatch() in the following way.
 
 Statement.executeBatch() will send all statements in a single
 semicolon separated query string, with only one round trip to
 the backend. This will provide a performance improvement, as
 intended by the JDBC specification.
 
 The updateCounts array will be set as described below. Note that
 the JDBC spec defines the meaning of the following special
 values:
 
 -2 the statement was executed successfully but the number of
 affected rows is unknown 
 -3 the statement failed to execute successfully 
 
 If all statements succeed, executeBatch() returns an
 updateCounts array with a row count for each statement in the
 batch, however the value for all but the last statement will be
 -2. The value for the last statement will be a proper update
 count. 
 If a statement fails, executeBatch() throws a
 BatchUpdateException containing an updateCounts array with a row
 count of -3 for each statement in the batch.
 
 Note that the behaviour will be the same when autocommit is
 enabled and when it is disabled. Even with autocommit enabled,
 the backend will commit or rollback all statements in the
 semicolon-separated query string as a unit.
 
 The new implementation of executeBatch() will cause a change in
 behaviour of the driver: the driver will no longer return update
 counts for all statements in a batch like it currently does, it
 will return -2 (unknown) for most statements. However, this
 behaviour is allowed by the JDBC spec and applications should be
 prepared to handle it.
 
 -+-+-
 
 I see two more issues we need to decide on...
 
 1) The JDBC spec requires Statement.executeBatch() to throw a
 BatchUpdateException if any of the statements does not return an
 update count (e.g. is a SELECT). How can we implement this? Do
 we need to parse the statements in the JDBC driver to detect
 SELECT's? It is a matter of interpretation, but it seems OK to
 me to just ignore this and return -2/-3 for SELECT's as well. In
 fact, perhaps we should allow SELECT's for function calls!?
 
 2) The reimplementation may cause the driver to send very long
 statements to the backend. I heard something about an 8K limit.
 In what version of the backend was this limitation removed? I
 guess we should implement the new algorithm conditionally, so
 we'll only send multi-statement query strings to a backend that
 has no statement length limitation.
 
 Regards,
 René Pijlman [EMAIL PROTECTED]
 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [JDBC] Re: Proposal to fix Statement.executeBatch()

2001-09-01 Thread Rene Pijlman

On Mon, 27 Aug 2001 22:57:13 -0700, Barry Lind wrote:
I do not know what the server does if you have autocommit enabled 
and you issue multiple statements in one try.

As you know, Peter Eisentraut said on hackers that all
statements in a semicolon-separated query string are processed
as one single transaction. So, if in S1;S2;S3 S2 fails, both
S1, S2 and S3 are rolled back and in effect they have all
failed.

I think this means we can implement your proposal. I've removed
the paragraph saying that we can't and I've added the following
to http://lab.applinet.nl/postgresql-jdbc/#Batch

-+-+-

The current implementation of Statement.executeBatch() in the
JDBC driver does not provide any performance improvement
compared to processing statements individually. This is because
the driver executes the statements one-by-one when
executeBatch() is called, using one round trip per statement. We
intend to reimplement executeBatch() in the following way.

Statement.executeBatch() will send all statements in a single
semicolon separated query string, with only one round trip to
the backend. This will provide a performance improvement, as
intended by the JDBC specification.

The updateCounts array will be set as described below. Note that
the JDBC spec defines the meaning of the following special
values:

-2 the statement was executed successfully but the number of
affected rows is unknown 
-3 the statement failed to execute successfully 

If all statements succeed, executeBatch() returns an
updateCounts array with a row count for each statement in the
batch, however the value for all but the last statement will be
-2. The value for the last statement will be a proper update
count. 
If a statement fails, executeBatch() throws a
BatchUpdateException containing an updateCounts array with a row
count of -3 for each statement in the batch.

Note that the behaviour will be the same when autocommit is
enabled and when it is disabled. Even with autocommit enabled,
the backend will commit or rollback all statements in the
semicolon-separated query string as a unit.

The new implementation of executeBatch() will cause a change in
behaviour of the driver: the driver will no longer return update
counts for all statements in a batch like it currently does, it
will return -2 (unknown) for most statements. However, this
behaviour is allowed by the JDBC spec and applications should be
prepared to handle it.

-+-+-

I see two more issues we need to decide on...

1) The JDBC spec requires Statement.executeBatch() to throw a
BatchUpdateException if any of the statements does not return an
update count (e.g. is a SELECT). How can we implement this? Do
we need to parse the statements in the JDBC driver to detect
SELECT's? It is a matter of interpretation, but it seems OK to
me to just ignore this and return -2/-3 for SELECT's as well. In
fact, perhaps we should allow SELECT's for function calls!?

2) The reimplementation may cause the driver to send very long
statements to the backend. I heard something about an 8K limit.
In what version of the backend was this limitation removed? I
guess we should implement the new algorithm conditionally, so
we'll only send multi-statement query strings to a backend that
has no statement length limitation.

Regards,
René Pijlman [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [JDBC] Re: Proposal to fix Statement.executeBatch()

2001-08-28 Thread Bruce Momjian


No problem.  Just checking.  Patch will remain in the queue and be
applied.

 Bruce,
 
 I think the existing patch can be applied as is.  The issues I raised 
 below are further improvements in the functionality that can be done and 
 don't directly relate to the patch that was submitted.  Sorry if I 
 confused things.
 
 --Barry
 
 Bruce Momjian wrote:
  Can someone suggest what is to be done with the propsed patch?
  
  
   What exactly is the behaviour of the backend in that scenario?
   Does it commit every separate SQL statement in the
   semicolon-separated list, or does it commit the list as a whole?
   Does it abort processing the statement list when an error occurs
   in one statement? And if it continues, does it return an error
   when only one statement in the middle of the list had an error?
 
 I do not know what the server does if you have autocommit enabled and 
 you issue multiple statements in one try.  However, I would be OK with 
 the driver issuing the statements one by one with autocommit on.  If you 
 are running in this mode you just wouldn't get any performance improvement.
 
   However, it would mean a change in behaviour of the driver that
   may break existing JDBC applications: the driver will no longer
   return update counts for all statements in a batch like it
   currently does, it will return unknown for most statements.
   I'm not sure if the performance improvement justifies this
   non-backwardly-compatible change, though I agree this is the
   intention of the feature. What do you think?
 
 I wouldn't worry about this 'change in behavior' because if the caller 
 is JDBC complient it should be coded to handle the new behavior as it is 
 complient with the spec.
 
 thanks,
 --Barry
 
 
 
 
 Rene Pijlman wrote:
 
 On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
 [executeBatch() implemented as one round trip]
 
 
 Here is how I would suggest this be done in a way that is spec 
 compliant (Note: that I haven't looked at the patch you submited yet, so 
 forgive me if you have already done it this way, but based on your 
 comments in this email, my guess is that you have not).
 
 
 Indeed, I have not implemented this.
 
 
 
 Statements should be batched together in a single statement with 
 semicolons separating the individual statements (this will allow the 
 backend to process them all in one round trip).
 
 The result array should return an element with the row count for each 
 statement, however the value for all but the last statement will be 
 '-2'.  (-2 is defined by the spec to mean the statement was processed 
 successfully but the number of affected rows is unknown).
 
 
 Ah, I see. I hadn't thought of that solution.
 
 
 
 In the event of an error, then the driver should return an array the 
 size of the submitted batch with values of -3 for all elements. -3 is 
 defined by the spec as the corresponding statement failed to execute 
 successfully, or for statements that could not be processed for some 
 reason.  Since in postgres when one statement fails (in non-autocommit 
 mode), the entire transaction is aborted this is consistent with a 
 return value of -3 in my reading of the spec.
 
 
 Not quite. A statement in a batch may also fail because its a
 succesful SELECT as far as the server is concerned (can't have
 select's in a batch). But that situation can also be handled
 correctly by setting the update count for that particular
 statement to -3. Its then up to the application to decide if it
 wants to rollback, I would say.
 
 But what to do when an error occurs with autocommit enabled?
 This is not recommended, but allowed by the spec, if I
 understand it correctly.
 
 What exactly is the behaviour of the backend in that scenario?
 Does it commit every separate SQL statement in the
 semicolon-separated list, or does it commit the list as a whole?
 Does it abort processing the statement list when an error occurs
 in one statement? And if it continues, does it return an error
 when only one statement in the middle of the list had an error?
 
 
 
 I believe this approach makes the most sense because:
 1) It implements batches in one round trip (the intention of the feature)
 2) It is complient with the standard
 3) It is complient with the current functionality of the backend
 
 
 If we can come up with an acceptable solution for an error with
 autocommit enabled, I agree. Otherwise, I'm not sure.
 
 However, it would mean a change in behaviour of the driver that
 may break existing JDBC applications: the driver will no longer
 return update counts for all statements in a batch like it
 currently does, it will return unknown for most statements.
 I'm not sure if the performance improvement justifies this
 non-backwardly-compatible change, though I agree this is the
 intention of the feature. What do you think?
 
 Regards,
 Ren? Pijlman
 
 
 
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, 

[JDBC] Re: Proposal to fix Statement.executeBatch()

2001-08-27 Thread Barry Lind

Rene,

I see your statements below as incorrect:

  The intended behaviour is to send a set of update/insert/delete/DDL
  statements in one round trip to the database. Unfortunately,
  this optional JDBC feature cannot be implemented correctly with
  PostgreSQL, since the backend only returns the update count of
  the last statement send in one call with multiple statements.
  JDBC requires it to return an array with the update counts of
  all statements in the batch.

The intended behaviour is certainly to send all of the statements in one 
round trip.  And the JDBC2.1 spec certainly allows postgres to do just 
that.  Here is how I would suggest this be done in a way that is spec 
compliant (Note: that I haven't looked at the patch you submited yet, so 
forgive me if you have already done it this way, but based on your 
comments in this email, my guess is that you have not).


Statements should be batched together in a single statement with 
semicolons separating the individual statements (this will allow the 
backend to process them all in one round trip).

The result array should return an element with the row count for each 
statement, however the value for all but the last statement will be 
'-2'.  (-2 is defined by the spec to mean the statement was processed 
successfully but the number of affected rows is unknown).

In the event of an error, then the driver should return an array the 
size of the submitted batch with values of -3 for all elements. -3 is 
defined by the spec as the corresponding statement failed to execute 
successfully, or for statements that could not be processed for some 
reason.  Since in postgres when one statement fails (in non-autocommit 
mode), the entire transaction is aborted this is consistent with a 
return value of -3 in my reading of the spec.

I believe this approach makes the most sense because:
1) It implements batches in one round trip (the intention of the feature)
2) It is complient with the standard
3) It is complient with the current functionality of the backend

thanks,
--Barry


Rene Pijlman wrote:
 I've finished the secion on batch updates in the JDBC 2.0
 compliance documentation on
 http://lab.applinet.nl/postgresql-jdbc/ (see the quote of the
 relevant part below).
 
 In the short term I think two things need to be fixed:
 1) don't begin, commit or rollback a transaction implicitly in
 Statement.executeBatch()
 2) have executeBatch() throw a BatchUpdateException when it is
 required to do so by the JDBC spec
 
 If there are no objections from this list I intend to submit a
 patch that fixes 1), and perhaps also 2). 
 
 Note that this may cause backward compatibility issues with JDBC
 applications that have come to rely on the incorrect behaviour.
 OTOH, there have been complaints on this list before, and those
 people would certainly be happy about the fix. E.g.
 http://fts.postgresql.org/db/mw/msg.html?mid=83832
 
 In the long run it would be nice if the backend would support
 returning one update count (and perhaps an OID) per statement
 send in a semicolon separated multi-statement call. Would this
 be something for the backend TODO list? OTOH, I'm not sure if
 this (small?) performance improvement is worth the trouble.
 
 Batch updates
 
 The driver supports batch updates with the addBatch, clearBatch
 and executeBatch methods of Statement, PreparedStatement and
 CallableStatement. DatabaseMetaData.supportsBatchUpdates()
 returns true.
 
 However, executing statements in a batch does not provide a
 performance improvement with PostgreSQL, since all statements
 are internally send to the backend and processed one-by-one.
 That defeats the purpose of the batch methods. The intended
 behaviour is to send a set of update/insert/delete/DDL
 statements in one round trip to the database. Unfortunately,
 this optional JDBC feature cannot be implemented correctly with
 PostgreSQL, since the backend only returns the update count of
 the last statement send in one call with multiple statements.
 JDBC requires it to return an array with the update counts of
 all statements in the batch. Even though the batch processing
 feature currently provides no performance improvement, it should
 not be removed from the driver for reasons of backward
 compatibility.
 
 The current implementation of Statement.executeBatch() in
 PostgreSQL starts a new transaction and commits or aborts it.
 This is not in compliance with the JDBC specification, which
 does not mention transactions in the description of
 Statement.executeBatch() at all. The confusion is probably
 caused by a JDBC tutorial from Sun with example code which
 disables autocommit before calling executeBatch so that the
 transaction will not be automatically committed or rolled back
 when the method executeBatch is called. This comment in the
 tutorials appears to be a misunderstanding. A good reason to
 disable autocommit before calling executeUpdate() is to be able
 to commit or rollback all statements in a 

Re: [JDBC] Re: Proposal to fix Statement.executeBatch()

2001-08-27 Thread Rene Pijlman

On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
[executeBatch() implemented as one round trip]
Here is how I would suggest this be done in a way that is spec 
compliant (Note: that I haven't looked at the patch you submited yet, so 
forgive me if you have already done it this way, but based on your 
comments in this email, my guess is that you have not).

Indeed, I have not implemented this.

Statements should be batched together in a single statement with 
semicolons separating the individual statements (this will allow the 
backend to process them all in one round trip).

The result array should return an element with the row count for each 
statement, however the value for all but the last statement will be 
'-2'.  (-2 is defined by the spec to mean the statement was processed 
successfully but the number of affected rows is unknown).

Ah, I see. I hadn't thought of that solution.

In the event of an error, then the driver should return an array the 
size of the submitted batch with values of -3 for all elements. -3 is 
defined by the spec as the corresponding statement failed to execute 
successfully, or for statements that could not be processed for some 
reason.  Since in postgres when one statement fails (in non-autocommit 
mode), the entire transaction is aborted this is consistent with a 
return value of -3 in my reading of the spec.

Not quite. A statement in a batch may also fail because its a
succesful SELECT as far as the server is concerned (can't have
select's in a batch). But that situation can also be handled
correctly by setting the update count for that particular
statement to -3. Its then up to the application to decide if it
wants to rollback, I would say.

But what to do when an error occurs with autocommit enabled?
This is not recommended, but allowed by the spec, if I
understand it correctly.

What exactly is the behaviour of the backend in that scenario?
Does it commit every separate SQL statement in the
semicolon-separated list, or does it commit the list as a whole?
Does it abort processing the statement list when an error occurs
in one statement? And if it continues, does it return an error
when only one statement in the middle of the list had an error?

I believe this approach makes the most sense because:
1) It implements batches in one round trip (the intention of the feature)
2) It is complient with the standard
3) It is complient with the current functionality of the backend

If we can come up with an acceptable solution for an error with
autocommit enabled, I agree. Otherwise, I'm not sure.

However, it would mean a change in behaviour of the driver that
may break existing JDBC applications: the driver will no longer
return update counts for all statements in a batch like it
currently does, it will return unknown for most statements.
I'm not sure if the performance improvement justifies this
non-backwardly-compatible change, though I agree this is the
intention of the feature. What do you think?

Regards,
René Pijlman

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [JDBC] Re: Proposal to fix Statement.executeBatch()

2001-08-27 Thread Barry Lind

  What exactly is the behaviour of the backend in that scenario?
  Does it commit every separate SQL statement in the
  semicolon-separated list, or does it commit the list as a whole?
  Does it abort processing the statement list when an error occurs
  in one statement? And if it continues, does it return an error
  when only one statement in the middle of the list had an error?

I do not know what the server does if you have autocommit enabled and 
you issue multiple statements in one try.  However, I would be OK with 
the driver issuing the statements one by one with autocommit on.  If you 
are running in this mode you just wouldn't get any performance improvement.

  However, it would mean a change in behaviour of the driver that
  may break existing JDBC applications: the driver will no longer
  return update counts for all statements in a batch like it
  currently does, it will return unknown for most statements.
  I'm not sure if the performance improvement justifies this
  non-backwardly-compatible change, though I agree this is the
  intention of the feature. What do you think?

I wouldn't worry about this 'change in behavior' because if the caller 
is JDBC complient it should be coded to handle the new behavior as it is 
complient with the spec.

thanks,
--Barry




Rene Pijlman wrote:
 On Mon, 27 Aug 2001 11:07:55 -0700, you wrote:
 [executeBatch() implemented as one round trip]
 
Here is how I would suggest this be done in a way that is spec 
compliant (Note: that I haven't looked at the patch you submited yet, so 
forgive me if you have already done it this way, but based on your 
comments in this email, my guess is that you have not).

 
 Indeed, I have not implemented this.
 
 
Statements should be batched together in a single statement with 
semicolons separating the individual statements (this will allow the 
backend to process them all in one round trip).

The result array should return an element with the row count for each 
statement, however the value for all but the last statement will be 
'-2'.  (-2 is defined by the spec to mean the statement was processed 
successfully but the number of affected rows is unknown).

 
 Ah, I see. I hadn't thought of that solution.
 
 
In the event of an error, then the driver should return an array the 
size of the submitted batch with values of -3 for all elements. -3 is 
defined by the spec as the corresponding statement failed to execute 
successfully, or for statements that could not be processed for some 
reason.  Since in postgres when one statement fails (in non-autocommit 
mode), the entire transaction is aborted this is consistent with a 
return value of -3 in my reading of the spec.

 
 Not quite. A statement in a batch may also fail because its a
 succesful SELECT as far as the server is concerned (can't have
 select's in a batch). But that situation can also be handled
 correctly by setting the update count for that particular
 statement to -3. Its then up to the application to decide if it
 wants to rollback, I would say.
 
 But what to do when an error occurs with autocommit enabled?
 This is not recommended, but allowed by the spec, if I
 understand it correctly.
 
 What exactly is the behaviour of the backend in that scenario?
 Does it commit every separate SQL statement in the
 semicolon-separated list, or does it commit the list as a whole?
 Does it abort processing the statement list when an error occurs
 in one statement? And if it continues, does it return an error
 when only one statement in the middle of the list had an error?
 
 
I believe this approach makes the most sense because:
1) It implements batches in one round trip (the intention of the feature)
2) It is complient with the standard
3) It is complient with the current functionality of the backend

 
 If we can come up with an acceptable solution for an error with
 autocommit enabled, I agree. Otherwise, I'm not sure.
 
 However, it would mean a change in behaviour of the driver that
 may break existing JDBC applications: the driver will no longer
 return update counts for all statements in a batch like it
 currently does, it will return unknown for most statements.
 I'm not sure if the performance improvement justifies this
 non-backwardly-compatible change, though I agree this is the
 intention of the feature. What do you think?
 
 Regards,
 René Pijlman
 
 



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly