Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-30 Thread Leon Smith
Hey, this thread was pointed out to me just a few days ago, but I'll start
by saying that I think this thread is on exactly the right track.   I don't
like the callback API,  and think that PQsetSingleRowMode should be offered
in place of it.   But I do have one

On Sat, Jun 16, 2012 at 10:22 AM, Marko Kreen mark...@gmail.com wrote:

 The function can be called only after PQsend* and before any
 rows have arrived.  This guarantees there will be no surprises
 to PQexec* users who expect full resultset at once.


Ok,  I'm guessing you mean that before you call PQgetResult or
PQgetRowData,  or maybe before you call PQgetResult or PQgetRowData and
it returns a result or partial result.Because it would be a race
condition if you meant exactly what you said.   (Though I don't understand
how this could possibly be implemented without some source of concurrency,
which libpq doesn't do.)   Maybe this is a little overly pendantic,  but I
do want to confirm the intention here.

One other possibility,  Tom Lane fretted ever so slightly about the use of
malloc/free per row... what about instead of PQsetSingleRowMode,  you have
PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize = 0
would be equivalent to what we have today,   a chunkSize of 1 means you get
what you have from PQsetSingleRowMode,  and larger chunkSizes would wait
until n rows have been received before returning them all in a single
result.  I don't know that this suggestion is all that important, but
it seems like an obvious generalization that might possibly be useful.

Best,
Leon


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-07-30 Thread Leon Smith
On Mon, Jul 30, 2012 at 9:59 PM, Jan Wieck janwi...@yahoo.com wrote:

 On 7/30/2012 8:11 PM, Leon Smith wrote:

 One other possibility,  Tom Lane fretted ever so slightly about the use
 of malloc/free per row... what about instead of PQsetSingleRowMode,  you
 have PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize
 = 0 would be equivalent to what we have today,   a chunkSize of 1 means
 you get what you have from PQsetSingleRowMode,  and larger chunkSizes
 would wait until n rows have been received before returning them all in
 a single result.  I don't know that this suggestion is all that
 important, but it seems like an obvious generalization that might
 possibly be useful.


 It is questionable if that actually adds any useful functionality.


This is true, I'm not sure my suggestion is necessarily useful.   I'm just
throwing it out there.


 Any collecting of multiple rows will only run the risk to stall
 receiving the following rows while processing this batch. Processing each
 row as soon as it is available will ensure making most use network buffers.


This is not necessarily true,  on multiple levels.   I mean,  some of the
programs I write are highly concurrent,  and this form of batching would
have almost no risk of stalling the network buffer.And the possible use
case would be when you are dealing with very small rows,  when there would
typically be several rows inside a single network packet or network buffer.



 Collecting multiple rows, like in the FETCH command for cursors does,
 makes sense when each batch introduces a network round trip, like for the
 FETCH command. But does it make any sense for a true streaming mode, like
 what is discussed here?


Maybe?I mean,  I anticipate that there are (probably) still use cases
for FETCH,  even when the row-at-a-time interface is a viable option and
the transport between postgres and the client has reasonable flow-control.

Leon


[HACKERS] Transactions over pathological TCP connections

2012-06-18 Thread Leon Smith
Out of (mostly idle) curiousity,  when exactly does a transaction commit,
especially with respect to a TCP connection that a pathological demon will
cut off at the worst possible moment?

The thing is,  I'm using PostgreSQL as a queue,  using asynchronous
notifications and following the advice of Marko Tiikkaja in this post:

http://johtopg.blogspot.com/2010/12/queues-in-sql.html

I'm using a stored procedure to reduce the round trips between the database
and client,  and then running it in a bare transaction,  that is,  as
SELECT dequeue_element();  with an implicit BEGIN/COMMIT to mark a row in
the queue as taken and return it.

My question is,  would it be theoretically possible for an element of a
queue to become marked but not delivered,  or delivered and not marked,  if
the TCP connection between the backend and client was interrupted at the
worst possible moment?   Will the backend wait for the delivery of the row
be acknowledged before the transaction is committed? Or should the
truly paranoid use an explicit transaction block and not consider the row
taken until confirmation that the transaction has committed has been
received?

Best,
Leon


Re: [HACKERS] Transactions over pathological TCP connections

2012-06-19 Thread Leon Smith
On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  The transaction would be committed before a command success report is
  delivered to the client, so I don't think delivered-and-not-marked is
  possible.

 ...unless you have configured synchronous_commit=off, or fsync=off.

 Or unless your disk melts into a heap of slag and you have to restore
 from backup.  You can protect against that last case using synchronous
 replication.



But hard disk failure isn't in the failure model I was concerned about.
=)To be perfectly honest,  I'm not too concerned with either hard drive
failure or network failure,   as we are deploying on Raid 1+0 database
server talking to the client over a redundant LAN,  and using asynchronous
(Slony) replication to an identical database server just in case.   No
single point of failure is a key philosophy of this app from top to bottom.
Like I said,  this is mostly idle curiosity.

But I'm also accustomed to trying to get work done on shockingly unreliable
internet connections.   As a result, network failure is something I think
about quite a lot when writing networked applications.   So this is not
entirely idle curiosity either.

And thinking about this a bit more,   it's clear that the database has to
commit before the result is sent,  on the off chance that the transaction
fails and needs to be retried.   And that an explicit transaction block
isn't really a solution either,  because   a BEGIN;  SELECT dequeue_row()
 would get the row to the client without marking it as taken,   but the
pathological TCP disconnect could then attack the  following COMMIT;,
 leaving the client to think that the row has not been actually taken when
it in fact has.

It's not clear to me that this is even a solvable problem without modifying
the schema to include both a taken and a finished processing state,
 and then letting elements be re-delievered after a period of time.   But
this would then allow a pathological demon with the power to cause TCP
connects have a single element delivered and processed multiple times.

In any case, thanks for the responses...

Best,
Leon


Re: [HACKERS] Transactions over pathological TCP connections

2012-06-19 Thread Leon Smith
I just realized this is essentially an instance of the Two General's
Problem;  which is something I feel should have been more obvious to me.

On Tue, Jun 19, 2012 at 5:50 PM, Leon Smith leon.p.sm...@gmail.com wrote:

 On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas robertmh...@gmail.comwrote:

 On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  The transaction would be committed before a command success report is
  delivered to the client, so I don't think delivered-and-not-marked is
  possible.

 ...unless you have configured synchronous_commit=off, or fsync=off.

 Or unless your disk melts into a heap of slag and you have to restore
 from backup.  You can protect against that last case using synchronous
 replication.



 But hard disk failure isn't in the failure model I was concerned about.
 =)To be perfectly honest,  I'm not too concerned with either hard drive
 failure or network failure,   as we are deploying on Raid 1+0 database
 server talking to the client over a redundant LAN,  and using asynchronous
 (Slony) replication to an identical database server just in case.   No
 single point of failure is a key philosophy of this app from top to bottom.
 Like I said,  this is mostly idle curiosity.

 But I'm also accustomed to trying to get work done on shockingly
 unreliable internet connections.   As a result, network failure is
 something I think about quite a lot when writing networked applications.
 So this is not entirely idle curiosity either.

 And thinking about this a bit more,   it's clear that the database has to
 commit before the result is sent,  on the off chance that the transaction
 fails and needs to be retried.   And that an explicit transaction block
 isn't really a solution either,  because   a BEGIN;  SELECT dequeue_row()
  would get the row to the client without marking it as taken,   but the
 pathological TCP disconnect could then attack the  following COMMIT;,
  leaving the client to think that the row has not been actually taken when
 it in fact has.

 It's not clear to me that this is even a solvable problem without
 modifying the schema to include both a taken and a finished processing
 state,  and then letting elements be re-delievered after a period of time.
   But this would then allow a pathological demon with the power to cause
 TCP connects have a single element delivered and processed multiple times.

 In any case, thanks for the responses...

 Best,
 Leon



[HACKERS] Exporting closePGconn from libpq

2011-05-14 Thread Leon Smith
A minor issue has come up in creating low-level bindings to libpq for
safe garbage-collected languages,  namely that PQfinish is the only
(AFAICT) way to close a connection but also de-allocates the memory
used to represent the database connection.It would be preferable
to call PQfinish to free the memory in a finalizer,  but appilcations
need a way to disconnect from the database at a predictable and
deterministic point in time,  whereas leaving a bit of memory around
until the GC finally gets to it is relatively harmless.The
low-level binding has a couple of options:

1.  Ignore the issue and allow for the possibility of a segfault if
the library is used incorrectly,  which is not a good situation for
safe languages.

2.  Create a wrapper that tracks whether or not PQfinish has ever been
called,  so that attempts to use a connection afterwards can be turned
into native exceptions/other forms of error signaling.  This kind of
solution can introduce their own minor issues.

3.  Hack libpq to export closePGconn so that libpq can safely signal
the low-level bindings of the error when a connection is used after it
is disconnected,  reserving PQfinish to run in a GC-triggered
finalizer.  While this is a technically preferable solution,  without
getting the change into upstream sources it is also a deployment
nightmare.

Is there any particular reason why closePGconn should not be exported
from libpq?

Best,
Leon

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exporting closePGconn from libpq

2011-05-14 Thread Leon Smith
On Sat, May 14, 2011 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yes: it'd introduce a new externally-visible state that libpq now has to
 worry about supporting in all its operations, ie connection closed but
 not gone.  This state is guaranteed to be poorly tested and hence buggy.

If you connect to a database over an unreliable network,  you can lose
the connection without warning at any time.  Thus libpq must already
support a connection 'closed' but not gone state,  and I'm fine with
making the explicitly disconnected state indistinguishable from the
connection lost state.

 I think you need a wrapper object.  Given the context you're describing,
 I'd be willing to lay a side bet that you'll end up needing a wrapper
 anyway, even if it seems like you could avoid it right now.  Language
 embeddings of libpq tend to accrete features...

The intention of the low-level bindings I'm working on is to keep
features to an absolute minimum; to bind calls to C in a 1-1 fashion
and to handle memory management and error signaling associated with
foreign calls.  Of course such a library is not intended to be
particularly attractive for application development,  but rather as a
library that can be wrapped up into a higher-level database access
library that's free to accrete features.  :)

Best,
Leon

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Simplified VALUES parameters

2014-02-26 Thread Leon Smith
Hi,  I'm the maintainer and a primary author of a postgresql client library
for Haskell,  called postgresql-simple,  and I recently investigated
improving support for VALUES expressions in this library.  As a result, I'd
like to suggest two changes to postgresql:

1.   Allow type specifications inside AS clauses,  for example

(VALUES (1,'hello'),(2,'world')) AS update(x int, y text)

2.  Have an explicit syntax for representing VALUES expressions which
contain no rows,  such as VALUES ().   (although the precise syntax isn't
important to me.)

My claim is that these changes would make it simpler for client libraries
to properly support parameterized VALUES expressions.  If you care,  I've
included a postscript including a brief background,  and a link to my
analysis and motivations.

Best,
Leon


P.S.

https://github.com/lpsmith/postgresql-simple/issues/61

Not entirely unlike many other client libraries, such as psycopg2,
 postgresql generates queries
by expanding values of particular Haskell types into fragments of SQL
syntax.   So for example,  you can currently write:

executeMany conn [sql|

 UPDATE tbl SET tbl.y = upd.y
   FROM (VALUES (?,?)) AS upd(x,y)

  WHERE tbl.x = upd.x

  |] [(1,hello),(2,world)]


Which will issue the query:

UPDATE tbl SET tbl.y = upd.y

  FROM (VALUES (1,'hello'),(2,'world')) AS upd(x,y)
 WHERE tbl.x = upd.x

The issue however is that postgresql-simple cannot currently parameterize
more complex queries that have multiple VALUES expressions,  or a VALUES
expression alongside other parameters,  as might occur with a Writable CTE
or complex query.

Also, when presented with a empty list of arguments,  executeMany does not
issue a query at all and simply returns 0,  which is (usually?) the right
thing to do given it's intended use cases,  but is not the right thing to
do in more general settings.

So,  what I'd like to do is to be able to write something like:


execute conn [sql|

 UPDATE tbl SET tbl.y = upd.y
   FROM ? AS upd(x,y)

  WHERE tbl.x = upd.x
AND tbl.z = ?

  |] ( Values [(1,hello),(2,world)], False )


and issue a similar query.   However, the problems with this approach is
specifying the postgresql types and handling the zero-row case properly.


Re: [HACKERS] Simplified VALUES parameters

2014-02-26 Thread Leon Smith
On Wed, Feb 26, 2014 at 1:54 PM, Josh Berkus j...@agliodbs.com wrote:

 And thank you for writing that driver!


You are welcome!

I have no opinion about your request for VALUES() stuff, though.  It
 looks fairly complex as far as grammar and libpq is concerned.


Actually,  my suggestions wouldn't necessarily impact libpq at all.   For
better and worse,  postgresql-simple does not currently support
protocol-level parameters at all.   While it's clear to me that I do
eventually need to work on supporting protocol-level parameters and support
for the binary formats,  it's also become clear to me since I first wrote
it that protocol-level parameters are not a total replacement either,  and
that postgresql-simple will still need to support direct parameter
expansion in some cases.   (e.g. for values parameters,  for identifier
parameters (which aren't yet supported due to the need to drop support for
libpq 8.4),  etc.)

Best,
Leon


Re: [HACKERS] Adding unsigned 256 bit integers

2014-04-10 Thread Leon Smith
pgmp is also worth mentioning here,   and it's likely to be more efficient
than the numeric type or something you hack up yourself:

http://pgmp.projects.pgfoundry.org/

Best,
Leon


On Thu, Apr 10, 2014 at 10:11 AM, k...@rice.edu k...@rice.edu wrote:

 On Thu, Apr 10, 2014 at 09:13:47PM +0800, Olivier Lalonde wrote:
  I was wondering if there would be any way to do the following in
 PostgreSQL:
 
  UPDATE cryptotable SET work = work + 'some big hexadecimal number'
 
  where work is an unsigned 256 bit integer. Right now my column is a
  character varying(64) column (hexadecimal representation of the number)
 but
  I would be happy to switch to another data type if it lets me do the
  operation above.
 
  If it's not possible with vanilla PostgreSQL, are there extensions that
  could help me?
 
  --
  - Oli
 
  Olivier Lalonde
  http://www.syskall.com -- connect with me!
 

 Hi Olivier,

 Here are some sample pl/pgsql helper functions that I have written for
 other purposes. They use integers but can be adapted to use numeric.

 Regards,
 Ken
 ---
 CREATE OR REPLACE FUNCTION hex2dec(t text) RETURNS integer AS $$
 DECLARE
   r RECORD;
 BEGIN
   FOR r IN EXECUTE 'SELECT x'''||t||'''::integer AS hex' LOOP
 RETURN r.hex;
   END LOOP;
 END
 $$ LANGUAGE plpgsql IMMUTABLE STRICT;
 ---

 ---
 CREATE OR REPLACE FUNCTION bytea2int (
   in_string BYTEA
 ) RETURNS INTEGER AS $$

 DECLARE

   b1 INTEGER := 0;
   b2 INTEGER := 0;
   b3 INTEGER := 0;
   b4 INTEGER := 0;
   out_int INTEGER := 0;

 BEGIN

   CASE OCTET_LENGTH(in_string)
 WHEN 1 THEN
   b4 := get_byte(in_string, 0);
 WHEN 2 THEN
   b3 := get_byte(in_string, 0);
   b4 := get_byte(in_string, 1);
 WHEN 3 THEN
   b2 := get_byte(in_string, 0);
   b3 := get_byte(in_string, 1);
   b4 := get_byte(in_string, 2);
 WHEN 4 THEN
   b1 := get_byte(in_string, 0);
   b2 := get_byte(in_string, 1);
   b3 := get_byte(in_string, 2);
   b4 := get_byte(in_string, 3);
   END CASE;

   out_int := (b1  24) + (b2  16) + (b3  8) + b4;

   RETURN(out_int);
 END;
 $$ LANGUAGE plpgsql IMMUTABLE;
 ---


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers