Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-17 Thread P. Christeas
On Thursday 14 February 2013, Manlio Perillo wrote:
 Il 14/02/2013 14:06, Albe Laurenz ha scritto:
  Manlio Perillo wrote:
  Sorry for the question, but where can I find the libpq test suite?
  I can not find it in the PostgreSQL sources; it seems that there are
  only some examples, in src/test/examples.
  
 For my Python DBAPI2 PostgreSQL driver I plan the following optimizations:
 
 1) always use PQsendQueryParams functions.
 
This will avoid having to escape parameters, as it is done in
psycopg2
(IMHO it still use simple query protocol for compatibility purpose)
 
 2) when the driver detects a Python string is being sent to the
database, use binary format.
 
As a special case, this will avoid having to use PQescapeByteaConn
when sending binary string (e.g. byte strings in Python 3.x)
 

Perhaps you could also see some attempt I'd made to support binary protocol 
inside psycopg2, some time ago:

https://github.com/xrg/psycopg/tree/execparams2



-- 
Say NO to spam and viruses. Stop using Microsoft Windows!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-15 Thread Manlio Perillo
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 15/02/2013 02:45, Andrew McNamara ha scritto:
 For my Python DBAPI2 PostgreSQL driver I plan the following optimizations:
 
 I suggest you have a look at my Python ocpgdb driver:
 
 http://code.google.com/p/ocpgdb/
 

Thanks, I did not know it.

 It uses the v3 binary protocol exclusively (to avoid the usual escaping
 security issues). A number of gotchyas were discovered along the way -
 in particular, you must be a lot more careful about types (as you note
 in a later reply).

Note that this query:
  curs.execute('SELECT * FROM foo WHERE bah  %s', '2006-1-1')

is IMHO incorrect, as per DBAPI 2.0:
http://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors


 There were also some issues with the v3 protocol,
 most of which have been fixed now.
 

I hope the issues are discussed in the commit log messages; I'm creating
a git local mirror of the svn repository.

 ocpgdb does not implement everything, just the bits I needed. That said,
 other people/projects are using it in production, and it's proven to be
 fast and stable.
 


Regards  Manlio
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlEemTYACgkQscQJ24LbaUTujgCfZhrNTsqy/PvRJ4qwLVqy8QVT
lNwAnjsJooEv/vss32RNMKEISOdZ16F1
=nKO5
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-15 Thread Peter Eisentraut
On 2/14/13 2:42 PM, Marko Tiikkaja wrote:
 I think the reason this doesn't work is that in order to prepare a query
 you need to know the parameter types, but you don't know that in Python,
 or at least with the way the DB-API works.  For example, if you write

 cur.execute(SELECT * FROM tbl WHERE a = %s AND b = %s, (val1, val2))

 what types will you pass to PQsendQueryParams?
 
 Pardon me if this is obvious, but why would you need to pass any types
 at all?  Assuming we're still talking about PQsendQueryParams and not an
 explicit prepare/execute cycle..

Well, PQsendQueryParams() requires types to be passed, doesn't it?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-15 Thread Daniele Varrazzo
On Fri, Feb 15, 2013 at 9:28 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 2/14/13 2:42 PM, Marko Tiikkaja wrote:
 I think the reason this doesn't work is that in order to prepare a query
 you need to know the parameter types, but you don't know that in Python,
 or at least with the way the DB-API works.  For example, if you write

 cur.execute(SELECT * FROM tbl WHERE a = %s AND b = %s, (val1, val2))

 what types will you pass to PQsendQueryParams?

 Pardon me if this is obvious, but why would you need to pass any types
 at all?  Assuming we're still talking about PQsendQueryParams and not an
 explicit prepare/execute cycle..

 Well, PQsendQueryParams() requires types to be passed, doesn't it?

No, not necessarily: they are inferred by the context if they are not specified.

I've had in mind for a long time to use the *Params() functions in
psycopg (although it would be largely not backwards compatible, hence
to be done on user request and not by default). Psycopg has all the
degrees of freedom in keeping the two implementations alive (the
non-*params for backward compatibility, the *params for future usage).
I'd drafted a plan on the psycopg ML some times ago. But I don't have
a timeline for that: it's a major work and without pressing
motivations to do it.

-- Daniele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Manlio Perillo
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 14/02/2013 14:06, Albe Laurenz ha scritto:
 Manlio Perillo wrote:
 Sorry for the question, but where can I find the libpq test suite?
 I can not find it in the PostgreSQL sources; it seems that there are
 only some examples, in src/test/examples.
 
 The regression tests are in src/interfaces/libpq/test
 and currently contain only URL parsing tests.
 

Ok, thanks.

Since I'm not sure if I should add a new test here, I'll use the test
suite of my project, since it contains an (almost) 1:1 wrapper around libpq.

 I'm planning to add some new features to libpq:

 * make PQsendPrepare send a Describe Portal protocol message
 * add support for setting per column binary result format
 
 I suggested exactly that here:
 http://www.postgresql.org/message-id/d960cb61b694cf459dcfb4b0128514c208a4e...@exadv11.host.magwien.gv.at
 and met resistance:
 - one can use libpqtypes
 - I couldn't find a convincing use case
 - it clutters up the API
 

For my Python DBAPI2 PostgreSQL driver I plan the following optimizations:

1) always use PQsendQueryParams functions.

   This will avoid having to escape parameters, as it is done in
   psycopg2
   (IMHO it still use simple query protocol for compatibility purpose)

2) when the driver detects a Python string is being sent to the
   database, use binary format.

   As a special case, this will avoid having to use PQescapeByteaConn
   when sending binary string (e.g. byte strings in Python 3.x)

3) enable use of prepared statements, but only if the user requested it,
   using setinputsizes function (used to set the Oids of the parameters)

4) when using a prepared statement, check the Oids of the result tuple.

   In order to make this efficient, I proposed a patch to send a
   Describe Portal message in PQsendPrepare function.

   When the driver detects that one of the result column is a string
   type, set the result format for that column to binary.

   As a special case, this will avoid having to use PQunescapeBytea
   when receiving a bytea data.

   This is currently impossible, using libpq API.

5) when returning the result set of a query, after a call to
   cursor.fetchall(), do not convert all the data to Python objects.

   This will be done only on request.

   This should optimize memory usage, as reported in:
   http://wiki.postgresql.org/wiki/Python_PostgreSQL_Driver_TODO

6) make available the use of PQsetSingleRowMode, to optimize large
   result set (as an option to the connection.cursor method)

7) as a generalization of PQsetSingleRowMode, expose in libpq API some
   of protocol internal portal API.

   One possible idea is to add a PQsetRowSize function, that will set
   the size of the result set, to be used in the Execute protocol
   message (currently libpq always set it to 0, to get the entire
   result set, and it does not support the Portal Suspended message)

   This will avoid having to use named cursor, as it is done in psycopg.

   I'll try to make a patch to check if this is feasible, can be
   done efficiently, and the new API has a minimal impact on existing
   API

Note that I will have to code these features, in order to check they
will work as I expect.


 [...]

 [1] A new Python PostgreSQL driver, implemented following
 http://wiki.postgresql.org/wiki/Driver_development
 and with many optimization (compared to psycopg2) enabled by the
 use of the extended query protocol
 
 I think that you'll need to explain in more detail why
 your proposed additions would be necessary for your project.
 Especially since many good drivers have been written against
 libpq as it is.
 
 Yours,
 Laurenz Albe
 

Thanks   Manlio Perillo

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlEc81MACgkQscQJ24LbaURO9ACfctOREoaAtMDm06Sg+qv5jesj
iW0An1CVAOaHzYaSn+P1AIJvXpI7nVT0
=rK4j
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Jonathan Rogers
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

A number of the described features sound quite useful. Is it not
practical to extend an existing library such as psycopg2? What method
will you use to call libpq functions? As you are no doubt aware,
psycopg2 uses the traditional CPython API but there is a fork which uses
ctypes instead, enabling it to work with CPython and Pypy.

Manlio Perillo wrote:

 For my Python DBAPI2 PostgreSQL driver I plan the following optimizations:
 
 2) when the driver detects a Python string is being sent to the
database, use binary format.

What exactly do you mean by Python string?


- -- 
Jonathan Ross Rogers
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlEdHFAACgkQVmXTv6uMqqOlzgCgxcGtjwFqK1AMKAED9rK5YSOa
3AMAoJhw4197HPBrjpxG/iSLK43B1B3j
=RnaP
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Manlio Perillo
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 14/02/2013 18:18, Jonathan Rogers ha scritto:
 A number of the described features sound quite useful. Is it not
 practical to extend an existing library such as psycopg2?

I suspect there are compatibility issues.

 What method
 will you use to call libpq functions? As you are no doubt aware,
 psycopg2 uses the traditional CPython API but there is a fork which uses
 ctypes instead, enabling it to work with CPython and Pypy.
 

I'm implementing a prototype version, in pure Python with ctypes.
When the prototype is ready, I will implement a CPython extension module
(since the purpose of writing yet another driver is to make it more
efficient than the current best driver).

I will also discuss about porting some of the features to psycopg2 (but
first I need to have a working driver).

 Manlio Perillo wrote:
 
 For my Python DBAPI2 PostgreSQL driver I plan the following optimizations:
 
 2) when the driver detects a Python string is being sent to the
database, use binary format.
 
 What exactly do you mean by Python string?
 

A Python string object.

The libpq interface will implement the functions
  pg_str_encode(string object) - bytes
  pg_str_decode(bytes, result object) - string object
as described in
http://wiki.postgresql.org/wiki/Driver_development

but with some differences.

The pg_str_encode function will return a tuple with the raw bytes and
the suggested parameter format.

As an example, pg_str_encode(Python 3.x byte string) will return the
byte string and 1 (since it is a binary string, and I want to avoid to
use PQescapeBytea function).

For the DBAPI interface, I plan to implement a generic API to map
PostgreSQL types to Python types (different from the one implemented in
psycopg2); something like:

pgtype = connection.create_type_map(
python type object, postgresql type name,
input text function, output text function,
input binary function - optional,
input binary function - optional)

but I have yet to decide how to detect the param format to use.

Maybe there will be only one output function, that will decide the best
format to be used.



Regards  Manlio
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlEdIYQACgkQscQJ24LbaUSfBACfWR7eD6pdyipC3/fljUiuelx6
GV4An3agVt4tx0E/JDUvO0iM8/BiZT1o
=xckI
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Peter Eisentraut
On 2/14/13 9:23 AM, Manlio Perillo wrote:
 1) always use PQsendQueryParams functions.
 
This will avoid having to escape parameters, as it is done in
psycopg2
(IMHO it still use simple query protocol for compatibility purpose)

I think the reason this doesn't work is that in order to prepare a query
you need to know the parameter types, but you don't know that in Python,
or at least with the way the DB-API works.  For example, if you write

cur.execute(SELECT * FROM tbl WHERE a = %s AND b = %s, (val1, val2))

what types will you pass to PQsendQueryParams?

You can make some obvious heuristics, but this sort of thing can get
complicated pretty quickly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Manlio Perillo
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 14/02/2013 20:01, Peter Eisentraut ha scritto:
 On 2/14/13 9:23 AM, Manlio Perillo wrote:
 1) always use PQsendQueryParams functions.

This will avoid having to escape parameters, as it is done in
psycopg2
(IMHO it still use simple query protocol for compatibility purpose)
 
 I think the reason this doesn't work is that in order to prepare a query
 you need to know the parameter types, but you don't know that in Python,
 or at least with the way the DB-API works.

Hint: .setinputsizes.

In my implementation, prepared queries are **only** used if user calls
setinputsizes; if setinputsizes is not called, preparing a query can
cause performance loss, so it is better to not prepare one.

  For example, if you write
 
 cur.execute(SELECT * FROM tbl WHERE a = %s AND b = %s, (val1, val2))
 
 what types will you pass to PQsendQueryParams?
 

1) if setinputsizes is used, use the type specified here.

2) if setinputsizes is not used, query the driver's type catalog, to
   get the postgresql type oid associated with a Python object; e.g.:

   pg_type = connection.get_type_by_object(val1)
   buf, format = pg_type.output_function(val1)
   param_oid = pg_type.oid

 You can make some obvious heuristics, but this sort of thing can get
 complicated pretty quickly.

A non trivial case if when val is a list, that should be mapped to a
PostgreSQL array.

However, you can always set the oid to 0, and let PostgreSQL deduce the
type, as it is done in psycopg2.  If user called setinputsizes, we are
happy.


Regards   Manlio
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlEdO54ACgkQscQJ24LbaURFWACgkG3dkDYUA3tHooiXpGDf8hm0
Fz0AnjhFDwJU/XSqtgPyHwuHw9+GiRlv
=m68+
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Marko Tiikkaja

On 14/02/2013 20:01, Peter Eisentraut wrote:

On 2/14/13 9:23 AM, Manlio Perillo wrote:

1) always use PQsendQueryParams functions.

This will avoid having to escape parameters, as it is done in
psycopg2
(IMHO it still use simple query protocol for compatibility purpose)


I think the reason this doesn't work is that in order to prepare a query
you need to know the parameter types, but you don't know that in Python,
or at least with the way the DB-API works.  For example, if you write

cur.execute(SELECT * FROM tbl WHERE a = %s AND b = %s, (val1, val2))

what types will you pass to PQsendQueryParams?


Pardon me if this is obvious, but why would you need to pass any types 
at all?  Assuming we're still talking about PQsendQueryParams and not an 
explicit prepare/execute cycle..



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [RFC] ideas for a new Python DBAPI driver (was Re: [HACKERS] libpq test suite)

2013-02-14 Thread Andrew McNamara
For my Python DBAPI2 PostgreSQL driver I plan the following optimizations:

I suggest you have a look at my Python ocpgdb driver:

http://code.google.com/p/ocpgdb/

It uses the v3 binary protocol exclusively (to avoid the usual escaping
security issues). A number of gotchyas were discovered along the way -
in particular, you must be a lot more careful about types (as you note
in a later reply). There were also some issues with the v3 protocol,
most of which have been fixed now.

ocpgdb does not implement everything, just the bits I needed. That said,
other people/projects are using it in production, and it's proven to be
fast and stable.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers