Re: [HACKERS] plpython SPI cursors

2011-12-10 Thread Peter Eisentraut
On mån, 2011-12-05 at 13:12 -0500, Bruce Momjian wrote:
 Jan Urbański wrote:
  On 05/12/11 18:58, Peter Eisentraut wrote:
   On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
   On 20/11/11 19:14, Steve Singer wrote:
   Responding now to all questions and attaching a revised patch based on 
   your comments.
   
   Committed
   
   Please refresh the other patch.
  
  Great, thanks!
  
  I'll try to send an updated version of the other patch this evening.
 
 I assume this is _not_ related to this TODO item:
 
   Add a DB-API compliant interface on top of the SPI interface 

No, but this is:
http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-12-05 Thread Peter Eisentraut
On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote:
 On 20/11/11 19:14, Steve Singer wrote:
  On 11-10-15 07:28 PM, Jan Urbański wrote:
  Hi,
 
  attached is a patch implementing the usage of SPI cursors in PL/Python.
  Currently when trying to process a large table in PL/Python you have
  slurp it all into memory (that's what plpy.execute does).
 
  J
 
  I found a few bugs (see my testing section below) that will need fixing
  + a few questions about the code
 
 Responding now to all questions and attaching a revised patch based on 
 your comments.

Committed

Please refresh the other patch.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-12-05 Thread Jan Urbański
On 05/12/11 18:58, Peter Eisentraut wrote:
 On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote:
 On 20/11/11 19:14, Steve Singer wrote:
 Responding now to all questions and attaching a revised patch based on 
 your comments.
 
 Committed
 
 Please refresh the other patch.

Great, thanks!

I'll try to send an updated version of the other patch this evening.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-12-05 Thread Bruce Momjian
Jan Urba??ski wrote:
 On 05/12/11 18:58, Peter Eisentraut wrote:
  On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
  On 20/11/11 19:14, Steve Singer wrote:
  Responding now to all questions and attaching a revised patch based on 
  your comments.
  
  Committed
  
  Please refresh the other patch.
 
 Great, thanks!
 
 I'll try to send an updated version of the other patch this evening.

I assume this is _not_ related to this TODO item:

Add a DB-API compliant interface on top of the SPI interface 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-12-05 Thread Jan Urbański
On 05/12/11 19:12, Bruce Momjian wrote:
 Jan Urbański wrote:
 On 05/12/11 18:58, Peter Eisentraut wrote:
 On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
 On 20/11/11 19:14, Steve Singer wrote:
 Responding now to all questions and attaching a revised patch based on 
 your comments.

 Committed

 Please refresh the other patch.

 Great, thanks!

 I'll try to send an updated version of the other patch this evening.
 
 I assume this is _not_ related to this TODO item:
 
   Add a DB-API compliant interface on top of the SPI interface 

No,  not related.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-11-28 Thread Peter Eisentraut
On lör, 2011-11-26 at 11:21 -0500, Steve Singer wrote:
 I've looked over the revised version of the patch and it now seems
 fine.
 
 Ready for committer. 

I can take it from here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-11-26 Thread Steve Singer

On 11-11-23 01:58 PM, Jan Urbański wrote:

On 20/11/11 19:14, Steve Singer wrote:

On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J


I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code


Responding now to all questions and attaching a revised patch based on 
your comments.




I've looked over the revised version of the patch and it now seems fine.

Ready for committer.



Do we like the name plpy.cursor or would we rather call it something 
like

plpy.execute_cursor(...) or plpy.cursor_open(...) or
plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is
worth
some opinions on. I like cursor() but if anyone disagrees now is the 
time.


We use plpy.subtransaction() to create Subxact objects, so I though 
plpy.cursor() would be most appropriate.



This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes 
someone else

adding that later if they see a need.


My idea is to add keyword arguments to plpy.cursor() that will allow 
you to decide whether you want a scrollable cursor and after that 
provide a move() method.



The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database
access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.


Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet 
summoned force to overhaul them.



in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;



I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need 
to be

volatile here.


It looked like you could drop volatile there (and in 
PLy_spi_execute_plan, where this is copied from (did I mention there's 
quite some code duplication in PL/Python?)) but digging in git I found 
this commit:


http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000 



that added the original volatile qualification, so I guess there's a 
reason.



line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, s, query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()


No, PyArg_ParseTuple returns a string on the stack, I check that 
repeatedly creating a cursor with a plan argument does not leak memory 
and that adding PyMem_Free there promptly leads to a segfault.




I tested both python 2.6 and 3 on a Linux system

[test cases demonstrating bugs]


Turns out it's a really bad idea to store pointers to Portal 
structures, because they get invalidated by the subtransaction abort 
hooks.


I switched to storing the cursor name and looking it up in the 
appropriate hash table every time it's used. The examples you sent 
(which I included as regression tests) now cause a ValueError to be 
raised with a message stating that the cursor has been created in an 
aborted subtransaction.


Not sure about the wording of the error message, though.

Thanks again for the review!

Cheers,
Jan







Re: [HACKERS] plpython SPI cursors

2011-11-23 Thread Jan Urbański

On 20/11/11 19:14, Steve Singer wrote:

On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J


I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code


Responding now to all questions and attaching a revised patch based on 
your comments.



Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or
plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is
worth
some opinions on. I like cursor() but if anyone disagrees now is the time.


We use plpy.subtransaction() to create Subxact objects, so I though 
plpy.cursor() would be most appropriate.



This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.


My idea is to add keyword arguments to plpy.cursor() that will allow you 
to decide whether you want a scrollable cursor and after that provide a 
move() method.



The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database
access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.


Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet 
summoned force to overhaul them.



in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;



I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.


It looked like you could drop volatile there (and in 
PLy_spi_execute_plan, where this is copied from (did I mention there's 
quite some code duplication in PL/Python?)) but digging in git I found 
this commit:


http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000

that added the original volatile qualification, so I guess there's a reason.


line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, s, query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()


No, PyArg_ParseTuple returns a string on the stack, I check that 
repeatedly creating a cursor with a plan argument does not leak memory 
and that adding PyMem_Free there promptly leads to a segfault.




I tested both python 2.6 and 3 on a Linux system

[test cases demonstrating bugs]


Turns out it's a really bad idea to store pointers to Portal structures, 
because they get invalidated by the subtransaction abort hooks.


I switched to storing the cursor name and looking it up in the 
appropriate hash table every time it's used. The examples you sent 
(which I included as regression tests) now cause a ValueError to be 
raised with a message stating that the cursor has been created in an 
aborted subtransaction.


Not sure about the wording of the error message, though.

Thanks again for the review!

Cheers,
Jan
diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index eda2bbf..d08c3d1 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -892,6 +892,15 @@ $$ LANGUAGE plpythonu;
   /para
 
   para
+Note that calling literalplpy.execute/literal will cause the entire
+result set to be read into memory. Only use that function when you are sure
+that the result set will be relatively small.  If you don't want to risk
+excessive memory usage when fetching large results,
+use literalplpy.cursor/literal rather
+than literalplpy.execute/literal.
+  /para
+
+  para
For example:
 programlisting
 rv = plpy.execute(SELECT * FROM my_table, 5)
@@ -958,6 +967,77 @@ $$ LANGUAGE plpythonu;
 
   /sect2
 
+  sect2
+titleAccessing data with cursors/title
+
+  para
+The literalplpy.cursor/literal function accepts the same arguments
+as literalplpy.execute/literal (except for literallimit/literal)
+and returns a cursor object, which allows you to process large result sets
+in smaller chunks.  As with literalplpy.execute/literal, either a query
+string or a plan object along with a list of arguments can be used.  The
+cursor object provides a literalfetch/literal method that accepts an
+integer paramter and returns a result object.  Each time you
+call literalfetch/literal, the returned object will contain the next
+

Re: [HACKERS] plpython SPI cursors

2011-11-20 Thread Steve Singer

On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

J


I found a few bugs  (see my testing section below) that will need fixing 
+ a few questions about the code



Overview  Feature Review
---
This patch adds cursor support to plpython.  SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once.  This is a good thing.  Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch.This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach.   I think the feature is worth having


Usability Review
--
 The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()

Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on.  I like cursor() but if anyone disagrees now is the time.

This patch does not provide a wrapper around SPI_cursor_move.  The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.


Documentation Review
-

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl 
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page 
does.  I think the organization of the perl page is

clearer than the python one and we should think about a doing some
documentaiton refactoring.  That should be done as a seperate patch and
shouldn't be a barrier to committing this one.




Code Review


in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portalportal;
+ char   *volatile nulls;
+ volatile int j;
+
+ if (nargs  0)
+ nulls = palloc(nargs * sizeof(char));
+ else
+ nulls = NULL;
+
+ for (j = 0; j  nargs; j++)
+ {
+ PyObject   *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

line 444
 PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char*query;
+ PyObject*plan;
+ PyObject   *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, s, query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()



Testing
---

I tested both python 2.6 and 3 on a Linux system



create or replace function x() returns text as $$
cur=None
try:
  with plpy.subtransaction():
cur=plpy.cursor('select generate_series(1,1000)')
rows=cur.fetch(10);
plpy.execute('select f()')

except plpy.SPIError:
  rows=cur.fetch(10);
  return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();

crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG:  server 
process (PID 3166) was terminated by signal 11: Segmentation fault


The below test gives me a strange error message:

create or replace function x1() returns text as $$
plan=None
try:
  with plpy.subtransaction():
plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select * FROM z')
plpy.execute('select * FROM does_not_exist')
except plpy.SPIError, e:
cur=plpy.cursor(plan)
rows=cur.fetch(10)
return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();

test=# select x1()
test-# ;
ERROR:  TypeError: Expected sequence of 82187072 arguments, got 0: NULL
CONTEXT:  Traceback (most recent call last):
  PL/Python function x1, line 9, in module
cur=plpy.cursor(plan)
PL/Python function x1
STATEMENT:  select x1()


I was expecting an error from the function just a bit more useful one.


Performance
---
I did not do any specific performance testing but I don't see this patch 
as having any impact to performance




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython SPI cursors

2011-11-20 Thread Jan Urbański

On 20/11/11 19:14, Steve Singer wrote:

On 11-10-15 07:28 PM, Jan Urbański wrote:

Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.



I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code


Hi Steve,

thanks a lot for the review, I'll investigate the errors you were 
getting and post a follow-up.


Good catch on trying cursors with explicit subtransactions, I didn't 
think about how they would interact.


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers