Re: [HACKERS] plpython SPI cursors
On mån, 2011-12-05 at 13:12 -0500, Bruce Momjian wrote: Jan Urbański wrote: On 05/12/11 18:58, Peter Eisentraut wrote: On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote: On 20/11/11 19:14, Steve Singer wrote: Responding now to all questions and attaching a revised patch based on your comments. Committed Please refresh the other patch. Great, thanks! I'll try to send an updated version of the other patch this evening. I assume this is _not_ related to this TODO item: Add a DB-API compliant interface on top of the SPI interface No, but this is: http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote: On 20/11/11 19:14, Steve Singer wrote: On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Responding now to all questions and attaching a revised patch based on your comments. Committed Please refresh the other patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 05/12/11 18:58, Peter Eisentraut wrote: On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote: On 20/11/11 19:14, Steve Singer wrote: Responding now to all questions and attaching a revised patch based on your comments. Committed Please refresh the other patch. Great, thanks! I'll try to send an updated version of the other patch this evening. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
Jan Urba??ski wrote: On 05/12/11 18:58, Peter Eisentraut wrote: On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote: On 20/11/11 19:14, Steve Singer wrote: Responding now to all questions and attaching a revised patch based on your comments. Committed Please refresh the other patch. Great, thanks! I'll try to send an updated version of the other patch this evening. I assume this is _not_ related to this TODO item: Add a DB-API compliant interface on top of the SPI interface -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 05/12/11 19:12, Bruce Momjian wrote: Jan Urbański wrote: On 05/12/11 18:58, Peter Eisentraut wrote: On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote: On 20/11/11 19:14, Steve Singer wrote: Responding now to all questions and attaching a revised patch based on your comments. Committed Please refresh the other patch. Great, thanks! I'll try to send an updated version of the other patch this evening. I assume this is _not_ related to this TODO item: Add a DB-API compliant interface on top of the SPI interface No, not related. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On lör, 2011-11-26 at 11:21 -0500, Steve Singer wrote: I've looked over the revised version of the patch and it now seems fine. Ready for committer. I can take it from here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 11-11-23 01:58 PM, Jan Urbański wrote: On 20/11/11 19:14, Steve Singer wrote: On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Responding now to all questions and attaching a revised patch based on your comments. I've looked over the revised version of the patch and it now seems fine. Ready for committer. Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. We use plpy.subtransaction() to create Subxact objects, so I though plpy.cursor() would be most appropriate. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. My idea is to add keyword arguments to plpy.cursor() that will allow you to decide whether you want a scrollable cursor and after that provide a move() method. The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet summoned force to overhaul them. in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portal portal; + char *volatile nulls; + volatile int j; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. It looked like you could drop volatile there (and in PLy_spi_execute_plan, where this is copied from (did I mention there's quite some code duplication in PL/Python?)) but digging in git I found this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000 that added the original volatile qualification, so I guess there's a reason. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char *query; + PyObject *plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, s, query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() No, PyArg_ParseTuple returns a string on the stack, I check that repeatedly creating a cursor with a plan argument does not leak memory and that adding PyMem_Free there promptly leads to a segfault. I tested both python 2.6 and 3 on a Linux system [test cases demonstrating bugs] Turns out it's a really bad idea to store pointers to Portal structures, because they get invalidated by the subtransaction abort hooks. I switched to storing the cursor name and looking it up in the appropriate hash table every time it's used. The examples you sent (which I included as regression tests) now cause a ValueError to be raised with a message stating that the cursor has been created in an aborted subtransaction. Not sure about the wording of the error message, though. Thanks again for the review! Cheers, Jan
Re: [HACKERS] plpython SPI cursors
On 20/11/11 19:14, Steve Singer wrote: On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Responding now to all questions and attaching a revised patch based on your comments. Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. We use plpy.subtransaction() to create Subxact objects, so I though plpy.cursor() would be most appropriate. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. My idea is to add keyword arguments to plpy.cursor() that will allow you to decide whether you want a scrollable cursor and after that provide a move() method. The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet summoned force to overhaul them. in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portal portal; + char *volatile nulls; + volatile int j; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. It looked like you could drop volatile there (and in PLy_spi_execute_plan, where this is copied from (did I mention there's quite some code duplication in PL/Python?)) but digging in git I found this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000 that added the original volatile qualification, so I guess there's a reason. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char *query; + PyObject *plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, s, query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() No, PyArg_ParseTuple returns a string on the stack, I check that repeatedly creating a cursor with a plan argument does not leak memory and that adding PyMem_Free there promptly leads to a segfault. I tested both python 2.6 and 3 on a Linux system [test cases demonstrating bugs] Turns out it's a really bad idea to store pointers to Portal structures, because they get invalidated by the subtransaction abort hooks. I switched to storing the cursor name and looking it up in the appropriate hash table every time it's used. The examples you sent (which I included as regression tests) now cause a ValueError to be raised with a message stating that the cursor has been created in an aborted subtransaction. Not sure about the wording of the error message, though. Thanks again for the review! Cheers, Jan diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml index eda2bbf..d08c3d1 100644 --- a/doc/src/sgml/plpython.sgml +++ b/doc/src/sgml/plpython.sgml @@ -892,6 +892,15 @@ $$ LANGUAGE plpythonu; /para para +Note that calling literalplpy.execute/literal will cause the entire +result set to be read into memory. Only use that function when you are sure +that the result set will be relatively small. If you don't want to risk +excessive memory usage when fetching large results, +use literalplpy.cursor/literal rather +than literalplpy.execute/literal. + /para + + para For example: programlisting rv = plpy.execute(SELECT * FROM my_table, 5) @@ -958,6 +967,77 @@ $$ LANGUAGE plpythonu; /sect2 + sect2 +titleAccessing data with cursors/title + + para +The literalplpy.cursor/literal function accepts the same arguments +as literalplpy.execute/literal (except for literallimit/literal) +and returns a cursor object, which allows you to process large result sets +in smaller chunks. As with literalplpy.execute/literal, either a query +string or a plan object along with a list of arguments can be used. The +cursor object provides a literalfetch/literal method that accepts an +integer paramter and returns a result object. Each time you +call literalfetch/literal, the returned object will contain the next +
Re: [HACKERS] plpython SPI cursors
On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. Currently when trying to process a large table in PL/Python you have slurp it all into memory (that's what plpy.execute does). J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Overview Feature Review --- This patch adds cursor support to plpython. SPI cursors will allow a plpython function to read process a large results set without having to read it all into memory at once. This is a good thing. Without this patch I think you could accomplish the same with using SQL DECLARE CURSOR and SQL fetch.This feature allows you to use a python cursor as an iterator resulting in much cleaner python code than the SQL FETCH approach. I think the feature is worth having Usability Review -- The patch adds the user methods cursor=plpy.cursor(query_or_plan) cursor.fetch(100) cursor.close() Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. Documentation Review - The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Code Review in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portalportal; + char *volatile nulls; + volatile int j; + + if (nargs 0) + nulls = palloc(nargs * sizeof(char)); + else + nulls = NULL; + + for (j = 0; j nargs; j++) + { + PyObject *elem; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char*query; + PyObject*plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, s, query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() Testing --- I tested both python 2.6 and 3 on a Linux system create or replace function x() returns text as $$ cur=None try: with plpy.subtransaction(): cur=plpy.cursor('select generate_series(1,1000)') rows=cur.fetch(10); plpy.execute('select f()') except plpy.SPIError: rows=cur.fetch(10); return rows[0]['generate_series'] return 'none' $$ language plpythonu; select x(); crashes the backend test=# select x(); The connection to the server was lost. Attempting reset: LOG: server process (PID 3166) was terminated by signal 11: Segmentation fault The below test gives me a strange error message: create or replace function x1() returns text as $$ plan=None try: with plpy.subtransaction(): plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)') plan=plpy.prepare('select * FROM z') plpy.execute('select * FROM does_not_exist') except plpy.SPIError, e: cur=plpy.cursor(plan) rows=cur.fetch(10) return rows[0]['generate_series'] return '1' $$ language plpythonu; select x1(); test=# select x1() test-# ; ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: NULL CONTEXT: Traceback (most recent call last): PL/Python function x1, line 9, in module cur=plpy.cursor(plan) PL/Python function x1 STATEMENT: select x1() I was expecting an error from the function just a bit more useful one. Performance --- I did not do any specific performance testing but I don't see this patch as having any impact to performance -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython SPI cursors
On 20/11/11 19:14, Steve Singer wrote: On 11-10-15 07:28 PM, Jan Urbański wrote: Hi, attached is a patch implementing the usage of SPI cursors in PL/Python. I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Hi Steve, thanks a lot for the review, I'll investigate the errors you were getting and post a follow-up. Good catch on trying cursors with explicit subtransactions, I didn't think about how they would interact. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers