Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 06/08/12 13:59, Heikki Linnakangas wrote: On 20.07.2012 10:13, Jan Urbański wrote: On 20/07/12 08:59, Jan Urbański wrote: On 18/07/12 17:17, Heikki Linnakangas wrote: On 14.07.2012 17:50, Jan Urbański wrote: If pg_do_encoding_conversion() throws an error, you don't get a chance to call Py_DECREF() to release the string. Is that a problem? If an error occurs in PLy_traceback(), after incrementing recursion_depth, you don't get a chance to decrement it again. I'm not sure if the Py* function calls can fail, but at least seemingly trivial things like initStringInfo() can throw an out-of-memory error. Of course you're right (on both accounts). Here's a version with a bunch of PG_TRies thrown in. Silly me, playing tricks with postincrements before fully waking up. Here's v3, with a correct inequality test for exceeding the traceback recursion test. Committed the convert-via-UTF-8 part of this. I'll take a closer look at the recursion check next. Thanks! Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 20.07.2012 10:13, Jan Urbański wrote: On 20/07/12 08:59, Jan Urbański wrote: On 18/07/12 17:17, Heikki Linnakangas wrote: On 14.07.2012 17:50, Jan Urbański wrote: If pg_do_encoding_conversion() throws an error, you don't get a chance to call Py_DECREF() to release the string. Is that a problem? If an error occurs in PLy_traceback(), after incrementing recursion_depth, you don't get a chance to decrement it again. I'm not sure if the Py* function calls can fail, but at least seemingly trivial things like initStringInfo() can throw an out-of-memory error. Of course you're right (on both accounts). Here's a version with a bunch of PG_TRies thrown in. Silly me, playing tricks with postincrements before fully waking up. Here's v3, with a correct inequality test for exceeding the traceback recursion test. Committed the convert-via-UTF-8 part of this. I'll take a closer look at the recursion check next. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 18/07/12 17:17, Heikki Linnakangas wrote: On 14.07.2012 17:50, Jan Urbański wrote: If pg_do_encoding_conversion() throws an error, you don't get a chance to call Py_DECREF() to release the string. Is that a problem? If an error occurs in PLy_traceback(), after incrementing recursion_depth, you don't get a chance to decrement it again. I'm not sure if the Py* function calls can fail, but at least seemingly trivial things like initStringInfo() can throw an out-of-memory error. Of course you're right (on both accounts). Here's a version with a bunch of PG_TRies thrown in. Cheers, Jan diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c new file mode 100644 index c375ac0..9944f72 *** a/src/pl/plpython/plpy_elog.c --- b/src/pl/plpython/plpy_elog.c *** static char *get_source_line(const char *** 28,33 --- 28,41 /* + * Guard agains re-entrant calls to PLy_traceback, which can happen if + * traceback formatting functions raise Python errors. + */ + #define TRACEBACK_RECURSION_LIMIT 2 + static int recursion_depth = 0; + + + /* * Emit a PG error or notice, together with any available info about * the current Python error, previously set by PLy_exception_set(). * This should be used to propagate Python errors into PG. If fmt is *** static char *get_source_line(const char *** 38,46 void PLy_elog(int elevel, const char *fmt,...) { ! char *xmsg; ! char *tbmsg; ! int tb_depth; StringInfoData emsg; PyObject *exc, *val, --- 46,54 void PLy_elog(int elevel, const char *fmt,...) { ! char *xmsg = NULL; ! char *tbmsg = NULL; ! int tb_depth = 0; StringInfoData emsg; PyObject *exc, *val, *** PLy_elog(int elevel, const char *fmt,... *** 62,68 } PyErr_Restore(exc, val, tb); ! PLy_traceback(xmsg, tbmsg, tb_depth); if (fmt) { --- 70,90 } PyErr_Restore(exc, val, tb); ! if (recursion_depth++ = TRACEBACK_RECURSION_LIMIT) ! { ! PG_TRY(); ! { ! PLy_traceback(xmsg, tbmsg, tb_depth); ! } ! PG_CATCH(); ! { ! recursion_depth--; ! PG_RE_THROW(); ! } ! PG_END_TRY(); ! } ! ! recursion_depth--; if (fmt) { diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c new file mode 100644 index 4aabafc..94c035e *** a/src/pl/plpython/plpy_util.c --- b/src/pl/plpython/plpy_util.c *** PLy_free(void *ptr) *** 61,126 PyObject * PLyUnicode_Bytes(PyObject *unicode) { ! PyObject *rv; ! const char *serverenc; ! /* ! * Map PostgreSQL encoding to a Python encoding name. ! */ ! switch (GetDatabaseEncoding()) { ! case PG_SQL_ASCII: ! /* ! * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's ! * 'ascii' means true 7-bit only ASCII, while PostgreSQL's ! * SQL_ASCII means that anything is allowed, and the system doesn't ! * try to interpret the bytes in any way. But not sure what else ! * to do, and we haven't heard any complaints... ! */ ! serverenc = ascii; ! break; ! case PG_WIN1250: ! serverenc = cp1250; ! break; ! case PG_WIN1251: ! serverenc = cp1251; ! break; ! case PG_WIN1252: ! serverenc = cp1252; ! break; ! case PG_WIN1253: ! serverenc = cp1253; ! break; ! case PG_WIN1254: ! serverenc = cp1254; ! break; ! case PG_WIN1255: ! serverenc = cp1255; ! break; ! case PG_WIN1256: ! serverenc = cp1256; ! break; ! case PG_WIN1257: ! serverenc = cp1257; ! break; ! case PG_WIN1258: ! serverenc = cp1258; ! break; ! case PG_WIN866: ! serverenc = cp866; ! break; ! case PG_WIN874: ! serverenc = cp874; ! break; ! default: ! /* Other encodings have the same name in Python. */ ! serverenc = GetDatabaseEncodingName(); ! break; } ! rv = PyUnicode_AsEncodedString(unicode, serverenc, strict); ! if (rv == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding); return rv; } --- 61,106 PyObject * PLyUnicode_Bytes(PyObject *unicode) { ! PyObject *bytes, *rv; ! char *utf8string, *encoded; ! /* first encode the Python unicode object with UTF-8 */ ! bytes = PyUnicode_AsUTF8String(unicode); ! if (bytes == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to bytes); ! ! utf8string = PyBytes_AsString(bytes); ! if (utf8string == NULL) { ! Py_DECREF(bytes); ! PLy_elog(ERROR, could not extract bytes from encoded string); ! } ! ! /* then convert to server encoding */ ! PG_TRY(); { ! encoded = (char *) pg_do_encoding_conversion( ! (unsigned char *) utf8string, ! strlen(utf8string), ! PG_UTF8, ! GetDatabaseEncoding()); ! } ! PG_CATCH(); ! { ! Py_DECREF(bytes); ! PG_RE_THROW(); } + PG_END_TRY(); ! /* finally, build a bytes object in the server encoding */ ! rv = PyBytes_FromStringAndSize(encoded, strlen(encoded)); ! ! /*
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 20/07/12 08:59, Jan Urbański wrote: On 18/07/12 17:17, Heikki Linnakangas wrote: On 14.07.2012 17:50, Jan Urbański wrote: If pg_do_encoding_conversion() throws an error, you don't get a chance to call Py_DECREF() to release the string. Is that a problem? If an error occurs in PLy_traceback(), after incrementing recursion_depth, you don't get a chance to decrement it again. I'm not sure if the Py* function calls can fail, but at least seemingly trivial things like initStringInfo() can throw an out-of-memory error. Of course you're right (on both accounts). Here's a version with a bunch of PG_TRies thrown in. Silly me, playing tricks with postincrements before fully waking up. Here's v3, with a correct inequality test for exceeding the traceback recursion test. J diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c new file mode 100644 index c375ac0..0ad8358 *** a/src/pl/plpython/plpy_elog.c --- b/src/pl/plpython/plpy_elog.c *** static char *get_source_line(const char *** 28,33 --- 28,41 /* + * Guard agains re-entrant calls to PLy_traceback, which can happen if + * traceback formatting functions raise Python errors. + */ + #define TRACEBACK_RECURSION_LIMIT 2 + static int recursion_depth = 0; + + + /* * Emit a PG error or notice, together with any available info about * the current Python error, previously set by PLy_exception_set(). * This should be used to propagate Python errors into PG. If fmt is *** static char *get_source_line(const char *** 38,46 void PLy_elog(int elevel, const char *fmt,...) { ! char *xmsg; ! char *tbmsg; ! int tb_depth; StringInfoData emsg; PyObject *exc, *val, --- 46,54 void PLy_elog(int elevel, const char *fmt,...) { ! char *xmsg = NULL; ! char *tbmsg = NULL; ! int tb_depth = 0; StringInfoData emsg; PyObject *exc, *val, *** PLy_elog(int elevel, const char *fmt,... *** 62,68 } PyErr_Restore(exc, val, tb); ! PLy_traceback(xmsg, tbmsg, tb_depth); if (fmt) { --- 70,90 } PyErr_Restore(exc, val, tb); ! if (recursion_depth++ TRACEBACK_RECURSION_LIMIT) ! { ! PG_TRY(); ! { ! PLy_traceback(xmsg, tbmsg, tb_depth); ! } ! PG_CATCH(); ! { ! recursion_depth--; ! PG_RE_THROW(); ! } ! PG_END_TRY(); ! } ! ! recursion_depth--; if (fmt) { diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c new file mode 100644 index 4aabafc..94c035e *** a/src/pl/plpython/plpy_util.c --- b/src/pl/plpython/plpy_util.c *** PLy_free(void *ptr) *** 61,126 PyObject * PLyUnicode_Bytes(PyObject *unicode) { ! PyObject *rv; ! const char *serverenc; ! /* ! * Map PostgreSQL encoding to a Python encoding name. ! */ ! switch (GetDatabaseEncoding()) { ! case PG_SQL_ASCII: ! /* ! * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's ! * 'ascii' means true 7-bit only ASCII, while PostgreSQL's ! * SQL_ASCII means that anything is allowed, and the system doesn't ! * try to interpret the bytes in any way. But not sure what else ! * to do, and we haven't heard any complaints... ! */ ! serverenc = ascii; ! break; ! case PG_WIN1250: ! serverenc = cp1250; ! break; ! case PG_WIN1251: ! serverenc = cp1251; ! break; ! case PG_WIN1252: ! serverenc = cp1252; ! break; ! case PG_WIN1253: ! serverenc = cp1253; ! break; ! case PG_WIN1254: ! serverenc = cp1254; ! break; ! case PG_WIN1255: ! serverenc = cp1255; ! break; ! case PG_WIN1256: ! serverenc = cp1256; ! break; ! case PG_WIN1257: ! serverenc = cp1257; ! break; ! case PG_WIN1258: ! serverenc = cp1258; ! break; ! case PG_WIN866: ! serverenc = cp866; ! break; ! case PG_WIN874: ! serverenc = cp874; ! break; ! default: ! /* Other encodings have the same name in Python. */ ! serverenc = GetDatabaseEncodingName(); ! break; } ! rv = PyUnicode_AsEncodedString(unicode, serverenc, strict); ! if (rv == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding); return rv; } --- 61,106 PyObject * PLyUnicode_Bytes(PyObject *unicode) { ! PyObject *bytes, *rv; ! char *utf8string, *encoded; ! /* first encode the Python unicode object with UTF-8 */ ! bytes = PyUnicode_AsUTF8String(unicode); ! if (bytes == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to bytes); ! ! utf8string = PyBytes_AsString(bytes); ! if (utf8string == NULL) { ! Py_DECREF(bytes); ! PLy_elog(ERROR, could not extract bytes from encoded string); ! } ! ! /* then convert to server encoding */ ! PG_TRY(); { ! encoded = (char *) pg_do_encoding_conversion( ! (unsigned char *) utf8string, ! strlen(utf8string), ! PG_UTF8, ! GetDatabaseEncoding()); ! } ! PG_CATCH(); ! { !
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 14.07.2012 17:50, Jan Urbański wrote: On 13/07/12 13:38, Jan Urbański wrote: On 12/07/12 11:08, Heikki Linnakangas wrote: On 07.07.2012 00:12, Jan Urbański wrote: So you're in favour of doing unicode - bytes by encoding with UTF-8 and then using the server's encoding functions? Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2 should be quite fast, and it would be good to be consistent in the way we do conversions in both directions. I'll implement that than (sorry for not following up on that eariler). Here's a patch that always encodes Python unicode objects using UTF-8 and then uses Postgres's internal functions to produce bytes in the server encoding. Thanks. If pg_do_encoding_conversion() throws an error, you don't get a chance to call Py_DECREF() to release the string. Is that a problem? If an error occurs in PLy_traceback(), after incrementing recursion_depth, you don't get a chance to decrement it again. I'm not sure if the Py* function calls can fail, but at least seemingly trivial things like initStringInfo() can throw an out-of-memory error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 13/07/12 13:38, Jan Urbański wrote: On 12/07/12 11:08, Heikki Linnakangas wrote: On 07.07.2012 00:12, Jan Urbański wrote: So you're in favour of doing unicode - bytes by encoding with UTF-8 and then using the server's encoding functions? Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2 should be quite fast, and it would be good to be consistent in the way we do conversions in both directions. I'll implement that than (sorry for not following up on that eariler). Here's a patch that always encodes Python unicode objects using UTF-8 and then uses Postgres's internal functions to produce bytes in the server encoding. Cheers, Jan diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c new file mode 100644 index c375ac0..c2b3cb8 *** a/src/pl/plpython/plpy_elog.c --- b/src/pl/plpython/plpy_elog.c *** static char *get_source_line(const char *** 28,33 --- 28,41 /* + * Guard agains re-entrant calls to PLy_traceback, which can happen if + * traceback formatting functions raise Python errors. + */ + #define TRACEBACK_RECURSION_LIMIT 2 + static int recursion_depth = 0; + + + /* * Emit a PG error or notice, together with any available info about * the current Python error, previously set by PLy_exception_set(). * This should be used to propagate Python errors into PG. If fmt is *** PLy_traceback(char **xmsg, char **tbmsg, *** 147,166 StringInfoData xstr; StringInfoData tbstr; /* * get the current exception */ PyErr_Fetch(e, v, tb); /* ! * oops, no exception, return */ ! if (e == NULL) { *xmsg = NULL; *tbmsg = NULL; *tb_depth = 0; return; } --- 155,177 StringInfoData xstr; StringInfoData tbstr; + recursion_depth++; + /* * get the current exception */ PyErr_Fetch(e, v, tb); /* ! * oops, no exception or recursion depth exceeded, return */ ! if (e == NULL || recursion_depth TRACEBACK_RECURSION_LIMIT) { *xmsg = NULL; *tbmsg = NULL; *tb_depth = 0; + recursion_depth--; return; } *** PLy_traceback(char **xmsg, char **tbmsg, *** 326,331 --- 337,344 (*tb_depth)++; } + recursion_depth--; + /* Return the traceback. */ *tbmsg = tbstr.data; diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c new file mode 100644 index 4aabafc..fe43312 *** a/src/pl/plpython/plpy_util.c --- b/src/pl/plpython/plpy_util.c *** PLy_free(void *ptr) *** 61,126 PyObject * PLyUnicode_Bytes(PyObject *unicode) { ! PyObject *rv; ! const char *serverenc; ! /* ! * Map PostgreSQL encoding to a Python encoding name. ! */ ! switch (GetDatabaseEncoding()) { ! case PG_SQL_ASCII: ! /* ! * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's ! * 'ascii' means true 7-bit only ASCII, while PostgreSQL's ! * SQL_ASCII means that anything is allowed, and the system doesn't ! * try to interpret the bytes in any way. But not sure what else ! * to do, and we haven't heard any complaints... ! */ ! serverenc = ascii; ! break; ! case PG_WIN1250: ! serverenc = cp1250; ! break; ! case PG_WIN1251: ! serverenc = cp1251; ! break; ! case PG_WIN1252: ! serverenc = cp1252; ! break; ! case PG_WIN1253: ! serverenc = cp1253; ! break; ! case PG_WIN1254: ! serverenc = cp1254; ! break; ! case PG_WIN1255: ! serverenc = cp1255; ! break; ! case PG_WIN1256: ! serverenc = cp1256; ! break; ! case PG_WIN1257: ! serverenc = cp1257; ! break; ! case PG_WIN1258: ! serverenc = cp1258; ! break; ! case PG_WIN866: ! serverenc = cp866; ! break; ! case PG_WIN874: ! serverenc = cp874; ! break; ! default: ! /* Other encodings have the same name in Python. */ ! serverenc = GetDatabaseEncodingName(); ! break; } ! rv = PyUnicode_AsEncodedString(unicode, serverenc, strict); ! if (rv == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding); return rv; } --- 61,96 PyObject * PLyUnicode_Bytes(PyObject *unicode) { ! PyObject *bytes, *rv; ! char *utf8string, *encoded; ! /* first encode the Python unicode object with UTF-8 */ ! bytes = PyUnicode_AsUTF8String(unicode); ! if (bytes == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to bytes); ! ! utf8string = PyBytes_AsString(bytes); ! if (utf8string == NULL) { ! Py_DECREF(bytes); ! PLy_elog(ERROR, could not extract bytes from encoded string); ! } ! ! /* then convert to server encoding */ ! encoded = (char *) pg_do_encoding_conversion((unsigned char *) utf8string, ! strlen(utf8string), ! PG_UTF8, ! GetDatabaseEncoding()); ! ! /* finally, build a bytes object in the server encoding */ ! rv = PyBytes_FromStringAndSize(encoded,
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 12/07/12 11:08, Heikki Linnakangas wrote: On 07.07.2012 00:12, Jan Urbański wrote: On 06/07/12 22:47, Peter Eisentraut wrote: On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote: What shall we do about those? Ignore them? Document that if you're sing one of these encodings then PL/Python with Python 2 will be crippled and with Python 3 just won't work? We could convert to UTF-8, and use the PostgreSQL functions to convert from UTF-8 to the server encoding. Double conversion might be slow, but I think it would be better than failing. Actually, we already do the other direction that way (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to always use this. I would hesitate to use this as a kind of fallback, because then we would sometimes be using PostgreSQL's recoding tables and sometimes Python's recoding tables, which could became confusing. So you're in favour of doing unicode - bytes by encoding with UTF-8 and then using the server's encoding functions? Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2 should be quite fast, and it would be good to be consistent in the way we do conversions in both directions. I'll implement that than (sorry for not following up on that eariler). J -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 07.07.2012 00:12, Jan Urbański wrote: On 06/07/12 22:47, Peter Eisentraut wrote: On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote: What shall we do about those? Ignore them? Document that if you're sing one of these encodings then PL/Python with Python 2 will be crippled and with Python 3 just won't work? We could convert to UTF-8, and use the PostgreSQL functions to convert from UTF-8 to the server encoding. Double conversion might be slow, but I think it would be better than failing. Actually, we already do the other direction that way (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to always use this. I would hesitate to use this as a kind of fallback, because then we would sometimes be using PostgreSQL's recoding tables and sometimes Python's recoding tables, which could became confusing. So you're in favour of doing unicode - bytes by encoding with UTF-8 and then using the server's encoding functions? Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2 should be quite fast, and it would be good to be consistent in the way we do conversions in both directions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 06.07.2012 00:54, Jan Urbański wrote: On 05/07/12 23:30, Peter Eisentraut wrote: On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote: The problem is that PLyUnicode_Bytes is (via an ifdef) used as PyString_ToString on Python3, which means that there are numerous call sites and new ones might appear in any moment. I'm not that keen on invoking the traceback machinery on low-level encoding errors. Why not? Because it can lead to recursion errors, like the one this patch was supposed to fix. The traceback machinery calls into the encoding functions, because it converts Python strings (like function names) into C strings. In the backend elog routines, there is a global variable 'recursion_depth', which is incremented when an error-handling routine is entered, and decremented afterwards. Can we use a similar mechinism in PLy_elog() to detect and stop recursion? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 06/07/12 10:05, Heikki Linnakangas wrote: On 06.07.2012 00:54, Jan Urbański wrote: On 05/07/12 23:30, Peter Eisentraut wrote: On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote: The problem is that PLyUnicode_Bytes is (via an ifdef) used as PyString_ToString on Python3, which means that there are numerous call sites and new ones might appear in any moment. I'm not that keen on invoking the traceback machinery on low-level encoding errors. Why not? Because it can lead to recursion errors, like the one this patch was supposed to fix. The traceback machinery calls into the encoding functions, because it converts Python strings (like function names) into C strings. In the backend elog routines, there is a global variable 'recursion_depth', which is incremented when an error-handling routine is entered, and decremented afterwards. Can we use a similar mechinism in PLy_elog() to detect and stop recursion? I guess we can, I'll try to do some tests in order to see if there's an easy user-triggereable way of causing PLy_elog to recurse and if not then a guard like this should be enough as a safety measure against as yet unknown conditions (as opposed to something we expect to happen regularly). Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 06/07/12 10:14, Jan Urbański wrote: On 06/07/12 10:05, Heikki Linnakangas wrote: In the backend elog routines, there is a global variable 'recursion_depth', which is incremented when an error-handling routine is entered, and decremented afterwards. Can we use a similar mechinism in PLy_elog() to detect and stop recursion? I guess we can, I'll try to do some tests in order to see if there's an easy user-triggereable way of causing PLy_elog to recurse and if not then a guard like this should be enough as a safety measure against as yet unknown conditions (as opposed to something we expect to happen regularly). Attached is a patch that stores the recursion level of PLy_traceback and prevents it from running if it's too deep (PLy_traceback is the one doing heavy lifting, that's why I chose to put the logic to skip running there). I tried a few things and was not able to easily invoke the infinite recursion condition, but I did notice that there are two more encodings that have different names in Postgres and in Python (KOI8-R and KOI8-U) and added them to the switch. There's still trouble with EUC_TW and MULE_INTERNAL which don't have Python equivalents. EUC-TW has been discussed in http://bugs.python.org/issue2066 and rejected (see http://bugs.python.org/issue2066#msg113731). If you use any of these encodings, you *will* get into the recursion trouble described eariler, just as before the path you'd get into it with CP1252 as your encoding. What shall we do about those? Ignore them? Document that if you're sing one of these encodings then PL/Python with Python 2 will be crippled and with Python 3 just won't work? Cheers, Jan diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c new file mode 100644 index c375ac0..c2b3cb8 *** a/src/pl/plpython/plpy_elog.c --- b/src/pl/plpython/plpy_elog.c *** static char *get_source_line(const char *** 28,33 --- 28,41 /* + * Guard agains re-entrant calls to PLy_traceback, which can happen if + * traceback formatting functions raise Python errors. + */ + #define TRACEBACK_RECURSION_LIMIT 2 + static int recursion_depth = 0; + + + /* * Emit a PG error or notice, together with any available info about * the current Python error, previously set by PLy_exception_set(). * This should be used to propagate Python errors into PG. If fmt is *** PLy_traceback(char **xmsg, char **tbmsg, *** 147,166 StringInfoData xstr; StringInfoData tbstr; /* * get the current exception */ PyErr_Fetch(e, v, tb); /* ! * oops, no exception, return */ ! if (e == NULL) { *xmsg = NULL; *tbmsg = NULL; *tb_depth = 0; return; } --- 155,177 StringInfoData xstr; StringInfoData tbstr; + recursion_depth++; + /* * get the current exception */ PyErr_Fetch(e, v, tb); /* ! * oops, no exception or recursion depth exceeded, return */ ! if (e == NULL || recursion_depth TRACEBACK_RECURSION_LIMIT) { *xmsg = NULL; *tbmsg = NULL; *tb_depth = 0; + recursion_depth--; return; } *** PLy_traceback(char **xmsg, char **tbmsg, *** 326,331 --- 337,344 (*tb_depth)++; } + recursion_depth--; + /* Return the traceback. */ *tbmsg = tbstr.data; diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c new file mode 100644 index bf29532..ea4ecdf *** a/src/pl/plpython/plpy_util.c --- b/src/pl/plpython/plpy_util.c *** PLyUnicode_Bytes(PyObject *unicode) *** 112,117 --- 112,123 case PG_WIN874: serverenc = cp874; break; + case PG_KOI8R: + serverenc = koi8-r; + break; + case PG_KOI8U: + serverenc = koi8-u; + break; default: /* Other encodings have the same name in Python. */ serverenc = GetDatabaseEncodingName(); *** PLyUnicode_Bytes(PyObject *unicode) *** 120,135 rv = PyUnicode_AsEncodedString(unicode, serverenc, strict); if (rv == NULL) ! { ! /* ! * Use a plain ereport instead of PLy_elog to avoid recursion, if ! * the traceback formatting functions try to do unicode to bytes ! * conversion again. ! */ ! ereport(ERROR, ! (errcode(ERRCODE_INTERNAL_ERROR), ! errmsg(could not convert Python Unicode object to PostgreSQL server encoding))); ! } return rv; } --- 126,132 rv = PyUnicode_AsEncodedString(unicode, serverenc, strict); if (rv == NULL) ! PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding); return rv; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 06.07.2012 18:01, Jan Urbański wrote: There's still trouble with EUC_TW and MULE_INTERNAL which don't have Python equivalents. EUC-TW has been discussed in http://bugs.python.org/issue2066 and rejected (see http://bugs.python.org/issue2066#msg113731). If you use any of these encodings, you *will* get into the recursion trouble described eariler, just as before the path you'd get into it with CP1252 as your encoding. What shall we do about those? Ignore them? Document that if you're sing one of these encodings then PL/Python with Python 2 will be crippled and with Python 3 just won't work? We could convert to UTF-8, and use the PostgreSQL functions to convert from UTF-8 to the server encoding. Double conversion might be slow, but I think it would be better than failing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote: What shall we do about those? Ignore them? Document that if you're sing one of these encodings then PL/Python with Python 2 will be crippled and with Python 3 just won't work? We could convert to UTF-8, and use the PostgreSQL functions to convert from UTF-8 to the server encoding. Double conversion might be slow, but I think it would be better than failing. Actually, we already do the other direction that way (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to always use this. I would hesitate to use this as a kind of fallback, because then we would sometimes be using PostgreSQL's recoding tables and sometimes Python's recoding tables, which could became confusing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 06/07/12 22:47, Peter Eisentraut wrote: On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote: What shall we do about those? Ignore them? Document that if you're sing one of these encodings then PL/Python with Python 2 will be crippled and with Python 3 just won't work? We could convert to UTF-8, and use the PostgreSQL functions to convert from UTF-8 to the server encoding. Double conversion might be slow, but I think it would be better than failing. Actually, we already do the other direction that way (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to always use this. I would hesitate to use this as a kind of fallback, because then we would sometimes be using PostgreSQL's recoding tables and sometimes Python's recoding tables, which could became confusing. So you're in favour of doing unicode - bytes by encoding with UTF-8 and then using the server's encoding functions? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote: The problem is that PLyUnicode_Bytes is (via an ifdef) used as PyString_ToString on Python3, which means that there are numerous call sites and new ones might appear in any moment. I'm not that keen on invoking the traceback machinery on low-level encoding errors. Why not? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
On 05/07/12 23:30, Peter Eisentraut wrote: On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote: The problem is that PLyUnicode_Bytes is (via an ifdef) used as PyString_ToString on Python3, which means that there are numerous call sites and new ones might appear in any moment. I'm not that keen on invoking the traceback machinery on low-level encoding errors. Why not? Because it can lead to recursion errors, like the one this patch was supposed to fix. The traceback machinery calls into the encoding functions, because it converts Python strings (like function names) into C strings. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers