Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-08-09 Thread Jan Urbański

On 06/08/12 13:59, Heikki Linnakangas wrote:

On 20.07.2012 10:13, Jan Urbański wrote:

On 20/07/12 08:59, Jan Urbański wrote:

On 18/07/12 17:17, Heikki Linnakangas wrote:

On 14.07.2012 17:50, Jan Urbański wrote:

If pg_do_encoding_conversion() throws an error, you don't get a chance
to call Py_DECREF() to release the string. Is that a problem?

If an error occurs in PLy_traceback(), after incrementing
recursion_depth, you don't get a chance to decrement it again. I'm not
sure if the Py* function calls can fail, but at least seemingly trivial
things like initStringInfo() can throw an out-of-memory error.


Of course you're right (on both accounts).

Here's a version with a bunch of PG_TRies thrown in.


Silly me, playing tricks with postincrements before fully waking up.

Here's v3, with a correct inequality test for exceeding the traceback
recursion test.


Committed the convert-via-UTF-8 part of this. I'll take a closer look at
the recursion check next.


Thanks!

Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-08-06 Thread Heikki Linnakangas

On 20.07.2012 10:13, Jan Urbański wrote:

On 20/07/12 08:59, Jan Urbański wrote:

On 18/07/12 17:17, Heikki Linnakangas wrote:

On 14.07.2012 17:50, Jan Urbański wrote:

If pg_do_encoding_conversion() throws an error, you don't get a chance
to call Py_DECREF() to release the string. Is that a problem?

If an error occurs in PLy_traceback(), after incrementing
recursion_depth, you don't get a chance to decrement it again. I'm not
sure if the Py* function calls can fail, but at least seemingly trivial
things like initStringInfo() can throw an out-of-memory error.


Of course you're right (on both accounts).

Here's a version with a bunch of PG_TRies thrown in.


Silly me, playing tricks with postincrements before fully waking up.

Here's v3, with a correct inequality test for exceeding the traceback
recursion test.


Committed the convert-via-UTF-8 part of this. I'll take a closer look at 
the recursion check next.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-20 Thread Jan Urbański

On 18/07/12 17:17, Heikki Linnakangas wrote:

On 14.07.2012 17:50, Jan Urbański wrote:

If pg_do_encoding_conversion() throws an error, you don't get a chance
to call Py_DECREF() to release the string. Is that a problem?

If an error occurs in PLy_traceback(), after incrementing
recursion_depth, you don't get a chance to decrement it again. I'm not
sure if the Py* function calls can fail, but at least seemingly trivial
things like initStringInfo() can throw an out-of-memory error.


Of course you're right (on both accounts).

Here's a version with a bunch of PG_TRies thrown in.

Cheers,
Jan
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
new file mode 100644
index c375ac0..9944f72
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
*** static char *get_source_line(const char
*** 28,33 
--- 28,41 
  
  
  /*
+  * Guard agains re-entrant calls to PLy_traceback, which can happen if
+  * traceback formatting functions raise Python errors.
+  */
+ #define TRACEBACK_RECURSION_LIMIT 2
+ static int	recursion_depth = 0;
+ 
+ 
+ /*
   * Emit a PG error or notice, together with any available info about
   * the current Python error, previously set by PLy_exception_set().
   * This should be used to propagate Python errors into PG.	If fmt is
*** static char *get_source_line(const char
*** 38,46 
  void
  PLy_elog(int elevel, const char *fmt,...)
  {
! 	char	   *xmsg;
! 	char	   *tbmsg;
! 	int			tb_depth;
  	StringInfoData emsg;
  	PyObject   *exc,
  			   *val,
--- 46,54 
  void
  PLy_elog(int elevel, const char *fmt,...)
  {
! 	char	   *xmsg = NULL;
! 	char	   *tbmsg = NULL;
! 	int			tb_depth = 0;
  	StringInfoData emsg;
  	PyObject   *exc,
  			   *val,
*** PLy_elog(int elevel, const char *fmt,...
*** 62,68 
  	}
  	PyErr_Restore(exc, val, tb);
  
! 	PLy_traceback(xmsg, tbmsg, tb_depth);
  
  	if (fmt)
  	{
--- 70,90 
  	}
  	PyErr_Restore(exc, val, tb);
  
! 	if (recursion_depth++ = TRACEBACK_RECURSION_LIMIT)
! 	{
! 		PG_TRY();
! 		{
! 			PLy_traceback(xmsg, tbmsg, tb_depth);
! 		}
! 		PG_CATCH();
! 		{
! 			recursion_depth--;
! 			PG_RE_THROW();
! 		}
! 		PG_END_TRY();
! 	}
! 
! 	recursion_depth--;
  
  	if (fmt)
  	{
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
new file mode 100644
index 4aabafc..94c035e
*** a/src/pl/plpython/plpy_util.c
--- b/src/pl/plpython/plpy_util.c
*** PLy_free(void *ptr)
*** 61,126 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject   *rv;
! 	const char *serverenc;
  
! 	/*
! 	 * Map PostgreSQL encoding to a Python encoding name.
! 	 */
! 	switch (GetDatabaseEncoding())
  	{
! 		case PG_SQL_ASCII:
! 			/*
! 			 * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's
! 			 * 'ascii' means true 7-bit only ASCII, while PostgreSQL's
! 			 * SQL_ASCII means that anything is allowed, and the system doesn't
! 			 * try to interpret the bytes in any way. But not sure what else
! 			 * to do, and we haven't heard any complaints...
! 			 */
! 			serverenc = ascii;
! 			break;
! 		case PG_WIN1250:
! 			serverenc = cp1250;
! 			break;
! 		case PG_WIN1251:
! 			serverenc = cp1251;
! 			break;
! 		case PG_WIN1252:
! 			serverenc = cp1252;
! 			break;
! 		case PG_WIN1253:
! 			serverenc = cp1253;
! 			break;
! 		case PG_WIN1254:
! 			serverenc = cp1254;
! 			break;
! 		case PG_WIN1255:
! 			serverenc = cp1255;
! 			break;
! 		case PG_WIN1256:
! 			serverenc = cp1256;
! 			break;
! 		case PG_WIN1257:
! 			serverenc = cp1257;
! 			break;
! 		case PG_WIN1258:
! 			serverenc = cp1258;
! 			break;
! 		case PG_WIN866:
! 			serverenc = cp866;
! 			break;
! 		case PG_WIN874:
! 			serverenc = cp874;
! 			break;
! 		default:
! 			/* Other encodings have the same name in Python. */
! 			serverenc = GetDatabaseEncodingName();
! 			break;
  	}
  
! 	rv = PyUnicode_AsEncodedString(unicode, serverenc, strict);
! 	if (rv == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding);
  	return rv;
  }
  
--- 61,106 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject	*bytes, *rv;
! 	char		*utf8string, *encoded;
  
! 	/* first encode the Python unicode object with UTF-8 */
! 	bytes = PyUnicode_AsUTF8String(unicode);
! 	if (bytes == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to bytes);
! 
! 	utf8string = PyBytes_AsString(bytes);
! 	if (utf8string == NULL) {
! 		Py_DECREF(bytes);
! 		PLy_elog(ERROR, could not extract bytes from encoded string);
! 	}
! 
! 	/* then convert to server encoding */
! 	PG_TRY();
  	{
! 		encoded = (char *) pg_do_encoding_conversion(
! 			(unsigned char *) utf8string,
! 			strlen(utf8string),
! 			PG_UTF8,
! 			GetDatabaseEncoding());
! 	}
! 	PG_CATCH();
! 	{
! 		Py_DECREF(bytes);
! 		PG_RE_THROW();
  	}
+ 	PG_END_TRY();
  
! 	/* finally, build a bytes object in the server encoding */
! 	rv = PyBytes_FromStringAndSize(encoded, strlen(encoded));
! 
! 	/* 

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-20 Thread Jan Urbański

On 20/07/12 08:59, Jan Urbański wrote:

On 18/07/12 17:17, Heikki Linnakangas wrote:

On 14.07.2012 17:50, Jan Urbański wrote:

If pg_do_encoding_conversion() throws an error, you don't get a chance
to call Py_DECREF() to release the string. Is that a problem?

If an error occurs in PLy_traceback(), after incrementing
recursion_depth, you don't get a chance to decrement it again. I'm not
sure if the Py* function calls can fail, but at least seemingly trivial
things like initStringInfo() can throw an out-of-memory error.


Of course you're right (on both accounts).

Here's a version with a bunch of PG_TRies thrown in.


Silly me, playing tricks with postincrements before fully waking up.

Here's v3, with a correct inequality test for exceeding the traceback 
recursion test.


J
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
new file mode 100644
index c375ac0..0ad8358
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
*** static char *get_source_line(const char
*** 28,33 
--- 28,41 
  
  
  /*
+  * Guard agains re-entrant calls to PLy_traceback, which can happen if
+  * traceback formatting functions raise Python errors.
+  */
+ #define TRACEBACK_RECURSION_LIMIT 2
+ static int	recursion_depth = 0;
+ 
+ 
+ /*
   * Emit a PG error or notice, together with any available info about
   * the current Python error, previously set by PLy_exception_set().
   * This should be used to propagate Python errors into PG.	If fmt is
*** static char *get_source_line(const char
*** 38,46 
  void
  PLy_elog(int elevel, const char *fmt,...)
  {
! 	char	   *xmsg;
! 	char	   *tbmsg;
! 	int			tb_depth;
  	StringInfoData emsg;
  	PyObject   *exc,
  			   *val,
--- 46,54 
  void
  PLy_elog(int elevel, const char *fmt,...)
  {
! 	char	   *xmsg = NULL;
! 	char	   *tbmsg = NULL;
! 	int			tb_depth = 0;
  	StringInfoData emsg;
  	PyObject   *exc,
  			   *val,
*** PLy_elog(int elevel, const char *fmt,...
*** 62,68 
  	}
  	PyErr_Restore(exc, val, tb);
  
! 	PLy_traceback(xmsg, tbmsg, tb_depth);
  
  	if (fmt)
  	{
--- 70,90 
  	}
  	PyErr_Restore(exc, val, tb);
  
! 	if (recursion_depth++  TRACEBACK_RECURSION_LIMIT)
! 	{
! 		PG_TRY();
! 		{
! 			PLy_traceback(xmsg, tbmsg, tb_depth);
! 		}
! 		PG_CATCH();
! 		{
! 			recursion_depth--;
! 			PG_RE_THROW();
! 		}
! 		PG_END_TRY();
! 	}
! 
! 	recursion_depth--;
  
  	if (fmt)
  	{
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
new file mode 100644
index 4aabafc..94c035e
*** a/src/pl/plpython/plpy_util.c
--- b/src/pl/plpython/plpy_util.c
*** PLy_free(void *ptr)
*** 61,126 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject   *rv;
! 	const char *serverenc;
  
! 	/*
! 	 * Map PostgreSQL encoding to a Python encoding name.
! 	 */
! 	switch (GetDatabaseEncoding())
  	{
! 		case PG_SQL_ASCII:
! 			/*
! 			 * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's
! 			 * 'ascii' means true 7-bit only ASCII, while PostgreSQL's
! 			 * SQL_ASCII means that anything is allowed, and the system doesn't
! 			 * try to interpret the bytes in any way. But not sure what else
! 			 * to do, and we haven't heard any complaints...
! 			 */
! 			serverenc = ascii;
! 			break;
! 		case PG_WIN1250:
! 			serverenc = cp1250;
! 			break;
! 		case PG_WIN1251:
! 			serverenc = cp1251;
! 			break;
! 		case PG_WIN1252:
! 			serverenc = cp1252;
! 			break;
! 		case PG_WIN1253:
! 			serverenc = cp1253;
! 			break;
! 		case PG_WIN1254:
! 			serverenc = cp1254;
! 			break;
! 		case PG_WIN1255:
! 			serverenc = cp1255;
! 			break;
! 		case PG_WIN1256:
! 			serverenc = cp1256;
! 			break;
! 		case PG_WIN1257:
! 			serverenc = cp1257;
! 			break;
! 		case PG_WIN1258:
! 			serverenc = cp1258;
! 			break;
! 		case PG_WIN866:
! 			serverenc = cp866;
! 			break;
! 		case PG_WIN874:
! 			serverenc = cp874;
! 			break;
! 		default:
! 			/* Other encodings have the same name in Python. */
! 			serverenc = GetDatabaseEncodingName();
! 			break;
  	}
  
! 	rv = PyUnicode_AsEncodedString(unicode, serverenc, strict);
! 	if (rv == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding);
  	return rv;
  }
  
--- 61,106 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject	*bytes, *rv;
! 	char		*utf8string, *encoded;
  
! 	/* first encode the Python unicode object with UTF-8 */
! 	bytes = PyUnicode_AsUTF8String(unicode);
! 	if (bytes == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to bytes);
! 
! 	utf8string = PyBytes_AsString(bytes);
! 	if (utf8string == NULL) {
! 		Py_DECREF(bytes);
! 		PLy_elog(ERROR, could not extract bytes from encoded string);
! 	}
! 
! 	/* then convert to server encoding */
! 	PG_TRY();
  	{
! 		encoded = (char *) pg_do_encoding_conversion(
! 			(unsigned char *) utf8string,
! 			strlen(utf8string),
! 			PG_UTF8,
! 			GetDatabaseEncoding());
! 	}
! 	PG_CATCH();
! 	{
! 		

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-18 Thread Heikki Linnakangas

On 14.07.2012 17:50, Jan Urbański wrote:

On 13/07/12 13:38, Jan Urbański wrote:

On 12/07/12 11:08, Heikki Linnakangas wrote:

On 07.07.2012 00:12, Jan Urbański wrote:

So you're in favour of doing unicode - bytes by encoding with UTF-8
and
then using the server's encoding functions?


Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
should be quite fast, and it would be good to be consistent in the way
we do conversions in both directions.



I'll implement that than (sorry for not following up on that eariler).


Here's a patch that always encodes Python unicode objects using UTF-8
and then uses Postgres's internal functions to produce bytes in the
server encoding.


Thanks.

If pg_do_encoding_conversion() throws an error, you don't get a chance 
to call Py_DECREF() to release the string. Is that a problem?


If an error occurs in PLy_traceback(), after incrementing 
recursion_depth, you don't get a chance to decrement it again. I'm not 
sure if the Py* function calls can fail, but at least seemingly trivial 
things like initStringInfo() can throw an out-of-memory error.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-14 Thread Jan Urbański

On 13/07/12 13:38, Jan Urbański wrote:

On 12/07/12 11:08, Heikki Linnakangas wrote:

On 07.07.2012 00:12, Jan Urbański wrote:

So you're in favour of doing unicode - bytes by encoding with UTF-8 and
then using the server's encoding functions?


Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
should be quite fast, and it would be good to be consistent in the way
we do conversions in both directions.



I'll implement that than (sorry for not following up on that eariler).


Here's a patch that always encodes Python unicode objects using UTF-8 
and then uses Postgres's internal functions to produce bytes in the 
server encoding.


Cheers,
Jan
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
new file mode 100644
index c375ac0..c2b3cb8
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
*** static char *get_source_line(const char
*** 28,33 
--- 28,41 
  
  
  /*
+  * Guard agains re-entrant calls to PLy_traceback, which can happen if
+  * traceback formatting functions raise Python errors.
+  */
+ #define TRACEBACK_RECURSION_LIMIT 2
+ static int	recursion_depth = 0;
+ 
+ 
+ /*
   * Emit a PG error or notice, together with any available info about
   * the current Python error, previously set by PLy_exception_set().
   * This should be used to propagate Python errors into PG.	If fmt is
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 147,166 
  	StringInfoData xstr;
  	StringInfoData tbstr;
  
  	/*
  	 * get the current exception
  	 */
  	PyErr_Fetch(e, v, tb);
  
  	/*
! 	 * oops, no exception, return
  	 */
! 	if (e == NULL)
  	{
  		*xmsg = NULL;
  		*tbmsg = NULL;
  		*tb_depth = 0;
  
  		return;
  	}
  
--- 155,177 
  	StringInfoData xstr;
  	StringInfoData tbstr;
  
+ 	recursion_depth++;
+ 
  	/*
  	 * get the current exception
  	 */
  	PyErr_Fetch(e, v, tb);
  
  	/*
! 	 * oops, no exception or recursion depth exceeded, return
  	 */
! 	if (e == NULL || recursion_depth  TRACEBACK_RECURSION_LIMIT)
  	{
  		*xmsg = NULL;
  		*tbmsg = NULL;
  		*tb_depth = 0;
  
+ 		recursion_depth--;
  		return;
  	}
  
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 326,331 
--- 337,344 
  		(*tb_depth)++;
  	}
  
+ 	recursion_depth--;
+ 
  	/* Return the traceback. */
  	*tbmsg = tbstr.data;
  
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
new file mode 100644
index 4aabafc..fe43312
*** a/src/pl/plpython/plpy_util.c
--- b/src/pl/plpython/plpy_util.c
*** PLy_free(void *ptr)
*** 61,126 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject   *rv;
! 	const char *serverenc;
  
! 	/*
! 	 * Map PostgreSQL encoding to a Python encoding name.
! 	 */
! 	switch (GetDatabaseEncoding())
  	{
! 		case PG_SQL_ASCII:
! 			/*
! 			 * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's
! 			 * 'ascii' means true 7-bit only ASCII, while PostgreSQL's
! 			 * SQL_ASCII means that anything is allowed, and the system doesn't
! 			 * try to interpret the bytes in any way. But not sure what else
! 			 * to do, and we haven't heard any complaints...
! 			 */
! 			serverenc = ascii;
! 			break;
! 		case PG_WIN1250:
! 			serverenc = cp1250;
! 			break;
! 		case PG_WIN1251:
! 			serverenc = cp1251;
! 			break;
! 		case PG_WIN1252:
! 			serverenc = cp1252;
! 			break;
! 		case PG_WIN1253:
! 			serverenc = cp1253;
! 			break;
! 		case PG_WIN1254:
! 			serverenc = cp1254;
! 			break;
! 		case PG_WIN1255:
! 			serverenc = cp1255;
! 			break;
! 		case PG_WIN1256:
! 			serverenc = cp1256;
! 			break;
! 		case PG_WIN1257:
! 			serverenc = cp1257;
! 			break;
! 		case PG_WIN1258:
! 			serverenc = cp1258;
! 			break;
! 		case PG_WIN866:
! 			serverenc = cp866;
! 			break;
! 		case PG_WIN874:
! 			serverenc = cp874;
! 			break;
! 		default:
! 			/* Other encodings have the same name in Python. */
! 			serverenc = GetDatabaseEncodingName();
! 			break;
  	}
  
! 	rv = PyUnicode_AsEncodedString(unicode, serverenc, strict);
! 	if (rv == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding);
  	return rv;
  }
  
--- 61,96 
  PyObject *
  PLyUnicode_Bytes(PyObject *unicode)
  {
! 	PyObject	*bytes, *rv;
! 	char		*utf8string, *encoded;
  
! 	/* first encode the Python unicode object with UTF-8 */
! 	bytes = PyUnicode_AsUTF8String(unicode);
! 	if (bytes == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to bytes);
! 
! 	utf8string = PyBytes_AsString(bytes);
! 	if (utf8string == NULL) {
! 		Py_DECREF(bytes);
! 		PLy_elog(ERROR, could not extract bytes from encoded string);
! 	}
! 
! 	/* then convert to server encoding */
! 	encoded = (char *) pg_do_encoding_conversion((unsigned char *) utf8string,
!  strlen(utf8string),
!  PG_UTF8,
!  GetDatabaseEncoding());
! 
! 	/* finally, build a bytes object in the server encoding */
! 	rv = PyBytes_FromStringAndSize(encoded, 

Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-13 Thread Jan Urbański

On 12/07/12 11:08, Heikki Linnakangas wrote:

On 07.07.2012 00:12, Jan Urbański wrote:

On 06/07/12 22:47, Peter Eisentraut wrote:

On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:

What shall we do about those? Ignore them? Document that if you're
sing
one of these encodings then PL/Python with Python 2 will be crippled
and
with Python 3 just won't work?


We could convert to UTF-8, and use the PostgreSQL functions to convert
from UTF-8 to the server encoding. Double conversion might be slow, but
I think it would be better than failing.


Actually, we already do the other direction that way
(PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
always use this.

I would hesitate to use this as a kind of fallback, because then we
would sometimes be using PostgreSQL's recoding tables and sometimes
Python's recoding tables, which could became confusing.


So you're in favour of doing unicode - bytes by encoding with UTF-8 and
then using the server's encoding functions?


Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
should be quite fast, and it would be good to be consistent in the way
we do conversions in both directions.



I'll implement that than (sorry for not following up on that eariler).

J

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-12 Thread Heikki Linnakangas

On 07.07.2012 00:12, Jan Urbański wrote:

On 06/07/12 22:47, Peter Eisentraut wrote:

On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:

What shall we do about those? Ignore them? Document that if you're sing
one of these encodings then PL/Python with Python 2 will be crippled
and
with Python 3 just won't work?


We could convert to UTF-8, and use the PostgreSQL functions to convert
from UTF-8 to the server encoding. Double conversion might be slow, but
I think it would be better than failing.


Actually, we already do the other direction that way
(PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
always use this.

I would hesitate to use this as a kind of fallback, because then we
would sometimes be using PostgreSQL's recoding tables and sometimes
Python's recoding tables, which could became confusing.


So you're in favour of doing unicode - bytes by encoding with UTF-8 and
then using the server's encoding functions?


Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2 
should be quite fast, and it would be good to be consistent in the way 
we do conversions in both directions.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-06 Thread Heikki Linnakangas

On 06.07.2012 00:54, Jan Urbański wrote:

On 05/07/12 23:30, Peter Eisentraut wrote:

On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:

The problem is that PLyUnicode_Bytes is (via an ifdef) used as
PyString_ToString on Python3, which means that there are numerous call
sites and new ones might appear in any moment. I'm not that keen on
invoking the traceback machinery on low-level encoding errors.


Why not?


Because it can lead to recursion errors, like the one this patch was
supposed to fix. The traceback machinery calls into the encoding
functions, because it converts Python strings (like function names) into
C strings.


In the backend elog routines, there is a global variable 
'recursion_depth', which is incremented when an error-handling routine 
is entered, and decremented afterwards. Can we use a similar mechinism 
in PLy_elog() to detect and stop recursion?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-06 Thread Jan Urbański

On 06/07/12 10:05, Heikki Linnakangas wrote:

On 06.07.2012 00:54, Jan Urbański wrote:

On 05/07/12 23:30, Peter Eisentraut wrote:

On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:

The problem is that PLyUnicode_Bytes is (via an ifdef) used as
PyString_ToString on Python3, which means that there are numerous call
sites and new ones might appear in any moment. I'm not that keen on
invoking the traceback machinery on low-level encoding errors.


Why not?


Because it can lead to recursion errors, like the one this patch was
supposed to fix. The traceback machinery calls into the encoding
functions, because it converts Python strings (like function names) into
C strings.


In the backend elog routines, there is a global variable
'recursion_depth', which is incremented when an error-handling routine
is entered, and decremented afterwards. Can we use a similar mechinism
in PLy_elog() to detect and stop recursion?


I guess we can, I'll try to do some tests in order to see if there's an 
easy user-triggereable way of causing PLy_elog to recurse and if not 
then a guard like this should be enough as a safety measure against as 
yet unknown conditions (as opposed to something we expect to happen 
regularly).


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-06 Thread Jan Urbański

On 06/07/12 10:14, Jan Urbański wrote:

On 06/07/12 10:05, Heikki Linnakangas wrote:

In the backend elog routines, there is a global variable
'recursion_depth', which is incremented when an error-handling routine
is entered, and decremented afterwards. Can we use a similar mechinism
in PLy_elog() to detect and stop recursion?


I guess we can, I'll try to do some tests in order to see if there's an
easy user-triggereable way of causing PLy_elog to recurse and if not
then a guard like this should be enough as a safety measure against as
yet unknown conditions (as opposed to something we expect to happen
regularly).


Attached is a patch that stores the recursion level of PLy_traceback and 
prevents it from running if it's too deep (PLy_traceback is the one 
doing heavy lifting, that's why I chose to put the logic to skip running 
there).


I tried a few things and was not able to easily invoke the infinite 
recursion condition, but I did notice that there are two more encodings 
that have different names in Postgres and in Python (KOI8-R and KOI8-U) 
and added them to the switch.


There's still trouble with EUC_TW and MULE_INTERNAL which don't have 
Python equivalents. EUC-TW has been discussed in 
http://bugs.python.org/issue2066 and rejected (see 
http://bugs.python.org/issue2066#msg113731).


If you use any of these encodings, you *will* get into the recursion 
trouble described eariler, just as before the path you'd get into it 
with CP1252 as your encoding.


What shall we do about those? Ignore them? Document that if you're sing 
one of these encodings then PL/Python with Python 2 will be crippled and 
with Python 3 just won't work?


Cheers,
Jan
diff --git a/src/pl/plpython/plpy_elog.c b/src/pl/plpython/plpy_elog.c
new file mode 100644
index c375ac0..c2b3cb8
*** a/src/pl/plpython/plpy_elog.c
--- b/src/pl/plpython/plpy_elog.c
*** static char *get_source_line(const char
*** 28,33 
--- 28,41 
  
  
  /*
+  * Guard agains re-entrant calls to PLy_traceback, which can happen if
+  * traceback formatting functions raise Python errors.
+  */
+ #define TRACEBACK_RECURSION_LIMIT 2
+ static int	recursion_depth = 0;
+ 
+ 
+ /*
   * Emit a PG error or notice, together with any available info about
   * the current Python error, previously set by PLy_exception_set().
   * This should be used to propagate Python errors into PG.	If fmt is
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 147,166 
  	StringInfoData xstr;
  	StringInfoData tbstr;
  
  	/*
  	 * get the current exception
  	 */
  	PyErr_Fetch(e, v, tb);
  
  	/*
! 	 * oops, no exception, return
  	 */
! 	if (e == NULL)
  	{
  		*xmsg = NULL;
  		*tbmsg = NULL;
  		*tb_depth = 0;
  
  		return;
  	}
  
--- 155,177 
  	StringInfoData xstr;
  	StringInfoData tbstr;
  
+ 	recursion_depth++;
+ 
  	/*
  	 * get the current exception
  	 */
  	PyErr_Fetch(e, v, tb);
  
  	/*
! 	 * oops, no exception or recursion depth exceeded, return
  	 */
! 	if (e == NULL || recursion_depth  TRACEBACK_RECURSION_LIMIT)
  	{
  		*xmsg = NULL;
  		*tbmsg = NULL;
  		*tb_depth = 0;
  
+ 		recursion_depth--;
  		return;
  	}
  
*** PLy_traceback(char **xmsg, char **tbmsg,
*** 326,331 
--- 337,344 
  		(*tb_depth)++;
  	}
  
+ 	recursion_depth--;
+ 
  	/* Return the traceback. */
  	*tbmsg = tbstr.data;
  
diff --git a/src/pl/plpython/plpy_util.c b/src/pl/plpython/plpy_util.c
new file mode 100644
index bf29532..ea4ecdf
*** a/src/pl/plpython/plpy_util.c
--- b/src/pl/plpython/plpy_util.c
*** PLyUnicode_Bytes(PyObject *unicode)
*** 112,117 
--- 112,123 
  		case PG_WIN874:
  			serverenc = cp874;
  			break;
+ 		case PG_KOI8R:
+ 			serverenc = koi8-r;
+ 			break;
+ 		case PG_KOI8U:
+ 			serverenc = koi8-u;
+ 			break;
  		default:
  			/* Other encodings have the same name in Python. */
  			serverenc = GetDatabaseEncodingName();
*** PLyUnicode_Bytes(PyObject *unicode)
*** 120,135 
  
  	rv = PyUnicode_AsEncodedString(unicode, serverenc, strict);
  	if (rv == NULL)
! 	{
! 		/*
! 		 * Use a plain ereport instead of PLy_elog to avoid recursion, if
! 		 * the traceback formatting functions try to do unicode to bytes
! 		 * conversion again.
! 		 */
! 		ereport(ERROR,
! (errcode(ERRCODE_INTERNAL_ERROR),
!  errmsg(could not convert Python Unicode object to PostgreSQL server encoding)));
! 	}
  	return rv;
  }
  
--- 126,132 
  
  	rv = PyUnicode_AsEncodedString(unicode, serverenc, strict);
  	if (rv == NULL)
! 		PLy_elog(ERROR, could not convert Python Unicode object to PostgreSQL server encoding);
  	return rv;
  }
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-06 Thread Heikki Linnakangas

On 06.07.2012 18:01, Jan Urbański wrote:

There's still trouble with EUC_TW and MULE_INTERNAL which don't have
Python equivalents. EUC-TW has been discussed in
http://bugs.python.org/issue2066 and rejected (see
http://bugs.python.org/issue2066#msg113731).

If you use any of these encodings, you *will* get into the recursion
trouble described eariler, just as before the path you'd get into it
with CP1252 as your encoding.

What shall we do about those? Ignore them? Document that if you're sing
one of these encodings then PL/Python with Python 2 will be crippled and
with Python 3 just won't work?


We could convert to UTF-8, and use the PostgreSQL functions to convert 
from UTF-8 to the server encoding. Double conversion might be slow, but 
I think it would be better than failing.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-06 Thread Peter Eisentraut
On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:
  What shall we do about those? Ignore them? Document that if you're sing
  one of these encodings then PL/Python with Python 2 will be crippled and
  with Python 3 just won't work?
 
 We could convert to UTF-8, and use the PostgreSQL functions to convert 
 from UTF-8 to the server encoding. Double conversion might be slow, but 
 I think it would be better than failing. 

Actually, we already do the other direction that way
(PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
always use this.

I would hesitate to use this as a kind of fallback, because then we
would sometimes be using PostgreSQL's recoding tables and sometimes
Python's recoding tables, which could became confusing.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-06 Thread Jan Urbański

On 06/07/12 22:47, Peter Eisentraut wrote:

On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:

What shall we do about those? Ignore them? Document that if you're sing
one of these encodings then PL/Python with Python 2 will be crippled and
with Python 3 just won't work?


We could convert to UTF-8, and use the PostgreSQL functions to convert
from UTF-8 to the server encoding. Double conversion might be slow, but
I think it would be better than failing.


Actually, we already do the other direction that way
(PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
always use this.

I would hesitate to use this as a kind of fallback, because then we
would sometimes be using PostgreSQL's recoding tables and sometimes
Python's recoding tables, which could became confusing.


So you're in favour of doing unicode - bytes by encoding with UTF-8 and 
then using the server's encoding functions?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-05 Thread Peter Eisentraut
On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:
 The problem is that PLyUnicode_Bytes is (via an ifdef) used as 
 PyString_ToString on Python3, which means that there are numerous call
 sites and new ones might appear in any moment. I'm not that keen on 
 invoking the traceback machinery on low-level encoding errors.

Why not?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

2012-07-05 Thread Jan Urbański

On 05/07/12 23:30, Peter Eisentraut wrote:

On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:

The problem is that PLyUnicode_Bytes is (via an ifdef) used as
PyString_ToString on Python3, which means that there are numerous call
sites and new ones might appear in any moment. I'm not that keen on
invoking the traceback machinery on low-level encoding errors.


Why not?


Because it can lead to recursion errors, like the one this patch was 
supposed to fix. The traceback machinery calls into the encoding 
functions, because it converts Python strings (like function names) into 
C strings.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers