Stefan Behnel, 26.08.2011 20:28:
"Martin v. Löwis", 26.08.2011 18:56:
I agree with your observation that somebody should be done about error
handling, and will update the PEP shortly. I propose that
PyUnicode_Ready should be explicitly called on input where raising an
exception is feasible. In contexts where it is not feasible (such
as reading a character, or reading the length or the kind), failing to
ready the string should cause a fatal error.
[...]
My gut feeling leans towards a KISS approach. If you go the route to
require an explicit point for triggering PyUnicode_Ready() calls, why not
just go all the way and make it completely explicit in *all* cases? I.e.
remove all implicit calls from the macros and make it part of the new API
semantics that users *must* call PyUnicode_FAST_READY() before doing
anything with a new string data layout. Much fewer surprises.
Note that there isn't currently an official macro way to figure out that
the flexible string layout has not been initialised yet, i.e. that wstr is
set but str is not. If the implicit PyUnicode_Ready() calls get removed,
PyUnicode_KIND() could take that place by simply returning WSTR_KIND.
Here's a patch that updates only the header file, to make it clear what I mean.
Stefan
# HG changeset patch
# User Stefan Behnel <sco...@users.berlios.de>
# Date 1314388513 -7200
# Branch pep-393
# Node ID 247e45f0c26f6f0f6a552f2eddb3598ae643adf1
# Parent 675e2004b38e809f12171750388b00620e1967c4
simplify new PyUnicode_*() macros by removing implicit calls to PyUnicode_Ready(); minor cleanups
diff -r 675e2004b38e -r 247e45f0c26f Include/unicodeobject.h
--- a/Include/unicodeobject.h Fri Aug 26 14:21:14 2011 -0400
+++ b/Include/unicodeobject.h Fri Aug 26 21:55:13 2011 +0200
@@ -282,18 +282,15 @@
#define SSTATE_IS_COMPACT 0x10
-/* String contains only wstr byte characters. This is only possible
- when the string was created with a legacy API and PyUnicode_Ready()
- has not been called yet. Note that PyUnicode_KIND() calls
- PyUnicode_FAST_READY() so PyUnicode_WCHAR_KIND is only possible as a
- intialized value not as a result of PyUnicode_KIND(). */
-#define PyUnicode_WCHAR_KIND 0
-
/* Return values of the PyUnicode_KIND() macro: */
-
#define PyUnicode_1BYTE_KIND 1
#define PyUnicode_2BYTE_KIND 2
#define PyUnicode_4BYTE_KIND 3
+#define PyUnicode_WCHAR_KIND 0 /* String contains only wstr byte
+ characters. This is the case when
+ the string was created with a legacy
+ API and PyUnicode_Ready() has not
+ been called yet. */
/* Return the number of bytes the string uses to represent single characters,
@@ -301,11 +298,10 @@
#define PyUnicode_CHARACTER_SIZE(op) \
(1 << (((SSTATE_KIND_MASK & ((PyUnicodeObject *)(op))->state) >> 2) - 1))
-/* Return pointers to the canonical representation casted as unsigned char,
- Py_UCS2, or Py_UCS4 for direct character access.
- No checks are performed, use PyUnicode_CHARACTER_SIZE or
- PyUnicode_KIND() before to ensure these will work correctly. */
-
+/* Return pointers to the canonical representation cast as Py_UCS1,
+ Py_UCS2, or Py_UCS4 for direct character access. No checks are
+ performed, use PyUnicode_FAST_READY() before to ensure these will
+ work correctly. */
#define PyUnicode_1BYTE_DATA(op) (((PyUnicodeObject*)op)->data.latin1)
#define PyUnicode_2BYTE_DATA(op) (((PyUnicodeObject*)op)->data.ucs2)
#define PyUnicode_4BYTE_DATA(op) (((PyUnicodeObject*)op)->data.ucs4)
@@ -315,18 +311,16 @@
#define PyUnicode_IS_COMPACT(op) \
(((op)->state & SSTATE_COMPACT_MASK) == SSTATE_IS_COMPACT)
-/* Return one of the PyUnicode_*_KIND values defined above.
- This macro calls PyUnicode_FAST_READY() before returning the kind. */
+/* Return one of the PyUnicode_*_KIND values defined above. */
#define PyUnicode_KIND(op) \
(assert(PyUnicode_Check(op)), \
- PyUnicode_FAST_READY((PyUnicodeObject *)(op)), \
((SSTATE_KIND_MASK & (((PyUnicodeObject *)(op))->state)) >> 2))
-/* Return a void pointer to the raw unicode buffer.
- This macro calls PyUnicode_FAST_READY() before returning the pointer. */
+/* Return a void pointer to the raw unicode buffer. The result is
+ potentially NULL if it has not been initialised, in which case
+ PyUnicode_AS_UNICODE() returns the pointer to the wstr buffer. */
#define PyUnicode_DATA(op) \
(assert(PyUnicode_Check(op)), \
- PyUnicode_FAST_READY((PyUnicodeObject *)(op)), \
((((PyUnicodeObject *)(op))->data.any)))
/* Write into the canonical representation, this macro does not do any sanity
@@ -366,8 +360,9 @@
/* PyUnicode_READ_CHAR() is less efficient than PyUnicode_READ() because it
calls PyUnicode_KIND() and might call it twice. For single reads, use
- PyUnicode_READ_CHAR, for multiple consecutive reads callers should
- cache kind and use PyUnicode_READ instead. */
+ PyUnicode_READ_CHAR(), for multiple consecutive reads callers should
+ cache kind and use PyUnicode_READ() instead.
+ Requires that the cononical representation has been initialised. */
#define PyUnicode_READ_CHAR(unicode, index) \
((Py_UCS4) \
(PyUnicode_KIND((unicode)) == PyUnicode_1BYTE_KIND ? \
@@ -410,11 +405,12 @@
/* Return a maximum character value which is suitable for creating another
string based on op. This is always an approximation but more efficient
- than interating over the string. */
+ than interating over the string.
+ Requires that the cononical representation has been initialised. */
#define PyUnicode_MAX_CHAR_VALUE(op) \
- (PyUnicode_FAST_READY((op)), \
+ (assert(PyUnicode_Check(op)), \
(PyUnicode_KIND(op) == PyUnicode_1BYTE_KIND ? \
- (((PyUnicodeObject *)(op))->data.any == \
+ (((PyUnicodeObject *)(op))->data.latin1 == \
((PyUnicodeObject *)(op))->utf8 ? \
(0x7f) : (0xff) \
) : \
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com