[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Gregory P. Smith

Changes by Gregory P. Smith:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Gregory P. Smith

Gregory P. Smith added the comment:

Committed revision 58493

--
resolution:  -> accepted
status: open -> closed

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Guido van Rossum

Guido van Rossum added the comment:

Good! Check it in before I change my mind. :-)

The words can be tweaked later.

> 04b is the same as 04, i just fixed the docstrings that i had missed in
> stringlib/transmogrify.h to use 'B' instead of 'S' and say they return a
> "modified copy of B" instead of a "string"
>
> wording could probably be improved further if anyone has any ideas.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Gregory P. Smith

Changes by Gregory P. Smith:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Gregory P. Smith

Changes by Gregory P. Smith:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Guido van Rossum

Guido van Rossum added the comment:

Very impressive!

(Apologies if these lines are occasionally out of order.)

+extern PyObject* _bytes_isspace(const char *cptr, const Py_ssize_t len);

IMO all these functions should have names starting with _Py or _Py_,
since they are visible to the linker.

Also, why 'const Py_ssize_t'? It's a scalar!

+/* these store their len sized answer in the given preallocated *result
arg */

Mind using a Capital first letter?

+extern const char isspace__doc__[];

This needs a _Py or _Py_ prefix too.

+extern const unsigned int ctype_table[256];

Now this is no longer static, it also needs a _Py or _Py_ prefix.

+extern const unsigned char ctype_tolower[256];
+extern const unsigned char ctype_toupper[256];

Ditto (the macros are fine though, since they are only visible to code
that #includes this file, which is only a few files).

+Return True if all characters in S are whitespace\n\

Shouldn't that be bytes instead of characters (and consistently
throughout)?  And probably use B for the variable name instead of S.

+/* --- GPS - */

What's this?  Your initials? :-)  I don't think it's needed.  I'm
guessing you 

 /* The nullbytes are used by the stringlib during partition.
  * If partition is removed from bytes, nullbytes and its helper

This comment block refers to something that ain't gonna happen
(partition being removed).  IMO the whole comment block can go, it's a
red herring.

+/* XXX: this depends on a signed integer overflow to < 0 */
+/* C compilers, including gcc, do -NOT- guarantee this. */

(And repeated twice more.)  Wouldn't it be better to write this in a way
that doesn't require this XXX comment?  ISTR that we already had one
area where we had to fight with gcc because it had proved to itself that
something could never be negative, even though it could be due to
overflow.  The GCC authors won. :-(

+/* TODO(gps):

Don't you mean XXX(gps)? :-)

+ * These methods still need implementing (porting over from
stringobject.c):
+ *
+ * IN PROGRESS:
+ * .capitalize(), .title(), .upper(), .lower(), .center(), .zfill(),
+ * .expandtabs(), .ljust(), .rjust(), .swapcase(), .splitlines(), 
+ */
+

Hmmm... Aren't these done now?

+/* XXX(gps): the docstring above says any iterable int will do but the
+ * bytes_setslice code only accepts something supporting PEP 3118.
+ * Is a list or tuple of 0 <= ints <= 255 also supposed to work? */

Yes, it is, and it's a bug that it currently doesn't.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Gregory P. Smith

Changes by Gregory P. Smith:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-15 Thread Guido van Rossum

Guido van Rossum added the comment:

Is it worth my time to review this yet?

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-11 Thread Gregory P. Smith

Changes by Gregory P. Smith:


--
nosy:  -gps

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-11 Thread Gregory P. Smith

Changes by Gregory P. Smith:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-11 Thread Gregory P. Smith

Gregory P. Smith added the comment:

> > Patch updated.  It now implements the is*() methods for PyBytes.  It
> > moves common code into a shared bytes_ctype.c and .h file so that
> > stringobject.c and bytesobject.c can share as much as possible.
>
> Did you move this into the stringlib subdirectory? That's more for
> sharing between PyString and PyUnicode, but I think there are more
> opportunities for sharing still, and PyString/PyBytes sharing makes
> sense separately.

Good idea, I haven't done that yet. At the moment it lives in
Include/bytes_ctype.h and Object/bytes_ctype.c directly.  stringlib is a
good place for it and is something I pondered but hadn't gotten to.  I'll do
that as I implement the remaining missing PyBytes_ methods to be in the next
update to this patch.

-gps

--
nosy: +gps

__
Tracker <[EMAIL PROTECTED]>

__> Patch 
updated.  It now implements the is*() methods for 
PyBytes.  It> moves common code into a shared bytes_ctype.c 
and .h file so that
> stringobject.c and bytesobject.c can share as much as 
possible.Did you move this into the stringlib subdirectory? That's 
more forsharing between PyString and PyUnicode, but I think there are 
more
opportunities for sharing still, and PyString/PyBytes sharing makessense 
separately.Good idea, I haven't done that yet. At the 
moment it lives in Include/bytes_ctype.h and Object/bytes_ctype.c 
directly.  stringlib is a good place for it and is something I pondered 
but hadn't gotten to.  I'll do that as I implement the remaining 
missing PyBytes_ methods to be in the next update to this patch.
-gps

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-11 Thread Guido van Rossum

Guido van Rossum added the comment:

> Patch updated.  It now implements the is*() methods for PyBytes.  It
> moves common code into a shared bytes_ctype.c and .h file so that
> stringobject.c and bytesobject.c can share as much as possible.

Did you move this into the stringlib subdirectory? That's more for
sharing between PyString and PyUnicode, but I think there are more
opportunities for sharing still, and PyString/PyBytes sharing makes
sense separately.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods

2007-10-11 Thread Gregory P. Smith

Changes by Gregory P. Smith:


__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods use PEP 3118 buffer API

2007-10-11 Thread Guido van Rossum

Changes by Guido van Rossum:


--
assignee:  -> gvanrossum
nosy: +gvanrossum

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1261] PEP 3137: make bytesobject.c methods use PEP 3118 buffer API

2007-10-11 Thread Gregory P. Smith

New submission from Gregory P. Smith:

This makes all existing bytesobject.c methods use the buffer API rather
than explicitly requiring bytes objects as input.  It also fixes input
to append() and remove() that was not strict enough and improves a few
unit tests in that area.

NOTE: this patch likely depends on http://bugs.python.org/issue1260
removing the buffer API from the unicode type in order for all unit
tests to pass (i only tested it with that applied since thats where
we're headed).

--
files: bytes-methods-use-buffer-api-01.diff.txt
keywords: patch, py3k
messages: 56341
nosy: gregory.p.smith
severity: normal
status: open
title: PEP 3137: make bytesobject.c methods use PEP 3118 buffer API
versions: Python 3.0

__
Tracker <[EMAIL PROTECTED]>

__Index: Objects/bytesobject.c
===
--- Objects/bytesobject.c   (revision 58412)
+++ Objects/bytesobject.c   (working copy)
@@ -37,15 +37,20 @@
 static int
 _getbytevalue(PyObject* arg, int *value)
 {
-PyObject *intarg = PyNumber_Int(arg);
-if (! intarg)
+long face_value;
+
+if (PyInt_Check(arg)) {
+face_value = PyInt_AsLong(arg);
+if (face_value < 0 || face_value >= 256) {
+PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)");
+return 0;
+}
+} else {
+PyErr_Format(PyExc_TypeError, "an integer is required");
 return 0;
-*value = PyInt_AsLong(intarg);
-Py_DECREF(intarg);
-if (*value < 0 || *value >= 256) {
-PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)");
-return 0;
 }
+
+*value = face_value;
 return 1;
 }
 
@@ -80,9 +85,7 @@
 {
 PyBufferProcs *buffer = Py_Type(obj)->tp_as_buffer;
 
-if (buffer == NULL ||
-PyUnicode_Check(obj) ||
-buffer->bf_getbuffer == NULL)
+if (buffer == NULL || buffer->bf_getbuffer == NULL)
 {
 PyErr_Format(PyExc_TypeError,
  "Type %.100s doesn't support the buffer API",
@@ -1088,6 +1102,23 @@
 return res;
 }
 
+/* TODO(gps):
+ * These methods need implementing (porting over from stringobject.c):
+ *
+ * .capitalize(), .center(), 
+ * .expandtabs(), .isalnum(), .isalpha(), .isdigit(),
+ * .islower(), .isspace(), .istitle(), .isupper(), 
+ * .rjust(), 
+ * .splitlines(), .swapcase(), .title(),
+ * .upper(), .zfill()
+ *
+ * XXX(gps) the code is -shared- for so many of these, thats gross.  I wish
+ * we had templates or generics or OO inheritance here.  A .h file with the
+ * methods as big CPP macros as templates would work but is ugly (especially
+ * when debugging).  Or can we do an (evil?) common substructure hack to
+ * allow us to write generic methods that work on both buffer (PyBytes_*)
+ * and bytes (PyString_*) objects?
+ */
 
 PyDoc_STRVAR(find__doc__,
 "B.find(sub [,start [,end]]) -> int\n\
@@ -1118,27 +1149,25 @@
 bytes_count(PyBytesObject *self, PyObject *args)
 {
 PyObject *sub_obj;
-const char *str = PyBytes_AS_STRING(self), *sub;
-Py_ssize_t sub_len;
+const char *str = PyBytes_AS_STRING(self);
 Py_ssize_t start = 0, end = PY_SSIZE_T_MAX;
+Py_buffer vsub;
+PyObject *count_obj;
 
 if (!PyArg_ParseTuple(args, "O|O&O&:count", &sub_obj,
 _PyEval_SliceIndex, &start, _PyEval_SliceIndex, &end))
 return NULL;
 
-if (PyBytes_Check(sub_obj)) {
-sub = PyBytes_AS_STRING(sub_obj);
-sub_len = PyBytes_GET_SIZE(sub_obj);
-}
-/* XXX --> use the modern buffer interface */
-else if (PyObject_AsCharBuffer(sub_obj, &sub, &sub_len))
+if (_getbuffer(sub_obj, &vsub) < 0)
 return NULL;
 
 _adjust_indices(&start, &end, PyBytes_GET_SIZE(self));
 
-return PyInt_FromSsize_t(
-stringlib_count(str + start, end - start, sub, sub_len)
+count_obj = PyInt_FromSsize_t(
+stringlib_count(str + start, end - start, vsub.buf, vsub.len)
 );
+PyObject_ReleaseBuffer(sub_obj, &vsub);
+return count_obj;
 }
 
 
@@ -1210,36 +1239,39 @@
  Py_ssize_t end, int direction)
 {
 Py_ssize_t len = PyBytes_GET_SIZE(self);
-Py_ssize_t slen;
-const char* sub;
 const char* str;
+Py_buffer vsubstr;
+int rv;
 
-if (PyBytes_Check(substr)) {
-sub = PyBytes_AS_STRING(substr);
-slen = PyBytes_GET_SIZE(substr);
-}
-/* XXX --> Use the modern buffer interface */
-else if (PyObject_AsCharBuffer(substr, &sub, &slen))
-return -1;
 str = PyBytes_AS_STRING(self);
 
+if (_getbuffer(substr, &vsubstr) < 0)
+return -1;
+
 _adjust_indices(&start, &end, len);
 
 if (direction < 0) {
 /* startswith */
-if (start+slen > len)
-return 0;
+if (start+vsubstr.len > len) {
+rv = 0;
+goto done;
+}
 } else {