[issue15958] bytes.join() should allow arbitrary buffer objects

2013-01-22 Thread Glyph Lefkowitz

Changes by Glyph Lefkowitz gl...@twistedmatrix.com:


--
nosy: +glyph

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yet one issue. You forgot to add join.h to BYTESTR_DEPS in Makefile.pre.in.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-18 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 388e43bb519d by Antoine Pitrou in branch 'default':
Followup to issue #15958: add join.h to Makefile dependencies for byte strings
http://hg.python.org/cpython/rev/388e43bb519d

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is a new patch checking that the sequence size didn't change. I also 
refactored the join() implementation into a shared function in stringlib.

--
Added file: http://bugs.python.org/file27594/bytes_join_buffers3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I added new comments. :-(

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I added new comments. :-(

Thanks. I think I will commit after adding the missing #undef :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 16285c1b4dda by Antoine Pitrou in branch 'default':
Issue #15958: bytes.join and bytearray.join now accept arbitrary buffer objects.
http://hg.python.org/cpython/rev/16285c1b4dda

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Done now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Well done.

However check at top of Objects/stringlib/join.h does not protect from using 
the file with asciilib or ucs1lib.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 However check at top of Objects/stringlib/join.h does not protect from using 
 the file with asciilib or ucs1lib.

I'm not sure that's a problem. Someone would have to go out of their way
to use join.h with only UCS1 unicode strings. Also tests would probably
start failing, since str.join doesn't have the right signature :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-15 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is an updated patch using PySequence_Fast_GET_SIZE to avoid problems when 
the sequence is resized during iteration.

--
Added file: http://bugs.python.org/file27585/bytes_join_buffers2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Here is a patch with tests.

--
Added file: http://bugs.python.org/file27554/bytes_join_buffers.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Patch LGTM, however...

$ ./python -m timeit -s a=[b'a']*10  b','.join(a)

Vanilla: 3.69 msec per loop
Patched: 11.6 msec per loop

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Patch LGTM, however...
 
 $ ./python -m timeit -s a=[b'a']*10  b','.join(a)
 
 Vanilla: 3.69 msec per loop
 Patched: 11.6 msec per loop

True. It is a bit of a pathological case, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch with restored performance. Is not it too complicated?

--
Added file: http://bugs.python.org/file27557/bytes_join_buffers_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___diff -r 51ce9830d85a Lib/test/test_bytes.py
--- a/Lib/test/test_bytes.pySat Oct 13 11:58:23 2012 -0400
+++ b/Lib/test/test_bytes.pySun Oct 14 01:06:42 2012 +0300
@@ -288,8 +288,22 @@
 self.assertEqual(self.type2test(b).join(lst), babc)
 self.assertEqual(self.type2test(b).join(tuple(lst)), babc)
 self.assertEqual(self.type2test(b).join(iter(lst)), babc)
-self.assertEqual(self.type2test(b.).join([bab, bcd]), bab.cd)
-# XXX more...
+dot_join = self.type2test(b.:).join
+self.assertEqual(dot_join([bab, bcd]), bab.:cd)
+self.assertEqual(dot_join([memoryview(bab), bcd]), bab.:cd)
+self.assertEqual(dot_join([bab, memoryview(bcd)]), bab.:cd)
+self.assertEqual(dot_join([bytearray(bab), bcd]), bab.:cd)
+self.assertEqual(dot_join([bab, bytearray(bcd)]), bab.:cd)
+# Stress it with many items
+seq = [babc] * 1000
+expected = babc + b.:abc * 999
+self.assertEqual(dot_join(seq), expected)
+# Error handling and cleanup when some item in the middle of the
+# sequence has the wrong type.
+with self.assertRaises(TypeError):
+dot_join([bytearray(bab), cd, bef])
+with self.assertRaises(TypeError):
+dot_join([memoryview(bab), cd, bef])
 
 def test_count(self):
 b = self.type2test(b'mississippi')
diff -r 51ce9830d85a Objects/bytearrayobject.c
--- a/Objects/bytearrayobject.c Sat Oct 13 11:58:23 2012 -0400
+++ b/Objects/bytearrayobject.c Sun Oct 14 01:06:42 2012 +0300
@@ -2569,75 +2569,126 @@
 in between each pair, and return the result as a new bytearray.);
 
 static PyObject *
-bytearray_join(PyByteArrayObject *self, PyObject *it)
+bytearray_join(PyObject *self, PyObject *orig)
 {
-PyObject *seq;
-Py_ssize_t mysize = Py_SIZE(self);
-Py_ssize_t i;
-Py_ssize_t n;
-PyObject **items;
-Py_ssize_t totalsize = 0;
-PyObject *result;
-char *dest;
-
-seq = PySequence_Fast(it, can only join an iterable);
-if (seq == NULL)
+char *sep = PyByteArray_AS_STRING(self);
+const Py_ssize_t seplen = PyByteArray_GET_SIZE(self);
+PyObject *res = NULL;
+char *p;
+Py_ssize_t seqlen = 0;
+Py_ssize_t sz = 0;
+Py_ssize_t i, j, nitems, nbufs;
+PyObject *seq, *item;
+Py_buffer *buffers = NULL;
+#define NB_STATIC_BUFFERS 10
+Py_buffer static_buffers[NB_STATIC_BUFFERS];
+
+seq = PySequence_Fast(orig, can only join an iterable);
+if (seq == NULL) {
 return NULL;
-n = PySequence_Fast_GET_SIZE(seq);
-items = PySequence_Fast_ITEMS(seq);
-
-/* Compute the total size, and check that they are all bytes */
-/* XXX Shouldn't we use _getbuffer() on these items instead? */
-for (i = 0; i  n; i++) {
-PyObject *obj = items[i];
-if (!PyByteArray_Check(obj)  !PyBytes_Check(obj)) {
-PyErr_Format(PyExc_TypeError,
- can only join an iterable of bytes 
- (item %ld has type '%.100s'),
- /* XXX %ld isn't right on Win64 */
- (long)i, Py_TYPE(obj)-tp_name);
+}
+
+seqlen = PySequence_Size(seq);
+if (seqlen == 0) {
+Py_DECREF(seq);
+return PyByteArray_FromStringAndSize(, 0);
+}
+if (seqlen  NB_STATIC_BUFFERS) {
+buffers = PyMem_NEW(Py_buffer, seqlen);
+if (buffers == NULL) {
+Py_DECREF(seq);
+return NULL;
+}
+}
+else {
+buffers = static_buffers;
+}
+
+/* There is at least one thing to join.
+ * Do a pre-pass to figure out the total amount of space we'll
+ * need (sz), and see whether all arguments are buffer-compatible.
+ */
+for (i = nitems = nbufs = 0; i  seqlen; i++) {
+Py_ssize_t itemlen;
+item = PySequence_Fast_GET_ITEM(seq, i);
+if (PyBytes_CheckExact(item)) {
+Py_INCREF(item);
+itemlen = PyBytes_GET_SIZE(item);
+}
+else {
+if (_getbuffer(item, buffers[nbufs])  0) {
+PyErr_Format(PyExc_TypeError,
+sequence item %zd: expected bytes, bytearray, 
+or an object with a buffer interface, %.80s 
found,
+i, Py_TYPE(item)-tp_name);
+goto error;
+}
+itemlen = buffers[nbufs].len;
+nbufs++;
+}
+nitems = i + 1;  /* for error cleanup */
+if (itemlen  PY_SSIZE_T_MAX - sz) {
+PyErr_SetString(PyExc_OverflowError,
+ 

[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

The problem with your approach is that the sequence could be mutated while 
another thread is running (_getbuffer() may release the GIL). Then the 
pre-computed size gets wrong.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 The problem with your approach is that the sequence could be mutated while
 another thread is running (_getbuffer() may release the GIL). Then the
 pre-computed size gets wrong.

Well, then I withdraw my patch.

But what if the sequence will be mutated and PySequence_Size(seq) will become 
less seqlen? Then using PySequence_Fast_GET_ITEM() will be incorrect.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 But what if the sequence will be mutated and PySequence_Size(seq) will become 
 less seqlen? Then using PySequence_Fast_GET_ITEM() will be incorrect.

Perhaps we should detect that case and raise, then.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-10-13 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file27557/bytes_join_buffers_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Ezio Melotti

Ezio Melotti added the comment:

Attached patch adds support for memoryviews to bytes.join:

 b''.join([memoryview(b'foo'), b'bar'])
b'foobar'

The implementation currently has some duplication, because it does a first pass 
to calculate the total size to allocate, and another pass to create the result 
that calculates the individual sizes again.  Py_SIZE(item) can't be used here 
for memoryviews, so during the first pass it's now necessary to check for 
memoryviews and extract the len from the memoryview buffer, and then do it 
again during the second pass.
If necessary this could be optimized/improved.

I also tried to check for multi-dimensional and non-contiguous buffers, but I 
didn't manage to obtain a failure so I removed the check for now.
More tests should probably be added to cover these cases, and possibly the 
patch should be adjusted accordingly.

If/when the patch is OK I'll do the same for bytearrays.

--
keywords: +patch
nosy: +ezio.melotti
stage: needs patch - patch review
Added file: http://bugs.python.org/file27257/issue15958.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think memoryview here is example only, and Antoine had in mind arbitrary 
buffer objects.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Ezio Melotti

Ezio Melotti added the comment:

Indeed.  Attached new patch.
Tests still need to be improved; bytearrays are still not changed.

--
Added file: http://bugs.python.org/file27258/issue15958-2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Stefan Krah

Stefan Krah added the comment:

We would need to release the buffers and also check for format 'B'.
With issue15958-2.diff this is possible:

 import array
 a = array.array('d', [1.2345])
 b''.join([b'ABC', a])
b'ABC\x8d\x97n\x12\x83\xc0\xf3?'


It is unfortunate that a PyBUF_SIMPLE request does not guarantee 'B'.
I think that's probably a mistake, but it's certainly existing practice.

--
nosy: +skrah

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Stefan Krah

Stefan Krah added the comment:

Also, perhaps we can keep a fast path for bytes and bytearray, but I
didn't time the difference.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Well, given the following works:

 import array
 a = array.array('d', [1.2345])
 b'' + a
b'\x8d\x97n\x12\x83\xc0\xf3?'

It should also work for bytes.join().
I guess that means I'm against the strict-typedness of memoryviews. As the name 
suggests, it provides access to some memory area, not some structured array of 
data.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Ezio Melotti

Ezio Melotti added the comment:

Attached new refleakless patch.

--
Added file: http://bugs.python.org/file27259/issue15958-3.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Attached new refleakless patch.

Your approach is dangerous, because the buffers may change size between
two calls to PyObject_GetBuffer(). I think you should keep the
Py_buffers alive in an array, and only release them at the end (it may
also be slightly faster to do so).

A nit: you are adding a lot of newlines in test_bytes.py.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 I think you should keep the
 Py_buffers alive in an array, and only release them at the end (it may
 also be slightly faster to do so).

However allocation of this array may considerably slow down the function. We 
may need the special-case for bytes and bytearray. Stop, and the bytearray (or 
bytearray subclass) can change size between two calls to Py_SIZE()?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-17 Thread Antoine Pitrou

New submission from Antoine Pitrou:

This should ideally succeed:

 b''.join([memoryview(b'foo'), b'bar'])
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: sequence item 0: expected bytes, memoryview found

(ditto for bytearray.join)

--
components: Interpreter Core
messages: 170618
nosy: pitrou
priority: normal
severity: normal
stage: needs patch
status: open
title: bytes.join() should allow arbitrary buffer objects
type: enhancement
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15958] bytes.join() should allow arbitrary buffer objects

2012-09-17 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15958
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com