[issue17560] problem using multiprocessing with really big objects?

2018-11-06 Thread Antoine Pitrou


Change by Antoine Pitrou :


--
assignee: davin -> 
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2018-11-06 Thread Antoine Pitrou


Antoine Pitrou  added the comment:


New changeset bccacd19fa7b56dcf2fbfab15992b6b94abb by Antoine Pitrou 
(Alexander Buchkovsky) in branch 'master':
bpo-17560: Too small type for struct.pack/unpack in mutliprocessing.Connection 
(GH-10305)
https://github.com/python/cpython/commit/bccacd19fa7b56dcf2fbfab15992b6b94abb


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2018-11-05 Thread Oleksandr Buchkovskyi


Change by Oleksandr Buchkovskyi :


--
keywords: +patch
pull_requests: +9646
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2018-09-19 Thread Shyam Saladi


Shyam Saladi  added the comment:

I apologize if this isn't the right forum to post this message, but Davin, if 
you have since put together the guide/blogpost mentioned in your message, would 
you be able to share a link to the material please?

I am interested in gaining a better understanding of multiprocessing's 
pluggable pickler and, more generally, dealing with large objects (the comment 
about this being being an anti-pattern is noted).

Thank you.

--
nosy: +saladi

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-07-29 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Davin, when you write-up a blog post, I think it would be helpful to mention 
that creating really large objects with multi-processing is mostly an 
anti-pattern (the cost of pickling and interprocess communication tends to 
drown-out the benefits of parallel processing).

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-07-29 Thread Antoine Pitrou

Antoine Pitrou added the comment:

> There's also the other multiprocessing limitation Antoine mentioned early on, 
> where queues/pipes used a 32-bit signed integer to encode object length.

> Is there a way or plan to get around this limitation?

As I said above in https://bugs.python.org/issue17560#msg185345, it should be 
easy to improve the current protocol to allow for larger than 2GB data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-07-25 Thread Daniel Liu

Daniel Liu added the comment:

There's also the other multiprocessing limitation Antoine mentioned early on, 
where queues/pipes used a 32-bit signed integer to encode object length.

Is there a way or plan to get around this limitation?

--
nosy: +Daniel Liu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-05-17 Thread Igor

Changes by Igor :


--
nosy: +i3v

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-03-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Pickle currently handle byte strings and unicode strings larger than 4GB only 
with protocol 4. But multiprocessing currently uses the default protocol which 
currently equals 3. There was suggestions to change the default pickle protocol 
(issue23403), the pickle protocol for multiprocessing (issue26507) or customize 
the serialization method for multiprocessing (issue28053). There is also a 
patch that implements the support of byte strings and unicode strings larger 
than 4GB with all protocols (issue25370).

Beside this I think that using some kind of shared memory is better way for 
transferring large data between subprocesses.

--
nosy: +serhiy.storchaka
versions: +Python 3.7 -Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-03-12 Thread artxyz

artxyz added the comment:

@davin Thanks for your answer! I will update to the current version.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-03-12 Thread Davin Potts

Davin Potts added the comment:

@artxyz: The current release of 2.7 is 2.7.13 -- if you are still using 2.7.5 
you might consider updating to the latest release.

As pointed out in the text of the issue, the multiprocessing pickler has been 
made pluggable in 3.3 and it's been made more conveniently so in 3.6.  The 
issue reported here arises from the constraints of working with large objects 
and pickle, hence the enhanced ability to take control of the multiprocessing 
pickler in 3.x applies.

I'll assign this issue to myself as a reminder to create a blog post around 
this example and potentially include it as a motivating need for controlling 
the multiprocessing pickler in the documentation.

--
assignee:  -> davin
nosy: +davin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2017-03-12 Thread artxyz

artxyz added the comment:

This is still an issue in Python 2.7.5

Will it be fixed?

--
nosy: +artxyz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel

Olivier Grisel added the comment:

> In 3.3 you can do
>
> from multiprocessing.forking import ForkingPickler
> ForkingPickler.register(MyType, reduce_MyType)
>
> Is this sufficient for you needs?  This is private (and its definition has 
> moved in 3.4) but it could be made public.

Indeed I forgot that the multiprocessing pickler was made already made
pluggable in Python 3.3. I needed backward compat for python 2.6 in
joblib hence I had to rewrite a bunch of the class hierarchy.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Richard Oudkerk

Richard Oudkerk added the comment:

> I could submit the part that makes it possible to customize the picklers 
> of multiprocessing.pool.Pool instance to the standard library if people 
> are interested.

2.7 and 3.3 are in bugfix mode now, so they will not change.

In 3.3 you can do

from multiprocessing.forking import ForkingPickler
ForkingPickler.register(MyType, reduce_MyType)

Is this sufficient for you needs?  This is private (and its definition has 
moved in 3.4) but it could be made public.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel

Olivier Grisel added the comment:

I forgot to end a sentence in my last comment:

- detect mmap-backed numpy

should read:

- detect mmap-backed numpy arrays and pickle only the filename and other buffer 
metadata to reconstruct a mmap-backed array in the worker processes instead of 
copying the data around.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel

Olivier Grisel added the comment:

I have implemented a custom subclass of the multiprocessing Pool to be able 
plug custom pickling strategy for this specific use case in joblib:

https://github.com/joblib/joblib/blob/master/joblib/pool.py#L327

In particular it can:

- detect mmap-backed numpy
- transform large memory backed numpy arrays into numpy.memmap instances prior 
to pickling using the /dev/shm partition when available or TMPDIR otherwise.

Here is some doc: 
https://github.com/joblib/joblib/blob/master/doc/parallel_numpy.rst

I could submit the part that makes it possible to customize the picklers of 
multiprocessing.pool.Pool instance to the standard library if people are 
interested.

The numpy specific stuff would stay in third party projects such as joblib but 
at least that would make it easier for people to plug their own optimizations 
without having to override half of the multiprocessing class hierarchy.

--
nosy: +Olivier.Grisel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 27/03/13 21:09, Charles-François Natali wrote:
> I could, but I don't have to: a shared memory won't incur any I/O or 
> copy (except if it is swapped). A file-backed mmap will incur a *lot* 
> of I/O: really, just try writting a 1GB file, and you'll see your disk 
> spin, or use cat /proc/diskstats.

You are right.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Charles-François Natali

Charles-François Natali added the comment:

> I meant when there is no memory pressure.

http://lwn.net/Articles/326552/
"""
The kernel page cache contains in-memory copies of data blocks
belonging to files kept in persistent storage. Pages which are written
to by a processor, but not yet written to disk, are accumulated in
cache and are known as "dirty" pages. The amount of dirty memory is
listed in /proc/meminfo. Pages in the cache are flushed to disk after
an interval of 30 seconds. Pdflush is a set of kernel threads which
are responsible for writing the dirty pages to disk, either explicitly
in response to a sync() call, or implicitly in cases when the page
cache runs out of pages, if the pages have been in memory for too
long, or there are too many dirty pages in the page cache (as
specified by /proc/sys/vm/dirty_ratio).
"""

>>> FreeBSD has a MAP_NOSYNC flag which gives Linux behaviour (otherwise
>>> dirty pages are flushed every 30-60).
>>
>> It's the same on Linux, depending on your mount options, data will be
>> committed to disk every 5 seconds or so, when the journal is
>> committed.
>
> Googling suggsests that MAP_SHARED on Linux is equivalent to MAP_SHARED
> | MAP_NOSYNC on FreeBSD.  I don't think it has anything to do with mount
> options.

"""
MAP_NOSYNCCauses data dirtied via this VM map to be flushed to
   physical media only when necessary (usually by the
   pager) rather than gratuitously.
[...]
"""

This just means that it will reduce synchronous writeback, but
writeback will still occur (by what they call the pager).

On Linux, writeback can be done by background kernel threads
(pdflush), or synchrously on behalf of the process.

The "mount option" thing is the following:
if the file system is mounted with data=journal or data=ordered, data
is written to disk before corresponding metadata is committed. And
metadata is written when the journal is committed, by default every 5
seconds:

man mount:
"""
ext3

   commit=nrsec   data={journal|ordered|writeback}
  Specifies the journalling mode for file data.  Metadata
is always journaled.  To use modes other than ordered on the root
filesystem, pass the mode to the kernel
  as boot parameter, e.g.  rootflags=data=journal.

  journal
 All data is committed into the journal prior to
being written into the main filesystem.

  ordered
 This is the default mode.  All data is forced
directly out to the main file system prior to its metadata being
committed to the journal.

  writeback
 Data ordering is not preserved - data may be
written into the main filesystem after its metadata has been committed
to the journal.  This is  rumoured  to
 be the highest-throughput option.  It guarantees
internal filesystem integrity, however it can allow old data to appear
in files after a crash and journal
 recovery.

   commit=nrsec
  Sync all data and metadata every nrsec seconds. The
default value is 5 seconds. Zero means default.
"""
> The Linux man page refuses to specify
>
>MAP_SHARED
>  Share this mapping. Updates to the mapping are visible to other
>  processes that map this file, and are carried through to the
>  underlying file. **The file may not actually be updated until
>  msync(2) or munmap() is called.**

*may*,:just as fsync() is required to make sure data is committed to
disk for a file, msync() is required for a mapping. But data is
committed asynchronously or synchronously depending on different
criterias (ratio of dirty pages, free memory, dirty pages age, etc).

> Can you demonstrate a slowdown with a benchmark?

I could, but I don't have to: a shared memory won't incur any I/O or
copy (except if it is swapped).
A file-backed mmap will incur a *lot* of I/O: really, just try
writting a 1GB file, and you'll see your disk spin, or use cat
/proc/diskstats.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 27/03/2013 8:14pm, Charles-François Natali wrote:
>
> Charles-François Natali added the comment:
>
>> Apart from creating, unlinking and resizing the file I don't think there
>> should be any disk I/O.
>>
>> On Linux disk I/O only occurs when fsync() or close() are called.
>
> What?
> Writeback occurs depending on the memory pressure, percentage of used
> pages, page modification time, etc. Try writing a large file without
> closing it, you'll see that there's disk activity (or use
> iostat/vmstat).

I meant when there is no memory pressure.

>> FreeBSD has a MAP_NOSYNC flag which gives Linux behaviour (otherwise
>> dirty pages are flushed every 30-60).
>
> It's the same on Linux, depending on your mount options, data will be
> committed to disk every 5 seconds or so, when the journal is
> committed.

Googling suggsests that MAP_SHARED on Linux is equivalent to MAP_SHARED 
| MAP_NOSYNC on FreeBSD.  I don't think it has anything to do with mount 
options.

The Linux man page refuses to specify

   MAP_SHARED
 Share this mapping. Updates to the mapping are visible to other
 processes that map this file, and are carried through to the
 underlying file. **The file may not actually be updated until
 msync(2) or munmap() is called.**

>> Once the file has been unlink()ed then any sensible operating system
>> should realize it does not need to sync the file.
>
> Why?
> Even if you delete the file right after open, if you write data to it,
> when the amount of data written fills your caches, the data has to go
> somewhere, even if only to make it available to the current process
> upon read()...

Can you demonstrate a slowdown with a benchmark?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Charles-François Natali

Charles-François Natali added the comment:

> Apart from creating, unlinking and resizing the file I don't think there
> should be any disk I/O.
>
> On Linux disk I/O only occurs when fsync() or close() are called.

What?
Writeback occurs depending on the memory pressure, percentage of used
pages, page modification time, etc. Try writing a large file without
closing it, you'll see that there's disk activity (or use
iostat/vmstat).

> FreeBSD has a MAP_NOSYNC flag which gives Linux behaviour (otherwise
> dirty pages are flushed every 30-60).

It's the same on Linux, depending on your mount options, data will be
committed to disk every 5 seconds or so, when the journal is
committed.

> Once the file has been unlink()ed then any sensible operating system
> should realize it does not need to sync the file.

Why?
Even if you delete the file right after open, if you write data to it,
when the amount of data written fills your caches, the data has to go
somewhere, even if only to make it available to the current process
upon read()...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 27/03/2013 7:27pm, Charles-François Natali wrote:
>
> Charles-François Natali added the comment:
>
>> Through fork, yes, but "shared" rather than "copy-on-write".
>
> There's a subtlety: because of refcounting, just treating a COW object
> as read-only (e.g. iteratin on the array) will trigger a copy
> anyway...

I mean "write-through" (as opposed to "read-only" or "copy-on-write").

>>   I don't think shm_open() really has any advantages over
>> using mmaps backed by "proper" files (since posix shared memeory uses up
>> space in /dev/shm which is limited).
>
> File-backed mmap() will incur disk I/O (although some of the data will
> probably sit in the page cache), which would be much slower than a
> shared memory. Also, you need corresponding disk space.
> As for the /dev/shm limit, it's normally dimensioned according to the
> amount of RAM, which is normally, which is in turn dimensioned
> according to the working set.

Apart from creating, unlinking and resizing the file I don't think there 
should be any disk I/O.

On Linux disk I/O only occurs when fsync() or close() are called. 
FreeBSD has a MAP_NOSYNC flag which gives Linux behaviour (otherwise 
dirty pages are flushed every 30-60).

Once the file has been unlink()ed then any sensible operating system 
should realize it does not need to sync the file.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Charles-François Natali

Charles-François Natali added the comment:

> Through fork, yes, but "shared" rather than "copy-on-write".

There's a subtlety: because of refcounting, just treating a COW object
as read-only (e.g. iteratin on the array) will trigger a copy
anyway...

> I assume you mean "shared memory" and shm_open(), not "semaphores" and
> sem_open().

Yes ;-)

>  I don't think shm_open() really has any advantages over
> using mmaps backed by "proper" files (since posix shared memeory uses up
> space in /dev/shm which is limited).

File-backed mmap() will incur disk I/O (although some of the data will
probably sit in the page cache), which would be much slower than a
shared memory. Also, you need corresponding disk space.
As for the /dev/shm limit, it's normally dimensioned according to the
amount of RAM, which is normally, which is in turn dimensioned
according to the working set.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 27/03/2013 5:47pm, Charles-François Natali wrote:
>> multiprocessing currently only allows sharing of such shared arrays
>> using inheritance.
>
> You mean through fork() COW?

Through fork, yes, but "shared" rather than "copy-on-write".

>>   Perhaps we need a picklable mmap type which can be sent over pipes
>> and queues.  (On Unix this would probably require fd passing.)
>
> If you use POSIX semaphores, you could pass the semaphore path and use
> sem_open in the other process (but that would mean you can't unlink it
> right after open).

I assume you mean "shared memory" and shm_open(), not "semaphores" and 
sem_open().  I don't think shm_open() really has any advantages over 
using mmaps backed by "proper" files (since posix shared memeory uses up 
space in /dev/shm which is limited).

By using fd passing you can get the operating system to do ref counting 
on the mmaps and not worry about when to unlink.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread mrjbq7

mrjbq7 added the comment:

> Richard was saying that you shouldn't serialize such a large array,
> that's just a huge performance bottleneck. The right way would be 
> to use a shared memory.

Gotcha, for clarification, my original use case was to *create* them
in the other process (something which took some time since they were 
calculated and not just random as in the example) and returned to the 
original process for further computation...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Charles-François Natali

Charles-François Natali added the comment:

> Also, does pickle currently handle byte strings larger than 4GB?

The 2.7 failure is indeed a pickle limitation, which should now be fixed by 
issue #13555.

> On a machine with 256GB of RAM, it makes more sense to send arrays
> of this size than say on a laptop...

Richard was saying that you shouldn't serialize such a large array, that's just 
a huge performance bottleneck. The right way would be to use a shared memory.

> multiprocessing currently only allows sharing of such shared arrays
> using inheritance.

You mean through fork() COW?

>  Perhaps we need a picklable mmap type which can be sent over pipes
> and queues.  (On Unix this would probably require fd passing.)

If you use POSIX semaphores, you could pass the semaphore path and use sem_open 
in the other process (but that would mean you can't unlink it right after open).

--
nosy: +neologix

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 27/03/2013 5:13pm, mrjbq7 wrote:
> On a machine with 256GB of RAM, it makes more sense to send arrays
> of this size than say on a laptop...

I was thinking more of speed than memory consumption.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread mrjbq7

mrjbq7 added the comment:

On a machine with 256GB of RAM, it makes more sense to send arrays of this size 
than say on a laptop...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Richard Oudkerk

Richard Oudkerk added the comment:

> I *think* we need to keep compatibility with the wire format, but perhaps 
> we could use a special length value (-1?) to introduce a longer (64-bit) 
> length value.

Yes we could, although that would not help on Windows pipe connections (where 
byte oriented messages are used instead).  Also, does pickle currently handle 
byte strings larger than 4GB?

But I can't help feeling that multigigabyte arrays should be transferred using 
shared mmaps rather than serialization.  numpy.frombuffer() could be used to 
recreate the array from the mmap.

multiprocessing currently only allows sharing of such shared arrays using 
inheritance.  Perhaps we need a picklable mmap type which can be sent over 
pipes and queues.  (On Unix this would probably require fd passing.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

A multiprocessing queue currently uses a 32-bit signed int to encode object 
length (in bytes):

def _send_bytes(self, buf):
# For wire compatibility with 3.2 and lower
n = len(buf)
self._send(struct.pack("!i", n))
# The condition is necessary to avoid "broken pipe" errors
# when sending a 0-length buffer if the other end closed the pipe.
if n > 0:
self._send(buf)

I *think* we need to keep compatibility with the wire format, but perhaps we 
could use a special length value (-1?) to introduce a longer (64-bit) length 
value.

--
nosy: +pitrou, sbt
type:  -> enhancement
versions: +Python 3.4 -Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17560] problem using multiprocessing with really big objects?

2013-03-27 Thread mrjbq7

New submission from mrjbq7:

I ran into a problem using multiprocessing to create large data objects (in 
this case numpy float64 arrays with 90,000 columns and 5,000 rows) and return 
them to the original python process.

It breaks in both Python 2.7 and 3.3, using numpy 1.7.0 (but with different 
error messages).

It is possible the array is too large to be serialized (450 million 64-bit 
numbers exceeds a 32-bit limit)?


Python 2.7
==

Process PoolWorker-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 99, in worker
put((job, i, result))
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put
return send(obj)
SystemError: NULL result without error in PyObject_Call


Python 3.3
==

Traceback (most recent call last):
  File "multi.py", line 18, in 
results = pool.map_async(make_data, range(5)).get(999)
  File "/usr/lib/python3.3/multiprocessing/pool.py", line 562, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[array([[ 
0.74628629,  0.36130663, -0.65984794, ..., -0.70921838,
 0.34389663, -1.7135126 ],
   [ 0.60266867, -0.40652402, -1.31590562, ...,  1.44896246,
-0.3922366 , -0.85012842],
   [ 0.59629641, -0.00623001, -0.12914128, ...,  0.99925511,
-2.30418136,  1.73414009],
   ..., 
   [ 0.24246639,  0.87519509,  0.24109069, ..., -0.48870107,
-0.20910332,  0.11749621],
   [ 0.62108937, -0.86217542, -0.47357384, ...,  1.59872243,
 0.76639995, -0.56711461],
   [ 0.90976471,  1.73566475, -0.18191821, ...,  0.19784432,
-0.29741643, -1.46375835]])]'. Reason: 'error("'i' format requires 
-2147483648 <= number <= 2147483647",)'

--
files: multi.py
messages: 185344
nosy: mrjbq7
priority: normal
severity: normal
status: open
title: problem using multiprocessing with really big objects?
versions: Python 3.3
Added file: http://bugs.python.org/file29595/multi.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com