Kyle Stanley <aeros...@gmail.com> added the comment:

DanilZ, could you take a look at the superseding issue 
(https://bugs.python.org/issue37297) and see if your exception raised within 
the job is the same?  

If it's not, I would suggest opening a separate issue (and linking to it in a 
comment here), as I don't think it's necessarily related to this one. 
"state=finished raised error" doesn't indicate the specific exception that 
occurred. A good format for the name would be something along the lines of:

"ProcessPoolExecutor.submit() <specific exception name here> while reading 
large object (4GB)"

It'd also be helpful in the separate issue to paste the full exception stack 
trace, specify OS, and multiprocessing start method used (spawn, fork, or 
forkserver). This is necessary to know for replicating the issue on our end.

In the meantime, I workaround I would suggest trying would be to use the  
*chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it across 
several jobs (at least 4+, more if you have additional cores) instead of within 
a single one. It'd also be generally helpful to see if that alleviates the 
problem, as it could possibly indicate an issue with running out of memory when 
the dataframe is converted to pickle format (which often increases the total 
size) within the process associated with the job.

----------
nosy: +aeros

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37294>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to