[issue37294] concurrent.futures.ProcessPoolExecutor state=finished raised error

2020-10-01 Thread DanilZ

DanilZ  added the comment:

I think you have correctly estimated the problem in the last part of your 
message: "as it could possibly indicate an issue with running out of memory 
when the dataframe is converted to pickle format (which often increases the 
total size) within the process associated with the job”

The function pd.read_csv performs without any problems inside a process, the 
error appears only when I try to extract it from the finished process via:
for f in concurrent.futures.as_completed(results):
data = f.result()

or

data = results.result()

It just does not pass a large file from the results object.

I am sure that inside of a multiprocess everything works correctly for 2 
reasons:
1. If I change in function inside a process to just save the file (that had 
been read in memory) to disk.
2. If I recuse the file size, then it gets extracted from results.result() 
without error.

So I guess then that my question narrows down to: 
1. Can I increase the memory allocated to a process? 
2. Or at least understand what would is the limit.

Regards,
Danil

> On 1 Oct 2020, at 03:11, Kyle Stanley  wrote:
> 
> 
> Kyle Stanley  added the comment:
> 
> DanilZ, could you take a look at the superseding issue 
> (https://bugs.python.org/issue37297) and see if your exception raised within 
> the job is the same?  
> 
> If it's not, I would suggest opening a separate issue (and linking to it in a 
> comment here), as I don't think it's necessarily related to this one. 
> "state=finished raised error" doesn't indicate the specific exception that 
> occurred. A good format for the name would be something along the lines of:
> 
> "ProcessPoolExecutor.submit()  while reading 
> large object (4GB)"
> 
> It'd also be helpful in the separate issue to paste the full exception stack 
> trace, specify OS, and multiprocessing start method used (spawn, fork, or 
> forkserver). This is necessary to know for replicating the issue on our end.
> 
> In the meantime, I workaround I would suggest trying would be to use the  
> *chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it 
> across several jobs (at least 4+, more if you have additional cores) instead 
> of within a single one. It'd also be generally helpful to see if that 
> alleviates the problem, as it could possibly indicate an issue with running 
> out of memory when the dataframe is converted to pickle format (which often 
> increases the total size) within the process associated with the job.
> 
> --
> nosy: +aeros
> 
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37294] concurrent.futures.ProcessPoolExecutor state=finished raised error

2020-10-01 Thread DanilZ


DanilZ  added the comment:

.

> On 1 Oct 2020, at 03:11, Kyle Stanley  wrote:
> 
> 
> Kyle Stanley  added the comment:
> 
> DanilZ, could you take a look at the superseding issue 
> (https://bugs.python.org/issue37297) and see if your exception raised within 
> the job is the same?  
> 
> If it's not, I would suggest opening a separate issue (and linking to it in a 
> comment here), as I don't think it's necessarily related to this one. 
> "state=finished raised error" doesn't indicate the specific exception that 
> occurred. A good format for the name would be something along the lines of:
> 
> "ProcessPoolExecutor.submit()  while reading 
> large object (4GB)"
> 
> It'd also be helpful in the separate issue to paste the full exception stack 
> trace, specify OS, and multiprocessing start method used (spawn, fork, or 
> forkserver). This is necessary to know for replicating the issue on our end.
> 
> In the meantime, I workaround I would suggest trying would be to use the  
> *chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it 
> across several jobs (at least 4+, more if you have additional cores) instead 
> of within a single one. It'd also be generally helpful to see if that 
> alleviates the problem, as it could possibly indicate an issue with running 
> out of memory when the dataframe is converted to pickle format (which often 
> increases the total size) within the process associated with the job.
> 
> --
> nosy: +aeros
> 
> ___
> Python tracker 
> 
> ___

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37294] concurrent.futures.ProcessPoolExecutor state=finished raised error

2020-09-30 Thread Kyle Stanley


Kyle Stanley  added the comment:

DanilZ, could you take a look at the superseding issue 
(https://bugs.python.org/issue37297) and see if your exception raised within 
the job is the same?  

If it's not, I would suggest opening a separate issue (and linking to it in a 
comment here), as I don't think it's necessarily related to this one. 
"state=finished raised error" doesn't indicate the specific exception that 
occurred. A good format for the name would be something along the lines of:

"ProcessPoolExecutor.submit()  while reading 
large object (4GB)"

It'd also be helpful in the separate issue to paste the full exception stack 
trace, specify OS, and multiprocessing start method used (spawn, fork, or 
forkserver). This is necessary to know for replicating the issue on our end.

In the meantime, I workaround I would suggest trying would be to use the  
*chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it across 
several jobs (at least 4+, more if you have additional cores) instead of within 
a single one. It'd also be generally helpful to see if that alleviates the 
problem, as it could possibly indicate an issue with running out of memory when 
the dataframe is converted to pickle format (which often increases the total 
size) within the process associated with the job.

--
nosy: +aeros

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37294] concurrent.futures.ProcessPoolExecutor state=finished raised error

2020-09-28 Thread DanilZ


DanilZ  added the comment:

After executing a single task inside a process the result is returned with 
state=finished raised error.

Error happens when trying to load a big dataset (over 5 GB). Otherwise the same 
dataset reduced to a smaller nrows executes and returns from result() without 
errors.

with concurrent.futures.ProcessPoolExecutor(max_workers = 1) as executor:
results = executor.submit(pd.read_csv, path)

data = results.result()

--
components: +2to3 (2.x to 3.x conversion tool) -Library (Lib)
nosy: +DanilZ
title: concurrent.futures.ProcessPoolExecutor and multiprocessing.pool.Pool 
fail with super -> concurrent.futures.ProcessPoolExecutor state=finished raised 
error

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com