[issue33081] multiprocessing Queue leaks a file descriptor associated with the pipe writer

2019-01-31 Thread Chris Langton


Chris Langton  added the comment:

interestingly, while it is expected Process or Queue would actually close 
resource file descriptors and doesn't because a dev decided they prefer to 
defer to the user how to manage gc themselves, the interesting thing is if you 
'upgrade' your code to use a pool, the process fd will be closed as the pool 
will destroy the object (so it is gc more often);

Say you're limited to a little over 1000 fd in your o/s you can do this

###

import multiprocessing
import json


def process(data):
with open('/tmp/fd/%d.json' % data['name'], 'w') as f:
f.write(json.dumps(data))
return 'processed %d' % data['name']

if __name__ == '__main__':
pool = multiprocessing.Pool(1000)
try:
for _ in range(1000):
x = {'name': _}
pool.apply(process, args=(x,))
finally:
pool.close()
del pool

###

only the pool fd hangs around longer then it should, which is a huge 
improvement, and you might not find a scenario where you need many pool objects.

--

___
Python tracker 
<https://bugs.python.org/issue33081>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33081] multiprocessing Queue leaks a file descriptor associated with the pipe writer

2019-01-31 Thread Chris Langton


Chris Langton  added the comment:

@pitrou I am interested in a fix for Python 2.7 because in Python 3.x the 
manner in which arithmetic is output is not arbitrary precise.

So I will continue using Python 2.7 until another language I am familiar with 
that has superior arbitrary precise arithmetic compared to python 3.x reaches a 
point where the lib ecosystem is as mature as python 2.7

I heavily use multiprocessing and have many use cases where i work around this 
issue, because i encounter it almost every time i find i need multiprocessing, 
basically i decide i need multiprocessing when i have too many external 
resources being processed by 1 CPU, meaning that multiprocessing will be 
managing thousands of external resources on immediate use!

I work around this issue with this solution instead of the Queue with always 
failed!



#!/usr/bin/env python  

import multiprocessing, time

ARBITRARY_DELAY = 10

processes = []
for data in parse_file(zonefile_path, regex):
t = multiprocessing.Process(target=write_to_json, args=(data, ))
processes.append(t)

i = 0
for one_process in processes:
i += 1
if i % 1000 == 0:
time.sleep(ARBITRARY_DELAY)
one_process.start()

for one_process in processes:
one_process.join()



At the time (years ago) i don't think i knew enough about fd to be good enough 
to solve the root cause (or be elegant) and i've reused this code every time 
Queue failed me (every time i use Queue basically)

To be frank, i ask anyone and they say Queue is flawed.

Now, I am older, and I had some free time, I decided to fix my zonefile parsing 
scripts and use more elegant solutions, i finally looked at the old code i 
reused in many projects and identified it was actually a fd issue (yay for 
knowledge) and was VERY disappointed to see here that you didn't care to solve 
the problem for python 2.7.. very unprofessional..

Now i am disappointed to by unpythonic and add to my script gc... you're 
unprofessionalism now makes me be unprofessional

--
nosy: +Chris Langton

___
Python tracker 
<https://bugs.python.org/issue33081>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com