ma3mju wrote:
On 7 Aug, 16:02, MRAB <pyt...@mrabarnett.plus.com> wrote:
ma3mju wrote:
On 3 Aug, 09:36, ma3mju <matt.u...@googlemail.com> wrote:
On 2 Aug, 21:49, Piet van Oostrum <p...@cs.uu.nl> wrote:
MRAB <pyt...@mrabarnett.plus.com> (M) wrote:
M> I wonder whether one of the workers is raising an exception, perhaps due
M> to lack of memory, when there are large number of jobs to process.
But that wouldn't prevent the join. And you would probably get an
exception traceback printed.
I wonder if something fishy is happening in the multiprocessing
infrastructure. Or maybe the Fortran code goes wrong because it has no
protection against buffer overruns and similar problems, I think.
--
Piet van Oostrum <p...@cs.uu.nl>
URL:http://pietvanoostrum.com[PGP8DAE142BE17999C4]
Private email: p...@vanoostrum.org
I don't think it's a memory problem, the reason for the hard and easy
queue is because for larger examples it uses far more RAM. If I run
all of workers with harder problems I do begin to run out of RAM and
end up spending all my time switching in and out of swap so I limit
the number of harder problems I run at the same time. I've watched it
run to the end (a very boring couple of hours) and it stays out of my
swap space and everything appears to be staying in RAM. Just hangs
after all "poison" has been printed for each process.
The other thing is that I get the message "here" telling me I broke
out of the loop after seeing the poison pill in the process and I get
all the things queued listed as output surely if I were to run out of
memory I wouldn't expect all of the jobs to be listed as output.
I have a serial script that works fine so I know individually for each
example the fortran code works.
Thanks
Matt
Any ideas for a solution?
A workaround is to do them in small batches.

You could put each job in a queue with a flag to say whether it's hard
or easy, then:

     while have more jobs:
         move up to BATCH_SIZE jobs into worker queues
         create and start workers
         wait for workers to finish
         discard workers

Yeah, I was hoping for something with a bit more finesse. In the end I
used pool instead with a callback function and that has solved the
problem. I did today find this snippet;

Joining processes that use queues

    Bear in mind that a process that has put items in a queue will
wait before terminating until all the buffered items are fed by the
“feeder” thread to the underlying pipe. (The child process can call
the Queue.cancel_join_thread() method of the queue to avoid this
behaviour.)

    This means that whenever you use a queue you need to make sure
that all items which have been put on the queue will eventually be
removed before the process is joined. Otherwise you cannot be sure
that processes which have put items on the queue will terminate.
Remember also that non-daemonic processes will be automatically be
joined.


I don't know (not a computer scientist) but could it have been the
pipe getting full?

In case anyway else is effected by this I've attached the new code to
see the changes I made to fix it.

[snip]
Maybe the reason is this:

Threads share an address space, so putting data into a queue simply
involves putting a reference there, but processes don't share an address
space, so a sender must continue to exist until the data itself has been
copied into the pipe that connects the processes. This pipe has a
limited capacity.

In your code you were waiting for the easy workers to terminate and you
weren't reading from the queue, and maybe, therefore, the pipe either,
so with a large number of jobs the pipe was becoming full.

In summary: the worker didn't terminate because the pipe was full; the
pipe was full because you weren't reading the results; you weren't
reading the results because the worker hadn't terminated.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to