Re: [Pytables-users] PyTables hangs while opening file in worker process
Hmm sorry to hear that Owen Let me know how it goes. On Thu, Oct 11, 2012 at 11:07 AM, Owen Mackwood < owen.mackw...@bccn-berlin.de> wrote: > Hi Anthony, > > I tried your suggestion and it has not solved the problem. It could be > that it makes the problem go away in the test code because it changes the > timing of the processes. I'll see if I can modify the test code to > reproduce the hang even with reloading the tables module. > > Regards, > Owen > > > On 10 October 2012 22:00, Anthony Scopatz wrote: > >> So Owen, >> >> I am still not sure what the underlying problem is, but I altered your >> parallel function to forciably reload pytables each time it is called. >> This seemed to work perfectly on my larger system but not at all on my >> smaller one. If there is a way that you can isolate pytables and not >> import it globally at all, it might work even better. Below is the code >> snippet. I hope this helps. >> >> Be Well >> Anthony >> >> def run_simulation_single((paramspace_pt, params)): >> import sys >> rmkeys = [key for key in sys.modules if key.startswith('tables')] >> for key in rmkeys: >> del sys.modules[key] >> import traceback >> import tables >> try: >> filename = params['results_file'] >> >> > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Anthony, I tried your suggestion and it has not solved the problem. It could be that it makes the problem go away in the test code because it changes the timing of the processes. I'll see if I can modify the test code to reproduce the hang even with reloading the tables module. Regards, Owen On 10 October 2012 22:00, Anthony Scopatz wrote: > So Owen, > > I am still not sure what the underlying problem is, but I altered your > parallel function to forciably reload pytables each time it is called. > This seemed to work perfectly on my larger system but not at all on my > smaller one. If there is a way that you can isolate pytables and not > import it globally at all, it might work even better. Below is the code > snippet. I hope this helps. > > Be Well > Anthony > > def run_simulation_single((paramspace_pt, params)): > import sys > rmkeys = [key for key in sys.modules if key.startswith('tables')] > for key in rmkeys: > del sys.modules[key] > import traceback > import tables > try: > filename = params['results_file'] > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
So Owen, I am still not sure what the underlying problem is, but I altered your parallel function to forciably reload pytables each time it is called. This seemed to work perfectly on my larger system but not at all on my smaller one. If there is a way that you can isolate pytables and not import it globally at all, it might work even better. Below is the code snippet. I hope this helps. Be Well Anthony def run_simulation_single((paramspace_pt, params)): import sys rmkeys = [key for key in sys.modules if key.startswith('tables')] for key in rmkeys: del sys.modules[key] import traceback import tables try: filename = params['results_file'] On Wed, Oct 10, 2012 at 2:06 PM, Owen Mackwood wrote: > On 10 October 2012 20:08, Anthony Scopatz wrote: > >> So just to confirm this behavior, having run your sample on a couple of >> my machines, what you see is that the code looks like it gets all the way >> to the end, and then it stalls right before it is about to exit, leaving >> some small number of processes (here names python tables_test.py) in the >> OS. Is this correct? >> > > More or less. What's really happening is that if your processor pool has N > processes, then each time one of the workers hangs the pool will have N-1 > processes running thereafter. Eventually when all the tasks have completed > (or all workers are hung, something that has happened to me when processing > many tasks), the main process will just block waiting for the hung > processes. > > If you're running Linux, when the test is finished and the main process is > still waiting on the hung processes, you can just kill the main process. > The orphaned processes that are still there afterward are the ones of > interest. > > >> It seems to be the case that these failures do not happen when I set the >> processor pool size to be less than or equal to the number of processors >> (physical or hyperthreaded) that I have on the machine. I was testing this >> both on an 32 proc cluster and my dual core laptop. Is this also >> the behavior you have seen? >> > > No, I've never noticed that to be the case. It appears that the greater > the true parallelism (ie - physical cores on which there are workers > executing in parallel) the greater the odds of there being a hang. I don't > have any real proof of this though; as with most concurrency bugs, it's > tough to be certain of anything. > > Regards, > Owen > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
On 10 October 2012 20:08, Anthony Scopatz wrote: > So just to confirm this behavior, having run your sample on a couple of my > machines, what you see is that the code looks like it gets all the way to > the end, and then it stalls right before it is about to exit, leaving some > small number of processes (here names python tables_test.py) in the OS. Is > this correct? > More or less. What's really happening is that if your processor pool has N processes, then each time one of the workers hangs the pool will have N-1 processes running thereafter. Eventually when all the tasks have completed (or all workers are hung, something that has happened to me when processing many tasks), the main process will just block waiting for the hung processes. If you're running Linux, when the test is finished and the main process is still waiting on the hung processes, you can just kill the main process. The orphaned processes that are still there afterward are the ones of interest. > It seems to be the case that these failures do not happen when I set the > processor pool size to be less than or equal to the number of processors > (physical or hyperthreaded) that I have on the machine. I was testing this > both on an 32 proc cluster and my dual core laptop. Is this also > the behavior you have seen? > No, I've never noticed that to be the case. It appears that the greater the true parallelism (ie - physical cores on which there are workers executing in parallel) the greater the odds of there being a hang. I don't have any real proof of this though; as with most concurrency bugs, it's tough to be certain of anything. Regards, Owen -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Owen, So just to confirm this behavior, having run your sample on a couple of my machines, what you see is that the code looks like it gets all the way to the end, and then it stalls right before it is about to exit, leaving some small number of processes (here names python tables_test.py) in the OS. Is this correct? It seems to be the case that these failures do not happen when I set the processor pool size to be less than or equal to the number of processors (physical or hyperthreaded) that I have on the machine. I was testing this both on an 32 proc cluster and my dual core laptop. Is this also the behavior you have seen? Be Well Anthony On Tue, Oct 9, 2012 at 8:08 AM, Owen Mackwood wrote: > Hi Anthony, > > I've created a reduced example which reproduces the error. I suppose the > more processes you can run in parallel the more likely it is you'll see the > hang. On a machine with 8 cores, I see 5-6 processes hang out of 2000. > > All of the hung tasks had a call stack that looked like this: > > #0 0x7fc8ecfd01fc in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/libpthread.so.0 > #1 0x7fc8ebd9d215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 > #2 0x7fc8ebaacff0 in H5open () from /usr/lib/libhdf5.so.6 > #3 0x7fc8e224c6a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new > (__pyx_v_self=0x28b35a0, __pyx_args=, > __pyx_kwds=) at tables/hdf5Extension.c:2820 > #4 0x004abf62 in ext_do_call (f=0x271f4c0, throwflag= optimized out>) at Python/ceval.c:4331 > #5 PyEval_EvalFrameEx (f=0x271f4c0, throwflag=) at > Python/ceval.c:2705 > #6 0x004ada51 in PyEval_EvalCodeEx (co=0x247aeb0, globals= optimized out>, locals=, args=0x288cea0, argcount=0, > kws=, kwcount=0, > defs=0x25ffd78, defcount=4, closure=0x0) at Python/ceval.c:3253 > > I've attached the code to reproduce this. It probably isn't quite minimal, > but it is reasonably simple (and stereotypical of the kind of operations I > use). Let me know if you need anything else, or have questions about my > code. > > Regards, > Owen > > > > On 8 October 2012 17:37, Anthony Scopatz wrote: > >> Hello Owen, >> >> So __getitem__() calls read() on the items it needs. Both should return >> a copy in-memory of the data that is on disk. >> >> Frankly, I am not really sure what is going on, given what you have said. >> A minimal example which reproduces the error would be really helpful. >> From the error that you have provided, though, the only thing that I can >> think of is that it is related to file opening on the worker processes. >> >> Be Well >> Anthony >> >> > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Anthony, I've created a reduced example which reproduces the error. I suppose the more processes you can run in parallel the more likely it is you'll see the hang. On a machine with 8 cores, I see 5-6 processes hang out of 2000. All of the hung tasks had a call stack that looked like this: #0 0x7fc8ecfd01fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x7fc8ebd9d215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 #2 0x7fc8ebaacff0 in H5open () from /usr/lib/libhdf5.so.6 #3 0x7fc8e224c6a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new (__pyx_v_self=0x28b35a0, __pyx_args=, __pyx_kwds=) at tables/hdf5Extension.c:2820 #4 0x004abf62 in ext_do_call (f=0x271f4c0, throwflag=) at Python/ceval.c:4331 #5 PyEval_EvalFrameEx (f=0x271f4c0, throwflag=) at Python/ceval.c:2705 #6 0x004ada51 in PyEval_EvalCodeEx (co=0x247aeb0, globals=, locals=, args=0x288cea0, argcount=0, kws=, kwcount=0, defs=0x25ffd78, defcount=4, closure=0x0) at Python/ceval.c:3253 I've attached the code to reproduce this. It probably isn't quite minimal, but it is reasonably simple (and stereotypical of the kind of operations I use). Let me know if you need anything else, or have questions about my code. Regards, Owen On 8 October 2012 17:37, Anthony Scopatz wrote: > Hello Owen, > > So __getitem__() calls read() on the items it needs. Both should return a > copy in-memory of the data that is on disk. > > Frankly, I am not really sure what is going on, given what you have said. > A minimal example which reproduces the error would be really helpful. > From the error that you have provided, though, the only thing that I can > think of is that it is related to file opening on the worker processes. > > Be Well > Anthony > > tables_test.tar.gz Description: GNU Zip compressed data -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
On Mon, Oct 8, 2012 at 11:19 AM, Owen Mackwood wrote: > Hi Anthony, > > On 8 October 2012 15:54, Anthony Scopatz wrote: > >> Hmmm, Are you actually copying the data (f.root.data[:]) or are you >> simply passing a reference as arguments (f.root.data)? >> > > I call f.root.data.read() on any arrays to load them into the process > target args dictionary. I had assumed this returns a copy of the data. The > documentation doesn't specify which, or even if there is any difference > from __getitem__. > > So if you are opening a file in the master process and then >> writing/creating/flushing from the workers this may cause a problem. >> Multiprocess creates a fork of the original process so you are relying on >> the file handle from the master process to not accidentally change somehow. >> Can you try to open the files in the workers rather than the master? I >> hope that this clears up the issue. >> > > I am not accessing the master file from the worker processes. At least not > by design, though as you say some kind of strange behaviour could be > arising due to the copy-on-fork of Linux. In principle, each process has > its own file and there is no sharing of files between processes. > > >> Basically, I am advocating a more conservative approach where all data >> that is read or written to in a worker must come from that worker, rather >> than being generated by the master. If you are *still* experiencing >> these problems, then we know we have a real problem. >> > > I'm being about as conservative as can be with my system. Unless read() > returns a reference to the master file there should be absolutely no > sharing between processes. And even if my args dictionary contains a > reference to the in-memory HDF5 file, how could reading it possibly trigger > a call to openFile? > > Can you clarify the semantics of read() vs. __getitem__()? Thanks. Hello Owen, So __getitem__() calls read() on the items it needs. Both should return a copy in-memory of the data that is on disk. Frankly, I am not really sure what is going on, given what you have said. A minimal example which reproduces the error would be really helpful. From the error that you have provided, though, the only thing that I can think of is that it is related to file opening on the worker processes. Be Well Anthony > > Regards, > Owen > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Anthony, On 8 October 2012 15:54, Anthony Scopatz wrote: > Hmmm, Are you actually copying the data (f.root.data[:]) or are you > simply passing a reference as arguments (f.root.data)? > I call f.root.data.read() on any arrays to load them into the process target args dictionary. I had assumed this returns a copy of the data. The documentation doesn't specify which, or even if there is any difference from __getitem__. So if you are opening a file in the master process and then > writing/creating/flushing from the workers this may cause a problem. > Multiprocess creates a fork of the original process so you are relying on > the file handle from the master process to not accidentally change somehow. > Can you try to open the files in the workers rather than the master? I > hope that this clears up the issue. > I am not accessing the master file from the worker processes. At least not by design, though as you say some kind of strange behaviour could be arising due to the copy-on-fork of Linux. In principle, each process has its own file and there is no sharing of files between processes. > Basically, I am advocating a more conservative approach where all data > that is read or written to in a worker must come from that worker, rather > than being generated by the master. If you are *still* experiencing > these problems, then we know we have a real problem. > I'm being about as conservative as can be with my system. Unless read() returns a reference to the master file there should be absolutely no sharing between processes. And even if my args dictionary contains a reference to the in-memory HDF5 file, how could reading it possibly trigger a call to openFile? Can you clarify the semantics of read() vs. __getitem__()? Thanks. Regards, Owen -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
On Mon, Oct 8, 2012 at 5:13 AM, Owen Mackwood wrote: > Hi Anthony, > > There is a single multiprocessing.Pool which usually has 6-8 processes, > each of which is used to run a single task, after which a new process is > created for the next task (maxtasksperchild=1 for the Pool constructor). > There is a master process that regularly opens an HDF5 file to read out > information for the worker processes (data that gets copied into a > dictionary and passed as args to the worker's target function). There are > no problems with the master process, it never hangs. > Hello Owen, Hmmm, Are you actually copying the data (f.root.data[:]) or are you simply passing a reference as arguments (f.root.data)? > The failure appears to be random, affecting less than 2% of my tasks (all > tasks are highly similar and should call the same tables functions in the > same order). This is running on Debian Squeeze, Python 2.7.3, PyTables > 2.4.0. As far as the particular function that hangs... tough to say since I > haven't yet been able to properly debug the issue. The interpreter hangs > which limits my ability to diagnose the source of the problem. I call a > number of functions in the tables module from the worker process, including > openFile, createVLArray, createCArray, createGroup, flush, and of course > close. > So if you are opening a file in the master process and then writing/creating/flushing from the workers this may cause a problem. Multiprocess creates a fork of the original process so you are relying on the file handle from the master process to not accidentally change somehow. Can you try to open the files in the workers rather than the master? I hope that this clears up the issue. Basically, I am advocating a more conservative approach where all data that is read or written to in a worker must come from that worker, rather than being generated by the master. If you are *still* experiencing these problems, then we know we have a real problem. Also if this doesn't fix it, if you could send us a small sample module which reproduces this issue, that would be great too! Be Well Anthony > > I'll continue to try and find out more about when and how the hang occurs. > I have to rebuild Python to allow the gdb pystack macro to work. If you > have any suggestions for me, I'd love to hear them. > > Regards, > Owen > > > On 7 October 2012 00:28, Anthony Scopatz wrote: > >> Hi Owen, >> >> How many pools do you have? Is this a random runtime failure? What kind >> of system is this one? Is there some particular fucntion in Python that >> you are running? (It seems to be openFile(), but I can't be sure...) The >> error is definitely happening down in the H5open() routine. Now whether >> this is HDF5's fault or ours, I am not yet sure. >> >> Be Well >> Anthony >> >> > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Anthony, There is a single multiprocessing.Pool which usually has 6-8 processes, each of which is used to run a single task, after which a new process is created for the next task (maxtasksperchild=1 for the Pool constructor). There is a master process that regularly opens an HDF5 file to read out information for the worker processes (data that gets copied into a dictionary and passed as args to the worker's target function). There are no problems with the master process, it never hangs. The failure appears to be random, affecting less than 2% of my tasks (all tasks are highly similar and should call the same tables functions in the same order). This is running on Debian Squeeze, Python 2.7.3, PyTables 2.4.0. As far as the particular function that hangs... tough to say since I haven't yet been able to properly debug the issue. The interpreter hangs which limits my ability to diagnose the source of the problem. I call a number of functions in the tables module from the worker process, including openFile, createVLArray, createCArray, createGroup, flush, and of course close. I'll continue to try and find out more about when and how the hang occurs. I have to rebuild Python to allow the gdb pystack macro to work. If you have any suggestions for me, I'd love to hear them. Regards, Owen On 7 October 2012 00:28, Anthony Scopatz wrote: > Hi Owen, > > How many pools do you have? Is this a random runtime failure? What kind > of system is this one? Is there some particular fucntion in Python that > you are running? (It seems to be openFile(), but I can't be sure...) The > error is definitely happening down in the H5open() routine. Now whether > this is HDF5's fault or ours, I am not yet sure. > > Be Well > Anthony > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Owen, How many pools do you have? Is this a random runtime failure? What kind of system is this one? Is there some particular fucntion in Python that you are running? (It seems to be openFile(), but I can't be sure...) The error is definitely happening down in the H5open() routine. Now whether this is HDF5's fault or ours, I am not yet sure. Be Well Anthony On Sat, Oct 6, 2012 at 4:56 AM, Owen Mackwood wrote: > Hi Anthony, > > I'm not trying to write in parallel. Each worker process has its own file > to write to. After all tasks are completed, I collect the results in the > master process. So the problem I'm seeing (a hang in the worker process) > shouldn't have anything to do with parallel writes. Do you have any other > suggestions? > > Regards, > Owen > > On 5 October 2012 18:38, Anthony Scopatz wrote: > >> Hello Owen, >> >> While you can use process pools to read from a file in parallel just >> fine, writing is another story completely. While HDF5 itself supports >> parallel writing though MPI, this comes at the high cost of compression no >> longer being available and a much more complicated code base. So for the >> time being, PyTables only supports the serial HDF5 library. >> >> Therefore if you want to write to a file in parallel, you adopt a >> strategy where you have one process which is responsible for all of the >> writing and all other processes send their data to this process instead of >> writing to file directly. This is a very effective way >> of accomplishing basically what you need. In fact, we have an example to >> do just that [1]. (As a side note: HDF5 may soon be adding an API for >> exactly this pattern because it comes up so often.) >> >> So if I were you, I would look at [1] and adopt it to my use case. >> >> Be Well >> Anthony >> >> 1. >> https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py >> >> On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood < >> owen.mackw...@bccn-berlin.de> wrote: >> >>> Hello, >>> >>> I'm using a multiprocessing.Pool to parallelize a set of tasks which >>> record their results into separate hdf5 files. Occasionally (less than 2% >>> of the time) the worker process will hang. According to gdb, the problem >>> occurs while opening the hdf5 file, when it attempts to obtain the >>> associated mutex. Here's part of the backtrace: >>> >>> #0 0x7fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /lib/libpthread.so.0 >>> #1 0x7fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 >>> #2 0x7fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6 >>> #3 0x7fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new >>> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=, >>> __pyx_kwds=) >>> at tables/hdf5Extension.c:2820 >>> #4 0x004abf62 in ext_do_call (f=0x4cb2430, throwflag=>> optimized out>) at Python/ceval.c:4331 >>> >>> Nothing else is trying to open this file, so can someone suggest why >>> this is occurring? This is a very annoying problem as there is no way to >>> recover from this error, and consequently the worker process is permanently >>> occupied, which effectively removes one of my processors from the pool. >>> >>> Regards, >>> Owen Mackwood >>> >>> >>> -- >>> Don't let slow site performance ruin your business. Deploy New Relic APM >>> Deploy New Relic app performance management and know exactly >>> what is happening inside your Ruby, Python, PHP, Java, and .NET app >>> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>> http://p.sf.net/sfu/newrelic-dev2dev >>> ___ >>> Pytables-users mailing list >>> Pytables-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> -- >> Don't let slow site performance ruin your business. Deploy New Relic APM >> Deploy New Relic app performance management and know exactly >> what is happening inside your Ruby, Python, PHP, Java, and .NET app >> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >> http://p.sf.net/sfu/newrelic-dev2dev >> ___ >> Pytables-users mailing list >> Pytables-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hi Anthony, I'm not trying to write in parallel. Each worker process has its own file to write to. After all tasks are completed, I collect the results in the master process. So the problem I'm seeing (a hang in the worker process) shouldn't have anything to do with parallel writes. Do you have any other suggestions? Regards, Owen On 5 October 2012 18:38, Anthony Scopatz wrote: > Hello Owen, > > While you can use process pools to read from a file in parallel just fine, > writing is another story completely. While HDF5 itself supports parallel > writing though MPI, this comes at the high cost of compression no longer > being available and a much more complicated code base. So for the time > being, PyTables only supports the serial HDF5 library. > > Therefore if you want to write to a file in parallel, you adopt a strategy > where you have one process which is responsible for all of the writing and > all other processes send their data to this process instead of writing to > file directly. This is a very effective way of accomplishing basically > what you need. In fact, we have an example to do just that [1]. (As a > side note: HDF5 may soon be adding an API for exactly this pattern because > it comes up so often.) > > So if I were you, I would look at [1] and adopt it to my use case. > > Be Well > Anthony > > 1. > https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py > > On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood < > owen.mackw...@bccn-berlin.de> wrote: > >> Hello, >> >> I'm using a multiprocessing.Pool to parallelize a set of tasks which >> record their results into separate hdf5 files. Occasionally (less than 2% >> of the time) the worker process will hang. According to gdb, the problem >> occurs while opening the hdf5 file, when it attempts to obtain the >> associated mutex. Here's part of the backtrace: >> >> #0 0x7fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from >> /lib/libpthread.so.0 >> #1 0x7fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 >> #2 0x7fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6 >> #3 0x7fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new >> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=, >> __pyx_kwds=) >> at tables/hdf5Extension.c:2820 >> #4 0x004abf62 in ext_do_call (f=0x4cb2430, throwflag=> optimized out>) at Python/ceval.c:4331 >> >> Nothing else is trying to open this file, so can someone suggest why this >> is occurring? This is a very annoying problem as there is no way to recover >> from this error, and consequently the worker process is permanently >> occupied, which effectively removes one of my processors from the pool. >> >> Regards, >> Owen Mackwood >> >> >> -- >> Don't let slow site performance ruin your business. Deploy New Relic APM >> Deploy New Relic app performance management and know exactly >> what is happening inside your Ruby, Python, PHP, Java, and .NET app >> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >> http://p.sf.net/sfu/newrelic-dev2dev >> ___ >> Pytables-users mailing list >> Pytables-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables hangs while opening file in worker process
Hello Owen, While you can use process pools to read from a file in parallel just fine, writing is another story completely. While HDF5 itself supports parallel writing though MPI, this comes at the high cost of compression no longer being available and a much more complicated code base. So for the time being, PyTables only supports the serial HDF5 library. Therefore if you want to write to a file in parallel, you adopt a strategy where you have one process which is responsible for all of the writing and all other processes send their data to this process instead of writing to file directly. This is a very effective way of accomplishing basically what you need. In fact, we have an example to do just that [1]. (As a side note: HDF5 may soon be adding an API for exactly this pattern because it comes up so often.) So if I were you, I would look at [1] and adopt it to my use case. Be Well Anthony 1. https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood wrote: > Hello, > > I'm using a multiprocessing.Pool to parallelize a set of tasks which > record their results into separate hdf5 files. Occasionally (less than 2% > of the time) the worker process will hang. According to gdb, the problem > occurs while opening the hdf5 file, when it attempts to obtain the > associated mutex. Here's part of the backtrace: > > #0 0x7fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/libpthread.so.0 > #1 0x7fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 > #2 0x7fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6 > #3 0x7fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new > (__pyx_v_self=0x7fb2b04867d0, __pyx_args=, > __pyx_kwds=) > at tables/hdf5Extension.c:2820 > #4 0x004abf62 in ext_do_call (f=0x4cb2430, throwflag= optimized out>) at Python/ceval.c:4331 > > Nothing else is trying to open this file, so can someone suggest why this > is occurring? This is a very annoying problem as there is no way to recover > from this error, and consequently the worker process is permanently > occupied, which effectively removes one of my processors from the pool. > > Regards, > Owen Mackwood > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] PyTables hangs while opening file in worker process
Hello, I'm using a multiprocessing.Pool to parallelize a set of tasks which record their results into separate hdf5 files. Occasionally (less than 2% of the time) the worker process will hang. According to gdb, the problem occurs while opening the hdf5 file, when it attempts to obtain the associated mutex. Here's part of the backtrace: #0 0x7fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x7fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 #2 0x7fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6 #3 0x7fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new (__pyx_v_self=0x7fb2b04867d0, __pyx_args=, __pyx_kwds=) at tables/hdf5Extension.c:2820 #4 0x004abf62 in ext_do_call (f=0x4cb2430, throwflag=) at Python/ceval.c:4331 Nothing else is trying to open this file, so can someone suggest why this is occurring? This is a very annoying problem as there is no way to recover from this error, and consequently the worker process is permanently occupied, which effectively removes one of my processors from the pool. Regards, Owen Mackwood -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users