Francesc,

thank you very much for your speedy and well explained response! 

I modified the mock-up script I sent originally according to your guidelines 
(lock, open, save and close for each workers) and it seems to be working fine. 
I hope to translate the solution to my real problem successfully as well.

Best,
Marko

On Nov 4, 2010, at 10:03 AM, Francesc Alted wrote:

> A Wednesday 03 November 2010 23:59:38 Marko Budisic escrigué:
>> Dear all,
>> 
>> I am having some trouble with using pytables correctly, and I was
>> hoping for some guidance. I would like to have one central pytables
>> file, containing a VLArray that would be used by several "worker"
>> processes. Each process should perform some computation, and append
>> it as a new row to VLArray. Due to possible sizes of results, it
>> would be difficult to pass results to the main thread for it to
>> store into pytables file.
> [clip]
> 
> What you are trying to achieve is tricky, but fortunately, possible.  
> First, in order to avoid problems with internal caches, you need to 
> lock, open, save and close for *each* worker.  You are not doing this 
> currently.
> 
> Then, you need to respect the "lock, open, save and close" order if you 
> want to ensure that everything goes well.  This example should 
> illustrate the proper sequence:
> 
> #!/usr/bin/env python
> 
> from multiprocessing import Pool
> import fcntl
> import numpy
> import tables
> import os
> 
> def work(i):
>    x = numpy.random.random((6,5000))
>    group = '/group%d/group%d' % (i, i)
>    dataset = 'dataset%d' % i
>    fhandle = os.open('/tmp/output.h5', os.O_RDWR)
>    fcntl.lockf(fhandle, fcntl.LOCK_EX)
>    f = tables.openFile('/tmp/output.h5','a')
>    # moving lockf here instead will cause crashes!
>    arr = f.createArray(group, dataset, x, createparents=True)
>    f.close()
>    os.close(fhandle)
> 
> def main():
>    tables.openFile('/tmp/output.h5','w').close()
>    pool = Pool(processes=8)
>    pool.map(work, range(5000), chunksize=1)
> 
> if __name__ == '__main__':
>    main()
> 
> [please note the use of lockf over an opened filehandle]
> 
> Third, you will need at least PyTables 2.2 in order the above to work.
> 
> You can get more info on this in:
> 
> http://pytables.org/trac/ticket/185
> 
> Hope this helps,
> 
> -- 
> Francesc Alted
> 
> ------------------------------------------------------------------------------
> The Next 800 Companies to Lead America's Growth: New Video Whitepaper
> David G. Thomson, author of the best-selling book "Blueprint to a 
> Billion" shares his insights and actions to help propel your 
> business during the next growth cycle. Listen Now!
> http://p.sf.net/sfu/SAP-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users


------------------------------------------------------------------------------
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a 
Billion" shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to