Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Antonio Valentino
Hi list,

Il 05/06/2013 00:38, Anthony Scopatz ha scritto:
> On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan  wrote:
>
>> I think I've seen this in the release notes of 3.0. This is actually
>> something that I'm looking into as well. So any experience/feedback about
>> creating files in memory would be much appreciated.
>>
>
> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't
> ever used this personally, but it would be great to have an example script,
> if someone wants to write one ;)
>
> Be Well
> Anthony
>
> 1.
> http://pytables.github.io/usersguide/parameter_files.html#hdf5-driver-management
>


thare is also a small example of usage in the cookbook [1]


[1] http://pytables.github.io/cookbook/inmemory_hdf5_files.html


ciao

-- 
Antonio Valentino

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Antonio Valentino
Hi Tim,

Il 05/06/2013 03:29, Tim Burgess ha scritto:
> I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an
> example based on what I was doing.
> I looked over the docs and it does mention that there is an option to throw 
> away
> the 'file' rather than write it to disk.

Please see the DRIVER_CORE_BACKING_STORE parameter [1]

[1] 
http://pytables.github.io/usersguide/parameter_files.html#tables.parameters.DRIVER_CORE_BACKING_STORE


regards

> Not sure how to do that and can't actually think of a use case where I would
> want to :-)
>
> And be wary, it is H5FD_CORE.
>
>
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
>>
>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't
>> ever used this personally, but it would be great to have an example script, 
>> if
>> someone wants to write one ;)
>>
>
> import numpy as np
> import tables
>
> CHUNKY = 30
> CHUNKX = 8640
>
> if __name__ == '__main__':
>
>   # create dataset and add global attrs
>
>   file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>
>   with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
>   # dummy some data
>   lats = np.empty([4320])
>   lons = np.empty([8640])
>
>   # create some simple arrays
>   lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>   lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>
>   # create a 365 x 4320 x 8640 CArray of 32bit float
>   shape = (365, 4320, 8640)
>   atom = tables.Float32Atom(dflt=np.nan)
>
>   # chunk into daily slices and then further chunk days
>   sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
>
>   # dummy up an ndarray
>   sst = np.empty([4320, 8640], dtype=np.float32)
>   sst.fill(30.0)
>
>   # write ndarray to a 2D plane in the HDF5
>   sst_node[0] = sst
>


-- 
Antonio Valentino

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Seref Arikan
You would be suprised to see how convenient HDF5 can be in small scale data
:) There are cases where one may need to use binary serialization of a few
thousand items, but still needing metadata, indexing and other nice
features provided by HDF5/pyTables.




On Wed, Jun 5, 2013 at 2:29 AM, Tim Burgess  wrote:

> I was playing around with in-memory HDF5 prior to the 3.0 release. Here's
> an example based on what I was doing.
> I looked over the docs and it does mention that there is an option to
> throw away the 'file' rather than write it to disk.
> Not sure how to do that and can't actually think of a use case where I
> would want to :-)
>
> And be wary, it is H5FD_CORE.
>
>
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
>
>
> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
> haven't ever used this personally, but it would be great to have an example
> script, if someone wants to write one ;)
>
>
>
> import numpy as np
> import tables
>
> CHUNKY = 30
> CHUNKX = 8640
>
> if __name__ == '__main__':
>
> # create dataset and add global attrs
>
> file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>
> with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
>
> # dummy some data
> lats = np.empty([4320])
> lons = np.empty([8640])
>
> # create some simple arrays
> lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
> lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>
> # create a 365 x 4320 x 8640 CArray of 32bit float
> shape = (365, 4320, 8640)
> atom = tables.Float32Atom(dflt=np.nan)
>
> # chunk into daily slices and then further chunk days
> sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
>
> # dummy up an ndarray
> sst = np.empty([4320, 8640], dtype=np.float32)
> sst.fill(30.0)
>
> # write ndarray to a 2D plane in the HDF5
> sst_node[0] = sst
>
>
>
> --
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 03:29, Tim Burgess wrote:
> I was playing around with in-memory HDF5 prior to the 3.0 release.
> Here's an example based on what I was doing.
> I looked over the docs and it does mention that there is an option to
> throw away the 'file' rather than write it to disk.
> Not sure how to do that and can't actually think of a use case where I
> would want to :-)
> 
> And be wary, it is H5FD_CORE.
> 
> 
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
>>
>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>> haven't ever used this personally, but it would be great to have an
>> example script, if someone wants to write one ;)
>>
>  
> 
> import numpy as np
> import tables
> 
> CHUNKY = 30 
> CHUNKX = 8640
> 
> if __name__ == '__main__':
> 
> # create dataset and add global attrs
> 
> file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
> 
> with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
> 
> # dummy some data
> lats = np.empty([4320])
> lons = np.empty([8640])
> 
> # create some simple arrays
> lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
> lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
> 
> # create a 365 x 4320 x 8640 CArray of 32bit float
> shape = (365, 4320, 8640)
> atom = tables.Float32Atom(dflt=np.nan)
> 
> # chunk into daily slices and then further chunk days
> sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
> 
> # dummy up an ndarray
> sst = np.empty([4320, 8640], dtype=np.float32)
> sst.fill(30.0)
> 
> # write ndarray to a 2D plane in the HDF5
> sst_node[0] = sst

Thanks Tim,

I adapted your example for my use case (I'm using the EArray class,
because I need to continuously update my database), and it works well.

However, when I use this with my own data (but also creating the arrays
like you did), I'm running into errors like "Could not wait on barrier".
It seems like the HDF library is spawing several threads.

Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
runtime?

Cheers, Andreas.


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 09:15, Seref Arikan wrote:
> You would be suprised to see how convenient HDF5 can be in small scale
> data :) There are cases where one may need to use binary serialization
> of a few thousand items, but still needing metadata, indexing and other
> nice features provided by HDF5/pyTables. 

You're right, Seref! That's why I wrote a small little script which
supports saving the script which generates the H5 file to the H5 file
itself, in a file_node. That way, if you have the data file, you can
always see what you did to create it :)

You can find the script here:

   https://github.com/andreas-h/pyrepsci

It's not cleaned up, but does the job. Currently, it works only via
pandas, but when I find the time I'll make it more general. Maybe you
find this useful.

-- Andreas.

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Andreas Hilboll
On 05.06.2013 10:31, Andreas Hilboll wrote:
> On 05.06.2013 03:29, Tim Burgess wrote:
>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>> Here's an example based on what I was doing.
>> I looked over the docs and it does mention that there is an option to
>> throw away the 'file' rather than write it to disk.
>> Not sure how to do that and can't actually think of a use case where I
>> would want to :-)
>>
>> And be wary, it is H5FD_CORE.
>>
>>
>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
>>>
>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>>> haven't ever used this personally, but it would be great to have an
>>> example script, if someone wants to write one ;)
>>>
>>  
>>
>> import numpy as np
>> import tables
>>
>> CHUNKY = 30 
>> CHUNKX = 8640
>>
>> if __name__ == '__main__':
>>
>> # create dataset and add global attrs
>>
>> file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>
>> with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>> example', driver='H5FD_CORE') as h5f:
>> 
>> # dummy some data
>> lats = np.empty([4320])
>> lons = np.empty([8640])
>>
>> # create some simple arrays
>> lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>> lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>
>> # create a 365 x 4320 x 8640 CArray of 32bit float
>> shape = (365, 4320, 8640)
>> atom = tables.Float32Atom(dflt=np.nan)
>>
>> # chunk into daily slices and then further chunk days
>> sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>> chunkshape=(1, CHUNKY, CHUNKX))
>>
>> # dummy up an ndarray
>> sst = np.empty([4320, 8640], dtype=np.float32)
>> sst.fill(30.0)
>>
>> # write ndarray to a 2D plane in the HDF5
>> sst_node[0] = sst
> 
> Thanks Tim,
> 
> I adapted your example for my use case (I'm using the EArray class,
> because I need to continuously update my database), and it works well.
> 
> However, when I use this with my own data (but also creating the arrays
> like you did), I'm running into errors like "Could not wait on barrier".
> It seems like the HDF library is spawing several threads.
> 
> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
> runtime?

Update:

When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
seems to work as expected (but a bit on the slow side ...). With
max_blosc_threads=4, the error pops up.

Cheers, Andreas.


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Francesc Alted
On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> On 05.06.2013 10:31, Andreas Hilboll wrote:
>> On 05.06.2013 03:29, Tim Burgess wrote:
>>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>>> Here's an example based on what I was doing.
>>> I looked over the docs and it does mention that there is an option to
>>> throw away the 'file' rather than write it to disk.
>>> Not sure how to do that and can't actually think of a use case where I
>>> would want to :-)
>>>
>>> And be wary, it is H5FD_CORE.
>>>
>>>
>>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
 I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
 haven't ever used this personally, but it would be great to have an
 example script, if someone wants to write one ;)

>>>   
>>>
>>> import numpy as np
>>> import tables
>>>
>>> CHUNKY = 30
>>> CHUNKX = 8640
>>>
>>> if __name__ == '__main__':
>>>
>>>  # create dataset and add global attrs
>>>
>>>  file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>>
>>>  with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>>> example', driver='H5FD_CORE') as h5f:
>>>  
>>>  # dummy some data
>>>  lats = np.empty([4320])
>>>  lons = np.empty([8640])
>>>
>>>  # create some simple arrays
>>>  lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>>  lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>>
>>>  # create a 365 x 4320 x 8640 CArray of 32bit float
>>>  shape = (365, 4320, 8640)
>>>  atom = tables.Float32Atom(dflt=np.nan)
>>>
>>>  # chunk into daily slices and then further chunk days
>>>  sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>>> chunkshape=(1, CHUNKY, CHUNKX))
>>>
>>>  # dummy up an ndarray
>>>  sst = np.empty([4320, 8640], dtype=np.float32)
>>>  sst.fill(30.0)
>>>
>>>  # write ndarray to a 2D plane in the HDF5
>>>  sst_node[0] = sst
>> Thanks Tim,
>>
>> I adapted your example for my use case (I'm using the EArray class,
>> because I need to continuously update my database), and it works well.
>>
>> However, when I use this with my own data (but also creating the arrays
>> like you did), I'm running into errors like "Could not wait on barrier".
>> It seems like the HDF library is spawing several threads.
>>
>> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
>> runtime?
> Update:
>
> When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> seems to work as expected (but a bit on the slow side ...). With
> max_blosc_threads=4, the error pops up.

Hmm, this seems like a bad interaction among threads in numexpr and 
blosc.  I'm not sure why this is triggering because the libraries should 
execute at different times.  Hmm is your app multi-threaded?

Although Blosc has implemented a lock for preventing this situation in 
the latest releases, numexpr still lacks this protection.  As the 
multithreading engine is the same for both packages, it should be 
relatively easy to implement the lock support to numexpr too. Volunteers?

-- 
Francesc Alted


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Francesc Alted
On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> On 05.06.2013 10:31, Andreas Hilboll wrote:
>> On 05.06.2013 03:29, Tim Burgess wrote:
>>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>>> Here's an example based on what I was doing.
>>> I looked over the docs and it does mention that there is an option to
>>> throw away the 'file' rather than write it to disk.
>>> Not sure how to do that and can't actually think of a use case where I
>>> would want to :-)
>>>
>>> And be wary, it is H5FD_CORE.
>>>
>>>
>>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
 I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
 haven't ever used this personally, but it would be great to have an
 example script, if someone wants to write one ;)

>>>   
>>>
>>> import numpy as np
>>> import tables
>>>
>>> CHUNKY = 30
>>> CHUNKX = 8640
>>>
>>> if __name__ == '__main__':
>>>
>>>  # create dataset and add global attrs
>>>
>>>  file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>>
>>>  with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>>> example', driver='H5FD_CORE') as h5f:
>>>  
>>>  # dummy some data
>>>  lats = np.empty([4320])
>>>  lons = np.empty([8640])
>>>
>>>  # create some simple arrays
>>>  lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>>  lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>>
>>>  # create a 365 x 4320 x 8640 CArray of 32bit float
>>>  shape = (365, 4320, 8640)
>>>  atom = tables.Float32Atom(dflt=np.nan)
>>>
>>>  # chunk into daily slices and then further chunk days
>>>  sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>>> chunkshape=(1, CHUNKY, CHUNKX))
>>>
>>>  # dummy up an ndarray
>>>  sst = np.empty([4320, 8640], dtype=np.float32)
>>>  sst.fill(30.0)
>>>
>>>  # write ndarray to a 2D plane in the HDF5
>>>  sst_node[0] = sst
>> Thanks Tim,
>>
>> I adapted your example for my use case (I'm using the EArray class,
>> because I need to continuously update my database), and it works well.
>>
>> However, when I use this with my own data (but also creating the arrays
>> like you did), I'm running into errors like "Could not wait on barrier".
>> It seems like the HDF library is spawing several threads.
>>
>> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
>> runtime?
> Update:
>
> When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> seems to work as expected (but a bit on the slow side ...).

BTW, can you really notice the difference between using 1, 2 or 4 
threads?  Can you show some figures?  Just curious.

-- 
Francesc Alted


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Anthony Scopatz
Thanks Antonio and Tim!

These are great. I think that one of these should definitely make it into
the examples/ dir.

Be Well
Anthony


On Wed, Jun 5, 2013 at 8:10 AM, Francesc Alted  wrote:

> On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> > On 05.06.2013 10:31, Andreas Hilboll wrote:
> >> On 05.06.2013 03:29, Tim Burgess wrote:
> >>> I was playing around with in-memory HDF5 prior to the 3.0 release.
> >>> Here's an example based on what I was doing.
> >>> I looked over the docs and it does mention that there is an option to
> >>> throw away the 'file' rather than write it to disk.
> >>> Not sure how to do that and can't actually think of a use case where I
> >>> would want to :-)
> >>>
> >>> And be wary, it is H5FD_CORE.
> >>>
> >>>
> >>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz 
> wrote:
>  I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>  haven't ever used this personally, but it would be great to have an
>  example script, if someone wants to write one ;)
> 
> >>>
> >>>
> >>> import numpy as np
> >>> import tables
> >>>
> >>> CHUNKY = 30
> >>> CHUNKX = 8640
> >>>
> >>> if __name__ == '__main__':
> >>>
> >>>  # create dataset and add global attrs
> >>>
> >>>  file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
> >>>
> >>>  with tables.open_file(file_path, 'w', title='PyTables HDF5
> In-memory
> >>> example', driver='H5FD_CORE') as h5f:
> >>>
> >>>  # dummy some data
> >>>  lats = np.empty([4320])
> >>>  lons = np.empty([8640])
> >>>
> >>>  # create some simple arrays
> >>>  lat_node = h5f.create_array('/', 'lat', lats,
> title='latitude')
> >>>  lon_node = h5f.create_array('/', 'lon', lons,
> title='longitude')
> >>>
> >>>  # create a 365 x 4320 x 8640 CArray of 32bit float
> >>>  shape = (365, 4320, 8640)
> >>>  atom = tables.Float32Atom(dflt=np.nan)
> >>>
> >>>  # chunk into daily slices and then further chunk days
> >>>  sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> >>> chunkshape=(1, CHUNKY, CHUNKX))
> >>>
> >>>  # dummy up an ndarray
> >>>  sst = np.empty([4320, 8640], dtype=np.float32)
> >>>  sst.fill(30.0)
> >>>
> >>>  # write ndarray to a 2D plane in the HDF5
> >>>  sst_node[0] = sst
> >> Thanks Tim,
> >>
> >> I adapted your example for my use case (I'm using the EArray class,
> >> because I need to continuously update my database), and it works well.
> >>
> >> However, when I use this with my own data (but also creating the arrays
> >> like you did), I'm running into errors like "Could not wait on barrier".
> >> It seems like the HDF library is spawing several threads.
> >>
> >> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
> >> runtime?
> > Update:
> >
> > When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> > seems to work as expected (but a bit on the slow side ...).
>
> BTW, can you really notice the difference between using 1, 2 or 4
> threads?  Can you show some figures?  Just curious.
>
> --
> Francesc Alted
>
>
>
> --
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] pytable 30 - encoding

2013-06-05 Thread Anthony Scopatz
Hi Jeff,

I have made some comments in the issue.  Thanks for investigating this
so thoroughly.

Be Well
Anthony


On Tue, Jun 4, 2013 at 8:16 PM, Jeff Reback  wrote:

> Anthony,
>
> I created an issue with more info
>
> I am not sure if this is a bug, or just a way both ne/pytables treat
> strings that need to touch an encoded value;
>
> I found workaround by specifying the condvars to readWhere. Any more
> thoughts on this?
>
> thanks Jeff
>
>
> https://github.com/PyTables/PyTables/issues/265
>
> I can be reached on my cell (917)971-6387
> *From:* Anthony Scopatz 
> *To:* Jeff Reback 
> *Cc:* Discussion list for PyTables 
> *Sent:* Tuesday, June 4, 2013 6:39 PM
>
> *Subject:* Re: [Pytables-users] pytable 30 - encoding
>
> Hi Jeff,
>
> Hmmm, Could you try doing the same thing on just an in-memory numpy array
> using numexpr.  If this succeeds it tells us that the problem is in
> PyTables, not numexpr.
>
> Be Well
> Anthony
>
>
> On Tue, Jun 4, 2013 at 11:35 AM, Jeff Reback  wrote:
>
> Anthony,
>
> I am using numexpr 2.1 (latest)
>
> this is puzzling; doesn't matter what I pass (bytes or str) , same result?
>
> (column == 'str-2')
> > /mnt/code/arb/test/pytables-3.py(38)()
> -> result = handle.root.test.table.readWhere(selector)
> (Pdb) handle.root.test.table.readWhere(selector)
> *** TypeError: string argument without an encoding
> (Pdb) handle.root.test.table.readWhere(selector.encode(encoding))
> *** TypeError: string argument without an encoding
> (Pdb)
>
>
>*From:* Anthony Scopatz 
> *To:* Jeff Reback ; Discussion list for PyTables <
> pytables-users@lists.sourceforge.net>
> *Sent:* Tuesday, June 4, 2013 12:25 PM
> *Subject:* Re: [Pytables-users] pytable 30 - encoding
>
> Hi Jeff,
>
> Have you also updated numexpr to the most recent version?  The error is
> coming from numexpr not compiling the expression correctly. Also, you might
> try making selector a str, rather than bytes:
>
> selector = "(column == 'str-2')"
>
> rather than
>
> selector = "(column == 'str-2')".encode(encoding)
>
> Be Well
> Anthony
>
>
> On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback  wrote:
>
> anthony,where am I going wrong here?
> #!/usr/local/bin/python3
> import tables
> import numpy as np
> import datetime, time
> encoding = 'UTF-8'
> test_file = 'test_select.h5'
> handle = tables.openFile(test_file, "w")
> node = handle.createGroup(handle.root, 'test')
> table = handle.createTable(node, 'table', dict(
> index = tables.Int64Col(),
> column = tables.StringCol(25),
> values = tables.FloatCol(shape=(3)),
> ))
>
> # add data
> r = table.row
> for i in range(10):
> r['index'] = i
> r['column'] = ("str-%d" % (i % 5)).encode(encoding)
> r['values'] = np.arange(3)
> r.append()
> table.flush()
> handle.close()
> # read
> handle = tables.openFile(test_file,"r")
> result = handle.root.test.table.read()
> print("table data\n")
> print(result)
> # where
> print("\nselector\n")
> selector = "(column == 'str-2')".encode(encoding)
> print(selector)
> result = handle.root.test.table.readWhere(selector)
> print(result)
> and the following out:
>
> [sheep-jreback-/code/arb/test] python3 pytables-3.py
> table data
> [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
> (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
> (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
> (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
> (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
> selector
> b"(column == 'str-2')"
> Traceback (most recent call last):
> File "pytables-3.py", line 37, in 
> result = handle.root.test.table.readWhere(selector)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py",
> line 35, in oldfunc
> return obj(*args, **kwargs)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1522, in read_where
> self._where(condition, condvars, start, stop, step)]
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1484, in _where
> compiled = self._compile_condition(condition, condvars)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1358, in _compile_condition
> compiled = compile_condition(condition, typemap, indexedcols)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py",
> line 419, in compile_condition
> func = NumExpr(expr, signature)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 559, in NumExpr
> precompile(ex, signature, context)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 511, in precompile
> constants_order, constants = getConstants(ast)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> li

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Tim Burgess
On Jun 06, 2013, at 04:19 AM, Anthony Scopatz  wrote:Thanks Antonio and Tim!These are great. I think that one of these should definitely make it into the examples/ dir.Be WellAnthony OK. I have put up a pull request with the code added. https://github.com/PyTables/PyTables/pull/266Cheers, Tim
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Anthony Scopatz
Thanks Tim!  You are the best.  Hopefully I will get to this later tonight.

Be Well
Anthony


On Wed, Jun 5, 2013 at 9:20 PM, Tim Burgess  wrote:

>
>
> On Jun 06, 2013, at 04:19 AM, Anthony Scopatz  wrote:
>
> Thanks Antonio and Tim!
>
> These are great. I think that one of these should definitely make it into
> the examples/ dir.
>
> Be Well
> Anthony
>
>
> OK. I have put up a pull request with the code added.
> https://github.com/PyTables/PyTables/pull/266
>
> Cheers, Tim
>
>
>
> --
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users