Re: [Pytables-users] Question about Leaf.remove() method

2013-08-29 Thread Anthony Scopatz
Hello Premal,

This is just how HDF5 works.  When you delete a Leaf the reference to that
node is removed and the space in the file becomes available for future use.
 However, HDF5 will not reduce files, it will only grow them.  Thus new
data could fill in the used space but it doesn't go away.  It just sits
there empty.

If you really want to get rid of this extraneous space you should use the
ptrepack or h5repack command line utilities to create a clean copy of the
file.

Hope this helps.

Be Well
Anthony


On Thu, Aug 29, 2013 at 10:40 AM, Forafo San ppv.g...@gmail.com wrote:

 Hello All,
 I have some data in an HDF5 file that is created with PyTables.
 Occasionally, I update the data by reading in one of the tables and adding
 or deleting rows.  Then, I create a new table containing the updated data,
 give it a random name, and let it reside in the same group where the old
 table resides. I flush the new table, then use the table.remove() (or
 Leaf.remove()) method to delete the old table and table.rename() method to
 rename the randomly-named new table to the same name as the old table.

 Problem:
 In a small sized table, the size of the hdf5 file doubles with the above
 process even when no new rows or other modifications are made (let's assume
 that the hdf5 file contains only this table). A ptdump indicates no
 presence of the old table.

 In a medium-sized table, the size of the hdf5 file rises substantially
 (20% or 30%) even when no new rows or columns are added.

 Do I understand the table.remove() right as completely deleting the table?
 Does it leave some residue that I should be aware of?

 All help is appreciated. Thanks,
 Premal


 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!
 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] modifying a table column

2013-08-27 Thread Anthony Scopatz
Hey Sasha,

You probably want to look at the Expr class [1] where you set out to be
the same as the original array.

Be Well
Anthony

1. http://pytables.github.io/usersguide/libref/expr_class.html


On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy guziy.sa...@gmail.comwrote:

 Hi All:

 I have a huge table imported from other binary files to hdf, and I forgot
 to multiply the data by a factor in one case. Is there an easy way to
 multiply a column by a constant factor using pytables?
 To modify it in place?

 Thank you

 --
 Sasha


 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!
 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] modifying a table column

2013-08-27 Thread Anthony Scopatz
On Tue, Aug 27, 2013 at 6:50 PM, Oleksandr Huziy guziy.sa...@gmail.comwrote:

 Hi Again:


 2013/8/27 Anthony Scopatz scop...@gmail.com

 Hey Sasha,

 You probably want to look at the Expr class [1] where you set out to be
 the same as the original array.

 Be Well
 Anthony

 1. http://pytables.github.io/usersguide/libref/expr_class.html



 I just wanted to make sure if it is possible to use an assignment in
 expressions? (this gives me a syntax error exception, complains about the
 equal sign in the expression)


Hi Sasha,

Assignment is a statement not an expression, so it is not possible to use
here.  This is why you are getting a syntax error.



 h = tb.open_file(path, mode=a)
 varTable = h.get_node(/, var_name)
 coef = 3 * 60 * 60 #output step
 expr = tb.Expr(c = c * m, uservars = {c: varTable.cols.field, m:
 coef })
 expr.eval()
 varTable.flush()
 h.close()

 Is this an optimal way of multiplying a column? (this one works, but I
 think it loads all the data into memory...right?)

 expr = tb.Expr(c * m, uservars = {c: varTable.cols.field, m:
 coef })
 varTable.cols.field[:] = expr.eval()


You are right that this loads the entire computed array into memory and is
therefore not optimal.  I would do something like the following:

h = tb.open_file(path, mode=a)
varTable = h.get_node(/, var_name)
coef = 3 * 60 * 60 #output step
c = varTable.cols.field
expr = tb.Expr(c = c * m, uservars = {c: c, m: coef })
expr.set_output(c)
expr.eval()
varTable.flush()
h.close()

Be Well
Anthony



 Thank you

 Cheers




  On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy guziy.sa...@gmail.com
  wrote:

  Hi All:

 I have a huge table imported from other binary files to hdf, and I
 forgot to multiply the data by a factor in one case. Is there an easy way
 to multiply a column by a constant factor using pytables?
 To modify it in place?

 Thank you

 --
 Sasha


 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft
 technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!

 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft
 technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!

 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!
 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] modifying a table column

2013-08-27 Thread Anthony Scopatz
Glad I could help!


On Tue, Aug 27, 2013 at 7:44 PM, Oleksandr Huziy guziy.sa...@gmail.comwrote:

 2013/8/27 Anthony Scopatz scop...@gmail.com


 You are right that this loads the entire computed array into memory and
 is therefore not optimal.  I would do something like the following:

 h = tb.open_file(path, mode=a)
 varTable = h.get_node(/, var_name)
 coef = 3 * 60 * 60 #output step
 c = varTable.cols.field
 expr = tb.Expr(c * m, uservars = {c: c, m: coef })
 expr.set_output(c)
 expr.eval()
 varTable.flush()
 h.close()


 Aha, this is cool. Thanks Anthony.

 Cheers
 --
 Sasha



  On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy 
 guziy.sa...@gmail.com wrote:

  Hi All:

 I have a huge table imported from other binary files to hdf, and I
 forgot to multiply the data by a factor in one case. Is there an easy way
 to multiply a column by a constant factor using pytables?
 To modify it in place?

 Thank you

 --
 Sasha


 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft
 technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!

 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft
 technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!

 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft
 technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!

 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft
 technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!

 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!
 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Numpy Arrays to Structure Array or Table

2013-08-08 Thread Anthony Scopatz
Hi David,

I think that you can do what you want in one, rather long line:

hfile.createTable(grp, 'signal', description=np.array(zip(some_func(t, v)),
dtype=[('time', np.float64), ('value', np.float64)]))

Or two nicer lines:

arr = np.array(zip(some_func(t, v)), dtype=[('time', np.float64), ('value',
np.float64)])
hfile.createTable(grp, 'signal', description=arr)

zip() is your friend =).  If zip is too slow and you don't want to make
more than one copy, you could try something like this:

temparr = np.array(some_func(t, v)).T
arr = np.view(temparr, dtype=[('time', np.float64), ('value', np.float64)])

This really only works because both columns have the same dtype.

Of course, you can always keep basically what you have and loop through the
column names programmaticly:

for name, col in zip(A.dtype.names, some_func(t, v)):
A[name] = col

I hope this helps!

Be Well
Anthony

On Wed, Aug 7, 2013 at 5:58 PM, David Reed david.ree...@gmail.com wrote:

 Hi there,

 I have some generic functions that take time series data with 2 numpy
 array arguments, time and value, and return 2 numpy arrays of time and
 value.

 I would like to place these arrays into a Numpy structured array or
 directly into a new pytables table with fields, time and value.

 Now Ive found I could do this:

 t, v = some_func(t, v)

 A = np.empty(len(t), dtype=[('time', np.float64), ('value',
 np.float64)])

 A['time'] = t
 A['value'] = v

 hfile.createTable(grp, 'signal', description=A)
 hfile.flush()

 But this seems rather clunky and inefficient.  Any suggestions to make
 this repackaging a little smoother?




 --
 Get 100% visibility into Java/.NET code with AppDynamics Lite!
 It's a free troubleshooting tool designed for production.
 Get down to code-level detail for bottlenecks, with 2% overhead.
 Download for free and get started troubleshooting in minutes.
 http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with 2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] suitable for storing data like k-v style?

2013-08-07 Thread Anthony Scopatz
Hi Jason,

A key-value store pattern is definitely supported.  However, be forewarned
that groups are implemented using B-trees, not hash tables. However, with
data of your size most of the access time will be in the leaf nodes and not
getting the group.  I'd say try it out and see.

Be Well
Anthony

On Wed, Aug 7, 2013 at 11:33 AM, Xianli Xu xiaolou.c...@gmail.com wrote:

 Hi all,

 I'm developing data processing service and evaluating if Pytable. Since
 hdf5 supports hierarchical data like a tree of folder, can I use such a
 tree-like structure as a K-V store like possibly store million of tables or
 arrays under one group and randomly access any one of them in O(1) time?
 e.g.

 root/
 user_log/
 uid1- table / array, (of tens of thousand rows /
 elements, ETL'ed user log info in int format)
 uid2- table / array,
 uid3- table / array,
 uid4- table / array,
 uid5- table / array,
 …… (perhaps million user)

 Just wondering how the hierarchical structure is implemented and such
 usage pattern is supported? if no, is there any running or better way to
 store such type of information? We adopt Pytables because the data is
 stored in higher density, faster loaded and no ACID / concurrency overhead,
 so traditional DB and no-sql db is not our option..

 Thanks,
 Jason

 --
 Get 100% visibility into Java/.NET code with AppDynamics Lite!
 It's a free troubleshooting tool designed for production.
 Get down to code-level detail for bottlenecks, with 2% overhead.
 Download for free and get started troubleshooting in minutes.
 http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with 2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] dates and space

2013-08-05 Thread Anthony Scopatz
On Mon, Aug 5, 2013 at 1:38 PM, Oleksandr Huziy guziy.sa...@gmail.comwrote:

 Hi Pytables users and developers:

 I have a few questions to which I could not find the answer in the
 documentation. Thank you in advance for any help.

 1. If I store dates in Pytables, does it mean I could write queries like
 table.where('date.month == 5')? Is there a common way to pass from python's
 datetime to pytable's datetime and inversely?


Hello Sasha,

Pytables times are the actual based off of C time, not Python's date times.
 This is because they use the HDF5 time types.  So unfortunately you can't
write queries like the one above.  (You'd need to talk to numexpr about
getting that kind of query implemented ~_~.)

Instead I would suggest that you store your times as Float64Atoms and
Float64Cols and then use arithmetic to figure out the query:

table.where((x / 3600 / 24)%12 == 5)

This is not perfect...


 2. I have several variables stored in the same file in a separate table
 for each variable. And I use separate columns year, month, day, hour,
 minute, second  - to mark the time for a record (the records are not
 necessarily ordered in time) and this is for each variable. I was thinking
 to put all the variables in the same table and put missing values for the
 variables which do not have outputs for a given time step. Is it possible
 to put None as a default value into a table (so I could easily filter dummy
 rows).


It is not possible to use None since that is a Python object of a
different type than the other integers you are trying to stick in the
column.  I would suggest that you use values with no actual meaning.  If
you are using normal ints you can use -1 to represent missing values.  If
you are using unsigned ints you have to pick other values, like 13 for
month on the Julian calendar.


 But then again the data comes in chunks, does this mean I would have to
 check if a row with the same date already exist for a different variable?


No you wouldn't you can store the same data multiple times in different
rows.


 I don't really like the ideas in 2, which are intended to save space, but
 maybe all I need is a good compression level? Can somebody advise me on
 this?


Compression would definitely help here since the date numbers are all
fairly similar.  Probably even a compression level of 1 would work.  Keep
in mind that sometime using compression actually speeds things up (see the
starving CPU problem).  You might just need to experiment with a few
different compression level to see how things go. 0, 1, 5, 9 gives you a
good spread.

Be Well
Anthony





 Cheers
 --
 Oleksandr (Sasha) Huziy


 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Clear chunks from CArray

2013-08-05 Thread Anthony Scopatz
Hello Giovanni,

I think you may need to del that slice and then possibly repack.  Hope this
helps.

Be Well
Anthony


On Mon, Aug 5, 2013 at 2:09 PM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hello all,

 is there a way to clear out a chunk from a CArray? I noticed that setting
 the
 data to zero actually takes disk space, i.e.

 ***
 from tables import open_file, BoolAtom

 h5f = open_file('test.h5', 'w')
 ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000),
 chunkshape=(1,1000))
 ca[:,:] = False
 h5f.close()
 ***

 The resulting file takes 249K ...

 Best,

 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
 ☞ http://cnets.indiana.edu/
 ✉ gciam...@indiana.edu


 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Clear chunks from CArray

2013-08-05 Thread Anthony Scopatz
On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hi Anthony,

 what do you mean precisely? I tried

 del ca[:,:]

 but CArray does not support __delitem__. Looking in the documentation I
 could
 only find a method called remove_rows, but it's in Table, not CArray.
 Maybe I am
 missing something?


Huh, it should...  This is definitely an oversight on our part.  If you
could please open an issue for this -- or better yet -- write a pull
request that implements delitem, that'd be great!

So I think you are right that there is no current way to delete rows from a
CArray.  Oops!  (Of course, I may still be missing something as well).

It looks like EArray also has this problem too, otherwise I would just tell
you to use that.

Be Well
Anthony



 Thank,

 Giovanni

 On Mon 05 Aug 2013 03:43:42 PM EDT,
 pytables-users-requ...@lists.sourceforge.net
 wrote:
 
  Hello Giovanni, I think you may need to del that slice and then possibly
  repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM,
  Giovanni Luca Ciampaglia  glciamp...@gmail.com wrote:
  Hello all,
 
  is there a way to clear out a chunk from a CArray? I noticed that
 setting
  the
  data to zero actually takes disk space, i.e.
 
  ***
  from tables import open_file, BoolAtom
 
  h5f = open_file('test.h5', 'w')
  ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(),
 shape=(1000,1000),
  chunkshape=(1,1000))
  ca[:,:] = False
  h5f.close()
  ***
 
  The resulting file takes 249K ...
 
  Best,
 
  --
  Giovanni Luca Ciampaglia
 
  Postdoctoral fellow
  Center for Complex Networks and Systems Research
  Indiana University
 
  ? 910 E 10th St ? Bloomington ? IN 47408
  ?http://cnets.indiana.edu/
  ?gciam...@indiana.edu
 
 



 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
 ☞ http://cnets.indiana.edu/
 ✉ gciam...@indiana.edu



 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Tables vs Arrays

2013-07-28 Thread Anthony Scopatz
On Sun, Jul 28, 2013 at 8:38 PM, David Reed david.ree...@gmail.com wrote:

 I'm really trying to become more productive using PyTables, but am
 struggling with what I should be using.  Whats the difference between a
 table and an array?


Hi David,

The difference between Arrays and Tables, conceptually is the same as the
different between numpy arrays and numpy structured arrays.  The plain old
[Aa]rray is a continuous block of a single data type.  Tables and
structured arrays have a more complex data type that is composed of a
continuous sequence of other data types (ie the fields / columns).  Which
data structure you use really depends a lot of the type of problem you are
trying to solve and what kinds of questions you want to answer with that
data structure.

That said, the implementation of Tables is far more similar to EArrays than
Arrays.  So a lot of the performance trade offs that you see are similar.

You should watch my HDF5 is for Lovers talk for more generic advice [1].
 I hope this helps!

Be Well
Anthony

1. http://www.youtube.com/watch?v=Nzx0HAd3FiI






 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables and Multiprocessing

2013-07-12 Thread Anthony Scopatz
On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois duboismathieu_g...@yahoo.fr
 wrote:

  Hi Anthony,

 Thank you very much for your answer (it works). I will try to remodel my
 code around this trick but I'm not sure it's possible because I use a
 framework that need arrays.


I think that this method still works.  You can always send back a numpy
array to the main process that you pull out from a subprocess.


 Can somebody explain what is going on? I was thinking that PyTables keep
 weakref to the file for lazy loading but I'm not sure.

 How

 In any case, the PyTables community is very helpful.


Glad to help!

Be Well
Anthony



 Thanks,
 Mathieu

 Le 12/07/2013 00:44, Anthony Scopatz a écrit :

 Hi Mathieu,

  I think you should try opening a new file handle per process.  The
 following works for me on v3.0:

  import tables
 import random
 import multiprocessing

  # Reload the data

  # Use multiprocessing to perform a simple computation (column average)

  def f(filename):
 h5file = tables.openFile(filename, mode='r')
 name = multiprocessing.current_process().name
 column = random.randint(0, 10)
 print '%s use column %i' % (name, column)
 rtn = h5file.root.X[:, column].mean()
 h5file.close()
 return rtn

  p = multiprocessing.Pool(2)
 col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

  Be well
 Anthony


 On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois 
 duboismathieu_g...@yahoo.fr wrote:

  Le 11/07/2013 21:56, Anthony Scopatz a écrit :




 On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois 
 duboismathieu_g...@yahoo.fr wrote:

 Hello,

 I wanted to use PyTables in conjunction with multiprocessing for some
 embarrassingly parallel tasks.

 However, it seems that it is not possible. In the following (very
 stupid) example, X is a Carray of size (100, 10) stored in the file
 test.hdf5:

 import tables

 import multiprocessing

 # Reload the data

 h5file = tables.openFile('test.hdf5', mode='r')

 X = h5file.root.X

 # Use multiprocessing to perform a simple computation (column average)

 def f(X):

  name = multiprocessing.current_process().name

  column = random.randint(0, n_features)

  print '%s use column %i' % (name, column)

  return X[:, column].mean()

 p = multiprocessing.Pool(2)

 col_mean = p.map(f, [X, X, X])

 When executing it the following error:

 Exception in thread Thread-2:

 Traceback (most recent call last):

File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner

  self.run()

File /usr/lib/python2.7/threading.py, line 504, in run

  self.__target(*self.__args, **self.__kwargs)

File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in
 _handle_tasks

  put(task)

 PicklingError: Can't pickle type 'weakref': attribute lookup
 __builtin__.weakref failed


 I have googled for weakref and pickle but can't find a solution.

 Any help?


  Hello Mathieu,

  I have used multiprocessing and files opened in read mode many times so
 I am not sure what is going on here.

  Thanks for your answer. Maybe you can point me to an working example?


   Could you provide the test.hdf5 file so that we could try to reproduce
 this.

  Here is the script that I have used to generate the data:

 import tables

 import numpy

 # Create data  store it

 n_features = 10

 n_obs  = 100

 X = numpy.random.rand(n_obs, n_features)

 h5file = tables.openFile('test.hdf5', mode='w')

 Xatom = tables.Atom.from_dtype(X.dtype)

 Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

 Xhdf5[:] = X

 h5file.close()


 I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
 12.04 (libhdf5 is 1.8.4patch1).




 By the way, I have noticed that by slicing a Carray, I get a numpy array
 (I created the HDF5 file with numpy). Therefore, everything is copied to
 memory. Is there a way to avoid that?


  Only the slice that you ask for is brought into memory an it is
 returned as a non-view numpy array.

  OK. I may be careful about that.



  Be Well
 Anthony



 Mathieu


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!

 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro 
 today!http://pubads.g.doubleclick.net

Re: [Pytables-users] HDF5/PyTables/NumPy Question

2013-07-12 Thread Anthony Scopatz
Hi Robert,

Glad these materials can be helpful.  (Note: these questions really should
be asked on the pytables-users mailing list -- CC'd here -- so please join
that list: https://lists.sourceforge.net/lists/listinfo/pytables-users)

On Fri, Jul 12, 2013 at 12:48 PM, Robert Nelson 
rrnel...@atmos.colostate.edu wrote:

 Dr. Scopatz,

 I came across your SciPy 2012 HDF5 is for lovers video and thought you
 might be able to help me.

 I'm trying to read large (1GB) HDF files and do multidimensional indexing
 (with repeated values) on them. I saw a 
 posthttp://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg02586.htmlof
  yours from over a year ago saying that the best solution would be to
 convert it to a NumPy array but this takes too long.


I think that the strategy is the same as before.  Ask (to the best of my
recollection) did not open an issue and so no changes have been made to
PyTables to handle this.

Also in this strategy, you should only be loading in the indices to start
with.  I doubt (though I could be wrong) that you have 1 Gb worth of index
data alone.  The whole idea here is to do a unique (set) and a sort
operation on the much smaller index data AND THEN use fancy indexing to
pull the actual data back out.

As always some sample code and a sample file would be extremely helpful.  I
don't think I can do much more for you without these.

Be Well
Anthony


 Have there been any updates in PyTables that would make this possible?

 Thank you!

 Robert Nelson
 Colorado State University
 rob.r.nel...@gmail.com
  763-354-8411

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables and Multiprocessing

2013-07-11 Thread Anthony Scopatz
Hi Mathieu,

I think you should try opening a new file handle per process.  The
following works for me on v3.0:

import tables
import random
import multiprocessing

# Reload the data

# Use multiprocessing to perform a simple computation (column average)

def f(filename):
h5file = tables.openFile(filename, mode='r')
name = multiprocessing.current_process().name
column = random.randint(0, 10)
print '%s use column %i' % (name, column)
rtn = h5file.root.X[:, column].mean()
h5file.close()
return rtn

p = multiprocessing.Pool(2)
col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

Be well
Anthony


On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr
 wrote:

  Le 11/07/2013 21:56, Anthony Scopatz a écrit :




 On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois 
 duboismathieu_g...@yahoo.fr wrote:

 Hello,

 I wanted to use PyTables in conjunction with multiprocessing for some
 embarrassingly parallel tasks.

 However, it seems that it is not possible. In the following (very
 stupid) example, X is a Carray of size (100, 10) stored in the file
 test.hdf5:

 import tables

 import multiprocessing

 # Reload the data

 h5file = tables.openFile('test.hdf5', mode='r')

 X = h5file.root.X

 # Use multiprocessing to perform a simple computation (column average)

 def f(X):

  name = multiprocessing.current_process().name

  column = random.randint(0, n_features)

  print '%s use column %i' % (name, column)

  return X[:, column].mean()

 p = multiprocessing.Pool(2)

 col_mean = p.map(f, [X, X, X])

 When executing it the following error:

 Exception in thread Thread-2:

 Traceback (most recent call last):

File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner

  self.run()

File /usr/lib/python2.7/threading.py, line 504, in run

  self.__target(*self.__args, **self.__kwargs)

File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in
 _handle_tasks

  put(task)

 PicklingError: Can't pickle type 'weakref': attribute lookup
 __builtin__.weakref failed


 I have googled for weakref and pickle but can't find a solution.

 Any help?


  Hello Mathieu,

  I have used multiprocessing and files opened in read mode many times so
 I am not sure what is going on here.

 Thanks for your answer. Maybe you can point me to an working example?


   Could you provide the test.hdf5 file so that we could try to reproduce
 this.

 Here is the script that I have used to generate the data:

 import tables

 import numpy

 # Create data  store it

 n_features = 10

 n_obs  = 100

 X = numpy.random.rand(n_obs, n_features)

 h5file = tables.openFile('test.hdf5', mode='w')

 Xatom = tables.Atom.from_dtype(X.dtype)

 Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

 Xhdf5[:] = X

 h5file.close()


 I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
 12.04 (libhdf5 is 1.8.4patch1).




 By the way, I have noticed that by slicing a Carray, I get a numpy array
 (I created the HDF5 file with numpy). Therefore, everything is copied to
 memory. Is there a way to avoid that?


  Only the slice that you ask for is brought into memory an it is returned
 as a non-view numpy array.

 OK. I may be careful about that.



  Be Well
 Anthony



 Mathieu


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!

 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro 
 today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk



 ___
 Pytables-users mailing 
 listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net

Re: [Pytables-users] `__iter__` state and `itertools.islice` when

2013-07-09 Thread Anthony Scopatz
On Tue, Jul 9, 2013 at 8:57 AM, Tony Yu tsy...@gmail.com wrote:




 On Tue, Jul 9, 2013 at 12:58 AM, Antonio Valentino 
 antonio.valent...@tiscali.it wrote:
 snip

 Yes, this is a bug IMO.
 Thank you for reporting and thank you for the small demonstration script.

 Can you please file a bug report on github [1]?
 Please also add info about the PyTables version you used for the test..


 Thanks for you quick reply. Ticket filed here:

 https://github.com/PyTables/PyTables/issues/267


Thanks Tony,

I have made my comments on the issue, but the short version is that I don't
think this is a bug, iteration needs a rewrite, and you should use
iterrows().

Be Well
Anthony

PS you should upgrade to 3.0 and use the new API :)




 Best,
 -Tony




 [1] https://github.com/PyTables/PyTables/issues

 --
 Antonio Valentino



 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Storing large images in PyTable

2013-07-05 Thread Anthony Scopatz
On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted fal...@gmail.com wrote:

 On 7/5/13 1:33 AM, Mathieu Dubois wrote:
  tables.tableExtension.Table._createTable (tables/tableExtension.c:2181)
 
  tables.exceptions.HDF5ExtError: Problems creating the table
 
  I think that the size of the column is too large (if I remove the
  Image
  field, everything works perfectly).
 
 
  Hi Mathieu,
 
  This shouldn't be the case.  What is the value of IMAGE_SIZE?
 
  IMAGE_SIZE is a tuple containing (121, 145, 121).

 This is a bit large for a row in the Table object.  My recommendation
 for these cases is to use an associated EArray with shape (0, 121, 145,
 121) and then append the images there.  You can always refer to the
 image by issuing a __getitem__() operation on the EArray object with the
 index of the row in the table.  Easy as a pie and you will allow the
 compression library (in case you are using compression) to work much
 more efficiently for the table.



Hi Francesc,

I disagree that this shape is too large for a table.  Here is a minimal
example that works for me:

import tables as tb
import numpy as np

images = np.ones(100, dtype=[('id', np.uint16),
 ('image', np.float32, (121, 145, 121))
 ])

with tb.open_file('temp.h5', 'w') as f:
f.create_table('/', 'images', images)

I think that there is something else going on with the initialization but
Mathieu hasn't given us enough information to figure it out =/.  A minimal
failing script would be super helpful here!

(BTW Mathieu, Tables can also take advantage of compression.  Though
Francesc's solution is nicer for a lot of reason too.)

Be Well
Anthony



 HTH,

 -- Francesc Alted


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Storing large images in PyTable

2013-07-05 Thread Anthony Scopatz
Thanks Mathieu!

I am glad this is working for you now.  File this one under Mysterious
Errors of the Universe :).

Be Well
Anthony


On Fri, Jul 5, 2013 at 6:51 PM, Mathieu Dubois
duboismathieu_g...@yahoo.frwrote:

  Hi,

 Sorry for the late response.

 First of all, I have managed to achieve what I wanted to do differently.

 Then the code Francesc send works well (I had to adapt it because I use
 version 2.3.1 under Ubuntu 12.04).

 I was able to reproduce something similar with a class like this (copied 
 pasted from the tutorial):

 import tables as tb

 import numpy as np

 class Subject(tb.IsDescription):

  # Subject information

  Id   = tb.UInt16Col()

  Image= tb.Float32Col(shape=(121, 145, 121))

 h5file = tb.openFile(tutorial1.h5, mode = w, title = Test file)

 group = h5file.createGroup(/, 'subject', 'Suject information')

 table = h5file.createTable(group, 'readout', Subject, Readout example)

 subject = table.row

 for i in xrange(10):

  subject['Id'] = i

  subject['Image'] = np.ones((121, 145, 121))

  subject.append()

 This code works well  too.

 So I don't really know why nothing was working yesterday: this was the
 same class and a very close program. I will try to investigate later on
 this.

 Thanks for everything,
 Mahtieu

 Le 05/07/2013 16:54, Anthony Scopatz a écrit :




 On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted fal...@gmail.com wrote:

 On 7/5/13 1:33 AM, Mathieu Dubois wrote:
  tables.tableExtension.Table._createTable (tables/tableExtension.c:2181)
 
  tables.exceptions.HDF5ExtError: Problems creating the table
 
  I think that the size of the column is too large (if I remove the
  Image
  field, everything works perfectly).
 
 
  Hi Mathieu,
 
  This shouldn't be the case.  What is the value of IMAGE_SIZE?
 
  IMAGE_SIZE is a tuple containing (121, 145, 121).

  This is a bit large for a row in the Table object.  My recommendation
 for these cases is to use an associated EArray with shape (0, 121, 145,
 121) and then append the images there.  You can always refer to the
 image by issuing a __getitem__() operation on the EArray object with the
 index of the row in the table.  Easy as a pie and you will allow the
 compression library (in case you are using compression) to work much
 more efficiently for the table.



  Hi Francesc,

  I disagree that this shape is too large for a table.  Here is a minimal
 example that works for me:

  import tables as tb
 import numpy as np

  images = np.ones(100, dtype=[('id', np.uint16),
  ('image', np.float32, (121, 145, 121))
  ])

  with tb.open_file('temp.h5', 'w') as f:
 f.create_table('/', 'images', images)

  I think that there is something else going on with the initialization
 but Mathieu hasn't given us enough information to figure it out =/.  A
 minimal failing script would be super helpful here!

  (BTW Mathieu, Tables can also take advantage of compression.  Though
 Francesc's solution is nicer for a lot of reason too.)

  Be Well
  Anthony



 HTH,

 -- Francesc Alted


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.
 http://p.sf.net/sfu/windows-dev2dev



 ___
 Pytables-users mailing 
 listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Storing large images in PyTable

2013-07-04 Thread Anthony Scopatz
On Thu, Jul 4, 2013 at 4:13 PM, Mathieu Dubois
duboismathieu_g...@yahoo.frwrote:

 Hello,

 I'm a beginner with Pyable.

 I wanted to store a database in a HDF5 file using PyTable. The DB is
 made by a CSV file (which contains the subject information) and a lot of
 images (I work on MRI so the images are 3 dimensional float32 arrays of
 shape (121, 145, 121)). The relation is very simple: there are a 3
 images per subject.

 My first idea was to create a class  Subject like this:
 class Subject(tables.IsDescription):
  # Subject information
  Id   = tables.UInt16Col()
  ...
  Image= tables.Float32Col(shape=IMAGE_SIZE)

 And the proceed like in the tutorial (open a file, create a group and a
 table associated to the Subject class and then append data to this table).

 Unfortunately I got an error when creating the table (even before
 inserting data):
 HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464:
#000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to
 create dataset
  major: Dataset
  minor: Unable to initialize object
#001: ../../../src/H5Dint.c line 428 in H5D_create_named(): unable to
 create and link to dataset
  major: Dataset
  minor: Unable to initialize object
#002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to
 create new link to object
  major: Links
  minor: Unable to initialize object
#003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't insert
 link
  major: Symbol table
  minor: Unable to insert object
#004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): internal
 path traversal failed
  major: Symbol table
  minor: Object not found
#005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real():
 traversal operator failed
  major: Symbol table
  minor: Callback failed
#006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to create
 object
  major: Object header
  minor: Unable to initialize object
#007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to
 open object
  major: Object header
  minor: Can't open object
#008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to
 create dataset
  major: Dataset
  minor: Unable to initialize object
#009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update
 the metadata cache
  major: Dataset
  minor: Unable to initialize object
#010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): unable
 to update new fill value header message
  major: Dataset
  minor: Unable to initialize object
#011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh():
 unable to create new message in header
  major: Attribute
  minor: Unable to insert object
#012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real():
 unable to create new message
  major: Object header
  minor: No space available for allocation
#013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): unable
 to allocate space for message
  major: Object header
  minor: Unable to initialize object
#014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object header
 message is too large
  major: Object header
  minor: Unable to initialize object
 Traceback (most recent call last):
File 00_build_dataset.tmp.py, line 52, in module
  dump_in_hdf5(**vars(args))
File 00_build_dataset.tmp.py, line 32, in dump_in_hdf5
  data_api.Subject)
File /usr/lib/python2.7/dist-packages/tables/file.py, line 770, in
 createTable
  chunkshape=chunkshape, byteorder=byteorder)
File /usr/lib/python2.7/dist-packages/tables/table.py, line 832, in
 __init__
  byteorder, _log)
File /usr/lib/python2.7/dist-packages/tables/leaf.py, line 291, in
 __init__
  super(Leaf, self).__init__(parentNode, name, _log)
File /usr/lib/python2.7/dist-packages/tables/node.py, line 296, in
 __init__
  self._v_objectID = self._g_create()
File /usr/lib/python2.7/dist-packages/tables/table.py, line 983, in
 _g_create
  self._v_new_title, self.filters.complib or '', obversion )
File tableExtension.pyx, line 195, in
 tables.tableExtension.Table._createTable (tables/tableExtension.c:2181)
 tables.exceptions.HDF5ExtError: Problems creating the table

 I think that the size of the column is too large (if I remove the Image
 field, everything works perfectly).


Hi Mathieu,

This shouldn't be the case.  What is the value of IMAGE_SIZE?

Be Well
Anthony



 Therefore what is the best way to store the images (while keeping the
 relation)? I have read various post about this subject on the web but
 could not find a definitive answer (the more helpful was

 http://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables
 ).

 I was thinking to create an extensible array and store each image in the
 same order than the subject. However, I 

Re: [Pytables-users] writing metadata

2013-06-25 Thread Anthony Scopatz
Also, depending on how much meta data you really needed to store you could
just use attributes.  That is what they are there for.


On Tue, Jun 25, 2013 at 10:06 AM, Josh Ayers josh.ay...@gmail.com wrote:

 Another option is to create a Python object - dict, list, or whatever
 works - containing the metadata and then store a pickled version of it in a
 PyTables array.  It's nice for this sort of thing because you have the full
 flexibility of Python's data containers.

 For example, if the Python object is called 'fit', then
 numpy.frombuffer(pickle.dumps(fit), 'u1') will pickle it and convert the
 result to a NumPy array of unsigned bytes.  It can be stored in a PyTables
 array using a UInt8Atom.  To retrieve the Python object, just use
 pickle.loads(hdf5_file.root.data_1.fit[:]).

 It gets a little more complicated if you want to be able to modify the
 Python object, because the length of the pickle will change.  In that case,
 you can use an EArray (for the case when the pickle grows), and store the
 number of bytes as an attribute.  Storing the number of bytes handles the
 case when the pickle shrinks and doesn't use the full length of the on-disk
 array.  To load it, use
 pickle.loads(hdf5_file.root.data_1.fit[:num_bytes]), where num_bytes is the
 previously stored attribute.  To modify it, just overwrite the array with
 the new version, expanding if necessary, then update the num_bytes
 attribute.

 Using a PyTables VLArray with an 'object' atom uses a similar technique
 under the hood, so that may be easier.  It doesn't allow resizing though.

 Hope that helps,
 Josh



 On Tue, Jun 25, 2013 at 1:33 AM, Andreas Hilboll li...@hilboll.de wrote:

 On 25.06.2013 10:26, Andre' Walker-Loud wrote:
  Dear PyTables users,
 
  I am trying to figure out the best way to write some metadata into some
 files I have.
 
  The hdf5 file looks like
 
  /root/data_1/stat
  /root/data_1/sys
 
  where stat and sys are Arrays containing statistical and systematic
 fluctuations of numerical fits to some data I have.  What I would like to
 do is add another object
 
  /root/data_1/fit
 
  where fit is just a metadata key that describes all the choices I
 made in performing the fit, such as seed for the random number generator,
 and many choices for fitting options, like initial guess values of
 parameters, fitting range, etc.
 
  I began to follow the example in the PyTables manual, in Section 1.2
 The Object Tree, where first a class is defined
 
  class Particle(tables.IsDescription):
identity = tables.StringCol(itemsize=22, dflt= , pos=0)
...
 
  and then this class is used to populate a table.
 
  In my case, I won't have a table, but really just want a single object
 containing my metadata.  I am wondering if there is a recommended way to do
 this?  The Table does not seem optimal, but I don't see what else I would
 use.

 For complex information I'd probably indeed use a table object. It
 doesn't matter if the table only has one row, but still you have all the
 information there nicely structured.

 -- Andreas.



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Speed of CArray writing sparse matrices

2013-06-24 Thread Anthony Scopatz
Hello Giovanni,

Great to hear that everything is working much better for you now and that
everything is much faster and smaller than NPY ;)

Do you know how the default value is set btw?


This is computed via a magical heuristic algorithm written by Francesc (?)
called computechunksize().

This is really optimized for dense data (Tables) so it is
not surprising that in performs poorly in your case.  Any updates you want
to make to PyTables to also handle sparse data well out of the box would be
very welcome ;)

1. https://github.com/PyTables/PyTables/blob/develop/tables/idxutils.py#L54



On Mon, Jun 24, 2013 at 10:51 AM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hi Anthony,

 thanks for the explanation and the links, it's much clearer now. So without
 compression a CArray is really a smarter type of sparse file, but you have
 to
 set a sensible chunk shape. Do you know how the default value is set btw?
 I am
 asking because I didn't see any change in performance from using the
 default
 value and using (1, N), where (N,N) is the shape of the matrix. I guess
 that the
 write performance depends crucially on the size of the I/O buffer, so the
 default must be choosing a similar setting.

 Anyway I have played a bit with other values of the chunk shape in
 conjunction
 with the compression level and using a shape (1,100) and a complevel=5
 gives
 speeds that are only 10-15% slower than what I get at shape=(1,1) and
 complevel=0. The resulting file is 10 times smaller, and something like 35
 times
 smaller than a NPY sparse file, btw!

 Thanks!

 Giovanni

 On 06/24/2013 05:25 AM, pytables-users-request@lists.sourceforge.netwrote:
  Hi Giovanni!
 
  I think that you may have some misunderstanding about how chucking works,
  which is leading you to get terrible performance.  In fact what you
  describe is a great strategy (right all and zip) for using normal Arrays.
 
  However, chunking and CArrays don't work like this.  If a chunk contains
 no
  data, it is not written at all!  Also, all zipping takes place on the
 chunk
  level.  Thus for very small chunks you can actually increase the file
 size
  and access time by using compression.
 
  For sparse matrices and CArrays, you need to play around with the
  chunkshape argument to create_carray()  and compression.  Performance is
  going to be affected how dense the matrix is and how grouped it is.  For
  example, for a very dense and randomly distributed matrix, chunkshape=1
 and
  no compression is best.  For block diagonal matrices, the chunkshape
 should
  be the nominal block shape.  Compression is only useful here if the
 blocks
  all have similar values or the block shape is large.  For example
 
  1 1 0 0 0 0
  1 1 0 0 0 0
  0 0 1 1 0 0
  0 0 1 1 0 0
  0 0 0 0 1 1
  0 0 0 0 1 1
 
  is well suited to a chunkshape=(2, 2)
 
  For more information on the HDF model please see my talk slides and video
[1,2]  I hope this helps.
 
  Be Well
  Anthony
 
  PS. Glad to see you using the new API
 
  1.https://github.com/scopatz/hdf5-is-for-lovers
  2.http://www.youtube.com/watch?v=Nzx0HAd3FiI


 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
 ☞ http://cnets.indiana.edu/
 ✉ gciam...@indiana.edu



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Speed of in-kernel Full-Table Search

2013-06-24 Thread Anthony Scopatz
On Mon, Jun 24, 2013 at 4:25 AM, Wagner Sebastian 
sebastian.wagner...@ait.ac.at wrote:

  Dear PyTables-Users,

 ** **

 For testing purposes I use a PyTables DB with 4 columns (1x Uint8 and
 3xFloat) with 750k rows, the total file size about 90MB. As the free
 version does no support indexing I thought that a search (full-table) on
 this database would last a least one or two seconds, because the file has
 to be loaded first (throttleneck I/O), and then the search over ~20k rows
 can begin. But PyTables took only 0.05 seconds for a full table search
 (in-kernel, so near C-speed, but nevertheless full table), while my
 bisecting algorithm with a precomputed sorted list wrapped around PyTables
 (but saved in there), took about 0.5 seconds.

 ** **

 So the thing I don’t understand: How can PyTables be so fast without any
 Indexing?


Hi Sebastian,

First, there is no longer a non-free version of PyTables and v3.0 *does* have
indexing capabilities.  However, you have to enable them so you probably
weren't using them.

PyTables is fast because HDF5 is a binary format, it using pthreads under
the covers to parallelize some tasks, and it uses numexpr (which is also
parallel) to evaluate many expressions.  All of these things help make
PyTables great!

Be Well
Anthony


 

 ** **

 I’m using 3.0.0rc2 coming with WinPython

 ** **

 Regards,

 Sebastian


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Speed of CArray writing sparse matrices

2013-06-23 Thread Anthony Scopatz
Hi Giovanni!

I think that you may have some misunderstanding about how chucking works,
which is leading you to get terrible performance.  In fact what you
describe is a great strategy (right all and zip) for using normal Arrays.

However, chunking and CArrays don't work like this.  If a chunk contains no
data, it is not written at all!  Also, all zipping takes place on the chunk
level.  Thus for very small chunks you can actually increase the file size
and access time by using compression.

For sparse matrices and CArrays, you need to play around with the
chunkshape argument to create_carray()  and compression.  Performance is
going to be affected how dense the matrix is and how grouped it is.  For
example, for a very dense and randomly distributed matrix, chunkshape=1 and
no compression is best.  For block diagonal matrices, the chunkshape should
be the nominal block shape.  Compression is only useful here if the blocks
all have similar values or the block shape is large.  For example

1 1 0 0 0 0
1 1 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 0 0 1 1
0 0 0 0 1 1

is well suited to a chunkshape=(2, 2)

For more information on the HDF model please see my talk slides and video
:) [1,2]  I hope this helps.

Be Well
Anthony

PS. Glad to see you using the new API ;)

1. https://github.com/scopatz/hdf5-is-for-lovers
2. http://www.youtube.com/watch?v=Nzx0HAd3FiI


On Sat, Jun 22, 2013 at 6:34 PM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hi all,

 I have a sparse 3.4M x 3.4M adjacency matrix with nnz = 23M and wanted
 to see if CArray was an appropriate solution for storing it. Right now I
 am using the NumPy binary format for storing the data in coordinate
 format and loading the matrix with Scipy's sparse coo_matrix class. As
 far as I understand, with CArray the matrix would be written in full
 (zeros included) but a) since it's chunked accessing it does not take
 memory and b) with compression enabled it would possible to keep the
 size of the file reasonable.

 If my assumptions are correct, then here is my problem: I am running
 into problems when writing the CArray to disk. I adapted the example
 from the documentation [1] and when I run the code on a 6000x6000 matrix
 with nnz = 17K I achieve a decent speed of roughly 4100 elements/s.
 However, when I try it on the full matrix the writing speed drops to 4
 elements/s. Am I doing something wrong? Any feedback would be greatly
 appreciated!

 Code: https://gist.github.com/junkieDolphin/5843064

 Cheers,

 Giovanni

 [1]

 http://pytables.github.io/usersguide/libref/homogenous_storage.html#the-carray-class

 --
 Giovanni Luca Ciampaglia

 ☞ http://www.inf.usi.ch/phd/ciampaglia/
 ✆ (812) 287-3471
 ✉ glciamp...@gmail.com



 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] append to multiple tables

2013-06-10 Thread Anthony Scopatz
Hi Ed,

Are you inside of a nested loop?  You probably just need to flush after the
innermost loop.

Do you have some sample code you can share?

Be Well
Anthony


On Mon, Jun 10, 2013 at 1:44 PM, Edward Vogel edwardvog...@gmail.comwrote:

 I have a dataset that I want to split between two tables. But, when I
 iterate over the data and append to both tables, I get a warning:

 /usr/local/lib/python2.7/site-packages/tables/table.py:2967:
 PerformanceWarning: table ``/cv2`` is being preempted from alive nodes
 without its buffers being flushed or with some index being dirty.  This may
 lead to very ineficient use of resources and even to fatal errors in
 certain situations.  Please do a call to the .flush() or .reindex_dirty()
 methods on this table before start using other nodes.

 However, if I flush after every append, I get awful performance.
 Is there a correct way to append to two tables without doing a flush?
 Note, I don't have any indices defined, so it seems reindex_dirty()
 doesn't apply.

 Thanks,
 Ed


 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Anthony Scopatz
Thanks Antonio and Tim!

These are great. I think that one of these should definitely make it into
the examples/ dir.

Be Well
Anthony


On Wed, Jun 5, 2013 at 8:10 AM, Francesc Alted fal...@gmail.com wrote:

 On 6/5/13 11:45 AM, Andreas Hilboll wrote:
  On 05.06.2013 10:31, Andreas Hilboll wrote:
  On 05.06.2013 03:29, Tim Burgess wrote:
  I was playing around with in-memory HDF5 prior to the 3.0 release.
  Here's an example based on what I was doing.
  I looked over the docs and it does mention that there is an option to
  throw away the 'file' rather than write it to disk.
  Not sure how to do that and can't actually think of a use case where I
  would want to :-)
 
  And be wary, it is H5FD_CORE.
 
 
  On Jun 05, 2013, at 08:38 AM, Anthony Scopatz scop...@gmail.com
 wrote:
  I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
  haven't ever used this personally, but it would be great to have an
  example script, if someone wants to write one ;)
 
 
 
  import numpy as np
  import tables
 
  CHUNKY = 30
  CHUNKX = 8640
 
  if __name__ == '__main__':
 
   # create dataset and add global attrs
 
   file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
 
   with tables.open_file(file_path, 'w', title='PyTables HDF5
 In-memory
  example', driver='H5FD_CORE') as h5f:
 
   # dummy some data
   lats = np.empty([4320])
   lons = np.empty([8640])
 
   # create some simple arrays
   lat_node = h5f.create_array('/', 'lat', lats,
 title='latitude')
   lon_node = h5f.create_array('/', 'lon', lons,
 title='longitude')
 
   # create a 365 x 4320 x 8640 CArray of 32bit float
   shape = (365, 4320, 8640)
   atom = tables.Float32Atom(dflt=np.nan)
 
   # chunk into daily slices and then further chunk days
   sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
  chunkshape=(1, CHUNKY, CHUNKX))
 
   # dummy up an ndarray
   sst = np.empty([4320, 8640], dtype=np.float32)
   sst.fill(30.0)
 
   # write ndarray to a 2D plane in the HDF5
   sst_node[0] = sst
  Thanks Tim,
 
  I adapted your example for my use case (I'm using the EArray class,
  because I need to continuously update my database), and it works well.
 
  However, when I use this with my own data (but also creating the arrays
  like you did), I'm running into errors like Could not wait on barrier.
  It seems like the HDF library is spawing several threads.
 
  Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
  runtime?
  Update:
 
  When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
  seems to work as expected (but a bit on the slow side ...).

 BTW, can you really notice the difference between using 1, 2 or 4
 threads?  Can you show some figures?  Just curious.

 --
 Francesc Alted



 --
 How ServiceNow helps IT people transform IT departments:
 1. A cloud service to automate IT design, transition and operations
 2. Dashboards that offer high-level views of enterprise services
 3. A single system of record for all IT processes
 http://p.sf.net/sfu/servicenow-d2d-j
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] pytable 30 - encoding

2013-06-05 Thread Anthony Scopatz
Hi Jeff,

I have made some comments in the issue.  Thanks for investigating this
so thoroughly.

Be Well
Anthony


On Tue, Jun 4, 2013 at 8:16 PM, Jeff Reback jreb...@yahoo.com wrote:

 Anthony,

 I created an issue with more info

 I am not sure if this is a bug, or just a way both ne/pytables treat
 strings that need to touch an encoded value;

 I found workaround by specifying the condvars to readWhere. Any more
 thoughts on this?

 thanks Jeff


 https://github.com/PyTables/PyTables/issues/265

 I can be reached on my cell (917)971-6387
 *From:* Anthony Scopatz scop...@gmail.com
 *To:* Jeff Reback j...@reback.net
 *Cc:* Discussion list for PyTables pytables-users@lists.sourceforge.net
 *Sent:* Tuesday, June 4, 2013 6:39 PM

 *Subject:* Re: [Pytables-users] pytable 30 - encoding

 Hi Jeff,

 Hmmm, Could you try doing the same thing on just an in-memory numpy array
 using numexpr.  If this succeeds it tells us that the problem is in
 PyTables, not numexpr.

 Be Well
 Anthony


 On Tue, Jun 4, 2013 at 11:35 AM, Jeff Reback jreb...@yahoo.com wrote:

 Anthony,

 I am using numexpr 2.1 (latest)

 this is puzzling; doesn't matter what I pass (bytes or str) , same result?

 (column == 'str-2')
  /mnt/code/arb/test/pytables-3.py(38)module()
 - result = handle.root.test.table.readWhere(selector)
 (Pdb) handle.root.test.table.readWhere(selector)
 *** TypeError: string argument without an encoding
 (Pdb) handle.root.test.table.readWhere(selector.encode(encoding))
 *** TypeError: string argument without an encoding
 (Pdb)


*From:* Anthony Scopatz scop...@gmail.com
 *To:* Jeff Reback j...@reback.net; Discussion list for PyTables 
 pytables-users@lists.sourceforge.net
 *Sent:* Tuesday, June 4, 2013 12:25 PM
 *Subject:* Re: [Pytables-users] pytable 30 - encoding

 Hi Jeff,

 Have you also updated numexpr to the most recent version?  The error is
 coming from numexpr not compiling the expression correctly. Also, you might
 try making selector a str, rather than bytes:

 selector = (column == 'str-2')

 rather than

 selector = (column == 'str-2').encode(encoding)

 Be Well
 Anthony


 On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback jreb...@yahoo.com wrote:

 anthony,where am I going wrong here?
 #!/usr/local/bin/python3
 import tables
 import numpy as np
 import datetime, time
 encoding = 'UTF-8'
 test_file = 'test_select.h5'
 handle = tables.openFile(test_file, w)
 node = handle.createGroup(handle.root, 'test')
 table = handle.createTable(node, 'table', dict(
 index = tables.Int64Col(),
 column = tables.StringCol(25),
 values = tables.FloatCol(shape=(3)),
 ))

 # add data
 r = table.row
 for i in range(10):
 r['index'] = i
 r['column'] = (str-%d % (i % 5)).encode(encoding)
 r['values'] = np.arange(3)
 r.append()
 table.flush()
 handle.close()
 # read
 handle = tables.openFile(test_file,r)
 result = handle.root.test.table.read()
 print(table data\n)
 print(result)
 # where
 print(\nselector\n)
 selector = (column == 'str-2').encode(encoding)
 print(selector)
 result = handle.root.test.table.readWhere(selector)
 print(result)
 and the following out:

 [sheep-jreback-/code/arb/test] python3 pytables-3.py
 table data
 [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
 (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
 (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
 (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
 (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
 selector
 b(column == 'str-2')
 Traceback (most recent call last):
 File pytables-3.py, line 37, in module
 result = handle.root.test.table.readWhere(selector)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py,
 line 35, in oldfunc
 return obj(*args, **kwargs)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py,
 line 1522, in read_where
 self._where(condition, condvars, start, stop, step)]
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py,
 line 1484, in _where
 compiled = self._compile_condition(condition, condvars)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py,
 line 1358, in _compile_condition
 compiled = compile_condition(condition, typemap, indexedcols)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py,
 line 419, in compile_condition
 func = NumExpr(expr, signature)
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 559, in NumExpr
 precompile(ex, signature, context)
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 511, in precompile
 constants_order, constants = getConstants(ast)
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 294, in getConstants

Re: [Pytables-users] pytable 30 - encoding

2013-06-04 Thread Anthony Scopatz
Hi Jeff,

Have you also updated numexpr to the most recent version?  The error is
coming from numexpr not compiling the expression correctly. Also, you might
try making selector a str, rather than bytes:

selector = (column == 'str-2')

rather than

selector = (column == 'str-2').encode(encoding)

Be Well
Anthony


On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback jreb...@yahoo.com wrote:

 anthony,where am I going wrong here?
 #!/usr/local/bin/python3
 import tables
 import numpy as np
 import datetime, time
 encoding = 'UTF-8'
 test_file = 'test_select.h5'
 handle = tables.openFile(test_file, w)
 node = handle.createGroup(handle.root, 'test')
 table = handle.createTable(node, 'table', dict(
 index = tables.Int64Col(),
 column = tables.StringCol(25),
 values = tables.FloatCol(shape=(3)),
 ))

 # add data
 r = table.row
 for i in range(10):
 r['index'] = i
 r['column'] = (str-%d % (i % 5)).encode(encoding)
 r['values'] = np.arange(3)
 r.append()
 table.flush()
 handle.close()
 # read
 handle = tables.openFile(test_file,r)
 result = handle.root.test.table.read()
 print(table data\n)
 print(result)
 # where
 print(\nselector\n)
 selector = (column == 'str-2').encode(encoding)
 print(selector)
 result = handle.root.test.table.readWhere(selector)
 print(result)
 and the following out:

 [sheep-jreback-/code/arb/test] python3 pytables-3.py
 table data
 [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
 (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
 (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
 (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
 (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
 selector
 b(column == 'str-2')
 Traceback (most recent call last):
 File pytables-3.py, line 37, in module
 result = handle.root.test.table.readWhere(selector)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py,
 line 35, in oldfunc
 return obj(*args, **kwargs)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py,
 line 1522, in read_where
 self._where(condition, condvars, start, stop, step)]
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py,
 line 1484, in _where
 compiled = self._compile_condition(condition, condvars)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py,
 line 1358, in _compile_condition
 compiled = compile_condition(condition, typemap, indexedcols)
 File
 /usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py,
 line 419, in compile_condition
 func = NumExpr(expr, signature)
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 559, in NumExpr
 precompile(ex, signature, context)
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 511, in precompile
 constants_order, constants = getConstants(ast)
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 294, in getConstants
 for a in constants_order]
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 294, in listcomp
 for a in constants_order]
 File
 /usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py,
 line 284, in convertConstantToKind
 return kind_to_type[kind](x)
 TypeError: string argument without an encoding
 Closing remaining open files: test_select.h5... done


 --
 How ServiceNow helps IT people transform IT departments:
 1. A cloud service to automate IT design, transition and operations
 2. Dashboards that offer high-level views of enterprise services
 3. A single system of record for all IT processes
 http://p.sf.net/sfu/servicenow-d2d-j
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Anthony Scopatz
Hi Andreas,

First off, nothing should be this bad, but

What is the data type of the array?  Also are you selecting chunksize
manually or letting PyTables figure it out?

Here are some things that you can try:

1.  Query with fancy indexing, once.  That is, rather than using a list
comprehension just say, _a[zip(*idx)]

2. set _a.nrowsinbuf [1] to a much smaller value (1, 5, or 10) which is
more appropriate for pulling out individual indexes.

Lastly, it is my opinion that the iteration mechanics are slower than they
can / should be.  I have a bunch of ideas about how to make them faster AND
clean up the code base but I won't have a ton of time to work on them in
the near term.  However, if this is something that you are interested in,
that would be great!  I'd love to help out anyone who was willing to take
this on.

Be Well
Anthony

1.
http://pytables.github.io/usersguide/libref/hierarchy_classes.html#tables.Leaf.nrowsinbuf


On Mon, Jun 3, 2013 at 7:45 AM, Andreas Hilboll li...@hilboll.de wrote:

 On 03.06.2013 14:43, Andreas Hilboll wrote:
  Hi,
 
  I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
  (the last dimension represents time, and once per month there'll be one
  more 5760x2880 array to add to the end).
 
  Now, extracting timeseries at one index location is slow; e.g., for four
  indices, it takes several seconds:
 
 In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
 
 In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
 CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
 Wall time: 7.17 s
 
  I have the feeling that this performance could be improved, but I'm not
  sure about how to properly use the `chunkshape` parameter in my case.
 
  Any help is greatly appreciated :)
 
  Cheers, Andreas.

 PS: If I could get significant performance gains by not using an EArray
 and therefore re-creating the whole database each month, then this would
 also be an option.

 -- Andreas.



 --
 Get 100% visibility into Java/.NET code with AppDynamics Lite
 It's a free troubleshooting tool designed for production
 Get down to code-level detail for bottlenecks, with 2% overhead.
 Download for free and get started troubleshooting in minutes.
 http://p.sf.net/sfu/appdyn_d2d_ap2
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Anyone want to present at PyData Boston, July 27-28th

2013-06-03 Thread Anthony Scopatz
Hey everyone,

Leah Silen (CC'd) of NumFOCUS was wondering if anyone wanted to give a talk
or tutorial about PyTables at PyData Boston [1].

I don't think that I'll be able to make it, but I highly encourage others
to take her up on this.  This sort of thing shouldn't be too hard to put
together since I have already assembled a repo of slides and exercises for
a 4 hour long tutorial [2].  Feel free to use them!

Be Well
Anthony

1. http://pydata.org/bos2013/
2. https://github.com/scopatz/hdf5-is-for-lovers
--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-03 Thread Anthony Scopatz
Opps!  I forgot to mention CArray!


On Mon, Jun 3, 2013 at 10:35 PM, Tim Burgess timburg...@mac.com wrote:

 My thoughts are:

 - try it without any compression. Assuming 32 bit floats, your monthly
 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at
 the least it will give you a baseline to work from - and will help if you
 are investigating IO tuning.

 - I have found with CArray that the auto chunksize works fairly well.
 Experiment with that chunksize and with some chunksizes that you think are
 more appropriate (maybe temporal rather than spatial in your case).


 On Jun 03, 2013, at 10:45 PM, Andreas Hilboll li...@hilboll.de wrote:

 On 03.06.2013 14:43, Andreas Hilboll wrote:
  Hi,
 
  I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
  (the last dimension represents time, and once per month there'll be one
  more 5760x2880 array to add to the end).
 
  Now, extracting timeseries at one index location is slow; e.g., for four
  indices, it takes several seconds:
 
  In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
 
  In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
  CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
  Wall time: 7.17 s
 
  I have the feeling that this performance could be improved, but I'm not
  sure about how to properly use the `chunkshape` parameter in my case.
 
  Any help is greatly appreciated :)
 
  Cheers, Andreas.

 PS: If I could get significant performance gains by not using an EArray
 and therefore re-creating the whole database each month, then this would
 also be an option.

 -- Andreas.



 --
 Get 100% visibility into Java/.NET code with AppDynamics Lite
 It's a free troubleshooting tool designed for production
 Get down to code-level detail for bottlenecks, with 2% overhead.
 Download for free and get started troubleshooting in minutes.
 http://p.sf.net/sfu/appdyn_d2d_ap2
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users



 --
 How ServiceNow helps IT people transform IT departments:
 1. A cloud service to automate IT design, transition and operations
 2. Dashboards that offer high-level views of enterprise services
 3. A single system of record for all IT processes
 http://p.sf.net/sfu/servicenow-d2d-j
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ANN: PyTables 3.0 final

2013-06-02 Thread Anthony Scopatz
Congratulations All!

This is a huge and important milestone for PyTables and I am glad to have
been a part of it!

Be Well
Anthony


On Sat, Jun 1, 2013 at 6:33 AM, Antonio Valentino 
antonio.valent...@tiscali.it wrote:

 ===
   Announcing PyTables 3.0.0
 ===

 We are happy to announce PyTables 3.0.0.

 PyTables 3.0.0 comes after about 5 years from the last major release
 (2.0) and 7 months since the last stable release (2.4.0).

 This is new major release and an important milestone for the PyTables
 project since it provides the long waited support for Python 3.x, which
 has been around for 4 years.

 Almost all of the core numeric/scientific packages for Python already
 support Python 3 so we are very happy that now also PyTables can provide
 this important feature.


 What's new
 ==

 A short summary of main new features:

 - Since this release, PyTables now provides full support to Python 3
 - The entire code base is now more compliant with coding style
guidelines described in PEP8.
 - Basic support for HDF5 drivers.  It now is possible to open/create an
HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE
drivers.
 - Basic support for in-memory image files.  An HDF5 file can be set
from or copied into a memory buffer.
 - Implemented methods to get/set the user block size in a HDF5 file.
 - All read methods now have an optional *out* argument that allows to
pass a pre-allocated array to store data.
 - Added support for the floating point data types with extended
precision (Float96, Float128, Complex192 and Complex256).
 - Consistent ``create_xxx()`` signatures.  Now it is possible to create
all data sets Array, CArray, EArray, VLArray, and Table from existing
Python objects.
 - Complete rewrite of the `nodes.filenode` module. Now it is fully
compliant with the interfaces defined in the standard `io` module.
Only non-buffered binary I/O is supported currently.

 Please refer to the RELEASE_NOTES document for a more detailed list of
 changes in this release.

 As always, a large amount of bugs have been addressed and squashed as well.

 In case you want to know more in detail what has changed in this
 version, please refer to: http://pytables.github.io/release_notes.html

 You can download a source package with generated PDF and HTML docs, as
 well as binaries for Windows, from:
 http://sourceforge.net/projects/pytables/files/pytables/3.0.0

 For an online version of the manual, visit:
 http://pytables.github.io/usersguide/index.html


 What it is?
 ===

 PyTables is a library for managing hierarchical datasets and
 designed to efficiently cope with extremely large amounts of data with
 support for full 64-bit file addressing.  PyTables runs on top of
 the HDF5 library and NumPy package for achieving maximum throughput and
 convenient use.  PyTables includes OPSI, a new indexing technology,
 allowing to perform data lookups in tables exceeding 10 gigarows
 (10**10 rows) in less than a tenth of a second.


 Resources
 =

 About PyTables: http://www.pytables.org

 About the HDF5 library: http://hdfgroup.org/HDF5/

 About NumPy: http://numpy.scipy.org/


 Acknowledgments
 ===

 Thanks to many users who provided feature improvements, patches, bug
 reports, support and suggestions.  See the ``THANKS`` file in the
 distribution package for a (incomplete) list of contributors.  Most
 specially, a lot of kudos go to the HDF5 and NumPy makers.
 Without them, PyTables simply would not exist.


 Share your experience
 =

 Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.


 

**Enjoy data!**

-- The PyTables Developers


 --
 Get 100% visibility into Java/.NET code with AppDynamics Lite
 It's a free troubleshooting tool designed for production
 Get down to code-level detail for bottlenecks, with 2% overhead.
 Download for free and get started troubleshooting in minutes.
 http://p.sf.net/sfu/appdyn_d2d_ap2
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] How much extra metadata does PyTables insert?

2013-05-26 Thread Anthony Scopatz
On Sun, May 26, 2013 at 11:04 AM, Nolan Phillips ncphillips...@gmail.comwrote:

 Hi,

 I have a question about the metadata that PyTables inserts into the HDF5
 files.

 Is this data stored in the files themselves, but just not user defined?
 The important question is, does this metadata make the HDF5 files
 inaccessible by other means, such as the standard C library or H5Py?


Hi Nolan,

The PyTables-specific metadata is for PyTables (and ViTables) consumption
only and does not (or should not) interfere with other methods of HDF5
consumption.  Since PyTables and h5py both link to the hdf5 library, I have
never had any interoperability problems.

Be Well
Anthony



 Thanks!

 Nolan


 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Ideas for effective linear ND interpolation?

2013-05-10 Thread Anthony Scopatz
[dropping scipy-user]

Hello Andreas

PyTables is a great option and using compression (zlib, blosc, etc) will
probably help.  Additionally, I would not that since your values are
between [0, 100], you can probably get away with using 32-bit floats,
rather than 64-bit floats.  This size reduction will speed things up, but
you probably don't want to go down to 16-bit floats.

I would recommend that you store your dataset on disk and then use PyTables
Expressions [1,2] with the out argument to keep your results on disk as
well.  If this strategy fails because you need to simultaneously look at
multiple indexes in the same array, then I would use partially offset
iterators as described in this thread [3].  In both cases, since iterators
are automatically chunked, you never read in the whole dataset at one time
and what you are interpolating can be as large as you want :).

Let us know if you have further specific questions.

Be Well
Anthony

1.
http://pytables.github.io/usersguide/libref.html#the-expr-class-a-general-purpose-expression-evaluator
2.
https://github.com/scopatz/hdf5-is-for-lovers/blob/master/hdf5-is-for-lovers.pdf?raw=true
2.  Nested Iteration of HDF5 using PyTables
http://blog.gmane.org/gmane.comp.python.pytables.user/month=20130101


On Fri, May 10, 2013 at 4:58 AM, Andreas Hilboll li...@hilboll.de wrote:

 Hi,

 I'll have to code multilinear interpolation in n dimensions, n~7. My
 data space is quite large, ~10**9 points. The values are given on a
 rectangular (but not square) grid. The values are numbers in a range of
 approx. [0.0, 100.0].

 The challenge is to do this efficiently, and it would be great if the
 whole thing would be able to run fast on a machine with only 8G (or
 better 4G) RAM.

 A common task will be to interpolate 10**6 points, which souldn't take
 too long.

 Any ideas on how to do this efficiently are welcome:

 * which dtype to use?
 * is using pytables/blosc an option? How can this be integrated in the
 interpolation?
 * you name it ... ;)

 Cheers, Andreas.


 --
 Learn Graph Databases - Download FREE O'Reilly Book
 Graph Databases is the definitive new guide to graph databases and
 their applications. This 200-page book is written by three acclaimed
 leaders in the field. The early access version is available now.
 Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Row.append()

2013-05-03 Thread Anthony Scopatz
On Fri, May 3, 2013 at 1:15 PM, Jim Knoll jim.kn...@spottradingllc.comwrote:

 I am trying to make this better / faster…  

 Data comes faster than I can store it on one box.  So My though was to
 have many boxes each storing their own part in their own table.

 Later I would concatenate the tables together with something like this:***
 *

 ** **

 dest_h5f = pt.openFile(path + 'big_mater.h5','a')

 for source_path in source_h5_path_list:

 h5f = pt.openFile(source_path,'r')

 for node in h5f.root:

 dest_table = dest_h5f.getNode('/', name = node.name)

 print node.nrows

 if node.nrows  0 and node.nrows  100:   # found I needed to
 limit the max size or I would crash  

 dest_table.append(node.read())

 dest_table.flush()

 h5f.close()

 dest_h5f.close()

 ** **

 I could add the logic to iter in chunks over the source data to overcome
 the crash and  but I suspect there could be a better way.


Hi Jim,

You can just iterate over each row in the table (ie for row in node).
 This is slow, but would solve the problem.


 

 ** Take a table in one h5 file and append it to a table in another h5
 file.   Looked like Table.copy() would do the trick but don’t see how I get
 it to append to an existing table.


You could append directly by using the where_append() method with the
condition 'True' to append the whole table.  This will automatically do
the chunking for you.

Be Well
Anthony


 ** **

 My h5 files have 4 rec arrays all stored in root.

 ** **

 Any suggestions?


 --

 *Jim Knoll* *
  **DBA/Developer II*

  Spot Trading L.L.C
  440 South LaSalle St., Suite 2800
  Chicago, IL 60605
  Office: 312.362.4550
  Direct: 312-362-4798
  Fax: 312.362.4551
  jim.kn...@spottradingllc.com
  www.spottradingllc.com
 --

 The information contained in this message may be privileged and
 confidential and protected from disclosure. If the reader of this message
 is not the intended recipient, or an employee or agent responsible for
 delivering this message to the intended recipient, you are hereby notified
 that any dissemination, distribution or copying of this communication is
 strictly prohibited. If you have received this communication in error,
 please notify us immediately by replying to the message and deleting it
 from your computer. Thank you. Spot Trading, LLC




 --
 Get 100% visibility into Java/.NET code with AppDynamics Lite
 It's a free troubleshooting tool designed for production
 Get down to code-level detail for bottlenecks, with 2% overhead.
 Download for free and get started troubleshooting in minutes.
 http://p.sf.net/sfu/appdyn_d2d_ap2
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with 2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ANN: PyTables 3.0 beta1

2013-04-27 Thread Anthony Scopatz
Whoo hoo!  Thanks for all of your hard work Antonio!

PyTables users, we'd really appreciate it if you could try out this beta
release, run the test suite:

$ python -c import tables as tb; tb.test()

And let us know if there are any issues.  Additionally, if you are feeling
brave, any help you can give closing out the last remaining issues [1]
would be great!

Be Well
Anthony

1. https://github.com/PyTables/PyTables/issues?milestone=4state=open


On Sat, Apr 27, 2013 at 6:51 AM, Antonio Valentino 
antonio.valent...@tiscali.it wrote:

 =
   Announcing PyTables 3.0.0b1
 =

 We are happy to announce PyTables 3.0.0b1.

 PyTables 3.0.0b1 comes after about 5 years from the last major release
 (2.0) and 7 months since the last stable release (2.4.0).

 This is new major release and an important milestone for the PyTables
 project since it provides the long waited support for Python 3.x that is
 being around for already 4 years now.

 Almost all the main numeric/scientific packages for python already
 support Python 3 so we are very happy that now also PyTables can provide
 this important feature.


 What's new
 ==

 A short summary of main new features:

 - Since this release PyTables provides full support to Python 3
 - The entire code base is now more compliant with coding style
guidelines describe in the PEP8.
 - Basic support for HDF5 drivers.  Now it is possible to open/create an
HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE
drivers.
 - Basic support for in-memory image files.  An HDF5 file can be set
from or copied into a memory buffer.
 - Implemented methods to get/set the user block size in a HDF5 file.
 - All read methods now have an optional *out* argument that allows to
pass a pre-allocated array to store data.
 - Added support for the floating point data types with extended
precision (Float96, Float128, Complex192 and Complex256).

 Please refer to the RELEASE_NOTES document for a more detailed list of
 changes in this release.

 As always, a large amount of bugs have been addressed and squashed as well.

 In case you want to know more in detail what has changed in this
 version, please refer to:
 http://pytables.github.io/release_notes.html

 You can download a source package with generated PDF and HTML docs, as
 well as binaries for Windows, from:
 http://sourceforge.net/projects/pytables/files/pytables/3.0.0b1

 For an online version of the manual, visit:
 http://pytables.github.io/usersguide/index.html


 What it is?
 ===

 PyTables is a library for managing hierarchical datasets and
 designed to efficiently cope with extremely large amounts of data with
 support for full 64-bit file addressing.  PyTables runs on top of
 the HDF5 library and NumPy package for achieving maximum throughput and
 convenient use.  PyTables includes OPSI, a new indexing technology,
 allowing to perform data lookups in tables exceeding 10 gigarows
 (10**10 rows) in less than a tenth of a second.


 Resources
 =

 About PyTables: http://www.pytables.org

 About the HDF5 library: http://hdfgroup.org/HDF5/

 About NumPy: http://numpy.scipy.org/


 Acknowledgments
 ===

 Thanks to many users who provided feature improvements, patches, bug
 reports, support and suggestions.  See the ``THANKS`` file in the
 distribution package for a (incomplete) list of contributors.  Most
 specially, a lot of kudos go to the HDF5 and NumPy makers.
 Without them, PyTables simply would not exist.


 Share your experience
 =

 Let us know of any bugs, suggestions, gripes, kudos, etc. you may
 have.


 

**Enjoy data!**


 --
 The PyTables Team


 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ANN: numexpr 2.1 (Python 3 support is here!)

2013-04-27 Thread Anthony Scopatz
Congrats Francesc!


On Sat, Apr 27, 2013 at 5:07 AM, Francesc Alted fal...@gmail.com wrote:

 
   Announcing Numexpr 2.1
 

 Numexpr is a fast numerical expression evaluator for NumPy.  With it,
 expressions that operate on arrays (like 3*a+4*b) are accelerated
 and use less memory than doing the same calculation in Python.

 It wears multi-threaded capabilities, as well as support for Intel's
 VML library (included in Intel MKL), which allows an extremely fast
 evaluation of transcendental functions (sin, cos, tan, exp, log...)
 while squeezing the last drop of performance out of your multi-core
 processors.

 Its only dependency is NumPy (MKL is optional), so it works well as an
 easy-to-deploy, easy-to-use, computational kernel for projects that
 don't want to adopt other solutions that require more heavy
 dependencies.

 What's new
 ==

 The main feature of this version is that it adds a much needed
 **compatibility for Python 3**

 Many thanks to Antonio Valentino for his fine work on this.
 Also, Christoph Gohlke quickly provided feedback and binaries for
 Windows and Mark Wiebe and Gaëtan de Menten provided many small
 (but important!) fixes and improvements.  All of you made numexpr 2.1
 the best release ever.  Thanks!

 In case you want to know more in detail what has changed in this
 version, see:

 http://code.google.com/p/numexpr/wiki/ReleaseNotes

 or have a look at RELEASE_NOTES.txt in the tarball.

 Where I can find Numexpr?
 =

 The project is hosted at Google code in:

 http://code.google.com/p/numexpr/

 You can get the packages from PyPI as well:

 http://pypi.python.org/pypi/numexpr

 Share your experience
 =

 Let us know of any bugs, suggestions, gripes, kudos, etc. you may
 have.


 Enjoy data!

 Francesc Alted



 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ANN: PyTables 3.0 beta1

2013-04-27 Thread Anthony Scopatz
On Sat, Apr 27, 2013 at 2:26 PM, Andreas Hilboll li...@hilboll.de wrote:

 Am 27.04.2013 19:42, schrieb Anthony Scopatz:
  On Sat, Apr 27, 2013 at 12:35 PM, Andreas Hilboll li...@hilboll.de
  mailto:li...@hilboll.de wrote:
 
  Am 27.04.2013 19 tel:27.04.2013%2019:17, schrieb Anthony Scopatz:
   Whoo hoo!  Thanks for all of your hard work Antonio!
  
   PyTables users, we'd really appreciate it if you could try out
  this beta
   release, run the test suite:
  
   $ python -c import tables as tb; tb.test()
  
   And let us know if there are any issues.  Additionally, if you are
   feeling brave, any help you can give closing out the last remaining
   issues [1] would be great!
 
  $ virtualenv --system-site-packages .virtualenvs/pytables-test
  (pytables-test) $ python -c import tables; tables.test()
  Traceback (most recent call last):
File string, line 1, in module
File tables/__init__.py, line 82, in module
  from tables.utilsextension import (get_pytables_version,
  get_hdf5_version,
  ImportError: No module named utilsextension
 
 
  This seems like you didn't compile and install PyTables first.  So to be
  more clear:
 
  ~ $ cd pytables
  ~/pytables $ python setup.py install
  ~/pytables $ cd ..
  ~ $ python -c import tables; tables.test()
 
  Be Well
  Anthony
 
 
 
  -- Andreas.
 
 

 Sorry, didn't write down that line. Actually, I did compile and install
 pytables using python setup.py install from within the virtualenv.

 The problem was that I ran that command from within the installation
 directory, so that `import tables` didn't import the installed version.
 I keep making that mistake with every project at least twice :-/

 When you try to do that in scipy, it gives a warning. Maybe it would be
 a good idea to do this in pytables as well?


That is a good idea!



 The tests all ran well:


Glad they passed!

Be Well
Anthony



 $ python -c import tables as tb; tb.test()

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 PyTables version:  3.0.0b1
 HDF5 version:  1.8.4-patch1
 NumPy version: 1.6.1
 Numexpr version:   1.4.2 (not using Intel's VML/MKL)
 Zlib version:  1.2.3.4 (in Python interpreter)
 BZIP2 version: 1.0.6 (6-Sept-2010)
 Blosc version: 1.2.1-rc1 (2013-04-24)
 Cython version:0.15.1
 Python version:2.7.3 (default, Aug  1 2012, 05:14:39)
 [GCC 4.6.3]
 Platform:  linux2-x86_64
 Byte-ordering: little
 Detected cores:2
 Default encoding:  ascii

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 [...]
 Ran 5242 tests in 493.636s
 OK


 and

 $ python -c import tables as tb; tb.test()

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 PyTables version:  3.0.0b1
 HDF5 version:  1.8.4-patch1
 NumPy version: 1.6.1
 Numexpr version:   2.1 (not using Intel's VML/MKL)
 Zlib version:  1.2.3.4 (in Python interpreter)
 BZIP2 version: 1.0.6 (6-Sept-2010)
 Blosc version: 1.2.1-rc1 (2013-04-24)
 Cython version:0.19
 Python version:3.2.3 (default, Oct 19 2012, 20:10:41)
 [GCC 4.6.3]
 Platform:  linux2-x86_64
 Byte-ordering: little
 Detected cores:2
 Default encoding:  utf-8

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 [...]
 Ran 5217 tests in 526.794s
 OK


 --
 -- Andreas.

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] In-kernel searches not returning values?

2013-04-26 Thread Anthony Scopatz
Hello Giovanni!

This definitely seems like a bug.  How was the column indexed?  Could you
send a sample script that reproduces the problem from start to finish?
Thanks.

Be Well
Anthony


On Fri, Apr 26, 2013 at 6:14 PM, Giovanni Luca Ciampaglia 
glciamp...@gmail.com wrote:

 Hi,

 I am new to PyTables and I like it very much though there are still some
 problems I am trying to solve. The latest is that I am seeing a strange
 behavior
 when using in-kernel searches. The seach condition is a simple equality
 test on
 a single column. Basically, when the column is indexed, in-kernel searches
 don't
 return the expected result, that is:

 In [150]: [ row['visits'] for row in ap.where('rid == 665689') ]
 Out[150]: []

 In [151]: [ row['visits'] for row in ap if row['rid'] == 665689 ]
 Out[151]: [18L]

 When I remove the index, it works again:

 In [153]: ap.cols.rid.removeIndex()

 In [154]: [ row['visits'] for row in ap.where('rid == 665689') ]
 Out[154]: [18L]

 Am I doing something wrong? This is an excerpt of the contents of the file:

 - % h5ls -ld test.h5|head
 AllPages Dataset {529000/Inf}
 Data:
 (0) {year=2008, month=1, day=1, hour=0, minute=0, epoch=1199145600,
 rid=665689,
 (0) visits=18},
 (1) {year=2008, month=1, day=1, hour=0, minute=0, epoch=1199145600, rid=2,
 (1) visits=11},
 (2) {year=2008, month=1, day=1, hour=0, minute=0, epoch=1199145600, rid=12,
 (2) visits=1},
 (3) {year=2008, month=1, day=1, hour=0, minute=0, epoch=1199145600,
 rid=612075,
 (3) visits=8},

 And this is the table description:

 Out[152]:
 /AllPages (Table(529000,), shuffle, zlib(5)) ''
 description := {
 year: UInt16Col(shape=(), dflt=0, pos=0),
 month: UInt8Col(shape=(), dflt=0, pos=1),
 day: UInt8Col(shape=(), dflt=0, pos=2),
 hour: UInt8Col(shape=(), dflt=0, pos=3),
 minute: UInt8Col(shape=(), dflt=0, pos=4),
 epoch: UInt32Col(shape=(), dflt=0, pos=5),
 rid: UInt32Col(shape=(), dflt=0, pos=6),
 visits: UInt32Col(shape=(), dflt=0, pos=7)}
 byteorder := 'little'
 chunkshape := (233016,)
 autoIndex := True
 colindexes := {
 rid: Index(1, light, shuffle, zlib(1)).is_CSI=False}

 Thanks!

 --
 Giovanni Luca Ciampaglia

 Postdoctoral fellow
 Center for Complex Networks and Systems Research
 Indiana University

 ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
 ☞ http://cnets.indiana.edu/
 ✉ gciam...@indiana.edu



 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] sourceforge downloads corrupted?

2013-04-24 Thread Anthony Scopatz
Hey Matt,

is this related?  https://github.com/PyTables/PyTables/issues/223

Be Well
Anthony


On Wed, Apr 24, 2013 at 3:09 PM, Matt Terry matt.te...@gmail.com wrote:

 Hello,

 The source tarball for pytables 2.4 on sourceforge appears to be broken.
 The file size is suspiciously small (800 kB vs 8.5MB on PyPI), the tarball
 doesn't untar, and the md5 doesn't match.

 -matt


 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Documentation for stable releases?

2013-04-22 Thread Anthony Scopatz
Hello Gaëtan,

Thanks for bringing this up and I think that older versions of the docs are
a fairly important thing to have. I have opened an issue for this on github
[1].  However, I doubt that I will have an opportunity to take care of this
in the short term.  So if you want to take care of this issue for the
benefit of yourself and all, I would love to see a pull request ;).

Be Well
Anthony

1. https://github.com/PyTables/PyTables/issues/236


On Mon, Apr 22, 2013 at 6:09 AM, Gaëtan de Menten gdemen...@gmail.comwrote:

 Hello all,

 TL;DR: It would be nice to have online documentation for stable versions
 and have pytables.github.io point to the doc for the latest stable
 release by default.
 

 I just tried to use the new out= argument to table.read, only to find out
 it did not work in my version (2.3.1). Then I tried to update my version to
 2.4 since I thought it was implemented in that version because of the
 2.4.0+1.dev name at the top of the page which I thought meant dev
 version leading to 2.4, or maybe to 2.4.1, but certainly not the next
 major release. I got even more confused because, after the initial failure
 with my 2.3.1 release, I checked the release notes... which I thought were
 for 2.4 because the title of the release notes page is Release notes for
 PyTables 2.4 series when it is in fact for the next major version...

 Here are a couple suggestions:
 * doc for stable releases (default to latest stable), bonus points to be
 able to switch easily from one version to another, a-la Python stdlib.
 * change 2.4.0+1.dev to 3.0-dev or 3.0-pre, and all mentions of 2.4.x
 * have new arguments to functions documented in the docstring for the
 functions (like in Python stdlib): new in pytables 3.0 in the docstring
 for table.read() would have made wonders.

 Thanks in adance,
 --
 Gaëtan de Menten



 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Call for help: PyTables 3.0 release (w/ Python 3.x support)

2013-04-22 Thread Anthony Scopatz
Hello Thadeus,

Thanks for posting this PR!  Once it is fixed for Python 3, we'd love to
see it merged in.

Be Well
Anthony


On Mon, Apr 22, 2013 at 2:52 PM, Thadeus Burgess thade...@thadeusb.comwrote:

 Hopefully this pull request can be included in the next version? It is
 keeping us from using the CSI functionality of PyTables.

 https://github.com/PyTables/PyTables/pull/238

 --
 Thadeus



 On Tue, Apr 16, 2013 at 4:10 PM, Anthony Scopatz scop...@gmail.comwrote:

 Hello PyTables Users,

 To let you know, we are hoping to do a PyTables 3.0-beta release here in
 the next week or two.  This will include the long awaited Python 3 support
 thanks to the heroic efforts of Antonio Valentino, who did the lion's share
 of the porting work for both PyTables  AND one of our dependencies, numexpr.

 However, to really make this release the best possible, we are asking for
 your help in cleaning up and closing some of the remaining issues.  You can
 see our list of open issues for this release here [1].  You can also see
 out todo list for this release here [2].

 *If you have a feature that you'd really love to see make it into the
 code base, now is the time to implement it.*  If you have always wanted
 to contribute, but weren't sure how to get going, please fork the repo on
 github and then issue a pull request.  If you have any questions about this
 process feel free to ask in this thread.

 Here is to a great next release!
 The PyTables Developers

 1. https://github.com/PyTables/PyTables/issues?milestone=4state=open
 2. https://github.com/PyTables/PyTables/wiki/NextReleaseTodo


 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Row.append() performance

2013-04-15 Thread Anthony Scopatz
Hello Shyam,

Can you please post the full traceback?  In any event, I am fairly certain
that this error is coming from the np.fromiter step.  The problem here is
that you are trying to read yur entire SQL query into a single numpy array
in memory.  This is impossible because you don't have enough RAM.
Therefore, you are going to need to read and write in chunks.  Something
like the following:

def getDataAndWriteHDF5(table):
databaseConn= pyodbc.connect(connection string, password)
cursor= databaseConn.cursor()
cursor.execute(SQL Query)
dt = np.dtype([('name', numpy.str_, 180), ('address', numpy.str_,
4200),
   ('email', numpy.str_, 180), ('phone', numpy.str_, 256)])
citer = iter(cursor)
chunksize = 4096  # This is just a guess, other values might work better
crange = range(chunksize)
while True:
resultSet = np.fromiter((tuple(row) for i, row in zip(crange,
citer)), dtype=dt)
table.append(resultSet)
if len(resultSet)  chunksize:
break

You may want to tweak some things, but that is the basic strategy.

Be Well
Anthony


On Mon, Apr 15, 2013 at 10:16 PM, Shyam Parimal Katti spk...@nyu.eduwrote:

 Hello Anthony,


 Thank you for your suggestions. When I mentioned that I am reading the data 
 from database, I meant a DB2 database, not a HDF5 database/file.


 I followed your suggestions, so the code looks as follows:


 def createHDF5File():

  h5File= tables.openFile(file name, mode=a)

   table.createTable(h5File.root, Contact, Contact, Contact, 
 expectedrows=700)

  .


 def getDataAndWriteHDF5(table):

  databaseConn= pyodbc.connect(connection string, password)

  cursor= databaseConn.cursor()

  cursor.execute(SQL Query)

   resultSet= np.fromiter(( tuple(row) for row in cursor), 
 dtype=[('name', numpy.str_, 180), ('address', numpy.str_, 4200), ('email', 
 numpy.str_, 180), ('phone', numpy.str_, 256)])

table.append(resultSet)



 Error message: MemoryError: cannot allocate array memory.



 I am setting the `expectedrows` parameter when creating the table in HDF5 
 file, and yet encounter the error above. Looking forward to suggestions.





  Hello Anthony,
 
  Thank you for replying back with suggestions.
 
  In response to your suggestions, I am *not reading the data from a file
  in the first step, but instead a database*.
 

 Hello Shyam,

 To put too fine a point on it, hdf5 databases are files.  And reading from
 any kind of file incurs the same disk read overhead.


   I did try out your 1st suggestion of doing a table.append(list of
  tuples), which took a little more than the executed time I got with the
  original code. Can you please guide me in how to chunk the data (that I got
  from database and stored as a list of tuples in Python) ?
 

 Ahh, so you should not be using list of tuples.  These are Pythonic types
 and conversion between HDF5 types and Python types is what is slowing you
 down.  You should be passing a numpy structured array into append().  Numpy
 types are very similar (and often exactly the same as) HDF5 types.  For
 large, continuous, structured data you want to avoid the Python interpreter
 as much as possible.  Use Python here as the glue code to compose a series
 of fast operations using the APIs exposed by numpy, pytables, etc.

 Be Well
 Anthony



 On Thu, Apr 11, 2013 at 6:16 PM, Shyam Parimal Katti spk...@nyu.eduwrote:

 Hello Anthony,

 Thank you for replying back with suggestions.

 In response to your suggestions, I am *not reading the data from a file
 in the first step, but instead a database*. I did try out your 1st
 suggestion of doing a table.append(list of tuples), which took a little
 more than the executed time I got with the original code. Can you please
 guide me in how to chunk the data (that I got from database and stored as a
 list of tuples in Python) ?


 Thanks,
 Shyam


 Hi Shyam,

 The pattern that you are using to write to a table is basically one for
 writing Python data to HDF5.  However, your data is already in a machine /
 HDF5 native format.  Thus what you are doing here is an excessive amount of
 work:  read data from file - convert to Python data structures - covert
 back to HDF5 data structures - write to file.

 When reading from a table you get back a numpy structured array (look them
 up on the numpy website).

 Then instead of using rows to write back the data, just use Table.append()
 [1] which lets you pass in a bunch of rows simultaneously.  (Note: that you
 data in this case is too large to fit into memory, so you may have to spit
 it up into chunks or use the new iterators which are in the development
 branch.)

 Additionally, if all you are doing is copying a table wholesale, you should
 use the Table.copy(). [2]  Or if you only want to copy some subset based on
 a conditional you provide, use whereAppend() [3].

 Finally, if you want to do math or evaluate expressions on 

Re: [Pytables-users] Some method like a table.readWhereSorted

2013-04-11 Thread Anthony Scopatz
Thanks for bringing this up, Julio.

Hmm I don't think that this exists currently, but since there are
readWhere() and readSorted() it shouldn't be too hard to implement.  I have
opened issue #225 to this effect.  Pull requests welcome!

https://github.com/PyTables/PyTables/issues/225

Be Well
Anthony


On Wed, Apr 10, 2013 at 1:02 PM, Dr. Louis Wicker louis.wic...@noaa.govwrote:

 I am also interested in the this capability, if it exists in some way...

 Lou

 On Apr 10, 2013, at 12:35 PM, Julio Trevisan juliotrevi...@gmail.com
 wrote:

  Hi,
 
  Is there a way that I could have the ability of readWhere (i.e., specify
 condition, and fast result) but also using a CSIndex so that the rows come
 sorted in a particular order?
 
  I checked readSorted() but it is iterative and does not allow to specify
 a condition.
 
  Julio
 
 --
  Precog is a next-generation analytics platform capable of advanced
  analytics on semi-structured data. The platform includes APIs for
 building
  apps and a phenomenal toolset for data science. Developers can use
  our toolset for easy data analysis  visualization. Get a free account!
 
 http://www2.precog.com/precogplatform/slashdotnewsletter___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users


 
 | Dr. Louis J. Wicker
 | NSSL/WRDD  Rm 4366
 | National Weather Center
 | 120 David L. Boren Boulevard, Norman, OK 73072
 |
 | E-mail:   louis.wic...@noaa.gov
 | HTTP:http://www.nssl.noaa.gov/~lwicker
 | Phone:(405) 325-6340
 | Fax:(405) 325-6780
 |
 |
 I  For every complex problem, there is a solution that is simple,
 |  neat, and wrong.
 |
 |   -- H. L. Mencken
 |

 
 | The contents  of this message are mine personally and
 | do not reflect any position of  the Government or NOAA.

 



 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Anthony Scopatz
On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan juliotrevi...@gmail.comwrote:

 Hi,

 I am using a Time64Col called timestamp in a condition, and I noticed
 that the condition does not work (i.e., no rows are selected) if I write
 something as:

 for row in node.where(timestamp == %f % t):
 ...

 However, I had this idea of dividing the values by, say 1000, and it does
 work:

 for row in node.where(timestamp/1000 == %f % t/1000):
 ...

 However, this doesn't seem to be an elegant solution. Please could someone
 point out a better solution to this?


Hello Julio,

While this may not be the most elegant solution it is probably one of the
most appropriate.  The problem here likely stems from the fact that
floating point numbers (which are how Time64Cols are stored) are not exact
representations of the desired value.  For example:

In [1]: 1.1 + 2.2
Out[1]: 3.3003

So when you divide my some constant order of magnitude, you are chopping
off the error associated with floating point precision.   You are creating
a bin of this constant's size around the target value that is close
enough to count as equivalent.  There are other mechanisms for alleviating
this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
(platform dependent), taking the difference and have it be less than some
tolerance x - y = t.  You get the idea.   You have to mitigate this effect
some how.

For more information please refer to:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html


 Could this be related to the fact that my column name is timestamp? I
 ask this because I use a program called HDFView to brose the HDF5 file.
 This program refuses to show the first column when it is called
 timestamp, but shows it when it is called id. I don't know if the facts
 are related or not.


This is probably unrelated.

Be Well
Anthony



 I don't know if this is useful information, but the conversion of a
 typical t to string gives something like this:

  print %f % t
 1365597435.00




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] ReadWhere() with a Time64Col in the condition

2013-04-10 Thread Anthony Scopatz
On Wed, Apr 10, 2013 at 11:40 AM, Julio Trevisan juliotrevi...@gmail.comwrote:

 Hi Anthony

 Thanks again.* *If it is a problem related to floating-point precision, I
 might use an Int64Col instead, since I don't need the timestamp miliseconds.


Another good plan since integers are exact ;)




 Julio




 On Wed, Apr 10, 2013 at 1:17 PM, Anthony Scopatz scop...@gmail.comwrote:

 On Wed, Apr 10, 2013 at 7:44 AM, Julio Trevisan 
 juliotrevi...@gmail.comwrote:

 Hi,

 I am using a Time64Col called timestamp in a condition, and I noticed
 that the condition does not work (i.e., no rows are selected) if I write
 something as:

 for row in node.where(timestamp == %f % t):
 ...

 However, I had this idea of dividing the values by, say 1000, and it
 does work:

 for row in node.where(timestamp/1000 == %f % t/1000):
 ...

 However, this doesn't seem to be an elegant solution. Please could
 someone point out a better solution to this?


 Hello Julio,

 While this may not be the most elegant solution it is probably one of the
 most appropriate.  The problem here likely stems from the fact that
 floating point numbers (which are how Time64Cols are stored) are not exact
 representations of the desired value.  For example:

 In [1]: 1.1 + 2.2
 Out[1]: 3.3003

 So when you divide my some constant order of magnitude, you are chopping
 off the error associated with floating point precision.   You are creating
 a bin of this constant's size around the target value that is close
 enough to count as equivalent.  There are other mechanisms for alleviating
 this issue: dividing and multiplying back (x/10)*10 == y,  right shifting
 (platform dependent), taking the difference and have it be less than some
 tolerance x - y = t.  You get the idea.   You have to mitigate this effect
 some how.

 For more information please refer to:
 http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html


 Could this be related to the fact that my column name is timestamp? I
 ask this because I use a program called HDFView to brose the HDF5 file.
 This program refuses to show the first column when it is called
 timestamp, but shows it when it is called id. I don't know if the facts
 are related or not.


 This is probably unrelated.

 Be Well
 Anthony



 I don't know if this is useful information, but the conversion of a
 typical t to string gives something like this:

  print %f % t
 1365597435.00




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for
 building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Reading single column from table

2013-03-22 Thread Anthony Scopatz
On Fri, Mar 22, 2013 at 7:11 AM, Julio Trevisan juliotrevi...@gmail.comwrote:

 Hi,

 I just joined this list, I am using PyTables for my project and it works
 great and fast.

 I am just trying to optimize some parts of the program and I noticed that
 zipping the tuples to get one tuple per column takes much longer than
 reading the data itself. The thing is that readWhere() returns one tuple
 per row, whereas I I need one tuple per column, so I have to use the zip()
 function to achieve this. Is there a way to skip this zip() operation?
 Please see below:


 def quote_GetData(self, period, name, dt1, dt2):
 Returns timedata.Quotes object.

 Arguments:
   period -- value from within infogetter.QuotePeriod
   name -- quote symbol
   dt1, dt2 -- datetime.datetime or timestamp values

 
 t = time.time()
 node = self.quote_GetNode(period, name)
 ts1 = misc.datetime2timestamp(dt1)
 ts2 = misc.datetime2timestamp(dt2)

 L = node.readWhere( \
(timestamp/1000 = %f)  (timestamp/1000 = %f) % \
(ts1/1000, ts2/1000))
 rowNum = len(L)
 Q = timedata.Quotes()
 print %s: took %f seconds to do everything else % (name,
 time.time()-t)

 t = time.time()
 if rowNum  0:
 (Q.timestamp, Q.open, Q.close, Q.high, Q.low, Q.volume, \
  Q.numTrades) = zip(*L)
 print %s: took %f seconds to ZIP % (name, time.time()-t)
 return Q

 *And the printout:*
 BOVESPA.VISTA.PETR4: took 0.068788 seconds to do everything else
 BOVESPA.VISTA.PETR4: took 0.379910 seconds to ZIP


Hi Julio,

The problem here isn't zip (packing and un-packing are generally
fast operations -- they happen *all* the time in Python).Nor is the
problem specifically with PyTables.  Rather this is an issue with how you
are using numpy structured arrays (look them up).  Basically, this is slow
because you are creating a list of column tuples where every element is a
Python object of the corresponding type.  For example  upcasting every
32-bit integer to a Python int is very expensive!

What you *should* be doing is keeping the columns as numpy arrays, which
keeps the memory layout small, continuous, fast, and if done right does not
require a copy (which you are doing now).

The value of L here is a structured array.  So say I have some
other structured array with 4 fields, the right way to do this is to pull
out each field individually by indexing

a, b, c, d = x['a'], x['b'], x['c'], x['d']

or more generally (for all fields):

a, b, c, d = map(lambda x: i[x], i.dtype.names)

or for some list of fields:

a, c, b = map(lambda x: i[x], ['a', 'c', 'b'])

Timing both your original method and the new one gives:

In [47]: timeit a, b, c, d = zip(*i)
1000 loops, best of 3: 1.3 ms per loop

In [48]: timeit a, b, c, d = map(lambda x: i[x], i.dtype.names)
10 loops, best of 3: 2.3 µs per loop

So the method I propose is 500x-1000x times faster.  Using numpy
idiomatically is very important!

Be Well
Anthony







 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_mar
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Writing to CArray

2013-03-11 Thread Anthony Scopatz
On Sun, Mar 10, 2013 at 8:47 PM, Tim Burgess timburg...@mac.com wrote:


 On 08/03/2013, at 2:51 AM, Anthony Scopatz wrote:

  Hey Tim,
 
  Awesome dataset! And neat image!
 
  As per your request, a couple of minor things I noticed were that you
 probably don't need to do the sanity check each time (great for debugging,
 but not needed always), you are using masked arrays which while sometimes
 convenient are generally slower than creating an array, a mask and applying
 the mask to the array, and you seem to be downcasting from float64 to
 float32 for some reason that I am not entirely clear on (size, speed?).
 
  To the more major question of write performance, one thing that you
 could try is compression.  You might want to do some timing studies to find
 the best compressor and level. Performance here can vary a lot based on how
 similar your data is (and how close similar data is to each other).  If you
 have got a bunch of zeros and only a few real data points, even zlib 1 is
 going to be blazing fast compared to writing all those zeros out explicitly.
 
  Another thing you could try doing is switching to EArray and using the
 append() method.  This might save PyTables, numpy, hdf5, etc from having to
 check that the shape of sst_node[qual_indices] is actually the same as
 the data you are giving it.  Additionally dumping a block of memory to the
 file directly (via append()) is generally faster than having to resolve
 fancy indexes (which are notoriously the slow part of even numpy).
 
  Lastly, as a general comment, you seem to be doing a lot of stuff in the
 inner most loop -- including writing to disk.  I would look at how you
 could restructure this to move as much as possible out of this loop.  Your
 data seems to be about 12 GB for a year, so this is probably too big to
 build up the full sst array completely in memory prior to writing.  That
 is, unless you have a computer much bigger than my laptop ;).  But issuing
 one fat write command is probably going to be faster than making 365 of
 them.
 
  Happy hacking!
  Be Well
  Anthony
 


 Thanks Anthony for being so responsive and touching on a number of points.

 The netCDF library gives me a masked array so I have to explicitly
 transform that into a regular numpy array.


Ahh interesting.  Depending on the netCDF version the file was made with,
you should be able to read the file directly from PyTables.  You could thus
directly get a normal numpy array.  This *should* be possible, but I have
never tried it ;)


 I've looked under the covers and have seen that the ma masked
 implementation is all pure Python and so there is a performance drawback.
 I'm not up to speed yet on where the numpy.na masking implementation is
 (started a new job here).

 I tried to do an implementation in memory (except for the final write) and
 found that I have about 2GB of indices when I extract the quality indices.
 Simply using those indexes, memory usage grows to over 64GB and I
 eventually run out of memory and start churning away in swap.

 For the moment, I have pulled down the latest git master and am using the
 new in-memory HDF feature. This seems to give be better performance and is
 code-wise pretty simple so for the moment, it's good enough.


Awesome! I am glad that this is working for you.


 Cheers and thanks again, Tim

 BTW I viewed your SciPy tutorial. Good stuff!


Thanks!






 --
 Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
 Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the
 endpoint security space. For insight on selecting the right partner to
 tackle endpoint security challenges, access the full report.
 http://p.sf.net/sfu/symantec-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and remains a good choice in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Writing to CArray

2013-03-07 Thread Anthony Scopatz
Hey Tim,

Awesome dataset! And neat image!

As per your request, a couple of minor things I noticed were that you
probably don't need to do the sanity check each time (great for debugging,
but not needed always), you are using masked arrays which while
sometimes convenient are generally slower than creating an array, a mask
and applying the mask to the array, and you seem to be downcasting from
float64 to float32 for some reason that I am not entirely clear on (size,
speed?).

To the more major question of write performance, one thing that you could
try is 
compressionhttp://pytables.github.com/usersguide/optimization.html#compression-issues.
 You might want to do some timing studies to find the best compressor and
level. Performance here can vary a lot based on how similar your data is
(and how close similar data is to each other).  If you have got a bunch of
zeros and only a few real data points, even zlib 1 is going to be blazing
fast compared to writing all those zeros out explicitly.

Another thing you could try doing is switching to EArray and using the
append() method.  This might save PyTables, numpy, hdf5, etc from having to
check that the shape of sst_node[qual_indices] is actually the same as
the data you are giving it.  Additionally dumping a block of memory to the
file directly (via append()) is generally faster than having to resolve
fancy indexes (which are notoriously the slow part of even numpy).

Lastly, as a general comment, you seem to be doing a lot of stuff in the
inner most loop -- including writing to disk.  I would look at how you
could restructure this to move as much as possible out of this loop.  Your
data seems to be about 12 GB for a year, so this is probably too big to
build up the full sst array completely in memory prior to writing.  That
is, unless you have a computer much bigger than my laptop ;).  But issuing
one fat write command is probably going to be faster than making 365 of
them.

Happy hacking!
Be Well
Anthony


On Wed, Mar 6, 2013 at 11:25 PM, Tim Burgess timburg...@mac.com wrote:

 I'm producing a large chunked HDF5 using CArray and want to clarify that
 the performance I'm getting is what would normally be expected.

 The source data is a large annual satellite dataset - 365 days x 4320
 latitiude by 8640 longitude of 32bit floats. I'm only interested in pixels
 of a certain quality so I am iterating over the source data (which is in
 daily files) and then determining the indices of all quality pixels in that
 day. There are usually about 2 million quality pixels in a day.

 I then set the equivalent CArray locations to the value of the quality
 pixels. As you can see in the code below, the source numpy array is a 1 x
 4320 x 8640. So for addressing the CArray, I simply take the first index
 and set it to the current day to map indices to the 365 x 4320 x 8640
 CArray.

 I've tried a couple of different chunkshapes. As I will be reading the HDF
 sequentially day by day and as the data comes from a polar-orbit, I'm using
 a 1 x 1080 x 240 chunk to try and optimize for chunks that will have no
 data (and therefore reduce the total filesize). You can see an image of an
 example day at


 http://data.nodc.noaa.gov/pathfinder/Version5.2/browse_images/2011/sea_surface_temperature/20110101001735-NODC-L3C_GHRSST-SSTskin-AVHRR_Pathfinder-PFV5.2_NOAA19_G_2011001_night-v02.0-fv01.0-sea_surface_temperature.png


 To produce a day takes about 2.5 minutes on a Linux (Ubuntu 12.04) machine
 with two SSDs in RAID 0. The system has 64GB of RAM but I don't think
 memory is a constraint here.
 Looking at a profile, most of that 2.5 minutes is spent in _g_writeCoords
 in tables.hdf5Extension.Array

 Here's the pertinent code:

 for year in range(2011, 2012):

 # create dataset and add global attrs
 annualfile_path =
 '%sPF4km/V5.2/hdf/annual/PF52-%d-c1080x240-test.h5' % (crwdir, year)
 print 'Creating ' + annualfile_path


 with tables.openFile(annualfile_path, 'w', title=('Pathfinder V5.2
 %d' % year)) as h5f:

 # write lat lons
 lat_node = h5f.createArray('/', 'lat', lats, title='latitude')
 lon_node = h5f.createArray('/', 'lon', lons, title='longitude')


 # glob all the region summaries in a year
 files = [glob.glob('%sPF4km/V5.2/%d/*night*' % (crwdir,
 year))[0]]
 print 'Found %d days' % len(files)
 files.sort()


 # create a 365 x 4320 x 8640 array
 shape = (NUMDAYS, 4320, 8640)
 atom = tables.Float32Atom(dflt=np.nan)
 # we chunk into daily slices and then further chunk days
 sst_node = h5f.createCArray(h5f.root, 'sst', atom, shape,
 chunkshape=(1, 1080, 240))


 for filename in files:

 # get day
 day = int(filename[-25:-22])
 print 'Processing %d day %d' % (year, day)

 ds = Dataset(filename)
 kelvin64 = 

Re: [Pytables-users] checksum always verified?

2013-02-27 Thread Anthony Scopatz
I think that the checksum is on the compressed data...


On Wed, Feb 27, 2013 at 2:16 PM, Frédéric Bastien no...@nouiz.org wrote:

 Hi,

 we just got some problem with our file server and this bring me
 question on how to detect corrupted files.

 There is a way to specify a filter when creating a table that add a
 checksum[1].

 My questions is, when a file is created with checksum, are they always
 verified when the chunks are uncompressed? Can we specify when we open
 the file if we want to check it or not? The examples I found only talk
 about it when we create the file.

 thanks

 Frédéric Bastien


 [1] http://pytables.github.com/usersguide/libref/helper_classes.html


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_feb
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] checksum always verified?

2013-02-27 Thread Anthony Scopatz
Sorry, I don't know.  I never have used this feature.   Maybe someone who
has can chime in.


On Wed, Feb 27, 2013 at 2:26 PM, Frédéric Bastien no...@nouiz.org wrote:

 That is fine with me. I just want to detect if my data got corrupted
 by hardware problems.

 Do someone know if it always get verified? Do you know if this cause
 significant speed difference?

 thanks

 Frédéric

 On Wed, Feb 27, 2013 at 3:21 PM, Anthony Scopatz scop...@gmail.com
 wrote:
  I think that the checksum is on the compressed data...
 
 
  On Wed, Feb 27, 2013 at 2:16 PM, Frédéric Bastien no...@nouiz.org
 wrote:
 
  Hi,
 
  we just got some problem with our file server and this bring me
  question on how to detect corrupted files.
 
  There is a way to specify a filter when creating a table that add a
  checksum[1].
 
  My questions is, when a file is created with checksum, are they always
  verified when the chunks are uncompressed? Can we specify when we open
  the file if we want to check it or not? The examples I found only talk
  about it when we create the file.
 
  thanks
 
  Frédéric Bastien
 
 
  [1] http://pytables.github.com/usersguide/libref/helper_classes.html
 
 
 
 --
  Everyone hates slow websites. So do we.
  Make your web apps faster with AppDynamics
  Download AppDynamics Lite for free today:
  http://p.sf.net/sfu/appdyn_d2d_feb
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 --
  Everyone hates slow websites. So do we.
  Make your web apps faster with AppDynamics
  Download AppDynamics Lite for free today:
  http://p.sf.net/sfu/appdyn_d2d_feb
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_feb
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Problem using HDFStore in pandas on Windows 64-bit Anaconda CE

2013-02-15 Thread Anthony Scopatz
Hi Jon,

Unfortunately, I have no way of testing this out.  I will say that I have
had problems with HDF5 and Anaconda on windows before since they only ship
the static *.lib hdf5 libraries.  So it may be the case that the pandas -
pytables / hdf5 interface hasn't been properly linked.  Barring someone on
this list who can test things out for you, you might try grabbing the
PyTables source from github and building it on top of your install of
Anaconda.  Sorry...

Be Well
Anthony


On Fri, Feb 15, 2013 at 3:29 AM, Jon Rowland rowland@gmail.com wrote:

 Hi - apologies if this is a duplicate, I had an error sending the
 first time and wasn't sure if it made it through.

 I have an issue using pandas/HDFStore/pytables in the Anaconda CE
 distribution on Windows 64-bit.

 After a little troubleshooting with the Anaconda/pandas lists, it's
 been suggested that it might be a pytables issue (or at least some
 kind of package mismatch causing pytables not to work).

 I have a clean install of Anaconda 1.3.1 64-bit CE edition on a
 Windows 64-bit machine.

 Running the pytables self-test gives the following output:


 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 PyTables version:  2.4.0
 HDF5 version:  1.8.9
 NumPy version: 1.6.2
 Numexpr version:   2.0.1 (not using Intel's VML/MKL)
 Zlib version:  1.2.3 (in Python interpreter)
 Blosc version: 1.1.3 (2010-11-16)
 Cython version:0.17.4
 Python version:2.7.3 |AnacondaCE 1.3.1 (64-bit)| (default, Jan  7
 2013, 09:47:12) [MSC v.1500 64 bit (AMD64)]
 Byte-ordering: little
 Detected cores:4

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

 Then I get a *lot* of output to standard error - pages and pages of it
 - that looks something like this:

 C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning:
 compression library ``bzip2`` is not available; using ``zlib`` instead
   % (complib, default_complib), FiltersWarning )
 C:\Anaconda\lib\site-packages\tables\filters.py:253: FiltersWarning:
 compression library ``lzo`` is not available; using ``zlib`` instead
   % (complib, default_complib), FiltersWarning )
 HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0:
   #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type
 major: Invalid arguments to routine
 minor: Inappropriate type
 HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0:
   #000: ..\..\src\H5A.c line 920 in H5Awrite(): not an attribute
 major: Invalid arguments to routine
 minor: Inappropriate type
 EHDF5-DIAG: Error detected in HDF5 (1.8.9) thread 0:
   #000: ..\..\src\H5A.c line 241 in H5Acreate2(): not a type
 major: Invalid arguments to routine
 minor: Inappropriate type

 Is this something I'm doing wrong or is there something wrong with the
 package?

 Any help would be appreciated.

 Thanks,
 Jon


 --
 Free Next-Gen Firewall Hardware Offer
 Buy your Sophos next-gen firewall before the end March 2013
 and get the hardware for free! Learn more.
 http://p.sf.net/sfu/sophos-d2d-feb
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Can't rename node with child node

2013-02-10 Thread Anthony Scopatz
Thanks for hunting this down Michka,

It was a pretty simple change, so I went ahead and merged it in.

Be Well
Anthony


On Sun, Feb 10, 2013 at 9:29 AM, Michka Popoff michkapop...@gmail.comwrote:

 After some (lng) code browsing I sent a pull request for my problem :
 https://github.com/PyTables/PyTables/pull/208

 Thanks, I was not sure if it was a bug or an intended functionality
 preventing me to rename nodes with children.

 Michka

 Le 10 févr. 2013 à 09:08, Anthony Scopatz a écrit :

 Hey Michka,

 This seems like a bug.  Please open an issue on github or submit a pull
 request if you figure out a fix.  Thanks!

 Be Well
 Anthony


 On Sat, Feb 9, 2013 at 4:44 AM, Michka Popoff michkapop...@gmail.comwrote:

 Hello

 I am not able to rename a node which has parent nodes. The doc doesn't
 specify any restriction to the usage of the renameNode method.
 Here is a small example script to show what I want to achieve :

 import tables

 # Create file and groups
 file = tables.openFile(test.hdf5, w)
 file.createGroup(/, data, Data)
 file.createGroup(/data, id, Single Data)
 file.createGroup(/data/id/, curves1, Curve 1)
 file.createGroup(/data/id/, curves2, Curve 2)

 # Rename (works)
 file.renameNode(/data/id/curves1, newcurve1)

 # Rename (doesn't work)
 file.renameNode(/data/id, newid)

 The first rename will work and rename /data/id/curves1 to
 /data/id/newcurve1
 The second rename will fail with the following traceback :

 Traceback (most recent call last):
   File Rename.py, line 14, in module
 file.renameNode(/data/id, newid)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py,
 line 1157, in renameNode
 obj._f_rename(newname, overwrite)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py,
 line 590, in _f_rename
 self._f_move(newname=newname, overwrite=overwrite)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py,
 line 674, in _f_move
 self._g_move(newparent, newname)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/group.py,
 line 565, in _g_move
 self._v_file._updateNodeLocations(oldPath, newPath)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py,
 line 2368, in _updateNodeLocations
 descendentNode._g_updateLocation(newNodePPath)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/node.py,
 line 414, in _g_updateLocation
 file_._refNode(self, newPath)
   File
 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tables/file.py,
 line 2287, in _refNode
 file already has a node with path ``%s`` % nodePath
 AssertionError: file already has a node with path ``/data``
 Closing remaining open files: test.hdf5... done
 Exception AttributeError: 'File' object has no attribute '_aliveNodes'
 in  ignored

 Perhaps I can not do what I want to do here, or is there another method I
 should use ?

 Thanks in advance

 Michka Popoff


 --
 Free Next-Gen Firewall Hardware Offer
 Buy your Sophos next-gen firewall before the end March 2013
 and get the hardware for free! Learn more.
 http://p.sf.net/sfu/sophos-d2d-feb
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users



 --
 Free Next-Gen Firewall Hardware Offer
 Buy your Sophos next-gen firewall before the end March 2013
 and get the hardware for free! Learn more.

 http://p.sf.net/sfu/sophos-d2d-feb___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Free Next-Gen Firewall Hardware Offer
 Buy your Sophos next-gen firewall before the end March 2013
 and get the hardware for free! Learn more.
 http://p.sf.net/sfu/sophos-d2d-feb
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Using built-in slice objects on CArray

2013-01-23 Thread Anthony Scopatz
Hi Andreas,

I think that the problem here is that coord_slice is actually a list of
slices, which you can't index by.  (Though, you may be able to in numpy...)

Try something like _ds[coord_slice[0]] instead.

Be Well
Anthony

B eW



On Tue, Jan 22, 2013 at 8:44 AM, Andreas Hilboll li...@hilboll.de wrote:

 Hi,

 how can I use Python's built-in `slice` object on CArray? Currently, I'm
 trying

 In:  coord_slice
 Out: [slice(0, 31, None), slice(0, 5760, None), slice(0, 2880, None)]

 In:  _ds
 Out:  /data/mydata (CArray(31, 5760, 2880), shuffle, blosc(5)) ''
   atom := Float32Atom(shape=(), dflt=0.0)
   maindim := 0
   flavor  := 'numpy'
   byteorder := 'little'
   chunkshape := (1, 45, 2880)

 In: _ds[coord_slice]
 Out: *** TypeError: long() argument must be a string or a number,
 not 'slice'

 The problem is that I want to write something generic, and I don't know
 beforehand how many dimensions the CArray has. My current plan is to
 create a tuple of slice objects programatically (using list
 comprehension), and then use this tuple as index. But apparently it
 doesn't work with pytables 2.3.1.

 Any suggestions on how to accomplish my task are greatly appreciated :)

 Cheers, Andreas.


 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnnow-d2d
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Using built-in slice objects on CArray

2013-01-23 Thread Anthony Scopatz
yeah, indexing with a list (rather than a tuple) has a different meaning.
The most notable place I have seen list-indexing used is with numpy
structured arrays.  In all other locations the tuple slicing is
for drilling down different dimensions, as you say.


On Wed, Jan 23, 2013 at 10:25 AM, Andreas Hilboll li...@hilboll.de wrote:

 Am Mi 23 Jan 2013 16:57:27 CET schrieb Anthony Scopatz:
  Hi Andreas,
 
  I think that the problem here is that coord_slice is actually a list
  of slices, which you can't index by.  (Though, you may be able to in
  numpy...)
 
  Try something like _ds[coord_slice[0]] instead.
 
  Be Well
  Anthony
 
  B eW
 
 
 
  On Tue, Jan 22, 2013 at 8:44 AM, Andreas Hilboll li...@hilboll.de
  mailto:li...@hilboll.de wrote:
 
  Hi,
 
  how can I use Python's built-in `slice` object on CArray?
  Currently, I'm
  trying
 
  In:  coord_slice
  Out: [slice(0, 31, None), slice(0, 5760, None), slice(0, 2880,
  None)]
 
  In:  _ds
  Out:  /data/mydata (CArray(31, 5760, 2880), shuffle, blosc(5)) ''
atom := Float32Atom(shape=(), dflt=0.0)
maindim := 0
flavor  := 'numpy'
byteorder := 'little'
chunkshape := (1, 45, 2880)
 
  In: _ds[coord_slice]
  Out: *** TypeError: long() argument must be a string or a number,
  not 'slice'
 
  The problem is that I want to write something generic, and I don't
  know
  beforehand how many dimensions the CArray has. My current plan is to
  create a tuple of slice objects programatically (using list
  comprehension), and then use this tuple as index. But apparently it
  doesn't work with pytables 2.3.1.
 
  Any suggestions on how to accomplish my task are greatly
  appreciated :)
 
  Cheers, Andreas.
 
 
 --
  Master Visual Studio, SharePoint, SQL, ASP.NET http://ASP.NET,
  C# 2012, HTML5, CSS,
  MVC, Windows 8 Apps, JavaScript and much more. Keep your skills
  current
  with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
  MVPs and experts. ON SALE this month only -- learn more at:
  http://p.sf.net/sfu/learnnow-d2d
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  mailto:Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 
 --
  Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
  MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
  with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
  MVPs and experts. ON SALE this month only -- learn more at:
  http://p.sf.net/sfu/learnnow-d2d
 
 
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users

 Hi Anthony,

 thanks for your input. However, I need to slice in multiple dimensions
 simultaneously, because my array is very large and I don't want to clog
 memory.

 However, I found out that it works with a tuple of slice objects, so
 _ds[tuple(coord_slice)] works as expected.

 Cheers, Andreas.


 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnnow-d2d
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

2013-01-03 Thread Anthony Scopatz
HI David,

Tables and table column iteration have been overhauled fairly recently [1].
 So you might try creating two iterators, offset by one, and then doing the
comparison.  I am hacking this out super quick so please forgive me:

from itertools import izip

with tb.openFile(...) as f:
data = f.root.data
data_i = iter(data)
data_j = iter(data)
data_i.next() # throw the first value away
for i, j in izip(data_i, data_j):
compare(i, j)

You get the idea ;)

Be Well
Anthony

1. https://github.com/PyTables/PyTables/issues/27


On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com wrote:

 I was hoping someone could help me out here.

 This is from a post I put up on StackOverflow,

 I am have a fairly large dataset that I store in HDF5 and access using
 PyTables. One operation I need to do on this dataset are pairwise
 comparisons between each of the elements. This requires 2 loops, one to
 iterate over each element, and an inner loop to iterate over every other
 element. This operation thus looks at N(N-1)/2 comparisons.

 For fairly small sets I found it to be faster to dump the contents into a
 multdimensional numpy array and then do my iteration. I run into problems
 with large sets because of memory issues and need to access each element of
 the dataset at run time.

 Putting the elements into an array gives me about 600 comparisons per
 second, while operating on hdf5 data itself gives me about 300 comparisons
 per second.

 Is there a way to speed this process up?

 Example follows (this is not my real code, just an example):

 *Small Set*:


 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)
 elements = np.empty((N_irises, 1e5))

 for ii, d in enumerate(data):
 elements[ii] = data['element']

 D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(elements[ii], elements[jj])

  *Large Set*:


 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element'][ii], data['element'][jj])



 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

2013-01-03 Thread Anthony Scopatz
Yup, that is right, thanks Josh!


On Thu, Jan 3, 2013 at 12:29 PM, Josh Ayers josh.ay...@gmail.com wrote:

 David,

 The change in issue 27 was only for iteration over a tables.Column
 instance.  To use it, tweak Anthony's code as follows.  This will iterate
 over the element column, as in your original example.

 Note also that this will only work with the development version of
 PyTables available on github.  It will be very slow using the released
 v2.4.0.


 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data.cols.element
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)


 Hope that helps,
 Josh



 On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz scop...@gmail.com wrote:

 HI David,

 Tables and table column iteration have been overhauled fairly recently
 [1].  So you might try creating two iterators, offset by one, and then
 doing the comparison.  I am hacking this out super quick so please forgive
 me:

 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)

 You get the idea ;)

 Be Well
 Anthony

 1. https://github.com/PyTables/PyTables/issues/27


 On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.comwrote:

 I was hoping someone could help me out here.

 This is from a post I put up on StackOverflow,

 I am have a fairly large dataset that I store in HDF5 and access using
 PyTables. One operation I need to do on this dataset are pairwise
 comparisons between each of the elements. This requires 2 loops, one to
 iterate over each element, and an inner loop to iterate over every other
 element. This operation thus looks at N(N-1)/2 comparisons.

 For fairly small sets I found it to be faster to dump the contents into
 a multdimensional numpy array and then do my iteration. I run into problems
 with large sets because of memory issues and need to access each element of
 the dataset at run time.

 Putting the elements into an array gives me about 600 comparisons per
 second, while operating on hdf5 data itself gives me about 300 comparisons
 per second.

 Is there a way to speed this process up?

 Example follows (this is not my real code, just an example):

 *Small Set*:



 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)
 elements = np.empty((N_irises, 1e5))

 for ii, d in enumerate(data):
 elements[ii] = data['element']

 D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(elements[ii], elements[jj])

  *Large Set*:



 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element'][ii], data['element'][jj])



 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3

2013-01-03 Thread Anthony Scopatz
On Thu, Jan 3, 2013 at 2:17 PM, David Reed david.ree...@gmail.com wrote:

 Thanks a lot for the help so far guys!

 Looking at itertools, I found what I believe to be the perfect function
 for what I need, itertools.combinations. This appears to be a valid
 replacement to the method proposed.


Yes, combinations is awesome!



 There is a small problem that I didn't mention is that my compare function
 actually takes as inputs 2 columns from the table. Like so:

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element1'][ii], 
 data['element1'][jj],data['element2'][ii],
 data['element2'][jj])

 Is there an efficient way of using itertools with this structure?


You can always make two other iterators for each column.  Since you have
two columns you would have 4 iterators.  I am not sure how fast this is
going to be but I am confident that there is definitely a way to do this in
one for-loop, which is going to be way faster than nested loops.

Be Well
Anthony




 On Thu, Jan 3, 2013 at 1:29 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 10:29:33 -0800
 From: Josh Ayers josh.ay...@gmail.com
 Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 Message-ID:
 
 cacob4anozyd7dafos7sxs07mchzb8zbripbbrvbazrv4weq...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 David,

 The change in issue 27 was only for iteration over a tables.Column
 instance.  To use it, tweak Anthony's code as follows.  This will iterate
 over the element column, as in your original example.

 Note also that this will only work with the development version of
 PyTables
 available on github.  It will be very slow using the released v2.4.0.


 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data.cols.element
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)


 Hope that helps,
 Josh



 On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz scop...@gmail.com
 wrote:

  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
  [1].  So you might try creating two iterators, offset by one, and then
  doing the comparison.  I am hacking this out super quick so please
 forgive
  me:
 
  from itertools import izip
 
  with tb.openFile(...) as f:
  data = f.root.data
  data_i = iter(data)
  data_j = iter(data)
  data_i.next() # throw the first value away
  for i, j in izip(data_i, data_j):
  compare(i, j)
 
  You get the idea ;)
 
  Be Well
  Anthony
 
  1. https://github.com/PyTables/PyTables/issues/27
 
 
  On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com
 wrote:
 
  I was hoping someone could help me out here.
 
  This is from a post I put up on StackOverflow,
 
  I am have a fairly large dataset that I store in HDF5 and access using
  PyTables. One operation I need to do on this dataset are pairwise
  comparisons between each of the elements. This requires 2 loops, one to
  iterate over each element, and an inner loop to iterate over every
 other
  element. This operation thus looks at N(N-1)/2 comparisons.
 
  For fairly small sets I found it to be faster to dump the contents
 into a
  multdimensional numpy array and then do my iteration. I run into
 problems
  with large sets because of memory issues and need to access each
 element of
  the dataset at run time.
 
  Putting the elements into an array gives me about 600 comparisons per
  second, while operating on hdf5 data itself gives me about 300
 comparisons
  per second.
 
  Is there a way to speed this process up?
 
  Example follows (this is not my real code, just an example):
 
  *Small Set*:
 
 
  with tb.openFile(h5_file, 'r') as f:
  data = f.root.data
 
  N_elements = len(data)
  elements = np.empty((N_irises, 1e5))
 
  for ii, d in enumerate(data):
  elements[ii] = data['element']
 
  D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
  for jj in xrange(ii+1, N_elements

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

2013-01-03 Thread Anthony Scopatz
Josh is right that you can just edit the code by hand (which works but
sucks).

However, on Windows -- on the rare occasion when I also have to develop on
it -- I typically use a distribution that includes a compiler, cython,
hdf5, and pytables already and then I install my development version from
github OVER this.  I recommend either EPD or Anaconda, though other
distributions listed here [1] might also work.

Be well
Anthony

1. http://numfocus.org/projects-2/software-distributions/


On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers josh.ay...@gmail.com wrote:

 The change was in pure Python code, so you should be able to just paste in
 the changes to your local copy.  Start with the table.Column.__iter__
 method (lines 3296-3310) here.


 https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py

 It needs to be modified slightly because it uses some additional features
 that aren't available in the released version (the out=buf_slice argument
 to table.read).  The following should work.

 def __iter__(self):
 table = self.table
 itemsize = self.dtype.itemsize
 nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
 max_row = len(self)
 for start_row in xrange(0, len(self), nrowsinbuf):
 end_row = min([start_row + nrowsinbuf, max_row])
 buf = table.read(start_row, end_row, 1, field=self.pathname)
 for row in buf:
 yield row


 I haven't tested this, but I think it will work.

 Josh



 On Thu, Jan 3, 2013 at 1:25 PM, David Reed david.ree...@gmail.com wrote:

 I apologize if I'm starting to sound helpless, but I'm forced to work on
 Windows 7 at work and have never had luck compiling python source
 successfully.  I have had to rely on precompiled binaries and now its
 biting me in the butt.

 Is there any quick fix I can do to improve this iteration using v2.4.0?


 On Thu, Jan 3, 2013 at 3:17 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 13:44:29 -0500
 From: David Reed david.ree...@gmail.com
 Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
 To: pytables-users@lists.sourceforge.net
 Message-ID:
 CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
 ev...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 Thanks Anthony, but unless Im missing something I don't think that method
 will work since this will only be comparing the ith element with ith+1
 element.  I still need 2 for loops right?

 Using itertools might speed things up though, I've never used them so I
 will give it a shot and let you know how it goes.  Looks like I need to
 download the latest release before I do that too.  Thanks for the help.

 -Dave



 On Thu, Jan 3, 2013 at 12:12 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

  Send Pytables-users mailing list submissions to
  pytables-users@lists.sourceforge.net
 
  To subscribe or unsubscribe via the World Wide Web, visit
  https://lists.sourceforge.net/lists/listinfo/pytables-users
  or, via email, send a message with subject or body 'help' to
  pytables-users-requ...@lists.sourceforge.net
 
  You can reach the person managing the list at
  pytables-users-ow...@lists.sourceforge.net
 
  When replying, please edit your Subject line so it is more specific
  than Re: Contents of Pytables-users digest...
 
 
  Today's Topics:
 
 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
 
 
  --
 
  Message: 1
  Date: Thu, 3 Jan 2013 11:11:47 -0600
  From: Anthony Scopatz scop...@gmail.com
  Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
  To: Discussion list for PyTables
  pytables-users@lists.sourceforge.net
  Message-ID:
  CAPk-6T5b=
  1egagp4+jhjcd3_4fnvbxrob2jbhay45rwdqzy...@mail.gmail.com
  Content-Type: text/plain; charset=iso-8859-1
 
  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
 [1].
   So you might try creating two iterators, offset by one, and then
 doing the
  comparison.  I am hacking this out

Re: [Pytables-users] pytables: could not find the HDF5 runtime

2012-12-10 Thread Anthony Scopatz
Try leaving the pytables source dir and then running then running IPython.


On Mon, Dec 10, 2012 at 9:20 AM, Jennifer Flegg jennifer.fl...@wwarn.orgwrote:

 Hi,
 I'm trying to install pytables and its proving difficult (using MAC OS
 10.6.4).
 I have installed in /usr/local/hdf5 and set the environment variable
 $HDF5_DIR to /usr/local/hdf5. When I run setup, I get a warning about
  not being able to find the HDF5 runtime.

 ndmmac149:tables-2.4.0 jflegg$ sudo python setup.py install
  --hdf5=/usr/local/hdf5
 * Found numpy 1.6.1 package installed.
 * Found numexpr 2.0.1 package installed.
 * Found Cython 0.17.2 package installed.
 * Found HDF5 headers at ``/usr/local/hdf5/include``,
  library at ``/usr/local/hdf5/lib``.
 .. WARNING:: Could not find the HDF5 runtime.
The HDF5 shared library was *not* found in the default library
paths. In case of runtime problems, please remember to install it.
 ld: library not found for -llzo2
 collect2: ld returned 1 exit status
 ld: library not found for -llzo2
 collect2: ld returned 1 exit status
 * Could not find LZO 2 headers and library; disabling support for it.
 ld: library not found for -llzo
 collect2: ld returned 1 exit status
 ld: library not found for -llzo
 collect2: ld returned 1 exit status
 * Could not find LZO 1 headers and library; disabling support for it.
 * Found bzip2 headers at ``/usr/include``, library at ``/usr/lib``.
 running install
 running build
 running build_py
 creating build
 creating build/lib.macosx-10.5-i386-2.7
 creating build/lib.macosx-10.5-i386-2.7/tables
 copying tables/__init__.py - build/lib.macosx-10.5-i386-2.7/tables
 copying tables/array.py - build/lib.macosx-10.5-i386-2.7/tables

 When I import pytables in python, I get the following error message

 In [1]: import tables
 -
 ImportError   Traceback (most recent call last)
 /Users/jflegg/ipython-input-1-389ecae14f10 in module()
  1 import tables

 /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-
 packages/tables/__init__.py in module()
  28
  29 # Necessary imports to get versions stored on the Pyrex extension

 --- 30 from tables.utilsExtension import getPyTablesVersion,
 getHDF5Version
  31
  32

 ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/7.3
 /lib/python2.7/site-packages/tables/utilsExtension.so, 2):
 Symbol not found: _H5E_CALLBACK_g  Referenced from:
 /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-
 packages/tables/utilsExtension.so
   Expected in: flat namespace
  in /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-
 packages/tables/utilsExtension.so


 Any help would be greatly appreciated.
 Jennifer





 --
 LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
 Remotely access PCs and mobile devices and provide instant support
 Improve your efficiency, and focus on delivering more value-add services
 Discover what IT Professionals Know. Rescue delivers
 http://p.sf.net/sfu/logmein_12329d2d
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] pytables: could not find the HDF5 runtime

2012-12-10 Thread Anthony Scopatz
Hi Jennifer,

Yeah, that is right, they are not in EPD Free.  However, they are in
Anaconda CE (http://continuum.io/downloads.html). Note the CE rather than
the full version.

Be Well
Anthony


On Mon, Dec 10, 2012 at 4:07 PM, Jennifer Flegg jennifer.fl...@wwarn.orgwrote:

 Hi Anthony,
 Thanks for your reply. I installed HDF5 also from source. The
 reason I'm building hdf5 and pytables myself is that they don't
 seem to be available through EPD any more (at least in the free
 version: http://www.enthought.com/products/epdlibraries.php)
 They used to both come bundled in EPD, but not anymore, which is
 a pain.
 Many thanks,
 Jennifer




 --
 LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
 Remotely access PCs and mobile devices and provide instant support
 Improve your efficiency, and focus on delivering more value-add services
 Discover what IT Professionals Know. Rescue delivers
 http://p.sf.net/sfu/logmein_12329d2d
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Error reading attribute with compound data type

2012-11-27 Thread Anthony Scopatz
On Wed, Nov 28, 2012 at 1:03 AM, Antonio Valentino 
antonio.valent...@tiscali.it wrote:

 Hi Anthony, hi dashesy,


 Il giorno 28/nov/2012, alle ore 00:57, Anthony Scopatz scop...@gmail.com
 ha scritto:

  This [1] seems to indicate that this kind of thing should be supported
 via numpy structured arrays.  However, I bet that this data set did not
 start out as a numpy structured array.  This might explain the problem if
 the flavor is wrong.  I would think that a fix should be relatively easy.
 
  Be Well
  Anthony
 
  1.
 http://pytables.github.com/usersguide/libref/declarative_classes.html?highlight=attr#the-attributeset-class
 

 I'm not sure that PyTables is able to handle variable length strings in
 compound data types at the moment.


Oops, I didn't notice that...  Antonio is right, the variable length part
of this is probably your issue.



 
  On Tue, Nov 27, 2012 at 5:17 PM, dashesy dash...@gmail.com wrote:
  I have a file that has attributes with nested compound type, when
  reading it with PyTables 2.4.0 I get this error:
 
  C:\Python27\lib\site-packages\tables\attributeset.py:293:
  DataTypeWarning: Unsupported type for attribute 'BmiRoot' in node '/'.
  Offending HDF5 class: 6
value = self._g_getAttr(self._v_node, name)
  C:\Python27\lib\site-packages\tables\attributeset.py:293:
  DataTypeWarning: Unsupported type for attribute 'BmiChanExt' in node
  'channel1'. Offending HDF5 class: 6
value = self._g_getAttr(self._v_node, name)
 

 Yes, it is not clear

  Hard to say what exactly happens, just wanted to know if this is not
  already fixed in newer versions I will be more than happy to work on
  it, any pointers as to where to look is appreciated.
 

 I don't thing that there are changes that can impact on this issue.
 Anyway you can give a try to the development branch [1]

 Any help is very appreciated

 [1] https://github.com/PyTables/PyTables


  Here is the (partial) dump of the file (for brevity I deleted
  non-related data parts but can provide the full file if needed):
 
  HDF5 pause5-10-5.ns2.h5 {
  GROUP / {
 ATTRIBUTE BmiRoot {
DATATYPE  /BmiRootAttr_t
DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): {
  1,
  0,
  0,
  1,
  2008-12-02 22:57:02.251000,
  1 kS/s,
  
   }
}
 }
 DATATYPE BmiRootAttr_t H5T_COMPOUND {
H5T_STD_U32LE MajorVersion;
H5T_STD_U32LE MinorVersion;
H5T_STD_U32LE Flags;
H5T_STD_U32LE GroupCount;
H5T_STRING {
   STRSIZE H5T_VARIABLE;
   STRPAD H5T_STR_NULLTERM;
   CSET H5T_CSET_ASCII;
   CTYPE H5T_C_S1;
} Date;
H5T_STRING {
   STRSIZE H5T_VARIABLE;
   STRPAD H5T_STR_NULLTERM;
   CSET H5T_CSET_ASCII;
   CTYPE H5T_C_S1;
} Application;
H5T_STRING {
   STRSIZE H5T_VARIABLE;
   STRPAD H5T_STR_NULLTERM;
   CSET H5T_CSET_ASCII;
   CTYPE H5T_C_S1;
} Comment;
 }
 GROUP channel {
DATATYPE BmiChanAttr_t H5T_COMPOUND {
   H5T_STD_U16LE ID;
   H5T_IEEE_F32LE Clock;
   H5T_IEEE_F32LE SampleRate;
   H5T_STD_U8LE SampleBits;
}
DATATYPE BmiChanExt2Attr_t H5T_COMPOUND {
   H5T_STD_I32LE DigitalMin;
   H5T_STD_I32LE DigitalMax;
   H5T_STD_I32LE AnalogMin;
   H5T_STD_I32LE AnalogMax;
   H5T_STRING {
  STRSIZE 16;
  STRPAD H5T_STR_NULLTERM;
  CSET H5T_CSET_ASCII;
  CTYPE H5T_C_S1;
   } AnalogUnit;
}
DATATYPE BmiChanExtAttr_t H5T_COMPOUND {
   H5T_IEEE_F64LE NanoVoltsPerLSB;
   H5T_COMPOUND {
  H5T_STD_U32LE HighPassFreq;
  H5T_STD_U32LE HighPassOrder;
  H5T_STD_U16LE HighPassType;
  H5T_STD_U32LE LowPassFreq;
  H5T_STD_U32LE LowPassOrder;
  H5T_STD_U16LE LowPassType;
   } Filter;
   H5T_STD_U8LE PhysicalConnector;
   H5T_STD_U8LE ConnectorPin;
   H5T_STRING {
  STRSIZE H5T_VARIABLE;
  STRPAD H5T_STR_NULLTERM;
  CSET H5T_CSET_ASCII;
  CTYPE H5T_C_S1;
   } Label;
}
DATATYPE BmiChanFiltAttr_t H5T_COMPOUND {
   H5T_STD_U32LE HighPassFreq;
   H5T_STD_U32LE HighPassOrder;
   H5T_STD_U16LE HighPassType;
   H5T_STD_U32LE LowPassFreq;
   H5T_STD_U32LE LowPassOrder;
   H5T_STD_U16LE LowPassType;
}
GROUP channel1 {
   ATTRIBUTE BmiChan {
  DATATYPE  /channel/BmiChanAttr_t
  DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
  DATA {
  (0): {
1,
3,
1000,
16

Re: [Pytables-users] Histogramming 1000x too slow

2012-11-25 Thread Anthony Scopatz
On Mon, Nov 19, 2012 at 12:59 PM, Jon Wilson j...@fnal.gov wrote:

  Hi Anthony,




 On 11/17/2012 11:49 AM, Anthony Scopatz wrote:

  Hi Jon,

  Barring changes to numexpr itself, this is exactly what I am suggesting.
  Well,, either writing one query expr per bin or (more cleverly) writing
 one expr which when evaluated for a row returns the integer bin number (1,
 2, 3,...) this row falls in.  Then you can simply count() for each bin
 number.

  For example, if you wanted to histogram data which ran from [0,100] into
 10 bins, then the expr r/10 into a dtype=int would do the trick.  This
 has the advantage of only running over the data once.  (Also, I am not
 convinced that running over the data multiple times is less efficient than
 doing row-based iteration.  You would have to test it on your data to find
 out.)


  It is a reduction operation, and would greatly benefit from chunking, I
 expect. Not unlike sum(), which is implemented as a specially supported
 reduction operation inside numexpr (buggily, last I checked). I suspect
 that a substantial improvement in histogramming requires direct support
 from either pytables or from numexpr. I don't suppose that there might be a
 chunked-reduction interface exposed somewhere that I could hook into?


  This is definitively as feature to request from numexpr.

 I've been fiddling around with Stephen's code a bit, and it looks like the
 best way to do things is to read chunks (whether exactly of table.chunksize
 or not is a matter for optimization) of the data in at a time, and create
 histograms of those chunks.  Then combining the histograms is a trivial sum
 operation.  This type of approach can be generically applied in many cases,
 I suspect, where row-by-row iteration is prohibitively slow, but the
 dataset is too large to fit into memory.  As I understand, this idea is the
 primary win of PyTables in the first place!

 So, I think it would be extraordinarily helpful to provide a
 chunked-iteration interface for this sort of use case.  It can be as simple
 as a wrapper around Table.read():

 class Table:
 def chunkiter(self, field=None):
 while n*self.chunksize  self.nrows:
 yield self.read(n*self.chunksize, (n+1)*self.chunksize,
 field=field)

 Then I can write something like
 bins = linspace(-1,1, 101)
 hist = sum(histogram(chunk, bins=bins) for chunk in
 mytable.chunkiter(myfield))

 Preliminary tests seem to indicate that, for a table with 1 column and 10M
 rows, reading in chunks of 10x chunksize gives the best
 read-time-per-row.  This is perhaps naive as regards chunksize black magic,
 though...


Hello Jon,

Sorry about the slow reply, but I think that what is proposed in issue #27
[1] would solve the above by default, right?  Maybe you could pull Josh's
code and test it on the above example to make sure.  And then we could go
ahead and merge this in :).


 And of course, if implemented by numexpr, it could benefit from the nice
 automatic multithreading there.


This would be nice, but as you point out, not totally necessary here.



 Also, I might dig in a bit and see about extending the field argument to
 read so it can read multiple fields at once (to do N-dimensional
 histograms), as you suggested in a previous mail some months ago.


Also super cool, but not immediate ;)

Be Well
Anthony

1. https://github.com/PyTables/PyTables/issues/27


 Best Regards,
 Jon

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] What is the best way to copy a table from one file to another?

2012-11-08 Thread Anthony Scopatz
Hey Aquil,

I think File.copyNode() [1] with the newparent argument as group on another
file will do what you want.

Be Well
Anthony

1.
http://pytables.github.com/usersguide/libref/file_class.html?highlight=copy#tables.File.copyNode


On Thu, Nov 8, 2012 at 10:02 AM, Aquil H. Abdullah aquil.abdul...@gmail.com
 wrote:

 I create the tables in an HDF5 file from three different python processes.
 I needed to modify one of the processes, but not the others. Is there an
 easy way to copy the two tables that did not change to the new file?

 --
 Aquil H. Abdullah
 I never think of the future. It comes soon enough - Albert Einstein



 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_nov
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Can PyTable use 7Zip

2012-11-08 Thread Anthony Scopatz
Hello Jim,

The major hurdle here is exposing 7Zip to HDF5.  Luckily it appears as if
this may have been taken care of for you by the HDF-group already [1].  You
should google around to see what has already been done and how hard it is
to install.  The next step is to expose this as a compression option for
filters [2].  I am fairly certain that this is just a matter of adding a
simple flag and making sure 7Zip works if available.  This should not be
too difficult at all and we would happily consider/review any pull request
that implemented this.  Barring any major concerns, I feel that it would
likely be accepted.

Be Well
Anthony

1.
http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.6/hdf5-1.6.7/src/unpacked/release_docs/INSTALL_Windows_From_Command_Line.txt
2.
http://pytables.github.com/usersguide/libref/helper_classes.html#the-filters-class


On Thu, Nov 8, 2012 at 9:52 PM, Jim Knoll jim.kn...@spottradingllc.comwrote:

   I would like to squeeze out as much compression as I can get.  I do not
 mind spending time on the front end as long as I do not kill my read
 performance.  Seems like 7Zip is well suited to my data.  Is it possible to
 have 7Zip used as the native internal compression for a pytable?

 ** **

 If not now hard would it be to add this option?


 --

 *Jim Knoll* *
  **Data Developer*

  Spot Trading L.L.C
  440 South LaSalle St., Suite 2800
  Chicago, IL 60605
  Office: 312.362.4550
  Direct: 312-362-4798
  Fax: 312.362.4551
  jim.kn...@spottradingllc.com
  www.spottradingllc.com
 --

 The information contained in this message may be privileged and
 confidential and protected from disclosure. If the reader of this message
 is not the intended recipient, or an employee or agent responsible for
 delivering this message to the intended recipient, you are hereby notified
 that any dissemination, distribution or copying of this communication is
 strictly prohibited. If you have received this communication in error,
 please notify us immediately by replying to the message and deleting it
 from your computer. Thank you. Spot Trading, LLC




 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_nov
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] pyTable index from c++

2012-11-08 Thread Anthony Scopatz
On Thu, Nov 8, 2012 at 10:19 PM, Jim Knoll jim.kn...@spottradingllc.comwrote:

   I love the index function and promote the internal use of PyTables at
 my company.  The availability of a indexed method to speed the search is
 the main reason why.

 ** **

 We are a mixed shop using c++ to create H5 (just for the raw speed …  need
 to keep up with streaming data)  End users start with python pyTables to
 consume the data.  (Often after we have created indexes from python
 pytables.col.col1.createIndex())   

 ** **

 Sometimes the users come up with something we want to do thousands of
 times and performance is critical.  But then we are falling back to c++ We
 can use our own index method but would like to make dbl use of the PyTables
 index.  

 ** **

 I know the python table.where(   is implemented in C.


Hi Jim,

This is only kind of true.  Querying (ie all of the where*() methods) are
actually mostly written in Python in the tables.py and expressions.py
files.  However, they make use of numexpr [1].


  

 ** Is there a way to access that from c or c++?Don’t mind if I need
 to do work to get the result I think in my case the work may be worth it.


*PLAN 1:* One possibility is that the parts of PyTables are written in
Cython.  We could maybe try (without making any edits to these files) to
convert them to Cython.  This has the advantage that for Cython files, if
you write the appropriate C++ header file and link against the shared
library correctly, it is possible to access certain functions from C/C++.
BUT, I am not sure how much of speed boost you would get out of this since
you would still be calling out to the Python interpreter to get these
result.  You are just calling Python's virtual machine from C++ rather than
calling it from Python (like normal).   This has the advantage that you
would basically get access to these functions acting on tables from C++.

*PLAN 2:* Alternatively, numexpr itself is mostly written in C++ already.
 You should be able to call core numexpr functions directly.  However, you
would have to feed it data that you read from the tables yourself.  These
could even be table indexes.  On a personal note, if you get code working
that does this, I would be interested in seeing your implementation.  (I
have another project where I have tables that I want to query from C++)

Let us know what route you ultimately end up taking or if you have any
further questions!

Be Well
Anthony

1. http://code.google.com/p/numexpr/source/browse/#hg%2Fnumexpr




 --

 *Jim Knoll* *
  **Data Developer*

  Spot Trading L.L.C
  440 South LaSalle St., Suite 2800
  Chicago, IL 60605
  Office: 312.362.4550
  Direct: 312-362-4798
  Fax: 312.362.4551
  jim.kn...@spottradingllc.com
  www.spottradingllc.com
 --

 The information contained in this message may be privileged and
 confidential and protected from disclosure. If the reader of this message
 is not the intended recipient, or an employee or agent responsible for
 delivering this message to the intended recipient, you are hereby notified
 that any dissemination, distribution or copying of this communication is
 strictly prohibited. If you have received this communication in error,
 please notify us immediately by replying to the message and deleting it
 from your computer. Thank you. Spot Trading, LLC




 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_nov
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Is it possible to manipulate a VariableNode in a query?

2012-10-30 Thread Anthony Scopatz
Hello Aquil,

Unfortunately, You currently cannot use indexing in queries (ie symbol[:3]
== x) and may only use the whole variable (symbol == x.  This is a
limitation of numexpr.  Please file a ticket with them, if you would like
to see this changed.  Sorry!

Be Well
Anthony

On Tue, Oct 30, 2012 at 10:44 AM, Aquil H. Abdullah 
aquil.abdul...@gmail.com wrote:

  Hello All,

 I am querying a table that has a field with a string value. I would like
 to determine if the string matches a pattern. Is there a simple way to do
 that through readWhere and the condition syntax?  None of the following
 work, but I was wondering if it were possible to do something similar:

 table.readWhere('CLZ' in field') or table.readWhere('symbol[:3] == CLZ')

 Thanks!

 --
 Aquil H. Abdullah
 I never think of the future. It comes soon enough - Albert Einstein



 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Large (to very large) datasets...

2012-10-30 Thread Anthony Scopatz
Hi Andrea,

Your problem is two fold.

1. Your timing wasn't reporting the time per data set, but rather the total
time since writing all data sets.  You need to put the start time in the
loop to get the time per data set.

2. Your larger problem was that you were writing too many times.  Generally
it is faster to write fewer, bigger sets of data than performing a lot of
small write operations.  Since you had data set opening and writing in a
doubly nested loop, it is not surprising that you were getting
terrible performance.   You were basically maximizing HDF5 overhead ;).
 Using slicing I removed the outermost loop and saw timings like the
following:

H5 file creation time: 7.406

Saving results for table: 0.0105440616608
Saving results for table: 0.0158948898315
Saving results for table: 0.0164661407471
Saving results for table: 0.00654292106628
Saving results for table: 0.00676298141479
Saving results for table: 0.00664114952087
Saving results for table: 0.0066990852356
Saving results for table: 0.00687289237976
Saving results for table: 0.00664210319519
Saving results for table: 0.0157809257507
Saving results for table: 0.0141618251801
Saving results for table: 0.00796294212341

Please see the attached version, at around line 82.  Additionally, if you
need to focus on performance I would recommend reading the following (
http://pytables.github.com/usersguide/optimization.html).  PyTables can
be blazingly fast when implemented correctly.  I would highly recommend
looking into compression.

I hope this helps!
Be Well
Anthony

On Tue, Oct 30, 2012 at 4:55 PM, Andrea Gavana andrea.gav...@gmail.comwrote:

 Hi All,

 I am pretty new to pytables and I am facing a problem of actually
 storing and retrieving data to/from a large dataset. My situation is
 the following:

 1. I am running stochastic simulations of a number of objects
 (typically between 100-1,000 simulations);
 2. For every simulation, I have around 1,200 objects, and for each
 of them I have 7 timeseries of 600 time-steps each.

 I thought of using pytables to try and get some sense out of my
 simulations, but I am failing to implement something intelligent (or
 fast, which is important as well...).

 The attached script (modified from the pytables tutorial) does the
 following:

 1. Create a table containing these objects;
 2. Adds 1,200 rows, one per object: for each object, I assign a 3D
 array defined as:

 results = Float32Col(shape=(NUM_SIM, len(ALL_DATES), 7))

 Where NUM_SIM is the number of simulations and ALL_DATES are the timesteps.

 3. For every simulation, I update the object results (using random
 numbers in the script).

 The timings on my computer are as follows (in seconds):

 H5 file creation time: 22.510

 Saving results for simulation 1   : 3.3356567
 Saving results for simulation 2   : 6.2429997921
 Saving results for simulation 3   : 9.1515041
 Saving results for simulation 4   : 12.075752
 Saving results for simulation 5   : 15.217902
 Saving results for simulation 6   : 17.9159998894
 Saving results for simulation 7   : 21.065847
 Saving results for simulation 8   : 23.645084
 Saving results for simulation 9   : 26.5359997749
 Saving results for simulation 10  : 29.5579998493

 As you can see, at every simulation the processing time increases by 3
 seconds, so by the time I get to 100 or 1,000 I will have more than
 enough time for 15 coffees in the morning :-D
 Also, the file creation time is somewhat on the slow side...

 I am sure I am missing a lot of things here, so I would appreciate any
 suggestion to implement my code in a better/more intelligent way (and
 also suggestions on other approaches in order to do what I am trying
 to do).

 Thank you in advance for your suggestions.

 Andrea.

 Imagination Is The Only Weapon In The War Against Reality.
 http://www.infinity77.net


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




pytables_test.py
Description: Binary data
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Large (to very large) datasets...

2012-10-30 Thread Anthony Scopatz
On Tue, Oct 30, 2012 at 6:20 PM, Andrea Gavana andrea.gav...@gmail.comwrote:

 Hi Anthony,

 On 30 October 2012 22:52, Anthony Scopatz wrote:
  Hi Andrea,
 
  Your problem is two fold.
 
  1. Your timing wasn't reporting the time per data set, but rather the
 total
  time since writing all data sets.  You need to put the start time in the
  loop to get the time per data set.
 
  2. Your larger problem was that you were writing too many times.
  Generally
  it is faster to write fewer, bigger sets of data than performing a lot of
  small write operations.  Since you had data set opening and writing in a
  doubly nested loop, it is not surprising that you were getting terrible
  performance.   You were basically maximizing HDF5 overhead ;).  Using
  slicing I removed the outermost loop and saw timings like the following:
 
  H5 file creation time: 7.406
 
  Saving results for table: 0.0105440616608
  Saving results for table: 0.0158948898315
  Saving results for table: 0.0164661407471
  Saving results for table: 0.00654292106628
  Saving results for table: 0.00676298141479
  Saving results for table: 0.00664114952087
  Saving results for table: 0.0066990852356
  Saving results for table: 0.00687289237976
  Saving results for table: 0.00664210319519
  Saving results for table: 0.0157809257507
  Saving results for table: 0.0141618251801
  Saving results for table: 0.00796294212341
 
  Please see the attached version, at around line 82.  Additionally, if you
  need to focus on performance I would recommend reading the following
  (http://pytables.github.com/usersguide/optimization.html).  PyTables
 can be
  blazingly fast when implemented correctly.  I would highly recommend
 looking
  into compression.
 
  I hope this helps!

 Thank you for your answer; indeed, I was timing it wrongly (I really
 need to go to sleep...). However, although I understand the need of
 writing fewer, I am not sure I can actually do it in my situations.
 Let me explain:

 1. I have a GUI which starts a number of parallel processes (up to 16,
 depending on a user selection);
 2. These processes actually do the computation/simulations - so, if I
 have 1,000 simulations to run and 8 parallel processes, each process
 gets 125 simulations (each of which holds 1,200 objects with a 600x7
 timeseries matrix per object).


Well, you can at least change the order of the loops and see if that helps.
That is rather than doing:

for i in xrange():
for p in table:

Do the following instead:

for p in table:
for i in xrange():

I don't believe that this will help too much since you are still writing
every element individually..



 If I had to write out the results only at the end, it would mean for
 me to find a way to share the 1,200 objects matrices in all the
 parallel processes (and I am not sure if pytables is going to complain
 when multiple concurrent processes try to access the same underlying
 HDF5 file).


Reading in parallel works pretty well.  Writing causes more headaches
but can be done.


 Or I could create one HDF file per process, but given the nature of
 the simulation I am running, every object in the 1,200 objects
 pool would need to keep a reference to a 125x600x7 matrix (assuming
 1,000 simulations and 8 processes) around in memory *OR* I will need
 to write the results to the HDF5 file for every simulation. Although
 we have extremely powerful PCs at work, I am not sure it is the right
 way to go...

 As always, I am open to all suggestions on how to improve my approach.


My basic suggestion is to have all of you processes produce results which
are then
aggregated by a single master process.  This master is the only one which
has write
access to the hdf5 file and will allow you to create larger arrays and
minimize the
number of writes that you do.

You'll probably want to take a look at this example:
https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py

I think that there might be a page in the docs about it now too...

But I think that this is the strategy that you want to pursue.  Multiple
compute processes, one write process.



 Thank you again for your quick and enlightening answer.


No problem!
Be Well
Anthony




 Andrea.

 Imagination Is The Only Weapon In The War Against Reality.
 http://www.infinity77.net

 # - #
 def ask_mailing_list_support(email):

 if mention_platform_and_version() and include_sample_app():
 send_message(email)
 else:
 install_malware()
 erase_hard_drives()
 # - #


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Pytables-users mailing list
 Pytables-users

Re: [Pytables-users] Numpy data and Pytable array (error: IndexError: tuple index out of range)

2012-10-29 Thread Anthony Scopatz
Hello Jack,

I am not really sure what is going wrong because you did not post the
full code where the exception is happening.  However, this error seems
to be because the pnts array is one dimensional. (Which is why pnts.shape
has a length of 1.)  You could verify this by printing out pnts right
before the
line that fails.

Also, why are you using ctypes?  This seems wrong...

Be Well
Anthony

On Sun, Oct 28, 2012 at 9:25 PM, JACK young.2...@yahoo.com wrote:



 Hi all,

 I am new to python and pytables. Currently I am writing a project about
 clustering and KNN algorithm. That is what I have got.

 **   code  ***

 import numpy.random as npr
 import numpy as np

 #step0: obtain the cluster
 dtype = np.dtype('f4')

 pnts_inds = np.arange(100)
 npr.shuffle(pnts_inds)
 pnts_inds = pnts_inds[:10]
 pnts_inds = np.sort(pnts_inds)
 for i,ind in enumerate(pnts_inds):
 clusters[i] = pnts_obj[ind]

 #step1: save the result to a HDF5 file called clst_fn.h5

 filters = tables.Filters(complevel = 1, complib = 'zlib')
 clst_fobj = tables.openFile('clst_fn.h5', 'w')
 clst_obj = clst_fobj.createCArray(clst_fobj.root, 'clusters',
tables.Atom.from_dtype(dtype), clusters.shape,
filters = filters)
 clst_obj[:] = clusters
 clst_fobj.close()

 #step2: other function
 blabla

 #step3: load the cluster from clst_fn

 pnts_fobj= tables.openFile('clst_fn.h5','r')
 for pnts in pnts_fobj.walkNodes('/', classname = 'Array'):
 break

 #
 #step4: evoke another function (called knn). The function input argument
 is the
 #data from pnts. I have checked the knn function individually. This
 function
 #works well if the input is pnts  = npr.rand(100,128)

 def knn(pnts):
 pnts = numpy.ascontiguousarray(pnts)
 N = ctypes.c_uint(pnts.shape[0])
 D = ctypes.c_uint(pnts.shape[1])
 #

 # evoke knn using the cluster from clst_fn (see step 3)
 knn(pnts)


 **   end of code   ***

 My problem now is that python is giving me a hard time by showing:
 error: IndexError: tuple index out of range
 This error comes from
 D = ctypes.c_uint(pnts.shape[1])   this line.

 Obviously, there must be something wrong with the input argument. Any
 thought
 about fixing the problem? Thank you in advance.






 --
 The Windows 8 Center - In partnership with Sourceforge
 Your idea - your app - 30 days.
 Get started!
 http://windows8center.sourceforge.net/
 what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
The Windows 8 Center - In partnership with Sourceforge
Your idea - your app - 30 days.
Get started!
http://windows8center.sourceforge.net/
what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Tutorial at PyData Conference New York

2012-10-27 Thread Anthony Scopatz
Great!  Thanks Francesc!

On Sat, Oct 27, 2012 at 6:16 AM, Francesc Alted fal...@gmail.com wrote:

 Hi,

 You may be interested on my IPython notebooks and slides for the
 conference:

 http://pytables.org/download/PyData2012-NYC.tar.gz
 PyData-NYC-2012-v3.pptx http://www.pytables.org/docs/PyData2012-NYC.pdf

 [BTW this time I felt in love with IPython notebook: it is great!]

 Unfortunately, I had only 45 minutes for the presentation, so I have not
 been able to show the PyTables files samples that some of you kindly
 send to me (but I'll keep them for the future, one never knows!).

 --
 Francesc Alted



 --
 WINDOWS 8 is here.
 Millions of people.  Your app in 30 days.
 Visit The Windows 8 Center at Sourceforge for all your go to resources.
 http://windows8center.sourceforge.net/
 join-generation-app-and-make-money-coding-fast/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
WINDOWS 8 is here. 
Millions of people.  Your app in 30 days.
Visit The Windows 8 Center at Sourceforge for all your go to resources.
http://windows8center.sourceforge.net/
join-generation-app-and-make-money-coding-fast/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Tutorial at PyData Conference New York

2012-10-27 Thread Anthony Scopatz
On Sat, Oct 27, 2012 at 11:21 AM, Antonio Valentino 
antonio.valent...@tiscali.it wrote:

 Hi Francesc,
 congratulations!

 Il 27/10/2012 13:16, Francesc Alted ha scritto:
  Hi,
 
  You may be interested on my IPython notebooks and slides for the
 conference:
 
  http://pytables.org/download/PyData2012-NYC.tar.gz
  PyData-NYC-2012-v3.pptx http://www.pytables.org/docs/PyData2012-NYC.pdf
 
  [BTW this time I felt in love with IPython notebook: it is great!]

 yes, the IPython notebuok is fantastic!

 ... and the idea of saving tutorials into notebook files is very very
 nice :))

 Maybe we could provide notebook files for all tutorials in the official
 doc.


+1




 ciao

 --
 Antonio Valentino


 --
 WINDOWS 8 is here.
 Millions of people.  Your app in 30 days.
 Visit The Windows 8 Center at Sourceforge for all your go to resources.
 http://windows8center.sourceforge.net/
 join-generation-app-and-make-money-coding-fast/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
WINDOWS 8 is here. 
Millions of people.  Your app in 30 days.
Visit The Windows 8 Center at Sourceforge for all your go to resources.
http://windows8center.sourceforge.net/
join-generation-app-and-make-money-coding-fast/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables data files for a tutorial

2012-10-21 Thread Anthony Scopatz
Hello Francesc,

I look forward to your pydata hearing how your tutorial goes!

Here [1] is a file that stores some basic nuclear data that is freely
redistributable.  It stores atomic weights, bound neutron scattering
lengths, and pre-compiled neutron cross sections (xs) for 5 different
energy regimes.  Everything in here is a table.  The file is rather
(at about 165 kb).  There are integer, float, and complex columns.

I hope that this helps!

Be Well
Anthony

1. https://s3.amazonaws.com/pyne/prebuilt_nuc_data.h5

On Sun, Oct 21, 2012 at 10:41 AM, Francesc Alted fal...@pytables.orgwrote:

 Hi,

 I'm going to give a tutorial on PyTables next Thursday during the PyData
 conference in New York (http://nyc2012.pydata.org/) and I'd like to use
 some real life data files.  So, if you have some public repository with
 data generated with PyTables, please tell me.  I'm looking for files
 that are not very large ( 1GB), and that use the Table object
 significantly.  A small description of the data included will be more
 that welcome too!

 Thanks!

 --
 Francesc Alted



 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] 10 years of PyTables

2012-10-21 Thread Anthony Scopatz
Congrats Francesc!

It is a testament to how useful PyTables is that it is still around and
going strong!

Personally, I know that your fast, polite, and in-depth responses on the
mailing list have made PyTables the great resources that it is.
 Additionally,
it has served as a model to me for how open source projects *should* be run!

I'd also really like to thank Antonio for driving new features into the
code base!

If only we were all on the same continent, we could have a PyTables birthday
party or something...

Be Well
Anthony

On Sun, Oct 21, 2012 at 10:26 AM, Francesc Alted fal...@pytables.orgwrote:

 Hi!,

 This month PyTables celebrates the 10th anniversary of its first public
 release:

 http://osdir.com/ml/python.scientific.user/2002-10/msg00043.html

 There one can read that very new features of Python like generators and
 metaclasses were leveraged.  Even that a nascent Pyrex (the antecessor
 of Cython) was used for the extensions.  Oh, what memories!

 The original text below:

 -
 Hi!,

 PyTables is a Python package which allows dealing with HDF5 tables. Such
 a table is defined as a collection of records whose values are stored in
 fixed-length fields. PyTables is intended to be easy-to-use, and tries to
 be a high-performance interface to HDF5. To achieve this, the newest
 improvements introduced in Python 2.2 (like generators or slots and
 metaclasses in new-brand classes) has been used. Pyrex creation extension
 tool has been chosen to access the HDF5 library.

 This package should be platform independent, but until now I've tested it
 only with Linux. It's the first public release (v 0.1), and it is in
 alpha state.

 You can get it from:

 http://sourceforge.net/projects/pytables/

 There is still not a project home page. Perhaps in next days.

 Feedback welcome.!

 --
 Francesc Alted PGP KeyID: 0x61C8C11F
 Scientific aplications developer
 Public PGP key available:http://www.openlc.org/falted_at_openlc.asc
 Key fingerprint = 1518 38FE 3A3D 8BE8 24A0 3E5B 1328 32CC 61C8 C11F


 --
 Francesc Alted



 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] multiprocessing and pytables

2012-10-15 Thread Anthony Scopatz
Hello Ernesto,

So you are actually asking two different questions, one on reading and the
other on writing.  In general reading, or querying,
with multiprocessing works very well.  Writing to a single file with
multiple processes is destined to failure though.  So the strategy that
many people have adopted is to have multiple processes create the data and
then have a master process which acts as a queue for writing out the data.
 Please see the example here for more inspiration [1].  Note that we have
been having problems recently with multiprocess writing out to multiple
files, but that is not what you want to do.

Be Well
Anthony

1.
https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py

On Mon, Oct 15, 2012 at 11:45 AM, Ernesto Picardi e.pica...@unical.itwrote:

 Dear all,

 I have a hdf5 file including several tables. To speed up the creation of
 all tables, could I create each individual table by independent processes
 launched by multiprocessing module? Could I employ independent processes to
 query diverse tables of the same hdf5 file?

 Thank you very much in advance for whatever answer.

 Regards,

 Ernesto







  Riservatezza / Confidentiality 
 In ottemperanza al D.Lgs. n. 196 del 30/6/2003 in materia di protezione
 dei dati personali, le informazioni contenute in questo messaggio sono
 strettamente riservate ed esclusivamente indirizzate al destinatario
 indicato (oppure alla persona responsabile di  rimetterlo al destinatario).
 Vogliate tener presente che qualsiasi uso, riproduzione o divulgazione di
 questo messaggio e' vietato. Nel caso in cui aveste ricevuto questo
 messaggio per errore, vogliate cortesemente avvertire il mittente e
 distruggere il presente  messaggio.



 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Closing Read-Only Files

2012-10-12 Thread Anthony Scopatz
On Fri, Oct 12, 2012 at 8:47 AM, Aquil H. Abdullah aquil.abdul...@gmail.com
 wrote:

  I have a process that uses PyTabels and opens a bunch of HDF5 files in
 read-only mode. I know that if I don't close these files that AtExit hook
 will close the open files and display the message:

 Closing remaining open files:

 My question is simple is it possible for me to run into any corruption
 issues by not explicitly closing files that have been opened in read-only
 mode?


Hello Aquil,

I don't think that you will have any issues with doing this.  However, I
would just go ahead and close all of the files anyway.  The 'with'
statement is great for that.  Also, recall line 2 of the Zen of Python:
Explicit is better than implicit.

Be Well
Anthony



 --
 Aquil H. Abdullah
 I never think of the future. It comes soon enough - Albert Einstein



 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables hangs while opening file in worker process

2012-10-11 Thread Anthony Scopatz
Hmm sorry to hear that Owen Let me know how it goes.

On Thu, Oct 11, 2012 at 11:07 AM, Owen Mackwood 
owen.mackw...@bccn-berlin.de wrote:

 Hi Anthony,

 I tried your suggestion and it has not solved the problem. It could be
 that it makes the problem go away in the test code because it changes the
 timing of the processes. I'll see if I can modify the test code to
 reproduce the hang even with reloading the tables module.

 Regards,
 Owen


 On 10 October 2012 22:00, Anthony Scopatz scop...@gmail.com wrote:

 So Owen,

 I am still not sure what the underlying problem is, but I altered your
 parallel function to forciably reload pytables each time it is called.
  This seemed to work perfectly on my larger system but not at all on my
 smaller one.  If there is a way that you can isolate pytables and not
 import it globally at all, it might work even better.  Below is the code
 snippet.  I hope this helps.

 Be Well
 Anthony

 def run_simulation_single((paramspace_pt, params)):
 import sys
 rmkeys = [key for key in sys.modules if key.startswith('tables')]
 for key in rmkeys:
 del sys.modules[key]
 import traceback
 import tables
 try:
 filename = params['results_file']




 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables hangs while opening file in worker process

2012-10-10 Thread Anthony Scopatz
Hi Owen,

So just to confirm this behavior, having run your sample on a couple of my
machines, what you see is that the code looks like it gets all the way to
the end, and then it stalls right before it is about to exit, leaving some
small number of processes (here names python tables_test.py) in the OS.  Is
this correct?

It seems to be the case that these failures do not happen when I set the
processor pool size to be less than or equal to the number of processors
(physical or hyperthreaded) that I have on the machine.  I was testing this
both on an 32 proc cluster and my dual core laptop.  Is this also
the behavior you have seen?

Be Well
Anthony

On Tue, Oct 9, 2012 at 8:08 AM, Owen Mackwood
owen.mackw...@bccn-berlin.dewrote:

 Hi Anthony,

 I've created a reduced example which reproduces the error. I suppose the
 more processes you can run in parallel the more likely it is you'll see the
 hang. On a machine with 8 cores, I see 5-6 processes hang out of 2000.

 All of the hung tasks had a call stack that looked like this:

 #0  0x7fc8ecfd01fc in pthread_cond_wait@@GLIBC_2.3.2 () from
 /lib/libpthread.so.0
 #1  0x7fc8ebd9d215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
 #2  0x7fc8ebaacff0 in H5open () from /usr/lib/libhdf5.so.6
 #3  0x7fc8e224c6a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
 (__pyx_v_self=0x28b35a0, __pyx_args=value optimized out,
 __pyx_kwds=value optimized out) at tables/hdf5Extension.c:2820
 #4  0x004abf62 in ext_do_call (f=0x271f4c0, throwflag=value
 optimized out) at Python/ceval.c:4331
 #5  PyEval_EvalFrameEx (f=0x271f4c0, throwflag=value optimized out) at
 Python/ceval.c:2705
 #6  0x004ada51 in PyEval_EvalCodeEx (co=0x247aeb0, globals=value
 optimized out, locals=value optimized out, args=0x288cea0, argcount=0,
 kws=value optimized out, kwcount=0,
 defs=0x25ffd78, defcount=4, closure=0x0) at Python/ceval.c:3253

 I've attached the code to reproduce this. It probably isn't quite minimal,
 but it is reasonably simple (and stereotypical of the kind of operations I
 use). Let me know if you need anything else, or have questions about my
 code.

 Regards,
 Owen



 On 8 October 2012 17:37, Anthony Scopatz scop...@gmail.com wrote:

 Hello Owen,

 So __getitem__() calls read() on the items it needs.  Both should return
 a copy in-memory of the data that is on disk.

 Frankly, I am not really sure what is going on, given what you have said.
  A minimal example which reproduces the error would be really helpful.
  From the error that you have provided, though, the only thing that I can
 think of is that it is related to file opening on the worker processes.

 Be Well
 Anthony




 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables hangs while opening file in worker process

2012-10-10 Thread Anthony Scopatz
So Owen,

I am still not sure what the underlying problem is, but I altered your
parallel function to forciably reload pytables each time it is called.
 This seemed to work perfectly on my larger system but not at all on my
smaller one.  If there is a way that you can isolate pytables and not
import it globally at all, it might work even better.  Below is the code
snippet.  I hope this helps.

Be Well
Anthony

def run_simulation_single((paramspace_pt, params)):
import sys
rmkeys = [key for key in sys.modules if key.startswith('tables')]
for key in rmkeys:
del sys.modules[key]
import traceback
import tables
try:
filename = params['results_file']


On Wed, Oct 10, 2012 at 2:06 PM, Owen Mackwood owen.mackw...@bccn-berlin.de
 wrote:

 On 10 October 2012 20:08, Anthony Scopatz scop...@gmail.com wrote:

 So just to confirm this behavior, having run your sample on a couple of
 my machines, what you see is that the code looks like it gets all the way
 to the end, and then it stalls right before it is about to exit, leaving
 some small number of processes (here names python tables_test.py) in the
 OS.  Is this correct?


 More or less. What's really happening is that if your processor pool has N
 processes, then each time one of the workers hangs the pool will have N-1
 processes running thereafter. Eventually when all the tasks have completed
 (or all workers are hung, something that has happened to me when processing
 many tasks), the main process will just block waiting for the hung
 processes.

 If you're running Linux, when the test is finished and the main process is
 still waiting on the hung processes, you can just kill the main process.
 The orphaned processes that are still there afterward are the ones of
 interest.


 It seems to be the case that these failures do not happen when I set the
 processor pool size to be less than or equal to the number of processors
 (physical or hyperthreaded) that I have on the machine.  I was testing this
 both on an 32 proc cluster and my dual core laptop.  Is this also
 the behavior you have seen?


 No, I've never noticed that to be the case. It appears that the greater
 the true parallelism (ie - physical cores on which there are workers
 executing in parallel) the greater the odds of there being a hang. I don't
 have any real proof of this though; as with most concurrency bugs, it's
 tough to be certain of anything.

 Regards,
 Owen


 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables hangs while opening file in worker process

2012-10-08 Thread Anthony Scopatz
On Mon, Oct 8, 2012 at 5:13 AM, Owen Mackwood
owen.mackw...@bccn-berlin.dewrote:

 Hi Anthony,

 There is a single multiprocessing.Pool which usually has 6-8 processes,
 each of which is used to run a single task, after which a new process is
 created for the next task (maxtasksperchild=1 for the Pool constructor).
 There is a master process that regularly opens an HDF5 file to read out
 information for the worker processes (data that gets copied into a
 dictionary and passed as args to the worker's target function). There are
 no problems with the master process, it never hangs.


Hello Owen,

Hmmm, Are you actually copying the data (f.root.data[:])  or are you simply
passing a reference as arguments (f.root.data)?


 The failure appears to be random, affecting less than 2% of my tasks (all
 tasks are highly similar and should call the same tables functions in the
 same order). This is running on Debian Squeeze, Python 2.7.3, PyTables
 2.4.0. As far as the particular function that hangs... tough to say since I
 haven't yet been able to properly debug the issue. The interpreter hangs
 which limits my ability to diagnose the source of the problem. I call a
 number of functions in the tables module from the worker process, including
 openFile, createVLArray, createCArray, createGroup, flush, and of course
 close.


So if you are opening a file in the master process and then
writing/creating/flushing from the workers this may cause a problem.
 Multiprocess creates a fork of the original process so you are relying on
the file handle from the master process to not accidentally change somehow.
 Can you try to open the files in the workers rather than the master?  I
hope that this clears up the issue.

Basically, I am advocating a more conservative approach where all data that
is read or written to in a worker must come from that worker, rather than
being generated by the master.  If you are *still* experiencing these
problems, then we know we have a real problem.

Also if this doesn't fix it, if you could send us a small sample module
which reproduces this issue, that would be great too!

Be Well
Anthony



 I'll continue to try and find out more about when and how the hang occurs.
 I have to rebuild Python to allow the gdb pystack macro to work. If you
 have any suggestions for me, I'd love to hear them.

 Regards,
 Owen


 On 7 October 2012 00:28, Anthony Scopatz scop...@gmail.com wrote:

 Hi Owen,

 How many pools do you have?  Is this a random runtime failure?  What kind
 of system is this one?  Is there some particular fucntion in Python that
 you are running?  (It seems to be openFile(), but I can't be sure...)  The
 error is definitely happening down in the H5open() routine.  Now whether
 this is HDF5's fault or ours, I am not yet sure.

 Be Well
 Anthony




 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Installation test failed: ImportError

2012-10-08 Thread Anthony Scopatz
Hello John,

You probably installed globally and are trying to test locally.  Either
leave off the Pythonpath or try testing from a location other than the root
pytables dir.

Be Well
Anthony

On Mon, Oct 8, 2012 at 4:23 PM, Dickson, John Robert 
john_dick...@hms.harvard.edu wrote:

 Hello,

 I am trying to install PyTables, but when testing it with the command:

 env PYTHONPATH=. python -c import tables; tables.test()

 It returned the following:

 Traceback (most recent call last):
   File string, line 1, in module
   File tables/__init__.py, line 30, in module
 from tables.utilsExtension import getPyTablesVersion, getHDF5Version
 ImportError: No module named utilsExtension

 I am using Mac OS X 10.8.2.

 Please let me know if you need any additional information.

 I would appreciate any suggestions on what the problem may be and how to
 correct it.

 Thanks,
 John

 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] EArray

2012-10-06 Thread Anthony Scopatz
Hi Andre,

You can use tuple addition to accomplish what you want:

(0,) + data.shape == (0,256,1,2)

Be Well
Anthony

On Sat, Oct 6, 2012 at 12:42 PM, Andre' Walker-Loud walksl...@gmail.comwrote:

 Hi All,

 I have a bunch of hdf5 files I am using to create one hdf5 file.
 Each individual file has many different pieces of data, and they are all
 the same shape in each file.

 I am using createEArray to make the large array in the final file.

 if the data files in the individual h5 files are of shape (256,1,2), then
 I have to use


 createEArray('/path/','name',tables.floatAtom64(),(0,256,1,2),expectedrows=len(data_files))

 if the np array I have grabbed from an individual file to append to my
 EArray is defined as data, is there a way to use data.shape to create the
 shape of my EArray?

 In spirit, I want to do something like (0,data.shape)  but this does not
 work.
 I have been scouring the numpy manual to see how to convert

 data.shape
  (256,1,2)

 to (0,256,1,2)

 but failed to figure this out (if I don't know ahead of time the shape of
 data - in which case I can manually reshape).


 Thanks,

 Andre



 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables hangs while opening file in worker process

2012-10-05 Thread Anthony Scopatz
Hello Owen,

While you can use process pools to read from a file in parallel just fine,
writing is another story completely.  While HDF5 itself supports parallel
writing though MPI, this comes at the high cost of compression no longer
being available and a much more complicated code base.  So for the time
being, PyTables only supports the serial HDF5 library.

Therefore if you want to write to a file in parallel, you adopt a strategy
where you have one process which is responsible for all of the writing and
all other processes send their data to this process instead of writing to
file directly.  This is a very effective way of accomplishing basically
what you need.  In fact, we have an example to do just that [1].  (As a
side note: HDF5 may soon be adding an API for exactly this pattern because
it comes up so often.)

So if I were you, I would look at [1] and adopt it to my use case.

Be Well
Anthony

1.
https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py

On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood
owen.mackw...@bccn-berlin.dewrote:

 Hello,

 I'm using a multiprocessing.Pool to parallelize a set of tasks which
 record their results into separate hdf5 files. Occasionally (less than 2%
 of the time) the worker process will hang. According to gdb, the problem
 occurs while opening the hdf5 file, when it attempts to obtain the
 associated mutex. Here's part of the backtrace:

 #0  0x7fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from
 /lib/libpthread.so.0
 #1  0x7fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
 #2  0x7fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6
 #3  0x7fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
 (__pyx_v_self=0x7fb2b04867d0, __pyx_args=value optimized out,
 __pyx_kwds=value optimized out)
 at tables/hdf5Extension.c:2820
 #4  0x004abf62 in ext_do_call (f=0x4cb2430, throwflag=value
 optimized out) at Python/ceval.c:4331

 Nothing else is trying to open this file, so can someone suggest why this
 is occurring? This is a very annoying problem as there is no way to recover
 from this error, and consequently the worker process is permanently
 occupied, which effectively removes one of my processors from the pool.

 Regards,
 Owen Mackwood


 --
 Don't let slow site performance ruin your business. Deploy New Relic APM
 Deploy New Relic app performance management and know exactly
 what is happening inside your Ruby, Python, PHP, Java, and .NET app
 Try New Relic at no cost today and get our sweet Data Nerd shirt too!
 http://p.sf.net/sfu/newrelic-dev2dev
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-28 Thread Anthony Scopatz
On Fri, Sep 28, 2012 at 2:46 AM, Francesc Alted fal...@pytables.org wrote:

 On 9/27/12 8:10 PM, Anthony Scopatz wrote:
 
  I think I remember seeing there was a performance limit with tables 
  255 columns.  I can't find a reference to that so it's possible I made
  it up.  However, I was wondering if carrays had some limitation like
  that.
 
  Tables are a different data set.  The issue with tables is that column
  metadata (names, etc.) needs to fit in the attribute space.  The size
  of this space is statically limited to 64 kb.  In my experience, this
  number is in the thousands of columns (not hundreds).

 For the record, the PerformanceWarning issued by PyTables has nothing to
 do with the attribute space, but rather to the fact that putting too
 many columns in the same table means that you have to retrieve much more
 data even if you are retrieving only one single column.  Also, internal
 I/O buffers have to be much more larger, and compressors tend to work
 much less efficiently too.

  On the other hand CArrays don't have much of any column metadata.
   CArrays should scale to an infinite number of columns without any issue.

 Yeah, they should scale better, although saying they can reach infinite
 scalability is a bit audacious :)  All the CArrays are datasets that
 have to be saved internally by HDF5, and that requires quite a few of
 resources to have track of them.


True, but I would ague that this is effectively infinite if you set your
chunksize
appropriately large.  I have never the run into an issue with HDF5 where
the
number of rows or columns on its own becomes too large for arrays.
 However,
it is relatively easy to reach this limit with tables (both in PyTables and
the HL
interface).  So maybe I should have said effectively infinite ;)



 --
 Francesc Alted



 --
 Got visibility?
 Most devs has no idea what their production app looks like.
 Find out how fast your code is with AppDynamics Lite.
 http://ad.doubleclick.net/clk;262219671;13503038;y?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-27 Thread Anthony Scopatz
On Thu, Sep 27, 2012 at 11:02 AM, Luke Lee durdenm...@gmail.com wrote:

 Are there any performance issues with relatively large carrays?  For
 example, say I have a carray with 300,000 float64s in it.  Is there some
 threshold where I could expect performance to degrade or anything?


Hello Luke,

The breakdowns happen when you have too many chunks.  However you are well
away from this threshold (which is ~20,000).  I believe that the PyTables
will issue a warning or error when you reach this point anyways.


 I think I remember seeing there was a performance limit with tables  255
 columns.  I can't find a reference to that so it's possible I made it up.
  However, I was wondering if carrays had some limitation like that.


Tables are a different data set.  The issue with tables is that column
metadata (names, etc.) needs to fit in the attribute space.  The size of
this space is statically limited to 64 kb.  In my experience, this number
is in the thousands of columns (not hundreds). On the other hand CArrays
don't have much of any column metadata.  CArrays should scale to an
infinite number of columns without any issue.

Be Well
Anthony




 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://ad.doubleclick.net/clk;258768047;13503038;j?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] where() with start/stop args returning incorrect result set

2012-09-25 Thread Anthony Scopatz
Hi Derek,

Ok That is very strange.  I cannot reproduce this on any of my data.  A
quick couple of extra questions:

1) Does this still happen when you set start=0?
2) What is the chunksize of this data set (are you at a boundary)?
3) Could you send us the full table information, ie repr(table).

Be Well
Anthony

On Tue, Sep 25, 2012 at 12:42 AM, Derek Shockey derek.shoc...@gmail.comwrote:

 I ran the tests. All 4988 passed. The information it output is:

 PyTables version:  2.4.0
 HDF5 version:  1.8.9
 NumPy version: 1.6.2
 Numexpr version:   2.0.1 (not using Intel's VML/MKL)
 Zlib version:  1.2.5 (in Python interpreter)
 LZO version:   2.06 (Aug 12 2011)
 BZIP2 version: 1.0.6 (6-Sept-2010)
 Blosc version: 1.1.3 (2010-11-16)
 Cython version:0.16
 Python version:2.7.3 (default, Jul  6 2012, 00:17:51)
 [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)]
 Platform:  darwin-x86_64
 Byte-ordering: little
 Detected cores:4

 -Derek

 On Mon, Sep 24, 2012 at 9:09 PM, Anthony Scopatz scop...@gmail.com
 wrote:
  Hi Derek,
 
  Can you please run the following command and report back what you see?
 
  python -c import tables; tables.test()
 
  Be Well
  Anthony
 
  On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey derek.shoc...@gmail.com
 
  wrote:
 
  Hello,
 
  I'm hoping someone can help me. When I specify start and stop values
  for calls to where() and readWhere(), it is returning blatantly
  incorrect results:
 
   table.readWhere(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
   start=3257, stop=table.nrows)[0]['id']
  '7f589d3e-a0e1-4882-b69b-0223a7de3801'
 
   table.where(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
   start=3257, stop=table.nrows).next()['id']
  '7f589d3e-a0e1-4882-b69b-0223a7de3801'
 
  This happens with a sequential block of about 150 rows of data, and
  each time it seems to be 8 rows off (i.e. the row it returns is 8 rows
  ahead of the row it should be returning). If I remove the start and
  stop args, it behaves correctly. This seems to be a bug, unless I am
  misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0,
  and hdf5 1.8.9 on OS X 10.8.2.
 
  Any ideas?
 
  Thanks,
  Derek
 
 
 
 --
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond.
 Discussions
  will include endpoint security, mobile security and the latest in
 malware
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 
 
 --
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond. Discussions
  will include endpoint security, mobile security and the latest in malware
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] where() with start/stop args returning incorrect result set

2012-09-25 Thread Anthony Scopatz
Hello Derek, and devs,

After playing around with your data,  I am able to reproduce this error on
my system.
I am not sure exactly where the problem is but I do know how to fix it!

It turns out that this is an issue with the indexes not being properly in
sync with the
original table OR the start and stop values are not
being propagated properly down
to the indexes.  When I tried to reindex by calling table.reIndex(), this
did not fix the
issue.  This makes me think that the problem is propagating start, stop,
and step all
the way through correctly.  I'll go ahead an make a ticket reflecting this.

That said, the way to fix this in the short term is to do one of the
following

1)  Only use start=0, and step=1 (I bet that other stop values work)
2) Don't use indexes.  When I removed the indexes from the file using
ptrepack analysis.h5 analysis2.h5, everything worked fine.

Thanks a ton for reporting this!
Be Well
Anthony

On Tue, Sep 25, 2012 at 12:30 PM, Derek Shockey derek.shoc...@gmail.comwrote:

 Hi Anthony,

 It doesn't happen if I set start=0 or seemingly any number below 3257
 (though I didn't try them *all*). I am new to PyTables and hdf5, so
 I'm not sure about the chunksize or if I'm at a boundary. I did
 however notice that the table's chunkshape is 203, and this happens
 for exactly 203 sequential records, so I doubt that's a coincidence.
 The table description is below.

 Thanks,
 Derek

 /events (Table(5988,)) ''
   description := {
   client_id: StringCol(itemsize=24, shape=(), dflt='', pos=0),
   data_01: StringCol(itemsize=36, shape=(), dflt='', pos=1),
   data_02: StringCol(itemsize=36, shape=(), dflt='', pos=2),
   data_03: StringCol(itemsize=36, shape=(), dflt='', pos=3),
   data_04: StringCol(itemsize=36, shape=(), dflt='', pos=4),
   data_05: StringCol(itemsize=36, shape=(), dflt='', pos=5),
   device_id: StringCol(itemsize=36, shape=(), dflt='', pos=6),
   id: StringCol(itemsize=36, shape=(), dflt='', pos=7),
   timestamp: Time64Col(shape=(), dflt=0.0, pos=8),
   type: UInt16Col(shape=(), dflt=0, pos=9),
   user_id: StringCol(itemsize=36, shape=(), dflt='', pos=10)}
   byteorder := 'little'
   chunkshape := (203,)
   autoIndex := True
   colindexes := {
 timestamp: Index(9, full, shuffle, zlib(1)).is_CSI=True,
 type: Index(9, full, shuffle, zlib(1)).is_CSI=True,
 id: Index(9, full, shuffle, zlib(1)).is_CSI=True,
 user_id: Index(9, full, shuffle, zlib(1)).is_CSI=True}

 On Tue, Sep 25, 2012 at 9:32 AM, Anthony Scopatz scop...@gmail.com
 wrote:
  Hi Derek,
 
  Ok That is very strange.  I cannot reproduce this on any of my data.  A
  quick couple of extra questions:
 
  1) Does this still happen when you set start=0?
  2) What is the chunksize of this data set (are you at a boundary)?
  3) Could you send us the full table information, ie repr(table).
 
  Be Well
  Anthony
 
 
  On Tue, Sep 25, 2012 at 12:42 AM, Derek Shockey derek.shoc...@gmail.com
 
  wrote:
 
  I ran the tests. All 4988 passed. The information it output is:
 
  PyTables version:  2.4.0
  HDF5 version:  1.8.9
  NumPy version: 1.6.2
  Numexpr version:   2.0.1 (not using Intel's VML/MKL)
  Zlib version:  1.2.5 (in Python interpreter)
  LZO version:   2.06 (Aug 12 2011)
  BZIP2 version: 1.0.6 (6-Sept-2010)
  Blosc version: 1.1.3 (2010-11-16)
  Cython version:0.16
  Python version:2.7.3 (default, Jul  6 2012, 00:17:51)
  [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)]
  Platform:  darwin-x86_64
  Byte-ordering: little
  Detected cores:4
 
  -Derek
 
  On Mon, Sep 24, 2012 at 9:09 PM, Anthony Scopatz scop...@gmail.com
  wrote:
   Hi Derek,
  
   Can you please run the following command and report back what you see?
  
   python -c import tables; tables.test()
  
   Be Well
   Anthony
  
   On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey
   derek.shoc...@gmail.com
   wrote:
  
   Hello,
  
   I'm hoping someone can help me. When I specify start and stop values
   for calls to where() and readWhere(), it is returning blatantly
   incorrect results:
  
table.readWhere(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
start=3257, stop=table.nrows)[0]['id']
   '7f589d3e-a0e1-4882-b69b-0223a7de3801'
  
table.where(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
start=3257, stop=table.nrows).next()['id']
   '7f589d3e-a0e1-4882-b69b-0223a7de3801'
  
   This happens with a sequential block of about 150 rows of data, and
   each time it seems to be 8 rows off (i.e. the row it returns is 8
 rows
   ahead of the row it should be returning). If I remove the start and
   stop args, it behaves correctly. This seems to be a bug, unless I am
   misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0,
   and hdf5 1.8.9 on OS X 10.8.2.
  
   Any ideas?
  
   Thanks,
   Derek
  
  
  
  
 --
   Live Security Virtual Conference
   Exclusive live event will cover

Re: [Pytables-users] where() with start/stop args returning incorrect result set

2012-09-24 Thread Anthony Scopatz
Hi Derek,

Can you please run the following command and report back what you see?

python -c import tables; tables.test()

Be Well
Anthony

On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey derek.shoc...@gmail.comwrote:

 Hello,

 I'm hoping someone can help me. When I specify start and stop values
 for calls to where() and readWhere(), it is returning blatantly
 incorrect results:

  table.readWhere(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
 start=3257, stop=table.nrows)[0]['id']
 '7f589d3e-a0e1-4882-b69b-0223a7de3801'

  table.where(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
 start=3257, stop=table.nrows).next()['id']
 '7f589d3e-a0e1-4882-b69b-0223a7de3801'

 This happens with a sequential block of about 150 rows of data, and
 each time it seems to be 8 rows off (i.e. the row it returns is 8 rows
 ahead of the row it should be returning). If I remove the start and
 stop args, it behaves correctly. This seems to be a bug, unless I am
 misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0,
 and hdf5 1.8.9 on OS X 10.8.2.

 Any ideas?

 Thanks,
 Derek


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] where() with start/stop args returning incorrect result set

2012-09-24 Thread Anthony Scopatz
PS When I do this on linux all 5077 tests pass for me.

On Mon, Sep 24, 2012 at 11:09 PM, Anthony Scopatz scop...@gmail.com wrote:

 Hi Derek,

 Can you please run the following command and report back what you see?

 python -c import tables; tables.test()

 Be Well
 Anthony


 On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey 
 derek.shoc...@gmail.comwrote:

 Hello,

 I'm hoping someone can help me. When I specify start and stop values
 for calls to where() and readWhere(), it is returning blatantly
 incorrect results:

  table.readWhere(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
 start=3257, stop=table.nrows)[0]['id']
 '7f589d3e-a0e1-4882-b69b-0223a7de3801'

  table.where(id == 'ceec536a-394e-4dd7-a182-eea557f3bb93',
 start=3257, stop=table.nrows).next()['id']
 '7f589d3e-a0e1-4882-b69b-0223a7de3801'

 This happens with a sequential block of about 150 rows of data, and
 each time it seems to be 8 rows off (i.e. the row it returns is 8 rows
 ahead of the row it should be returning). If I remove the start and
 stop args, it behaves correctly. This seems to be a bug, unless I am
 misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0,
 and hdf5 1.8.9 on OS X 10.8.2.

 Any ideas?

 Thanks,
 Derek


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-21 Thread Anthony Scopatz
On Fri, Sep 21, 2012 at 10:49 AM, Luke Lee durdenm...@gmail.com wrote:

 Hi again,

 I haven't been getting the updates via email so I'm attempting to post
 again to respond.

 Thanks everyone for the suggestions.  I have a few questions:

 1.  What is the benefit of using the stand-alone carray project (
 https://github.com/FrancescAlted/carray) vs Pytables.carray?


Hello Luke,

carrays are in-memory, not on disk.


 2.  I realized my code base never uses the query functionality of a Table.
  So, I changed all my columns to be just Pytables.carray objects instead.
  They are all sitting at the top of the hierarchy, just below root.  Is
 this a good idea?

 I see a big speed increase from this obviously because now everything is
 stored contiguously.  However, are there any downsides to doing this?  I
 suppose I could also use EArray, but we are never actually changing the
 data once it is stored in HDF5.


If it works for you, then great!


 3.  Is compression automatically happening with the Carray?  I know the
 documentation says that compression is supported, but what do I need to do
 to enable it?  Maybe it's already happening and this is contributing to my
 big speed improvement.


For compression to be enabled, you need to define the appropriate filter
[1] on either the node or the file.

4.  I would certainly love to take a look at contributing something like
 this in my free time.  I don't have a whole lot at this time so the changes
 could take a while.  I'm sure I need to learn a lot more about the codebase
 before really giving it a try.  I'm going to take a look at this though,
 thanks for the suggestion!


No problem ;)


 5.  How do I subscribe to the dev mailing list?  I only see announcements
 and users.


Here is the dev list site:
https://groups.google.com/forum/?fromgroups#!forum/pytables-dev


 6.  Any idea why I'm not getting the emails from the list?  I signed up 2
 days ago and didn't get any of your replies via email.


We have been having problems with this list.  I think It might be time to
transition...

Be Well
Anthony

1.
http://pytables.github.com/usersguide/libref/helper_classes.html?highlight=filter#tables.Filters




 --
 Got visibility?
 Most devs has no idea what their production app looks like.
 Find out how fast your code is with AppDynamics Lite.
 http://ad.doubleclick.net/clk;262219671;13503038;y?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Optimizing pytables for reading entire columns at a time

2012-09-21 Thread Anthony Scopatz
On Fri, Sep 21, 2012 at 4:55 PM, Francesc Alted fal...@gmail.com wrote:

 On 9/21/12 10:07 PM, Anthony Scopatz wrote:
  On Fri, Sep 21, 2012 at 10:49 AM, Luke Lee durdenm...@gmail.com
  mailto:durdenm...@gmail.com wrote:
 
  Hi again,
 
  I haven't been getting the updates via email so I'm attempting to
  post again to respond.
 
  Thanks everyone for the suggestions.  I have a few questions:
 
  1.  What is the benefit of using the stand-alone carray project
  (https://github.com/FrancescAlted/carray) vs Pytables.carray?
 
 
  Hello Luke,
 
  carrays are in-memory, not on disk.

 Well, that was true until version 0.5 where disk persistency was
 introduced.  Now, carray supports both in-memory and on-disk objects,
 and they work exactly in the same way.


Sorry for not being exactly up to date ;)



 --
 Francesc Alted



 --
 Got visibility?
 Most devs has no idea what their production app looks like.
 Find out how fast your code is with AppDynamics Lite.
 http://ad.doubleclick.net/clk;262219671;13503038;y?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] [ANN] Blosc 1.1.4 released

2012-09-16 Thread Anthony Scopatz
Great!  Thanks to you both.

On Sun, Sep 16, 2012 at 11:42 AM, Antonio Valentino 
antonio.valent...@tiscali.it wrote:

 Hi Francesc,
 thank you.
 Just pushed updates into pytables.

 ciao


 Il 16/09/2012 12:07, Francesc Alted ha scritto:
  ===
 Announcing Blosc 1.1.4
 A blocking, shuffling and lossless compression library
  ===
 
  What is new?
  
 
  - Redefinition of the BLOSC_MAX_BUFFERSIZE constant as (INT_MAX -
  BLOSC_MAX_OVERHEAD) instead of just INT_MAX.  This prevents to
 produce
  outputs larger than INT_MAX, which is not supported.
 
  - `exit()` call has been replaced by a ``return -1`` in blosc_compress()
  when checking for buffer sizes.  Now programs will not just exit when
  the buffer is too large, but return a negative code.
 
  - Improvements in explicit casts.  Blosc compiles without warnings
  (with GCC) now.
 
  - Lots of improvements in docs, in particular a nice ascii-art diagram
  of the Blosc format (Valentin Haenel).
 
  - [HDF5 filter] Adapted HDF5 filter to use HDF5 1.8 by default
  (Antonio Valentino).
 
  For more info, please see the release notes in:
 
  https://github.com/FrancescAlted/blosc/wiki/Release-notes
 

 --
 Antonio Valentino


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://ad.doubleclick.net/clk;258768047;13503038;j?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] a question about data corruption.

2012-09-16 Thread Anthony Scopatz
Hello Gelin,

Unless you were using the undo / redo mechanism, then I don't think that
there is.  You'll probably have to fix the file manually using PyTables
normally and the provided tools like ptrepack.

Be Well
Anthony

On Sun, Sep 16, 2012 at 12:22 PM, gelin yan dynami...@gmail.com wrote:

 Hi All

 I have a question about data corruption. Is it possible to repair data
 file when there is a situation like power outage or process crash? I have
 poked around the manual; however I did fail to find anything about how to
 repair corrupted data if it happened.


 Thanks

 Regards

 gelin yan


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://ad.doubleclick.net/clk;258768047;13503038;j?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Fwd: A sad day for our community. John Hunter: 1968-2012.

2012-08-30 Thread Anthony Scopatz
Passing the bad news along, in case you hadn't heard.

-- Forwarded message --
From: Fernando Perez fperez@gmail.com
Date: Wed, Aug 29, 2012 at 9:32 PM
Subject: A sad day for our community. John Hunter: 1968-2012.
To: matplotlib development list matplotlib-de...@lists.sourceforge.net,
Matplotlib Users matplotlib-us...@lists.sourceforge.net, IPython
Development list ipython-...@scipy.org, IPython User list 
ipython-u...@scipy.org, Discussion of Numerical Python 
numpy-discuss...@scipy.org, SciPy Developers List scipy-...@scipy.org,
SciPy Users List scipy-u...@scipy.org, numfo...@googlegroups.com,
pyd...@googlegroups.com, scikit-learn-general 
scikit-learn-gene...@lists.sourceforge.net, networkx-discuss 
networkx-disc...@googlegroups.com, sage-devel sage-de...@googlegroups.com,
pystatsmod...@googlegroups.com, enthought-dev 
enthought-...@mail.enthought.com, yt-...@lists.spacepope.org


Dear friends and colleagues,

I am terribly saddened to report that yesterday, August 28 2012 at
10am,  John D. Hunter died from complications arising from cancer
treatment at the University of Chicago hospital, after a brief but
intense battle with this terrible illness.  John is survived by his
wife Miriam, his three daughters Rahel, Ava and Clara, his sisters
Layne and Mary, and his mother Sarah.

Note: If you decide not to read any further (I know this is a long
message), please go to this page for some important information about
how you can thank John for everything he gave in a decade of generous
contributions to the Python and scientific communities:
http://numfocus.org/johnhunter.

Just a few weeks ago, John delivered his keynote address at the SciPy
2012 conference in Austin centered around the evolution of matplotlib:

http://www.youtube.com/watch?v=e3lTby5RI54

but tragically, shortly after his return home he was diagnosed with
advanced colon cancer.  This diagnosis was a terrible discovery to us
all, but John took it with his usual combination of calm and resolve,
and initiated treatment procedures.  Unfortunately, the first round of
chemotherapy treatments led to severe complications that sent him to
the intensive care unit, and despite the best efforts of the
University of Chicago medical center staff, he never fully recovered
from these.  Yesterday morning, he died peacefully at the hospital
with his loved ones at his bedside.  John fought with grace and
courage, enduring every necessary procedure with a smile on his face
and a kind word for all of his caretakers and becoming a loved patient
of the many teams that ended up involved with his case.  This was no
surprise for those of us who knew him, but he clearly left a deep and
lasting mark even amongst staff hardened by the rigors of oncology
floors and intensive care units.

I don't need to explain to this community the impact of John's work,
but allow me to briefly recap, in case this is read by some who don't
know the whole story.  In 2002, John was a postdoc at the University
of Chicago hospital working on the analysis of epilepsy seizure data
in children.  Frustrated with the state of the existing proprietary
solutions for this class of problems, he started using Python for his
work, back when the scientific Python ecosystem was much, much smaller
than it is today and this could have been seen as a crazy risk.
Furthermore, he found that there were many half-baked solutions for
data visualization in Python at the time, but none that truly met his
needs.  Undeterred, he went on to create matplotlib
(http://matplotlib.org) and thus overcome one of the key obstacles for
Python to become the best solution for open source scientific and
technical computing.  Matplotlib is both an amazing technical
achievement and a shining example of open source community building,
as John not only created its backbone but also fostered the
development of a very strong development team, ensuring that the
talent of many others could also contribute to this project.  The
value and importance of this are now painfully clear: despite having
lost John, matplotlib continues to thrive thanks to the leadership of
Michael Droetboom, the support of Perry Greenfield at the Hubble
Telescope Science Institute, and the daily work of the rest of the
team.  I want to thank Perry and Michael for putting their resources
and talent once more behind matplotlib, securing the future of the
project.

It is difficult to overstate the value and importance of matplotlib,
and therefore of John's contributions (which do not end in matplotlib,
by the way; but a biography will have to wait for another day...).
Python has become a major force in the technical and scientific
computing world, leading the open source offers and challenging
expensive proprietary platforms with large teams and millions of
dollars of resources behind them. But this would be impossible without
a solid data visualization tool that would allow both ad-hoc data
exploration and the production of complex, fine-tuned 

Re: [Pytables-users] import error

2012-08-19 Thread Anthony Scopatz
Hi John,

This is probably a path issue.  You likely have both pytables installed and
a 'tables' source sub-directory wherever you are running this from.  For
whatever reason, it is picking up the source version rather than the
installed version.  It is either that, or you simply don't have it
installed correctly.  The file it is missing is one which gets compiled
when you run python setup.py install

Be Well
Anthony

On Sat, Aug 18, 2012 at 2:28 AM, John cloverev...@yahoo.com wrote:

 I have the following import error when I try to  import pytable

 ImportError   Traceback (most recent call 
 last)ipython-input-2-389ecae14f10 in module() 1 import tables
 /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/tables/__init__.py
  in module() 28  29 # Necessary imports to get versions stored on 
 the Pyrex extension--- 30 from tables.utilsExtension import 
 getPyTablesVersion, getHDF5Version 31  32
 ImportError: 
 dlopen(/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/tables/utilsExtension.so,
  2): Symbol not found: _H5E_CALLBACK_g
   Referenced from: 
 /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/tables/utilsExtension.so
   Expected in: flat namespace
  in 
 /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/tables/utilsExtension.so



 Anyone knows whats wrong with it?


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Searching for nan values in a table...

2012-08-16 Thread Anthony Scopatz
So this is probably a numexpr issue.  There doesn't seem to be an isnan()
implementation [1].  I would bring it up with them. Sorry we can't do more.

Be Well
Anthony

1. http://code.google.com/p/numexpr/wiki/UsersGuide

On Thu, Aug 16, 2012 at 12:57 PM, Aquil H. Abdullah 
aquil.abdul...@gmail.com wrote:

 I get the same error if I use:

 bad_vols = tbl.getWhereList('volume == nan')
 bad_vols = tbl.getWhereList('volume == NaN')

 --
 Aquil H. Abdullah
 I never think of the future. It comes soon enough - Albert Einstein

 On Thursday, August 16, 2012 at 1:52 PM, Anthony Scopatz wrote:

 Have you tried simply doing:

 'volume == nan' or
 'volume == NaN'

 On Thu, Aug 16, 2012 at 12:49 PM, Aquil H. Abdullah 
 aquil.abdul...@gmail.com wrote:

  Hello All,

 I am trying to determine if there are any NaN values in one of my tables,
 but when I queried for numpy.nan I received a NameError. Can any tell be
 the best way to search for a NaN value? Thanks!

 In [7]: type(np.nan)
 Out[7]: float

 In [8]: bad_vols = tbl.getWhereList('volume == %f' % np.nan)
 ---
 NameError Traceback (most recent call last)
 /Users/aquilabdullah/ipython-input-8-2c1b183b0581 in module()
  1 bad_vols = tbl.getWhereList('volume == %f' % np.nan)

 /Library/Python/2.7/site-packages/tables/table.pyc in getWhereList(self,
 condition, condvars, sort, start, stop, step)
1540
1541 coords = [ p.nrow for p in
 - 1542self._where(condition, condvars, start, stop,
 step) ]
1543 coords = numpy.array(coords, dtype=SizeType)
1544 # Reset the conditions

 /Library/Python/2.7/site-packages/tables/table.pyc in _where(self,
 condition, condvars, start, stop, step)
1434
1435 # Compile the condition and extract usable index
 conditions.
 - 1436 condvars = self._requiredExprVars(condition, condvars,
 depth=3)
1437 compiled = self._compileCondition(condition, condvars)
1438

 /Library/Python/2.7/site-packages/tables/table.pyc in
 _requiredExprVars(self, expression, uservars, depth)
1207 val = user_globals[var]
1208 else:
 - 1209 raise NameError(name ``%s`` is not defined % var)
1210
1211 # Check the value.

 NameError: name ``nan`` is not defined

 --
 Aquil H. Abdullah
 I never think of the future. It comes soon enough - Albert Einstein



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Numpy views stored as attributes

2012-08-15 Thread Anthony Scopatz
Hello Ask,

I bet this is because you are storing these as attrs...which will default
back to some pickled Python representation.

Can you check if this works as expected when saving as actual arrays.
 Something like:

import numpy as np
import tables

with tables.openFile(test.h5, w) as f:

A=np.array([[0,1],[2,3]])

a=f.createArray(/, a, A)
b=f.createArray(/, b, A.T.copy())
c=f.createArray(/, c, A.T)

assert np.all(a==A)
assert np.all(b==A.T)
assert np.all(c==A) # AssertionError!
assert np.all(c==A.T)

Be Well
Anthony

On Wed, Aug 15, 2012 at 4:13 AM, Ask F. Jakobsen a...@linet.dk wrote:

 Hey all,

 When I store a view of a numpy array as an attribute it appears to be
 stored as the array that owns the data. Is this a bug? I find it confusing
 that the user has to check if the numpy array owns the data or always
 remember to do a copy() before storing a numpy array as an attribute.

 Below is some sample code that highlights the problem.

 Best regards, Ask

 import numpy as np
 import tables

 with tables.openFile(test.h5, w) as f:

 x=f.createArray(/, test, [0])

 A=np.array([[0,1],[2,3]])

 x.attrs['a']=A
 x.attrs['b']=A.T.copy()
 x.attrs['c']=A.T

 assert np.all(x.attrs['a']==A)
 assert np.all(x.attrs['b']==A.T)
 assert np.all(x.attrs['c']==A)
 assert np.all(x.attrs['c']==A.T) # AssertionError!


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] In-kernal for subset?

2012-08-15 Thread Anthony Scopatz
On Wed, Aug 15, 2012 at 12:33 PM, Adam Dershowitz
adershow...@exponent.comwrote:

  I am trying to find all cases where a value transitions above a
 threshold.  So, my code first does a getwherelist to find values that are
 above the threshold, then it uses that list to find immediately prior
 values that are below.  The code is working, but the second part, searching
 through just a smaller subset is much slower (First search is on the order
 of 1 second, while the second is a minute).
 Is there any way to get this second part of the search in-kernal?  Or any
 more general way to do a search for values above a threshold, where the
 prior value is below?
 Essentially, what I am looking for is a way to speed up that second search
 for all rows in a prior defined list, where a condition is applied to the
 table

  My table is just seconds and values, in chronological order.

  Here is the code that I am using now:

  h5data = tb.openFile(AllData.h5,r)
 table1 = h5data.root.table1

  #Find all values above threshold:
  thelist= table1.getWhereList(Value  150)

  #From the above list find all values where the immediately prior value
 is below:
 transition=[]
 for i in thelist:
 if (table1[i-1]['Value']  150) and (i != 0) :
 transition.append(i)


Hey Adam,

Sorry for taking a while to respond.  Assuming you don't mind one of these
being = or =, you don't really need the second loop with a little index
arithmetic:

import numpy as np
inds = np.array(thelist)
dinds = inds[1:] - inds[:-1]
transition = dinds[(1  dinds)]

This should get you an array of all of the transition indices since
wherever the difference in indices is greater than 1 the Value must have
dropped below the threshold and then returned back up.

Be Well
Anthony



  Thanks,



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] openFile strategy question

2012-08-15 Thread Anthony Scopatz
Hi Andre,

I am a little confused.  Let me verify.  You have 400 hdf5 file (re and
im) buried in an a unix directory tree.  You want to make a single file
which concatenates this data.  Is this right?

Be Well
Anthony

On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud walksl...@gmail.comwrote:

 Hi All,

 Just a strategy question.
 I have many hdf5 files containing data for different measurements of the
 same quantities.

 My directory tree looks like

 top description [ group ]
   sub description [ group ]
 avg [ group ]
   re [ numpy array shape = (96,1,2) ]
   im [ numpy array shape = (96,1,2) ] - only exists for know subset of
 data files

 I have ~400 of these files.  What I want to do is create a single file,
 which collects all of these files with exactly the same directory
 structure, except at the very bottom

   re [ numpy array shape = (400,96,1,2) ]


 The simplest thing I came up with to do this is loop over the two levels
 of descriptive group structures, and build the numpy array for the final
 set this way.

 basic loop structure:

 final_file = tables.openFile('all_data.h5','a')

 for d1 in top_description:
 final_file.createGroup(final_file.root,d1)
 for d2 in sub_description:
 final_file.createGroup(final_file.root+'/'+d1,d2)
 data_re = np.zeros([400,96,1,2])
 for i,file in enumerate(hdf5_files):
 tmp = tables.openFile(file)
 data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re')
 tmp.close()
 final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re)


 But this involves opening and closing the individual 400 hdf5 files many
 times.
 There must be a smarter algorithmic way to do this - or perhaps built in
 pytables tools.

 Any advice is appreciated.


 Andre

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] openFile seems to hang

2012-08-07 Thread Anthony Scopatz
On Tue, Aug 7, 2012 at 11:50 AM, Daniel Wheeler
daniel.wheel...@gmail.comwrote:



 On Tue, Aug 7, 2012 at 12:46 PM, Anthony Scopatz scop...@gmail.comwrote:


 On Tue, Aug 7, 2012 at 11:43 AM, Daniel Wheeler 
 daniel.wheel...@gmail.com wrote:



 They should know what do do and how to fix it.


 Maybe mpi init issues with either pytrilinos or mpi4py as a wild guess.
 Both are imported by fipy.


 Your guess is as good, or much better, than mine.


 Thanks for your questions and answers.


BTW If it turns out that you need us to change something in PyTables to
play nicely with fipy, please let us know!



 --
 Daniel Wheeler


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] openFile seems to hang

2012-08-06 Thread Anthony Scopatz
Hi Daniel,

Does this always happen when opening files? or just occasionally?

Be Well
Anthony

On Mon, Aug 6, 2012 at 11:08 AM, Daniel Wheeler
daniel.wheel...@gmail.comwrote:

 The following just seems to hang indefinitely.

 In [1]: import tables


 In [2]: f = tables.openFile('tmp.h5', mode='a')

 The tests hang as well.

 In [3]: tables.test()

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 PyTables version:  2.4.0
 HDF5 version:  1.8.4-patch1
 NumPy version: 1.6.1
 Numexpr version:   2.0.1 (not using Intel's VML/MKL)
 Zlib version:  1.2.3.4 (in Python interpreter)
 BZIP2 version: 1.0.5 (10-Dec-2007)
 Blosc version: 1.1.3 (2010-11-16)
 Cython version:0.15.1
 Python version:2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
 [GCC 4.4.5]
 Platform:  linux2-x86_64
 Byte-ordering: little
 Detected cores:4

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Performing only a light (yet comprehensive) subset of the test suite.
 If you want a more complete test, try passing the --heavy flag to this
 script
 (or set the 'heavy' parameter in case you are using tables.test() call).
 The whole suite will take more than 4 hours to complete on a relatively
 modern CPU and around 512 MB of main memory.

 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 /users/wd15/.virtualenvs/trunk/lib/python2.6/site-packages/tables/filters.py:253:
 FiltersWarning: compression library ``lzo`` is not available; using
 ``zlib`` instead
   % (complib, default_complib), FiltersWarning )

 Any ideas are greatly appreciated.

 Thanks.

 --
 Daniel Wheeler


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] advice on data representation

2012-08-01 Thread Anthony Scopatz
 = I030_070_DESC()
 
I030_170 = I030_170_DESC()
 
I030_100 = I030_100_DESC()
 
I030_180 = I030_180_DESC()
 
I030_181 = I030_181_DESC()
 
I030_060 = I030_060_DESC()
 
I030_150 = I030_150_DESC()
 
I030_140 = I030_140_DESC()
 
I030_340 = I030_340_DESC()
 
I030_400 = I030_400_DESC()
 
...
 
I030_210 = I030_210_DESC()
 
I030_120 = I030_120_DESC()
 
I030_050 = I030_050_DESC()
 
I030_270 = I030_270_DESC()
 
I030_370 = I030_370_DESC()
 
 
 
 
 
Från: Anthony Scopatz [mailto:scop...@gmail.com]
Skickat: den 12 juli 2012 00:02
Till: Discussion list for PyTables
Ämne: Re: [Pytables-users] advice on using PyTables
 
 
 
Hello Benjamin,
 
 
 
Not knowing to much about the ASTERIX format, other than
  what you said and what is in the links, I would say that this is a good
  fit for HDF5 and PyTables.  PyTables will certainly help you read in
  the data and manipulate it.
 
 
 
However, before you abandon hachoir completely, I will say
  it is a lot easier to write hdf5 files in PyTables than to use the HDF5
  C API.   If hachoir is too slow, have you tried profiling the code to
  see what is taking up the most time?  Maybe you could just rewrite
  these parts in C?  Have you looked into Cythonizing it?  Also, you
  don't seem to be using numpy to read in the data... (there are some
  tricks given ASTERIX here, but not insurmountable).
 
 
 
I ask the above, just so you don't have to completely
  rewrite everything.  You are correct though that pure python is
  probably not sufficient.  Feel free to ask more questions here.
 
 
 
Be Well
 
Anthony
 
 
 
On Wed, Jul 11, 2012 at 6:52 AM, benjamin.bertr...@lfv.se
  wrote:
 
Hi,
 
I'm working with Air Traffic Management and would like to
  perform checks / compute statistics on ASTERIX data.
ASTERIX is an ATM Surveillance Data Binary Messaging Format
  (http://www.eurocontrol.int/asterix/public/standard_page/overview.html)
 
The data consist of a concatenation of consecutive data
  blocks.
Each data block consists of data category + length +
  records.
Each record is of variable length and consists of several
  data items (that are well defined for each category).
Some data items might be present or not depending on a field
  specification (bitfield).
 
I started to write a parser using hachoir
  (https://bitbucket.org/haypo/hachoir/overview) a pure python library.
But the parsing was really too slow and taking a lot of
  memory.
That's not really useable.
 
From what I read, PyTables could really help to manipulate
  and analyze the data.
So I've been thinking about writing a tool (probably in C)
  to convert my ASTERIX format to HDF5.
 
Before I start, I'd like confirmation that this seems like a
  suitable application for PyTables.
Is there another approach than writing a conversion tool to
  HDF5?
 
Thanks in advance
 
Benjamin
 
 
 
 

  --
Live Security Virtual Conference
Exclusive live event will cover all the ways today's
  security and
threat landscape has changed and how IT managers can
  respond. Discussions
will include endpoint security, mobile security and the
  latest in malware
threats.
  http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo

Re: [Pytables-users] advice on data representation

2012-07-31 Thread Anthony Scopatz
 = I030_181_DESC()

 I030_060 = I030_060_DESC()

 I030_150 = I030_150_DESC()

 I030_140 = I030_140_DESC()

 I030_340 = I030_340_DESC()

 I030_400 = I030_400_DESC()

 ...

 I030_210 = I030_210_DESC()

 I030_120 = I030_120_DESC()

 I030_050 = I030_050_DESC()

 I030_270 = I030_270_DESC()

 I030_370 = I030_370_DESC()

 ** **

 ** **

 *Från:* Anthony Scopatz [mailto:scop...@gmail.com]
 *Skickat:* den 12 juli 2012 00:02
 *Till:* Discussion list for PyTables
 *Ämne:* Re: [Pytables-users] advice on using PyTables

 ** **

 Hello Benjamin, 

 ** **

 Not knowing to much about the ASTERIX format, other than what you said and
 what is in the links, I would say that this is a good fit for HDF5 and
 PyTables.  PyTables will certainly help you read in the data and manipulate
 it.  

 ** **

 However, before you abandon hachoir completely, I will say it is a lot
 easier to write hdf5 files in PyTables than to use the HDF5 C API.   If
 hachoir is too slow, have you tried profiling the code to see what is
 taking up the most time?  Maybe you could just rewrite these parts in C?
  Have you looked into Cythonizing it?  Also, you don't seem to be using
 numpy to read in the data... (there are some tricks given ASTERIX here, but
 not insurmountable).

 ** **

 I ask the above, just so you don't have to completely rewrite everything.
  You are correct though that pure python is probably not sufficient.  Feel
 free to ask more questions here.

 ** **

 Be Well

 Anthony

 ** **

 On Wed, Jul 11, 2012 at 6:52 AM, benjamin.bertr...@lfv.se wrote:

 Hi,

 I'm working with Air Traffic Management and would like to perform checks /
 compute statistics on ASTERIX data.
 ASTERIX is an ATM Surveillance Data Binary Messaging Format (
 http://www.eurocontrol.int/asterix/public/standard_page/overview.html)

 The data consist of a concatenation of consecutive data blocks.
 Each data block consists of data category + length + records.
 Each record is of variable length and consists of several data items (that
 are well defined for each category).
 Some data items might be present or not depending on a field specification
 (bitfield).

 I started to write a parser using hachoir (
 https://bitbucket.org/haypo/hachoir/overview) a pure python library.
 But the parsing was really too slow and taking a lot of memory.
 That's not really useable.

 From what I read, PyTables could really help to manipulate and analyze
 the data.
 So I've been thinking about writing a tool (probably in C) to convert my
 ASTERIX format to HDF5.

 Before I start, I'd like confirmation that this seems like a suitable
 application for PyTables.
 Is there another approach than writing a conversion tool to HDF5?

 Thanks in advance

 Benjamin

 ** **


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


  1   2   3   >