Re: [Pytables-users] openFile strategy question

2012-08-16 Thread Andre' Walker-Loud
Hi Anthony,

 Oh OK, I think I understand a little better. What I would do would be to make 
 for i,file in enumerate(hdf5_files) the outer most loop and then use the 
 File.walkNodes() method [1] to walk each file and pick out only the data sets 
 that you want to copy, skipping over all others.  This should allow you to 
 only open each of the 400 files once.  Hope this helps.

Thanks.  This is the idea I had, but was failing to implement (although I 
didn't use walkNodes).  To get it to work, I had to figure out how to use 
createEArray properly.  In the end, it was a silly fix.

I created an EArray with shape (0,96,1,2), and was trying to append numpy 
arrays of shape (96,1,2) to this, which was failing.  In the end, all I had to 
do was

arr.append(np.array([my_array]))

where as before, I was simply missing the [ ] brackets, so the shapes did not 
line up.


Cheers,

Andre
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] openFile strategy question

2012-08-15 Thread Anthony Scopatz
Hi Andre,

I am a little confused.  Let me verify.  You have 400 hdf5 file (re and
im) buried in an a unix directory tree.  You want to make a single file
which concatenates this data.  Is this right?

Be Well
Anthony

On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud walksl...@gmail.comwrote:

 Hi All,

 Just a strategy question.
 I have many hdf5 files containing data for different measurements of the
 same quantities.

 My directory tree looks like

 top description [ group ]
   sub description [ group ]
 avg [ group ]
   re [ numpy array shape = (96,1,2) ]
   im [ numpy array shape = (96,1,2) ] - only exists for know subset of
 data files

 I have ~400 of these files.  What I want to do is create a single file,
 which collects all of these files with exactly the same directory
 structure, except at the very bottom

   re [ numpy array shape = (400,96,1,2) ]


 The simplest thing I came up with to do this is loop over the two levels
 of descriptive group structures, and build the numpy array for the final
 set this way.

 basic loop structure:

 final_file = tables.openFile('all_data.h5','a')

 for d1 in top_description:
 final_file.createGroup(final_file.root,d1)
 for d2 in sub_description:
 final_file.createGroup(final_file.root+'/'+d1,d2)
 data_re = np.zeros([400,96,1,2])
 for i,file in enumerate(hdf5_files):
 tmp = tables.openFile(file)
 data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re')
 tmp.close()
 final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re)


 But this involves opening and closing the individual 400 hdf5 files many
 times.
 There must be a smarter algorithmic way to do this - or perhaps built in
 pytables tools.

 Any advice is appreciated.


 Andre

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] openFile strategy question

2012-08-15 Thread Andre' Walker-Loud
Hi Anthony,

 I am a little confused.  Let me verify.  You have 400 hdf5 file (re and im) 
 buried in an a unix directory tree.  You want to make a single file which 
 concatenates this data.  Is this right?

Sorry for my description - that is not quite right.
The unix directory tree is the group tree I have made in each individual hdf5 
file.  So I have 400 hdf5 files, each with the given directory tree.  And I 
basically want to copy that directory tree, but merge all of them together.
However, there are bits in each of the small files that I do not want to merge 
- I only want to grab the average data sets, while the little files contains 
many different samples (which I have already averaged into the avg group.

Is this clear?


Thanks,

Andre



 
 Be Well
 Anthony
 
 On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud walksl...@gmail.com 
 wrote:
 Hi All,
 
 Just a strategy question.
 I have many hdf5 files containing data for different measurements of the same 
 quantities.
 
 My directory tree looks like
 
 top description [ group ]
   sub description [ group ]
 avg [ group ]
   re [ numpy array shape = (96,1,2) ]
   im [ numpy array shape = (96,1,2) ] - only exists for know subset of 
 data files
 
 I have ~400 of these files.  What I want to do is create a single file, which 
 collects all of these files with exactly the same directory structure, except 
 at the very bottom
 
   re [ numpy array shape = (400,96,1,2) ]
 
 
 The simplest thing I came up with to do this is loop over the two levels of 
 descriptive group structures, and build the numpy array for the final set 
 this way.
 
 basic loop structure:
 
 final_file = tables.openFile('all_data.h5','a')
 
 for d1 in top_description:
 final_file.createGroup(final_file.root,d1)
 for d2 in sub_description:
 final_file.createGroup(final_file.root+'/'+d1,d2)
 data_re = np.zeros([400,96,1,2])
 for i,file in enumerate(hdf5_files):
 tmp = tables.openFile(file)
 data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re')
 tmp.close()
 final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re)
 
 
 But this involves opening and closing the individual 400 hdf5 files many 
 times.
 There must be a smarter algorithmic way to do this - or perhaps built in 
 pytables tools.
 
 Any advice is appreciated.
 
 
 Andre
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and 
 threat landscape has changed and how IT managers can respond. Discussions 
 will include endpoint security, mobile security and the latest in malware 
 threats. 
 http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users