Re: [Pytables-users] openFile strategy question
Hi Anthony, Oh OK, I think I understand a little better. What I would do would be to make for i,file in enumerate(hdf5_files) the outer most loop and then use the File.walkNodes() method [1] to walk each file and pick out only the data sets that you want to copy, skipping over all others. This should allow you to only open each of the 400 files once. Hope this helps. Thanks. This is the idea I had, but was failing to implement (although I didn't use walkNodes). To get it to work, I had to figure out how to use createEArray properly. In the end, it was a silly fix. I created an EArray with shape (0,96,1,2), and was trying to append numpy arrays of shape (96,1,2) to this, which was failing. In the end, all I had to do was arr.append(np.array([my_array])) where as before, I was simply missing the [ ] brackets, so the shapes did not line up. Cheers, Andre -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] openFile strategy question
Hi Andre, I am a little confused. Let me verify. You have 400 hdf5 file (re and im) buried in an a unix directory tree. You want to make a single file which concatenates this data. Is this right? Be Well Anthony On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud walksl...@gmail.comwrote: Hi All, Just a strategy question. I have many hdf5 files containing data for different measurements of the same quantities. My directory tree looks like top description [ group ] sub description [ group ] avg [ group ] re [ numpy array shape = (96,1,2) ] im [ numpy array shape = (96,1,2) ] - only exists for know subset of data files I have ~400 of these files. What I want to do is create a single file, which collects all of these files with exactly the same directory structure, except at the very bottom re [ numpy array shape = (400,96,1,2) ] The simplest thing I came up with to do this is loop over the two levels of descriptive group structures, and build the numpy array for the final set this way. basic loop structure: final_file = tables.openFile('all_data.h5','a') for d1 in top_description: final_file.createGroup(final_file.root,d1) for d2 in sub_description: final_file.createGroup(final_file.root+'/'+d1,d2) data_re = np.zeros([400,96,1,2]) for i,file in enumerate(hdf5_files): tmp = tables.openFile(file) data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re') tmp.close() final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re) But this involves opening and closing the individual 400 hdf5 files many times. There must be a smarter algorithmic way to do this - or perhaps built in pytables tools. Any advice is appreciated. Andre -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] openFile strategy question
Hi Anthony, I am a little confused. Let me verify. You have 400 hdf5 file (re and im) buried in an a unix directory tree. You want to make a single file which concatenates this data. Is this right? Sorry for my description - that is not quite right. The unix directory tree is the group tree I have made in each individual hdf5 file. So I have 400 hdf5 files, each with the given directory tree. And I basically want to copy that directory tree, but merge all of them together. However, there are bits in each of the small files that I do not want to merge - I only want to grab the average data sets, while the little files contains many different samples (which I have already averaged into the avg group. Is this clear? Thanks, Andre Be Well Anthony On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud walksl...@gmail.com wrote: Hi All, Just a strategy question. I have many hdf5 files containing data for different measurements of the same quantities. My directory tree looks like top description [ group ] sub description [ group ] avg [ group ] re [ numpy array shape = (96,1,2) ] im [ numpy array shape = (96,1,2) ] - only exists for know subset of data files I have ~400 of these files. What I want to do is create a single file, which collects all of these files with exactly the same directory structure, except at the very bottom re [ numpy array shape = (400,96,1,2) ] The simplest thing I came up with to do this is loop over the two levels of descriptive group structures, and build the numpy array for the final set this way. basic loop structure: final_file = tables.openFile('all_data.h5','a') for d1 in top_description: final_file.createGroup(final_file.root,d1) for d2 in sub_description: final_file.createGroup(final_file.root+'/'+d1,d2) data_re = np.zeros([400,96,1,2]) for i,file in enumerate(hdf5_files): tmp = tables.openFile(file) data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re') tmp.close() final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re) But this involves opening and closing the individual 400 hdf5 files many times. There must be a smarter algorithmic way to do this - or perhaps built in pytables tools. Any advice is appreciated. Andre -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users