Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

Anthony Scopatz Thu, 03 Jan 2013 09:12:37 -0800

HI David,

Tables and table column iteration have been overhauled fairly recently [1].
 So you might try creating two iterators, offset by one, and then doing the
comparison.  I am hacking this out super quick so please forgive me:


from itertools import izip

with tb.openFile(...) as f:
    data = f.root.data
    data_i = iter(data)
    data_j = iter(data)
    data_i.next() # throw the first value away
    for i, j in izip(data_i, data_j):
        compare(i, j)

You get the idea ;)

Be Well
Anthony

1. https://github.com/PyTables/PyTables/issues/27


On Thu, Jan 3, 2013 at 9:25 AM, David Reed <david.ree...@gmail.com> wrote:

> I was hoping someone could help me out here.
>
> This is from a post I put up on StackOverflow,
>
> I am have a fairly large dataset that I store in HDF5 and access using
> PyTables. One operation I need to do on this dataset are pairwise
> comparisons between each of the elements. This requires 2 loops, one to
> iterate over each element, and an inner loop to iterate over every other
> element. This operation thus looks at N(N-1)/2 comparisons.
>
> For fairly small sets I found it to be faster to dump the contents into a
> multdimensional numpy array and then do my iteration. I run into problems
> with large sets because of memory issues and need to access each element of
> the dataset at run time.
>
> Putting the elements into an array gives me about 600 comparisons per
> second, while operating on hdf5 data itself gives me about 300 comparisons
> per second.
>
> Is there a way to speed this process up?
>
> Example follows (this is not my real code, just an example):
>
> *Small Set*:
>
>
> with tb.openFile(h5_file, 'r') as f:
>     data = f.root.data
>
>     N_elements = len(data)
>     elements = np.empty((N_irises, 1e5))
>
>     for ii, d in enumerate(data):
>         elements[ii] = data['element']
>
> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
>     for jj in xrange(ii+1, N_elements):
>         D[ii, jj] = compare(elements[ii], elements[jj])
>
>  *Large Set*:
>
>
> with tb.openFile(h5_file, 'r') as f:
>     data = f.root.data
>
>     N_elements = len(data)
>
>     D = np.empty((N_irises, N_irises))
>     for ii in xrange(N_elements):
>         for jj in xrange(ii+1, N_elements):
>              D[ii, jj] = compare(data['element'][ii], data['element'][jj])
>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

Reply via email to