Thanks a lot for the help so far guys!

Looking at itertools, I found what I believe to be the perfect function for
what I need, itertools.combinations. This appears to be a valid replacement
to the method proposed.

There is a small problem that I didn't mention is that my compare function
actually takes as inputs 2 columns from the table. Like so:

D = np.empty((N_irises, N_irises))
for ii in xrange(N_elements):
    for jj in xrange(ii+1, N_elements):
         D[ii, jj] = compare(data['element1'][ii],
data['element1'][jj],data['element2'][ii],
data['element2'][jj])

Is there an efficient way of using itertools with this structure?


On Thu, Jan 3, 2013 at 1:29 PM, <
pytables-users-requ...@lists.sourceforge.net> wrote:

> Send Pytables-users mailing list submissions to
>         pytables-users@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/pytables-users
> or, via email, send a message with subject or body 'help' to
>         pytables-users-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>         pytables-users-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Pytables-users digest..."
>
>
> Today's Topics:
>
>    1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 3 Jan 2013 10:29:33 -0800
> From: Josh Ayers <josh.ay...@gmail.com>
> Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
> To: Discussion list for PyTables
>         <pytables-users@lists.sourceforge.net>
> Message-ID:
>         <
> cacob4anozyd7dafos7sxs07mchzb8zbripbbrvbazrv4weq...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> David,
>
> The change in issue 27 was only for iteration over a tables.Column
> instance.  To use it, tweak Anthony's code as follows.  This will iterate
> over the "element" column, as in your original example.
>
> Note also that this will only work with the development version of PyTables
> available on github.  It will be very slow using the released v2.4.0.
>
>
> from itertools import izip
>
> with tb.openFile(...) as f:
>     data = f.root.data.cols.element
>     data_i = iter(data)
>     data_j = iter(data)
>     data_i.next() # throw the first value away
>     for i, j in izip(data_i, data_j):
>         compare(i, j)
>
>
> Hope that helps,
> Josh
>
>
>
> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <scop...@gmail.com> wrote:
>
> > HI David,
> >
> > Tables and table column iteration have been overhauled fairly recently
> > [1].  So you might try creating two iterators, offset by one, and then
> > doing the comparison.  I am hacking this out super quick so please
> forgive
> > me:
> >
> > from itertools import izip
> >
> > with tb.openFile(...) as f:
> >     data = f.root.data
> >     data_i = iter(data)
> >     data_j = iter(data)
> >     data_i.next() # throw the first value away
> >     for i, j in izip(data_i, data_j):
> >         compare(i, j)
> >
> > You get the idea ;)
> >
> > Be Well
> > Anthony
> >
> > 1. https://github.com/PyTables/PyTables/issues/27
> >
> >
> > On Thu, Jan 3, 2013 at 9:25 AM, David Reed <david.ree...@gmail.com>
> wrote:
> >
> >> I was hoping someone could help me out here.
> >>
> >> This is from a post I put up on StackOverflow,
> >>
> >> I am have a fairly large dataset that I store in HDF5 and access using
> >> PyTables. One operation I need to do on this dataset are pairwise
> >> comparisons between each of the elements. This requires 2 loops, one to
> >> iterate over each element, and an inner loop to iterate over every other
> >> element. This operation thus looks at N(N-1)/2 comparisons.
> >>
> >> For fairly small sets I found it to be faster to dump the contents into
> a
> >> multdimensional numpy array and then do my iteration. I run into
> problems
> >> with large sets because of memory issues and need to access each
> element of
> >> the dataset at run time.
> >>
> >> Putting the elements into an array gives me about 600 comparisons per
> >> second, while operating on hdf5 data itself gives me about 300
> comparisons
> >> per second.
> >>
> >> Is there a way to speed this process up?
> >>
> >> Example follows (this is not my real code, just an example):
> >>
> >> *Small Set*:
> >>
> >>
> >> with tb.openFile(h5_file, 'r') as f:
> >>     data = f.root.data
> >>
> >>     N_elements = len(data)
> >>     elements = np.empty((N_irises, 1e5))
> >>
> >>     for ii, d in enumerate(data):
> >>         elements[ii] = data['element']
> >>
> >> D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
> >>     for jj in xrange(ii+1, N_elements):
> >>         D[ii, jj] = compare(elements[ii], elements[jj])
> >>
> >>  *Large Set*:
> >>
> >>
> >> with tb.openFile(h5_file, 'r') as f:
> >>     data = f.root.data
> >>
> >>     N_elements = len(data)
> >>
> >>     D = np.empty((N_irises, N_irises))
> >>     for ii in xrange(N_elements):
> >>         for jj in xrange(ii+1, N_elements):
> >>              D[ii, jj] = compare(data['element'][ii],
> data['element'][jj])
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> >> MVPs and experts. ON SALE this month only -- learn more at:
> >> http://p.sf.net/sfu/learnmore_122712
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pytables-users@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>
> >>
> >
> >
> >
> ------------------------------------------------------------------------------
> > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> > MVPs and experts. ON SALE this month only -- learn more at:
> > http://p.sf.net/sfu/learnmore_122712
> > _______________________________________________
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
>
> ------------------------------
>
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> End of Pytables-users Digest, Vol 80, Issue 3
> *********************************************
>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to