[Pytables-users] Nested Iteration of HDF5 using PyTables

2013-01-03 Thread David Reed
I was hoping someone could help me out here.

This is from a post I put up on StackOverflow,

I am have a fairly large dataset that I store in HDF5 and access using
PyTables. One operation I need to do on this dataset are pairwise
comparisons between each of the elements. This requires 2 loops, one to
iterate over each element, and an inner loop to iterate over every other
element. This operation thus looks at N(N-1)/2 comparisons.

For fairly small sets I found it to be faster to dump the contents into a
multdimensional numpy array and then do my iteration. I run into problems
with large sets because of memory issues and need to access each element of
the dataset at run time.

Putting the elements into an array gives me about 600 comparisons per
second, while operating on hdf5 data itself gives me about 300 comparisons
per second.

Is there a way to speed this process up?

Example follows (this is not my real code, just an example):

*Small Set*:

with tb.openFile(h5_file, 'r') as f:
data = f.root.data

N_elements = len(data)
elements = np.empty((N_irises, 1e5))

for ii, d in enumerate(data):
elements[ii] = data['element']

D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
for jj in xrange(ii+1, N_elements):
D[ii, jj] = compare(elements[ii], elements[jj])

 *Large Set*:

with tb.openFile(h5_file, 'r') as f:
data = f.root.data

N_elements = len(data)

D = np.empty((N_irises, N_irises))
for ii in xrange(N_elements):
for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(data['element'][ii], data['element'][jj])
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

2013-01-03 Thread Anthony Scopatz
HI David,

Tables and table column iteration have been overhauled fairly recently [1].
 So you might try creating two iterators, offset by one, and then doing the
comparison.  I am hacking this out super quick so please forgive me:

from itertools import izip

with tb.openFile(...) as f:
data = f.root.data
data_i = iter(data)
data_j = iter(data)
data_i.next() # throw the first value away
for i, j in izip(data_i, data_j):
compare(i, j)

You get the idea ;)

Be Well
Anthony

1. https://github.com/PyTables/PyTables/issues/27


On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com wrote:

 I was hoping someone could help me out here.

 This is from a post I put up on StackOverflow,

 I am have a fairly large dataset that I store in HDF5 and access using
 PyTables. One operation I need to do on this dataset are pairwise
 comparisons between each of the elements. This requires 2 loops, one to
 iterate over each element, and an inner loop to iterate over every other
 element. This operation thus looks at N(N-1)/2 comparisons.

 For fairly small sets I found it to be faster to dump the contents into a
 multdimensional numpy array and then do my iteration. I run into problems
 with large sets because of memory issues and need to access each element of
 the dataset at run time.

 Putting the elements into an array gives me about 600 comparisons per
 second, while operating on hdf5 data itself gives me about 300 comparisons
 per second.

 Is there a way to speed this process up?

 Example follows (this is not my real code, just an example):

 *Small Set*:


 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)
 elements = np.empty((N_irises, 1e5))

 for ii, d in enumerate(data):
 elements[ii] = data['element']

 D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(elements[ii], elements[jj])

  *Large Set*:


 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element'][ii], data['element'][jj])



 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

2013-01-03 Thread Josh Ayers
David,

The change in issue 27 was only for iteration over a tables.Column
instance.  To use it, tweak Anthony's code as follows.  This will iterate
over the element column, as in your original example.

Note also that this will only work with the development version of PyTables
available on github.  It will be very slow using the released v2.4.0.


from itertools import izip

with tb.openFile(...) as f:
data = f.root.data.cols.element
data_i = iter(data)
data_j = iter(data)
data_i.next() # throw the first value away
for i, j in izip(data_i, data_j):
compare(i, j)


Hope that helps,
Josh



On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz scop...@gmail.com wrote:

 HI David,

 Tables and table column iteration have been overhauled fairly recently
 [1].  So you might try creating two iterators, offset by one, and then
 doing the comparison.  I am hacking this out super quick so please forgive
 me:

 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)

 You get the idea ;)

 Be Well
 Anthony

 1. https://github.com/PyTables/PyTables/issues/27


 On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com wrote:

 I was hoping someone could help me out here.

 This is from a post I put up on StackOverflow,

 I am have a fairly large dataset that I store in HDF5 and access using
 PyTables. One operation I need to do on this dataset are pairwise
 comparisons between each of the elements. This requires 2 loops, one to
 iterate over each element, and an inner loop to iterate over every other
 element. This operation thus looks at N(N-1)/2 comparisons.

 For fairly small sets I found it to be faster to dump the contents into a
 multdimensional numpy array and then do my iteration. I run into problems
 with large sets because of memory issues and need to access each element of
 the dataset at run time.

 Putting the elements into an array gives me about 600 comparisons per
 second, while operating on hdf5 data itself gives me about 300 comparisons
 per second.

 Is there a way to speed this process up?

 Example follows (this is not my real code, just an example):

 *Small Set*:


 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)
 elements = np.empty((N_irises, 1e5))

 for ii, d in enumerate(data):
 elements[ii] = data['element']

 D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(elements[ii], elements[jj])

  *Large Set*:


 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element'][ii], data['element'][jj])



 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2

2013-01-03 Thread David Reed
Thanks Anthony, but unless Im missing something I don't think that method
will work since this will only be comparing the ith element with ith+1
element.  I still need 2 for loops right?

Using itertools might speed things up though, I've never used them so I
will give it a shot and let you know how it goes.  Looks like I need to
download the latest release before I do that too.  Thanks for the help.

-Dave



On Thu, Jan 3, 2013 at 12:12 PM, 
pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 11:11:47 -0600
 From: Anthony Scopatz scop...@gmail.com
 Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 Message-ID:
 CAPk-6T5b=
 1egagp4+jhjcd3_4fnvbxrob2jbhay45rwdqzy...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 HI David,

 Tables and table column iteration have been overhauled fairly recently [1].
  So you might try creating two iterators, offset by one, and then doing the
 comparison.  I am hacking this out super quick so please forgive me:

 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)

 You get the idea ;)

 Be Well
 Anthony

 1. https://github.com/PyTables/PyTables/issues/27


 On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com wrote:

  I was hoping someone could help me out here.
 
  This is from a post I put up on StackOverflow,
 
  I am have a fairly large dataset that I store in HDF5 and access using
  PyTables. One operation I need to do on this dataset are pairwise
  comparisons between each of the elements. This requires 2 loops, one to
  iterate over each element, and an inner loop to iterate over every other
  element. This operation thus looks at N(N-1)/2 comparisons.
 
  For fairly small sets I found it to be faster to dump the contents into a
  multdimensional numpy array and then do my iteration. I run into problems
  with large sets because of memory issues and need to access each element
 of
  the dataset at run time.
 
  Putting the elements into an array gives me about 600 comparisons per
  second, while operating on hdf5 data itself gives me about 300
 comparisons
  per second.
 
  Is there a way to speed this process up?
 
  Example follows (this is not my real code, just an example):
 
  *Small Set*:
 
 
  with tb.openFile(h5_file, 'r') as f:
  data = f.root.data
 
  N_elements = len(data)
  elements = np.empty((N_irises, 1e5))
 
  for ii, d in enumerate(data):
  elements[ii] = data['element']
 
  D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
  for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(elements[ii], elements[jj])
 
   *Large Set*:
 
 
  with tb.openFile(h5_file, 'r') as f:
  data = f.root.data
 
  N_elements = len(data)
 
  D = np.empty((N_irises, N_irises))
  for ii in xrange(N_elements):
  for jj in xrange(ii+1, N_elements):
   D[ii, jj] = compare(data['element'][ii],
 data['element'][jj])
 
 
 
 
 --
  Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
  MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
  with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
  MVPs and experts. ON SALE this month only -- learn more at:
  http://p.sf.net/sfu/learnmore_122712
  ___
  Pytables-users mailing list
  Pytables-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/pytables-users
 
 
 -- next part --
 An HTML attachment was scrubbed...

 --


 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month 

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3

2013-01-03 Thread David Reed
Thanks a lot for the help so far guys!

Looking at itertools, I found what I believe to be the perfect function for
what I need, itertools.combinations. This appears to be a valid replacement
to the method proposed.

There is a small problem that I didn't mention is that my compare function
actually takes as inputs 2 columns from the table. Like so:

D = np.empty((N_irises, N_irises))
for ii in xrange(N_elements):
for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(data['element1'][ii],
data['element1'][jj],data['element2'][ii],
data['element2'][jj])

Is there an efficient way of using itertools with this structure?


On Thu, Jan 3, 2013 at 1:29 PM, 
pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 10:29:33 -0800
 From: Josh Ayers josh.ay...@gmail.com
 Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 Message-ID:
 
 cacob4anozyd7dafos7sxs07mchzb8zbripbbrvbazrv4weq...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 David,

 The change in issue 27 was only for iteration over a tables.Column
 instance.  To use it, tweak Anthony's code as follows.  This will iterate
 over the element column, as in your original example.

 Note also that this will only work with the development version of PyTables
 available on github.  It will be very slow using the released v2.4.0.


 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data.cols.element
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)


 Hope that helps,
 Josh



 On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz scop...@gmail.com wrote:

  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
  [1].  So you might try creating two iterators, offset by one, and then
  doing the comparison.  I am hacking this out super quick so please
 forgive
  me:
 
  from itertools import izip
 
  with tb.openFile(...) as f:
  data = f.root.data
  data_i = iter(data)
  data_j = iter(data)
  data_i.next() # throw the first value away
  for i, j in izip(data_i, data_j):
  compare(i, j)
 
  You get the idea ;)
 
  Be Well
  Anthony
 
  1. https://github.com/PyTables/PyTables/issues/27
 
 
  On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com
 wrote:
 
  I was hoping someone could help me out here.
 
  This is from a post I put up on StackOverflow,
 
  I am have a fairly large dataset that I store in HDF5 and access using
  PyTables. One operation I need to do on this dataset are pairwise
  comparisons between each of the elements. This requires 2 loops, one to
  iterate over each element, and an inner loop to iterate over every other
  element. This operation thus looks at N(N-1)/2 comparisons.
 
  For fairly small sets I found it to be faster to dump the contents into
 a
  multdimensional numpy array and then do my iteration. I run into
 problems
  with large sets because of memory issues and need to access each
 element of
  the dataset at run time.
 
  Putting the elements into an array gives me about 600 comparisons per
  second, while operating on hdf5 data itself gives me about 300
 comparisons
  per second.
 
  Is there a way to speed this process up?
 
  Example follows (this is not my real code, just an example):
 
  *Small Set*:
 
 
  with tb.openFile(h5_file, 'r') as f:
  data = f.root.data
 
  N_elements = len(data)
  elements = np.empty((N_irises, 1e5))
 
  for ii, d in enumerate(data):
  elements[ii] = data['element']
 
  D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
  for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(elements[ii], elements[jj])
 
   *Large Set*:
 
 
  with tb.openFile(h5_file, 'r') as f:
  data = f.root.data
 
  N_elements = len(data)
 
  D = np.empty((N_irises, N_irises))
  for ii in xrange(N_elements):
  for jj in xrange(ii+1, N_elements):
   D[ii, jj] = compare(data['element'][ii],
 data['element'][jj])
 
 
 
 
 

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

2013-01-03 Thread David Reed
I apologize if I'm starting to sound helpless, but I'm forced to work on
Windows 7 at work and have never had luck compiling python source
successfully.  I have had to rely on precompiled binaries and now its
biting me in the butt.

Is there any quick fix I can do to improve this iteration using v2.4.0?


On Thu, Jan 3, 2013 at 3:17 PM, 
pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 13:44:29 -0500
 From: David Reed david.ree...@gmail.com
 Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
 To: pytables-users@lists.sourceforge.net
 Message-ID:
 CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
 ev...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 Thanks Anthony, but unless Im missing something I don't think that method
 will work since this will only be comparing the ith element with ith+1
 element.  I still need 2 for loops right?

 Using itertools might speed things up though, I've never used them so I
 will give it a shot and let you know how it goes.  Looks like I need to
 download the latest release before I do that too.  Thanks for the help.

 -Dave



 On Thu, Jan 3, 2013 at 12:12 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

  Send Pytables-users mailing list submissions to
  pytables-users@lists.sourceforge.net
 
  To subscribe or unsubscribe via the World Wide Web, visit
  https://lists.sourceforge.net/lists/listinfo/pytables-users
  or, via email, send a message with subject or body 'help' to
  pytables-users-requ...@lists.sourceforge.net
 
  You can reach the person managing the list at
  pytables-users-ow...@lists.sourceforge.net
 
  When replying, please edit your Subject line so it is more specific
  than Re: Contents of Pytables-users digest...
 
 
  Today's Topics:
 
 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
 
 
  --
 
  Message: 1
  Date: Thu, 3 Jan 2013 11:11:47 -0600
  From: Anthony Scopatz scop...@gmail.com
  Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
  To: Discussion list for PyTables
  pytables-users@lists.sourceforge.net
  Message-ID:
  CAPk-6T5b=
  1egagp4+jhjcd3_4fnvbxrob2jbhay45rwdqzy...@mail.gmail.com
  Content-Type: text/plain; charset=iso-8859-1
 
  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
 [1].
   So you might try creating two iterators, offset by one, and then doing
 the
  comparison.  I am hacking this out super quick so please forgive me:
 
  from itertools import izip
 
  with tb.openFile(...) as f:
  data = f.root.data
  data_i = iter(data)
  data_j = iter(data)
  data_i.next() # throw the first value away
  for i, j in izip(data_i, data_j):
  compare(i, j)
 
  You get the idea ;)
 
  Be Well
  Anthony
 
  1. https://github.com/PyTables/PyTables/issues/27
 
 
  On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com
 wrote:
 
   I was hoping someone could help me out here.
  
   This is from a post I put up on StackOverflow,
  
   I am have a fairly large dataset that I store in HDF5 and access using
   PyTables. One operation I need to do on this dataset are pairwise
   comparisons between each of the elements. This requires 2 loops, one to
   iterate over each element, and an inner loop to iterate over every
 other
   element. This operation thus looks at N(N-1)/2 comparisons.
  
   For fairly small sets I found it to be faster to dump the contents
 into a
   multdimensional numpy array and then do my iteration. I run into
 problems
   with large sets because of memory issues and need to access each
 element
  of
   the dataset at run time.
  
   Putting the elements into an array gives me about 600 comparisons per
   second, while operating on hdf5 data itself gives me about 300
  comparisons
   per second.
  
   Is there a way to speed this process up?
  
   Example follows (this is not my real code, just an example):
  
   *Small Set*:
  
  
   with tb.openFile(h5_file, 'r') as f:
   data = f.root.data
  
   N_elements = len(data)
   elements = 

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

2013-01-03 Thread Josh Ayers
The change was in pure Python code, so you should be able to just paste in
the changes to your local copy.  Start with the table.Column.__iter__
method (lines 3296-3310) here.

https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py

It needs to be modified slightly because it uses some additional features
that aren't available in the released version (the out=buf_slice argument
to table.read).  The following should work.

def __iter__(self):
table = self.table
itemsize = self.dtype.itemsize
nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
max_row = len(self)
for start_row in xrange(0, len(self), nrowsinbuf):
end_row = min([start_row + nrowsinbuf, max_row])
buf = table.read(start_row, end_row, 1, field=self.pathname)
for row in buf:
yield row


I haven't tested this, but I think it will work.

Josh



On Thu, Jan 3, 2013 at 1:25 PM, David Reed david.ree...@gmail.com wrote:

 I apologize if I'm starting to sound helpless, but I'm forced to work on
 Windows 7 at work and have never had luck compiling python source
 successfully.  I have had to rely on precompiled binaries and now its
 biting me in the butt.

 Is there any quick fix I can do to improve this iteration using v2.4.0?


 On Thu, Jan 3, 2013 at 3:17 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 13:44:29 -0500
 From: David Reed david.ree...@gmail.com
 Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
 To: pytables-users@lists.sourceforge.net
 Message-ID:
 CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
 ev...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 Thanks Anthony, but unless Im missing something I don't think that method
 will work since this will only be comparing the ith element with ith+1
 element.  I still need 2 for loops right?

 Using itertools might speed things up though, I've never used them so I
 will give it a shot and let you know how it goes.  Looks like I need to
 download the latest release before I do that too.  Thanks for the help.

 -Dave



 On Thu, Jan 3, 2013 at 12:12 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

  Send Pytables-users mailing list submissions to
  pytables-users@lists.sourceforge.net
 
  To subscribe or unsubscribe via the World Wide Web, visit
  https://lists.sourceforge.net/lists/listinfo/pytables-users
  or, via email, send a message with subject or body 'help' to
  pytables-users-requ...@lists.sourceforge.net
 
  You can reach the person managing the list at
  pytables-users-ow...@lists.sourceforge.net
 
  When replying, please edit your Subject line so it is more specific
  than Re: Contents of Pytables-users digest...
 
 
  Today's Topics:
 
 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
 
 
  --
 
  Message: 1
  Date: Thu, 3 Jan 2013 11:11:47 -0600
  From: Anthony Scopatz scop...@gmail.com
  Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
  To: Discussion list for PyTables
  pytables-users@lists.sourceforge.net
  Message-ID:
  CAPk-6T5b=
  1egagp4+jhjcd3_4fnvbxrob2jbhay45rwdqzy...@mail.gmail.com
  Content-Type: text/plain; charset=iso-8859-1
 
  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
 [1].
   So you might try creating two iterators, offset by one, and then doing
 the
  comparison.  I am hacking this out super quick so please forgive me:
 
  from itertools import izip
 
  with tb.openFile(...) as f:
  data = f.root.data
  data_i = iter(data)
  data_j = iter(data)
  data_i.next() # throw the first value away
  for i, j in izip(data_i, data_j):
  compare(i, j)
 
  You get the idea ;)
 
  Be Well
  Anthony
 
  1. https://github.com/PyTables/PyTables/issues/27
 
 
  On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com
 wrote:
 
   I was hoping someone could help me out here.
  
   This is from a post I put up on StackOverflow,
  
   I am have a 

Re: [Pytables-users] Nested Iteration of HDF5 using PyTables

2013-01-03 Thread Anthony Scopatz
Yup, that is right, thanks Josh!


On Thu, Jan 3, 2013 at 12:29 PM, Josh Ayers josh.ay...@gmail.com wrote:

 David,

 The change in issue 27 was only for iteration over a tables.Column
 instance.  To use it, tweak Anthony's code as follows.  This will iterate
 over the element column, as in your original example.

 Note also that this will only work with the development version of
 PyTables available on github.  It will be very slow using the released
 v2.4.0.


 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data.cols.element
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)


 Hope that helps,
 Josh



 On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz scop...@gmail.com wrote:

 HI David,

 Tables and table column iteration have been overhauled fairly recently
 [1].  So you might try creating two iterators, offset by one, and then
 doing the comparison.  I am hacking this out super quick so please forgive
 me:

 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)

 You get the idea ;)

 Be Well
 Anthony

 1. https://github.com/PyTables/PyTables/issues/27


 On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.comwrote:

 I was hoping someone could help me out here.

 This is from a post I put up on StackOverflow,

 I am have a fairly large dataset that I store in HDF5 and access using
 PyTables. One operation I need to do on this dataset are pairwise
 comparisons between each of the elements. This requires 2 loops, one to
 iterate over each element, and an inner loop to iterate over every other
 element. This operation thus looks at N(N-1)/2 comparisons.

 For fairly small sets I found it to be faster to dump the contents into
 a multdimensional numpy array and then do my iteration. I run into problems
 with large sets because of memory issues and need to access each element of
 the dataset at run time.

 Putting the elements into an array gives me about 600 comparisons per
 second, while operating on hdf5 data itself gives me about 300 comparisons
 per second.

 Is there a way to speed this process up?

 Example follows (this is not my real code, just an example):

 *Small Set*:



 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)
 elements = np.empty((N_irises, 1e5))

 for ii, d in enumerate(data):
 elements[ii] = data['element']

 D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
 D[ii, jj] = compare(elements[ii], elements[jj])

  *Large Set*:



 with tb.openFile(h5_file, 'r') as f:
 data = f.root.data

 N_elements = len(data)

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element'][ii], data['element'][jj])



 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. ON SALE this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122712
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, 

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 3

2013-01-03 Thread Anthony Scopatz
On Thu, Jan 3, 2013 at 2:17 PM, David Reed david.ree...@gmail.com wrote:

 Thanks a lot for the help so far guys!

 Looking at itertools, I found what I believe to be the perfect function
 for what I need, itertools.combinations. This appears to be a valid
 replacement to the method proposed.


Yes, combinations is awesome!



 There is a small problem that I didn't mention is that my compare function
 actually takes as inputs 2 columns from the table. Like so:

 D = np.empty((N_irises, N_irises))
 for ii in xrange(N_elements):
 for jj in xrange(ii+1, N_elements):
  D[ii, jj] = compare(data['element1'][ii], 
 data['element1'][jj],data['element2'][ii],
 data['element2'][jj])

 Is there an efficient way of using itertools with this structure?


You can always make two other iterators for each column.  Since you have
two columns you would have 4 iterators.  I am not sure how fast this is
going to be but I am confident that there is definitely a way to do this in
one for-loop, which is going to be way faster than nested loops.

Be Well
Anthony




 On Thu, Jan 3, 2013 at 1:29 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Nested Iteration of HDF5 using PyTables (Josh Ayers)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 10:29:33 -0800
 From: Josh Ayers josh.ay...@gmail.com
 Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
 To: Discussion list for PyTables
 pytables-users@lists.sourceforge.net
 Message-ID:
 
 cacob4anozyd7dafos7sxs07mchzb8zbripbbrvbazrv4weq...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 David,

 The change in issue 27 was only for iteration over a tables.Column
 instance.  To use it, tweak Anthony's code as follows.  This will iterate
 over the element column, as in your original example.

 Note also that this will only work with the development version of
 PyTables
 available on github.  It will be very slow using the released v2.4.0.


 from itertools import izip

 with tb.openFile(...) as f:
 data = f.root.data.cols.element
 data_i = iter(data)
 data_j = iter(data)
 data_i.next() # throw the first value away
 for i, j in izip(data_i, data_j):
 compare(i, j)


 Hope that helps,
 Josh



 On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz scop...@gmail.com
 wrote:

  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
  [1].  So you might try creating two iterators, offset by one, and then
  doing the comparison.  I am hacking this out super quick so please
 forgive
  me:
 
  from itertools import izip
 
  with tb.openFile(...) as f:
  data = f.root.data
  data_i = iter(data)
  data_j = iter(data)
  data_i.next() # throw the first value away
  for i, j in izip(data_i, data_j):
  compare(i, j)
 
  You get the idea ;)
 
  Be Well
  Anthony
 
  1. https://github.com/PyTables/PyTables/issues/27
 
 
  On Thu, Jan 3, 2013 at 9:25 AM, David Reed david.ree...@gmail.com
 wrote:
 
  I was hoping someone could help me out here.
 
  This is from a post I put up on StackOverflow,
 
  I am have a fairly large dataset that I store in HDF5 and access using
  PyTables. One operation I need to do on this dataset are pairwise
  comparisons between each of the elements. This requires 2 loops, one to
  iterate over each element, and an inner loop to iterate over every
 other
  element. This operation thus looks at N(N-1)/2 comparisons.
 
  For fairly small sets I found it to be faster to dump the contents
 into a
  multdimensional numpy array and then do my iteration. I run into
 problems
  with large sets because of memory issues and need to access each
 element of
  the dataset at run time.
 
  Putting the elements into an array gives me about 600 comparisons per
  second, while operating on hdf5 data itself gives me about 300
 comparisons
  per second.
 
  Is there a way to speed this process up?
 
  Example follows (this is not my real code, just an example):
 
  *Small Set*:
 
 
  with tb.openFile(h5_file, 'r') as f:
  data = f.root.data
 
  N_elements = len(data)
  elements = np.empty((N_irises, 1e5))
 
  for ii, d in enumerate(data):
  elements[ii] = data['element']
 
  D = np.empty((N_irises, N_irises))  for ii in xrange(N_elements):
  for jj in xrange(ii+1, N_elements):
  

Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 4

2013-01-03 Thread Anthony Scopatz
Josh is right that you can just edit the code by hand (which works but
sucks).

However, on Windows -- on the rare occasion when I also have to develop on
it -- I typically use a distribution that includes a compiler, cython,
hdf5, and pytables already and then I install my development version from
github OVER this.  I recommend either EPD or Anaconda, though other
distributions listed here [1] might also work.

Be well
Anthony

1. http://numfocus.org/projects-2/software-distributions/


On Thu, Jan 3, 2013 at 3:46 PM, Josh Ayers josh.ay...@gmail.com wrote:

 The change was in pure Python code, so you should be able to just paste in
 the changes to your local copy.  Start with the table.Column.__iter__
 method (lines 3296-3310) here.


 https://github.com/PyTables/PyTables/blob/b479ed025f4636f7f4744ac83a89bc947808907c/tables/table.py

 It needs to be modified slightly because it uses some additional features
 that aren't available in the released version (the out=buf_slice argument
 to table.read).  The following should work.

 def __iter__(self):
 table = self.table
 itemsize = self.dtype.itemsize
 nrowsinbuf = table._v_file.params['IO_BUFFER_SIZE'] // itemsize
 max_row = len(self)
 for start_row in xrange(0, len(self), nrowsinbuf):
 end_row = min([start_row + nrowsinbuf, max_row])
 buf = table.read(start_row, end_row, 1, field=self.pathname)
 for row in buf:
 yield row


 I haven't tested this, but I think it will work.

 Josh



 On Thu, Jan 3, 2013 at 1:25 PM, David Reed david.ree...@gmail.com wrote:

 I apologize if I'm starting to sound helpless, but I'm forced to work on
 Windows 7 at work and have never had luck compiling python source
 successfully.  I have had to rely on precompiled binaries and now its
 biting me in the butt.

 Is there any quick fix I can do to improve this iteration using v2.4.0?


 On Thu, Jan 3, 2013 at 3:17 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

 Send Pytables-users mailing list submissions to
 pytables-users@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/pytables-users
 or, via email, send a message with subject or body 'help' to
 pytables-users-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 pytables-users-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Pytables-users digest...


 Today's Topics:

1. Re: Pytables-users Digest, Vol 80, Issue 2 (David Reed)
2. Re: Pytables-users Digest, Vol 80, Issue 3 (David Reed)


 --

 Message: 1
 Date: Thu, 3 Jan 2013 13:44:29 -0500
 From: David Reed david.ree...@gmail.com
 Subject: Re: [Pytables-users] Pytables-users Digest, Vol 80, Issue 2
 To: pytables-users@lists.sourceforge.net
 Message-ID:
 CAM6XA7=8ocg5WPD4KLSvLhSw-3BCvq5u7MRxq3Ajd6ha=
 ev...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 Thanks Anthony, but unless Im missing something I don't think that method
 will work since this will only be comparing the ith element with ith+1
 element.  I still need 2 for loops right?

 Using itertools might speed things up though, I've never used them so I
 will give it a shot and let you know how it goes.  Looks like I need to
 download the latest release before I do that too.  Thanks for the help.

 -Dave



 On Thu, Jan 3, 2013 at 12:12 PM, 
 pytables-users-requ...@lists.sourceforge.net wrote:

  Send Pytables-users mailing list submissions to
  pytables-users@lists.sourceforge.net
 
  To subscribe or unsubscribe via the World Wide Web, visit
  https://lists.sourceforge.net/lists/listinfo/pytables-users
  or, via email, send a message with subject or body 'help' to
  pytables-users-requ...@lists.sourceforge.net
 
  You can reach the person managing the list at
  pytables-users-ow...@lists.sourceforge.net
 
  When replying, please edit your Subject line so it is more specific
  than Re: Contents of Pytables-users digest...
 
 
  Today's Topics:
 
 1. Re: Nested Iteration of HDF5 using PyTables (Anthony Scopatz)
 
 
  --
 
  Message: 1
  Date: Thu, 3 Jan 2013 11:11:47 -0600
  From: Anthony Scopatz scop...@gmail.com
  Subject: Re: [Pytables-users] Nested Iteration of HDF5 using PyTables
  To: Discussion list for PyTables
  pytables-users@lists.sourceforge.net
  Message-ID:
  CAPk-6T5b=
  1egagp4+jhjcd3_4fnvbxrob2jbhay45rwdqzy...@mail.gmail.com
  Content-Type: text/plain; charset=iso-8859-1
 
  HI David,
 
  Tables and table column iteration have been overhauled fairly recently
 [1].
   So you might try creating two iterators, offset by one, and then
 doing the
  comparison.  I am hacking this out