Yup, that is right, thanks Josh!
On Thu, Jan 3, 2013 at 12:29 PM, Josh Ayers <josh.ay...@gmail.com> wrote:
> David,
>
> The change in issue 27 was only for iteration over a tables.Column
> instance. To use it, tweak Anthony's code as follows. This will iterate
> over the "element" column, as in your original example.
>
> Note also that this will only work with the development version of
> PyTables available on github. It will be very slow using the released
> v2.4.0.
>
>
> from itertools import izip
>
> with tb.openFile(...) as f:
> data = f.root.data.cols.element
> data_i = iter(data)
> data_j = iter(data)
> data_i.next() # throw the first value away
> for i, j in izip(data_i, data_j):
> compare(i, j)
>
>
> Hope that helps,
> Josh
>
>
>
> On Thu, Jan 3, 2013 at 9:11 AM, Anthony Scopatz <scop...@gmail.com> wrote:
>
>> HI David,
>>
>> Tables and table column iteration have been overhauled fairly recently
>> [1]. So you might try creating two iterators, offset by one, and then
>> doing the comparison. I am hacking this out super quick so please forgive
>> me:
>>
>> from itertools import izip
>>
>> with tb.openFile(...) as f:
>> data = f.root.data
>> data_i = iter(data)
>> data_j = iter(data)
>> data_i.next() # throw the first value away
>> for i, j in izip(data_i, data_j):
>> compare(i, j)
>>
>> You get the idea ;)
>>
>> Be Well
>> Anthony
>>
>> 1. https://github.com/PyTables/PyTables/issues/27
>>
>>
>> On Thu, Jan 3, 2013 at 9:25 AM, David Reed <david.ree...@gmail.com>wrote:
>>
>>> I was hoping someone could help me out here.
>>>
>>> This is from a post I put up on StackOverflow,
>>>
>>> I am have a fairly large dataset that I store in HDF5 and access using
>>> PyTables. One operation I need to do on this dataset are pairwise
>>> comparisons between each of the elements. This requires 2 loops, one to
>>> iterate over each element, and an inner loop to iterate over every other
>>> element. This operation thus looks at N(N-1)/2 comparisons.
>>>
>>> For fairly small sets I found it to be faster to dump the contents into
>>> a multdimensional numpy array and then do my iteration. I run into problems
>>> with large sets because of memory issues and need to access each element of
>>> the dataset at run time.
>>>
>>> Putting the elements into an array gives me about 600 comparisons per
>>> second, while operating on hdf5 data itself gives me about 300 comparisons
>>> per second.
>>>
>>> Is there a way to speed this process up?
>>>
>>> Example follows (this is not my real code, just an example):
>>>
>>> *Small Set*:
>>>
>>>
>>>
>>> with tb.openFile(h5_file, 'r') as f:
>>> data = f.root.data
>>>
>>> N_elements = len(data)
>>> elements = np.empty((N_irises, 1e5))
>>>
>>> for ii, d in enumerate(data):
>>> elements[ii] = data['element']
>>>
>>> D = np.empty((N_irises, N_irises)) for ii in xrange(N_elements):
>>> for jj in xrange(ii+1, N_elements):
>>> D[ii, jj] = compare(elements[ii], elements[jj])
>>>
>>> *Large Set*:
>>>
>>>
>>>
>>> with tb.openFile(h5_file, 'r') as f:
>>> data = f.root.data
>>>
>>> N_elements = len(data)
>>>
>>> D = np.empty((N_irises, N_irises))
>>> for ii in xrange(N_elements):
>>> for jj in xrange(ii+1, N_elements):
>>> D[ii, jj] = compare(data['element'][ii], data['element'][jj])
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>>> MVPs and experts. ON SALE this month only -- learn more at:
>>> http://p.sf.net/sfu/learnmore_122712
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
>> MVPs and experts. ON SALE this month only -- learn more at:
>> http://p.sf.net/sfu/learnmore_122712
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122712
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users