Hey Bartosz,

A Wednesday 02 February 2011 09:37:14 Bartosz Telenczuk escrigué:
> Hi,
> 
> I am trying to implement efficient out-of-memory computations on
> large arrays. I have two questions:
> 
> 1) My data is stored in binary files, which I read using
> numpy.memmap. Is there a way to efficiently copy from memmap to
> CArray without reading all data into memory first? I suppose I could
> use iterate over chunks, but then I would need to optimize the
> chunksizes.

Yes, just try loading data in chunks.  For example, let's say that your 
array is bidimensional; I think something like should work:

carray = tables.createCArray(...)
for i,row in enumerate(your_memmap_array):
    carray[i] = row

Maybe using EArrays would be slightly simpler:

earray = tables.createEArray(...)
for i,row in your_memmap_array:
    earray.append(row)

For other dimensions you have to find an appropriate chunk, but the 
example above illustrates the idea.

> 2) In the data I want to find threshold crossings. In numpy I usually
> do it using nonzero function:
> 
> import numpy as np
> a = np.random.randn(100)
> T = 0
> i, = np.nonzero((a[:-1]<T) & (a[1:]>T))
> 
> How can I implement it with tables.Expr?

You can't.  tables.Expr only support expressions as in numexpr, and 
unfortunately, this does not include indexing variables in the middle of 
expressions (as in your example).

Hope this helps!

-- 
Francesc Alted

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to