Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-25 Thread Daπid
On 24 January 2014 23:09, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  Francesc: Thanks. I looked at numexpr a few years back but it didn't
 support array slicing/indexing.  Has that changed?


No, but you can do it yourself.

big_array = np.empty(2)
piece = big_array[30:-50]
ne.evaluate('sqrt(piece)')

Here, creating piece does not increase memory use, as slicing shares the
original data (well, actually, it adds a mere 80 bytes, the overhead of an
array).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Dinesh Vadhia
I want to write a general exception handler to warn if too much data is being 
loaded for the ram size in a machine for a successful numpy array operation to 
take place.  For example, the program multiplies two floating point arrays A 
and B which are populated with loadtext.  While the data is being loaded, want 
to continuously check that the data volume doesn't pass a threshold that will 
cause on out-of-memory error during the A*B operation.  The known variables are 
the amount of memory available, data type (floats in this case) and the numpy 
array operation to be performed. It seems this requires knowledge of the 
internal memory requirements of each numpy operation.  For sake of simplicity, 
can ignore other memory needs of program.  Is this possible?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith
There is no reliable way to predict how much memory an arbitrary numpy
operation will need, no. However, in most cases the main memory cost will
be simply the need to store the input and output arrays; for large arrays,
all other allocations should be negligible.

The most effective way to avoid running out of memory, therefore, is to
avoid creating temporary arrays, by using only in-place operations.

E.g., if a and b each require N bytes of ram, then memory requirements
(roughly).

c = a + b: 3N
c = a + 2*b: 4N
a += b: 2N
np.add(a, b, out=a): 2N
b *= 2; a += b: 2N

Note that simply loading a and b requires 2N memory, so the latter code
samples are near-optimal.

Of course some calculations do require the use of temporary storage space...

-n
On 24 Jan 2014 15:19, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  I want to write a general exception handler to warn if too much data is
 being loaded for the ram size in a machine for a successful numpy array
 operation to take place.  For example, the program multiplies two floating
 point arrays A and B which are populated with loadtext.  While the data is
 being loaded, want to continuously check that the data volume doesn't pass
 a threshold that will cause on out-of-memory error during the A*B
 operation.  The known variables are the amount of memory available, data
 type (floats in this case) and the numpy array operation to be performed.
 It seems this requires knowledge of the internal memory requirements of
 each numpy operation.  For sake of simplicity, can ignore other memory
 needs of program.  Is this possible?


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Francesc Alted

Yeah, numexpr is pretty cool for avoiding temporaries in an easy way:

https://github.com/pydata/numexpr

Francesc

El 24/01/14 16:30, Nathaniel Smith ha escrit:


There is no reliable way to predict how much memory an arbitrary numpy 
operation will need, no. However, in most cases the main memory cost 
will be simply the need to store the input and output arrays; for 
large arrays, all other allocations should be negligible.


The most effective way to avoid running out of memory, therefore, is 
to avoid creating temporary arrays, by using only in-place operations.


E.g., if a and b each require N bytes of ram, then memory requirements 
(roughly).


c = a + b: 3N
c = a + 2*b: 4N
a += b: 2N
np.add(a, b, out=a): 2N
b *= 2; a += b: 2N

Note that simply loading a and b requires 2N memory, so the latter 
code samples are near-optimal.


Of course some calculations do require the use of temporary storage 
space...


-n

On 24 Jan 2014 15:19, Dinesh Vadhia dineshbvad...@hotmail.com 
mailto:dineshbvad...@hotmail.com wrote:


I want to write a general exception handler to warn if too much
data is being loaded for the ram size in a machine for a
successful numpy array operation to take place.  For example, the
program multiplies two floating point arrays A and B which are
populated with loadtext.  While the data is being loaded, want to
continuously check that the data volume doesn't pass a threshold
that will cause on out-of-memory error during the A*B operation.
The known variables are the amount of memory available, data type
(floats in this case) and the numpy array operation to be
performed. It seems this requires knowledge of the internal memory
requirements of each numpy operation.  For sake of simplicity, can
ignore other memory needs of program.  Is this possible?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Chris Barker - NOAA Federal
c = a + b: 3N
c = a + 2*b: 4N

Does python garbage collect mid-expression? I.e. :

C = (a + 2*b) + b

4 or 5 N?

Also note that when memory gets tight, fragmentation can be a problem. I.e.
if two size-n arrays where just freed, you still may not be able to
allocate a size-2n array. This seems to be worse on windows, not sure why.

a += b: 2N
np.add(a, b, out=a): 2N
b *= 2; a += b: 2N

Note that simply loading a and b requires 2N memory, so the latter code
samples are near-optimal.

And will run quite a bit faster for large arrays--pushing that memory
around takes time.

-Chris
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith
On 24 Jan 2014 15:57, Chris Barker - NOAA Federal chris.bar...@noaa.gov
wrote:


 c = a + b: 3N
 c = a + 2*b: 4N

 Does python garbage collect mid-expression? I.e. :

 C = (a + 2*b) + b

 4 or 5 N?

It should be collected as soon as the reference gets dropped, so 4N. (This
is the advantage of a greedy refcounting collector.)

 Also note that when memory gets tight, fragmentation can be a problem.
I.e. if two size-n arrays where just freed, you still may not be able to
allocate a size-2n array. This seems to be worse on windows, not sure why.

If your arrays are big enough that you're worried that making a stray copy
will ENOMEM, then you *shouldn't* have to worry about fragmentation -
malloc will give each array its own virtual mapping, which can be backed by
discontinuous physical memory. (I guess it's possible windows has a somehow
shoddy VM system and this isn't true, but that seems unlikely these days?)

Memory fragmentation is more a problem if you're allocating lots of small
objects of varying sizes.

On 32 bit, virtual address fragmentation could also be a problem, but if
you're working with giant data sets then you need 64 bits anyway :-).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Dinesh Vadhia
So, with the example case, the approximate memory cost for an in-place 
operation would be:

A *= B : 2N

But, if the original A or B is to remain unchanged then it will be:

C = A * B : 3N ?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith
Yes.
On 24 Jan 2014 17:19, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  So, with the example case, the approximate memory cost for an in-place
 operation would be:

 A *= B : 2N

 But, if the original A or B is to remain unchanged then it will be:

 C = A * B : 3N ?



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Dinesh Vadhia
Francesc: Thanks. I looked at numexpr a few years back but it didn't support 
array slicing/indexing.  Has that changed?


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Chris Barker
On Fri, Jan 24, 2014 at 8:25 AM, Nathaniel Smith n...@pobox.com wrote:

 If your arrays are big enough that you're worried that making a stray copy
 will ENOMEM, then you *shouldn't* have to worry about fragmentation -
 malloc will give each array its own virtual mapping, which can be backed by
 discontinuous physical memory. (I guess it's possible windows has a somehow
 shoddy VM system and this isn't true, but that seems unlikely these days?)

All I know is that when I push the limits with memory on a 32 bit Windows
system, it often crashed out when I've never seen more than about 1GB
of memory use by the application -- I would have thought that would
be plenty of overhead.

I also know that I've reached limits onWindows32 well before OS_X 32, but
that may be because IIUC, Windows32 only allows 2GB per process, whereas
OS-X32 allows 4GB per process.

Memory fragmentation is more a problem if you're allocating lots of small
 objects of varying sizes.

It could be that's what I've been doing

On 32 bit, virtual address fragmentation could also be a problem, but if
 you're working with giant data sets then you need 64 bits anyway :-).

well, giant is defined relative to the system capabilities... but yes, if
you're  pushing the limits of a 32 bit system , the easiest thing to do is
go to 64bits and some more memory!

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith
On Fri, Jan 24, 2014 at 10:29 PM, Chris Barker chris.bar...@noaa.gov wrote:
 On Fri, Jan 24, 2014 at 8:25 AM, Nathaniel Smith n...@pobox.com wrote:

 If your arrays are big enough that you're worried that making a stray copy
 will ENOMEM, then you *shouldn't* have to worry about fragmentation - malloc
 will give each array its own virtual mapping, which can be backed by
 discontinuous physical memory. (I guess it's possible windows has a somehow
 shoddy VM system and this isn't true, but that seems unlikely these days?)

 All I know is that when I push the limits with memory on a 32 bit Windows
 system, it often crashed out when I've never seen more than about 1GB of
 memory use by the application -- I would have thought that would be plenty
 of overhead.

 I also know that I've reached limits onWindows32 well before OS_X 32, but
 that may be because IIUC, Windows32 only allows 2GB per process, whereas
 OS-X32 allows 4GB per process.

 Memory fragmentation is more a problem if you're allocating lots of small
 objects of varying sizes.

 It could be that's what I've been doing

 On 32 bit, virtual address fragmentation could also be a problem, but if
 you're working with giant data sets then you need 64 bits anyway :-).

 well, giant is defined relative to the system capabilities... but yes, if
 you're  pushing the limits of a 32 bit system , the easiest thing to do is
 go to 64bits and some more memory!

Oh, yeah, common confusion. Allowing 2 GiB of address space per
process doesn't mean you can actually practically use 2 GiB of
*memory* per process, esp. if you're allocating/deallocating a mix of
large and small objects, because address space fragmentation will kill
you way before that. The memory is there, there isn't anywhere to slot
it into the process's address space. So you don't need to add more
memory, just switch to a 64-bit OS.

On 64-bit you have oodles of address space, so the memory manager can
easily slot in large objects far away from small objects, and it's
only fragmentation within each small-object arena that hurts. A good
malloc will keep this overhead down pretty low though -- certainly
less than the factor of two you're thinking about.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion