Re: [Numpy-discussion] multiprocessing shared arrays and numpy

2010-03-11 Thread Francesc Alted
A Sunday 07 March 2010 20:03:21 Gael Varoquaux escrigué:
 On Sun, Mar 07, 2010 at 07:00:03PM +, René Dudfield wrote:
  1. Mmap'd files are useful since you can reuse disk cache as program
  memory.  So large files don't waste ram on the disk cache.
 
 I second that. mmaping has worked very well for me for large datasets,
 especialy in the context of reducing memory pressure.

As far as I know, memmap files (or better, the underlying OS) *use* all 
available RAM for loading data until RAM is exhausted and then start to use 
SWAP, so the memory pressure is still there.  But I may be wrong...

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing shared arrays and numpy

2010-03-11 Thread Gael Varoquaux
On Thu, Mar 11, 2010 at 10:04:36AM +0100, Francesc Alted wrote:
 As far as I know, memmap files (or better, the underlying OS) *use* all 
 available RAM for loading data until RAM is exhausted and then start to use 
 SWAP, so the memory pressure is still there.  But I may be wrong...

I believe that your above assertion is 'half' right. First I think that
it is not SWAP that the memapped file uses, but the original disk space,
thus you avoid running out of SWAP. Second, if you open several times the
same data without memmapping, I believe that it will be duplicated in
memory. On the other hand, when you memapping, it is not duplicated, thus
if you are running several processing jobs on the same data, you save
memory. I am very much in this case.

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling routines from a Fortran library using python

2010-03-11 Thread Nils Wagner
On Mon, 22 Feb 2010 22:18:23 +0900
  David Cournapeau courn...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner
 nwag...@iam.uni-stuttgart.de wrote:
 

 ar x test.a
 gfortran -shared *.o -o libtest.so -lg2c

 to build a shared library. The additional option -lg2c 
was
 necessary due to an undefined symbol: s_cmp
 
 You should avoid the -lg2c option at any cost if 
compiling with
 gfortran. I am afraid that you got a library compiled 
with g77. If
 that's the case, you should use g77 and not gfortran. 
You cannot mix
 libraries built with one with libraries with another.
 

 Now I am able to load the shared library

 from ctypes import *
 my_lib = CDLL('test.so')

 What are the next steps to use the library functions
 within python ?
 
 You use it as you would use a C library:
 
 http://python.net/crew/theller/ctypes/tutorial.html
 
 But the fortran ABI, at least for code built with g77 
and gfortran,
 pass everything by reference. To make sure to pass the 
right
 arguments, I strongly suggest to double check with the 
.h you
 received.
 
 cheers,
 
 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi all,

I tried to run the following script.
The result is a segmentation fault.
Did I use byref correctly ?

from ctypes import *
my_dsio = CDLL('libdsio20_gnu4.so')  # loading dynamic 
link libraries
#
# FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR)
# 
# int 
I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1;
# WordBUF[100],HEAD[30];
# char*PATH,*STRING;
# char*PGNAME,*DATE,*TIME,*TEXT;
# int LHEAD=30;
#
# C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));
#


IERR= c_int()
FDSCAT  = c_char_p('dscat.ds')
JUNCAT  = c_int()
LDSNCAT = c_int(len(FDSCAT.value))
print
print 'LDSNCAT', LDSNCAT.value
print 'FDSCAT' , FDSCAT.value  , len(FDSCAT.value)

my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) 
# segmentation fault
print IERR.value


Any idea ?

  Nils
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling routines from a Fortran library using python

2010-03-11 Thread Dag Sverre Seljebotn
Nils Wagner wrote:
 On Mon, 22 Feb 2010 22:18:23 +0900
   David Cournapeau courn...@gmail.com wrote:
   
 On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner
 nwag...@iam.uni-stuttgart.de wrote:

 
 ar x test.a
 gfortran -shared *.o -o libtest.so -lg2c

 to build a shared library. The additional option -lg2c 
 was
 necessary due to an undefined symbol: s_cmp
   
 You should avoid the -lg2c option at any cost if 
 compiling with
 gfortran. I am afraid that you got a library compiled 
 with g77. If
 that's the case, you should use g77 and not gfortran. 
 You cannot mix
 libraries built with one with libraries with another.

 
 Now I am able to load the shared library

 from ctypes import *
 my_lib = CDLL('test.so')

 What are the next steps to use the library functions
 within python ?
   
 You use it as you would use a C library:

 http://python.net/crew/theller/ctypes/tutorial.html

 But the fortran ABI, at least for code built with g77 
 and gfortran,
 pass everything by reference. To make sure to pass the 
 right
 arguments, I strongly suggest to double check with the 
 .h you
 received.

 cheers,

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

 Hi all,

 I tried to run the following script.
 The result is a segmentation fault.
 Did I use byref correctly ?

 from ctypes import *
 my_dsio = CDLL('libdsio20_gnu4.so')  # loading dynamic 
 link libraries
 #
 # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR)
 # 
 # int 
 I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1;
 # WordBUF[100],HEAD[30];
 # char*PATH,*STRING;
 # char*PGNAME,*DATE,*TIME,*TEXT;
 # int LHEAD=30;
 #
 # C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));
 #


 IERR= c_int()
 FDSCAT  = c_char_p('dscat.ds')
 JUNCAT  = c_int()
 LDSNCAT = c_int(len(FDSCAT.value))
 print
 print 'LDSNCAT', LDSNCAT.value
 print 'FDSCAT' , FDSCAT.value  , len(FDSCAT.value)

 my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) 
 # segmentation fault
 print IERR.value


 Any idea ?
   
You shouldn't have byref on FDSCAT nor LDSNCAT, as explained by this line:

# C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing shared arrays and numpy

2010-03-11 Thread Nadav Horesh
Here is a strange thing I am getting with multiprocessing and memory mapped 
array:

The below script generates the error message 30 times (for every slice access):

Exception AttributeError: AttributeError('NoneType' object has no attribute 
'tell',) in bound method memmap.__del__ of memmap(2949995000.0) ignored


Although I get the correct answer eventually.
--
import numpy as N
import multiprocessing as MP

def average(cube):
return [plane.mean() for plane in cube]

N.arange(30*100*100, dtype=N.int32).tofile(open('30x100x100_int32.dat','w'))

data = N.memmap('30x100x100_int32.dat', dtype=N.int32, shape=(30,100,100))

pool = MP.Pool(processes=1)

job = pool.apply_async(average, [data,])
print job.get()

--

I use python 2.6.4 and numpy 1.4.0 on 64 bit linux (amd64)

  Nadav


-Original Message-
From: numpy-discussion-boun...@scipy.org on behalf of Gael Varoquaux
Sent: Thu 11-Mar-10 11:36
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] multiprocessing shared arrays and numpy
 
On Thu, Mar 11, 2010 at 10:04:36AM +0100, Francesc Alted wrote:
 As far as I know, memmap files (or better, the underlying OS) *use* all 
 available RAM for loading data until RAM is exhausted and then start to use 
 SWAP, so the memory pressure is still there.  But I may be wrong...

I believe that your above assertion is 'half' right. First I think that
it is not SWAP that the memapped file uses, but the original disk space,
thus you avoid running out of SWAP. Second, if you open several times the
same data without memmapping, I believe that it will be duplicated in
memory. On the other hand, when you memapping, it is not duplicated, thus
if you are running several processing jobs on the same data, you save
memory. I am very much in this case.

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

winmail.dat___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling routines from a Fortran library using python

2010-03-11 Thread Dag Sverre Seljebotn
Nils Wagner wrote:
 On Thu, 11 Mar 2010 13:01:33 +0100
   Dag Sverre Seljebotn da...@student.matnat.uio.no 
 wrote:
   
 Nils Wagner wrote:
 
 On Mon, 22 Feb 2010 22:18:23 +0900
   David Cournapeau courn...@gmail.com wrote:
   
   
 On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner
 nwag...@iam.uni-stuttgart.de wrote:

 
 
 ar x test.a
 gfortran -shared *.o -o libtest.so -lg2c

 to build a shared library. The additional option -lg2c 
 was
 necessary due to an undefined symbol: s_cmp
   
   
 You should avoid the -lg2c option at any cost if 
 compiling with
 gfortran. I am afraid that you got a library compiled 
 with g77. If
 that's the case, you should use g77 and not gfortran. 
 You cannot mix
 libraries built with one with libraries with another.

 
 
 Now I am able to load the shared library

 from ctypes import *
 my_lib = CDLL('test.so')

 What are the next steps to use the library functions
 within python ?
   
   
 You use it as you would use a C library:

 http://python.net/crew/theller/ctypes/tutorial.html

 But the fortran ABI, at least for code built with g77 
 and gfortran,
 pass everything by reference. To make sure to pass the 
 right
 arguments, I strongly suggest to double check with the 
 .h you
 received.

 cheers,

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 Hi all,

 I tried to run the following script.
 The result is a segmentation fault.
 Did I use byref correctly ?

 from ctypes import *
 my_dsio = CDLL('libdsio20_gnu4.so')  # loading 
 dynamic 
 link libraries
 #
 # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR)
 # 
 # int 
 I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1;
 # WordBUF[100],HEAD[30];
 # char*PATH,*STRING;
 # char*PGNAME,*DATE,*TIME,*TEXT;
 # int LHEAD=30;
 #
 # C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));
 #


 IERR= c_int()
 FDSCAT  = c_char_p('dscat.ds')
 JUNCAT  = c_int()
 LDSNCAT = c_int(len(FDSCAT.value))
 print
 print 'LDSNCAT', LDSNCAT.value
 print 'FDSCAT' , FDSCAT.value  , len(FDSCAT.value)

 my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) 
 # segmentation fault
 print IERR.value


 Any idea ?
   
   
 You shouldn't have byref on FDSCAT nor LDSNCAT, as 
 explained by this line:

 # C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));

 Dag Sverre
 
   

 Sorry, I am newbie to C. What is the correct way ?

   

my_dsio.dsio(byref(JUNCAT),FDSCAT,byref(IERR),LDSNCAT) 

Dag

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling routines from a Fortran library using python

2010-03-11 Thread Nils Wagner
On Thu, 11 Mar 2010 13:42:43 +0100
  Dag Sverre Seljebotn da...@student.matnat.uio.no 
wrote:
 Nils Wagner wrote:
 On Thu, 11 Mar 2010 13:01:33 +0100
   Dag Sverre Seljebotn da...@student.matnat.uio.no 
 wrote:
   
 Nils Wagner wrote:
 
 On Mon, 22 Feb 2010 22:18:23 +0900
   David Cournapeau courn...@gmail.com wrote:
   
   
 On Mon, Feb 22, 2010 at 10:01 PM, Nils Wagner
 nwag...@iam.uni-stuttgart.de wrote:

 
 
 ar x test.a
 gfortran -shared *.o -o libtest.so -lg2c

 to build a shared library. The additional option -lg2c 
 was
 necessary due to an undefined symbol: s_cmp
   
   
 You should avoid the -lg2c option at any cost if 
 compiling with
 gfortran. I am afraid that you got a library compiled 
 with g77. If
 that's the case, you should use g77 and not gfortran. 
 You cannot mix
 libraries built with one with libraries with another.

 
 
 Now I am able to load the shared library

 from ctypes import *
 my_lib = CDLL('test.so')

 What are the next steps to use the library functions
 within python ?
   
   
 You use it as you would use a C library:

 http://python.net/crew/theller/ctypes/tutorial.html

 But the fortran ABI, at least for code built with g77 
 and gfortran,
 pass everything by reference. To make sure to pass the 
 right
 arguments, I strongly suggest to double check with the 
 .h you
 received.

 cheers,

 David
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 Hi all,

 I tried to run the following script.
 The result is a segmentation fault.
 Did I use byref correctly ?

 from ctypes import *
 my_dsio = CDLL('libdsio20_gnu4.so')  # loading 
 dynamic 
 link libraries
 #
 # FORTRAN : CALL DSIO(JUNCAT,FDSCAT,IERR)
 # 
 # int 
 I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1;
 # WordBUF[100],HEAD[30];
 # char*PATH,*STRING;
 # char*PGNAME,*DATE,*TIME,*TEXT;
 # int LHEAD=30;
 #
 # C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));
 #


 IERR= c_int()
 FDSCAT  = c_char_p('dscat.ds')
 JUNCAT  = c_int()
 LDSNCAT = c_int(len(FDSCAT.value))
 print
 print 'LDSNCAT', LDSNCAT.value
 print 'FDSCAT' , FDSCAT.value  , len(FDSCAT.value)

 my_dsio.dsio(byref(JUNCAT),byref(FDSCAT),byref(IERR),byref(LDSNCAT)) 
 # segmentation fault
 print IERR.value


 Any idea ?
   
   
 You shouldn't have byref on FDSCAT nor LDSNCAT, as 
 explained by this line:

 # C   : DSIO(JUNCAT,FDSCAT,IERR,strlen(FDSCAT));

 Dag Sverre
 
   

 Sorry, I am newbie to C. What is the correct way ?

   
 
 my_dsio.dsio(byref(JUNCAT),FDSCAT,byref(IERR),LDSNCAT) 
 
 Dag


Great. It works like a charme.
How can I translate the following C-code into Python ?
I don't know how to handle HEAD and memcpy ?
Any pointer would be appreciated.

Thanks in advance.


   typedef union {
int i;
float   f;
charc[4];
} Word;

   int   
 I,J,K,N,IDE,IA,IE,IERR,JUNIT,JUNCAT,NDATA,NREC,LREADY,ONE=1;
   Word   BUF[100],HEAD[30];

   for (I=5;ILHEAD;I++)
HEAD[I].i = 0;
   HEAD[ 0].i =   1;
   HEAD[ 1].i =  LHEAD + NDATA*7;
   HEAD[ 2].i =  LHEAD;
   HEAD[ 3].i =  NDATA;
   HEAD[ 4].i =   7;
   memcpy (HEAD[ 7].c,DSIO,4);
   memcpy (HEAD[ 8].c,TEST,4);
   memcpy (HEAD[ 9].c,NPCO,4);
   memcpy (HEAD[10].c,,4);
   memcpy (HEAD[11].c,DSIO,4);
   HEAD[20].i =   1;
   HEAD[21].i =  NDATA;
   STRING = MM  RAD;
   DSEINH(STRING,HEAD[24].i,ONE,
 strlen(STRING));
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing shared arrays and numpy

2010-03-11 Thread Francesc Alted
A Thursday 11 March 2010 10:36:42 Gael Varoquaux escrigué:
 On Thu, Mar 11, 2010 at 10:04:36AM +0100, Francesc Alted wrote:
  As far as I know, memmap files (or better, the underlying OS) *use* all
  available RAM for loading data until RAM is exhausted and then start to
  use SWAP, so the memory pressure is still there.  But I may be wrong...
 
 I believe that your above assertion is 'half' right. First I think that
 it is not SWAP that the memapped file uses, but the original disk space,
 thus you avoid running out of SWAP. Second, if you open several times the
 same data without memmapping, I believe that it will be duplicated in
 memory. On the other hand, when you memapping, it is not duplicated, thus
 if you are running several processing jobs on the same data, you save
 memory. I am very much in this case.

Mmh, this is not my experience.  During the past month, I was proposing in a 
course the students to compare the memory consumption of numpy.memmap and 
tables.Expr (a module for performing out-of-memory computations in PyTables).  
The idea was precisely to show that, contrarily to tables.Expr, numpy.memmap 
computations do take a lot of memory when they are being accessed.

I'm attaching a slightly modified version of that exercise.  On it, one have 
to compute a polynomial in a certain range.  Here it is the output of the 
script for the numpy.memmap case for a machine with 8 GB RAM and 6 GB of swap:

Total size for datasets: 7629.4 MB  

Populating x using numpy.memmap with 5 points...

Total file sizes: 40 -- (3814.7 MB) 

*** Time elapsed populating: 70.982 

Computing: '((.25*x + .75)*x - 1.5)*x - 2' using numpy.memmap
Total file sizes: 80 -- (7629.4 MB)
 Time elapsed computing: 81.727
10.08user 13.37system 2:33.26elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k
7808inputs+15625008outputs (39major+5750196minor)pagefaults 0swaps

While the computation was going on, I've spied the process with the top 
utility, and that told me that the total virtual size consumed by the Python 
process was 7.9 GB, with a total of *resident* memory of 6.7 GB (!).  And this 
should not only be a top malfunction because I've checked that, by the end of 
the computation, my machine started to swap some processes out (i.e. the 
working set above was too large to allow the OS keep everything in memory).

Now, just for the sake of comparison, I've tried running the same script but 
using tables.Expr.  Here it is the output:

Total size for datasets: 7629.4 MB
Populating x using tables.Expr with 5 points...
Total file sizes: 4000631280 -- (3815.3 MB)
*** Time elapsed populating: 78.817
Computing: '((.25*x + .75)*x - 1.5)*x - 2' using tables.Expr
Total file sizes: 8001261168 -- (7630.6 MB)
 Time elapsed computing: 155.836
13.11user 18.59system 3:58.61elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
7842784inputs+15632208outputs (28major+940347minor)pagefaults 0swaps

and top was telling me that memory consumption was 148 MB for total virtual 
size and just 44 MB (as expected, because computation was really made using an 
out-of-core algorithm).

Interestingly, when using compression (Blosc level 4, in this case), the time 
to do the computation with tables.Expr has reduced a lot:

Total size for datasets: 7629.4 MB
Populating x using tables.Expr with 5 points...
Total file sizes: 1080130765 -- (1030.1 MB)
*** Time elapsed populating: 30.005
Computing: '((.25*x + .75)*x - 1.5)*x - 2' using tables.Expr
Total file sizes: 2415761895 -- (2303.9 MB)
 Time elapsed computing: 40.048
37.11user 6.98system 1:12.88elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
45312inputs+4720568outputs (4major+989323minor)pagefaults 0swaps

while memory consumption is barely the same than above: 148 MB / 45 MB.

So, in my experience, numpy.memmap is really using that large chunk of memory 
(unless my testbed is badly programmed, in which case I'd be grateful if you 
can point out what's wrong).

-- 
Francesc Alted
###
# This script compares the speed of the computation of a polynomial
# for different (numpy.memmap and tables.Expr) out-of-memory paradigms.
#
# Author: Francesc Alted
# Date: 2010-02-03
###

import os
import sys
from time import time
import numpy as np
import tables as tb


expr = ((.25*x + .75)*x - 1.5)*x - 2  # a computer-friendly polynomial
N = 500*1000*1000  # the number of points to compute expression
step = 100*1000   # perform calculation 

Re: [Numpy-discussion] multiprocessing shared arrays and numpy

2010-03-11 Thread Gael Varoquaux
On Thu, Mar 11, 2010 at 02:26:49PM +0100, Francesc Alted wrote:
  I believe that your above assertion is 'half' right. First I think that
  it is not SWAP that the memapped file uses, but the original disk space,
  thus you avoid running out of SWAP. Second, if you open several times the
  same data without memmapping, I believe that it will be duplicated in
  memory. On the other hand, when you memapping, it is not duplicated, thus
  if you are running several processing jobs on the same data, you save
  memory. I am very much in this case.

 Mmh, this is not my experience.  During the past month, I was proposing in a 
 course the students to compare the memory consumption of numpy.memmap and 
 tables.Expr (a module for performing out-of-memory computations in PyTables). 

 [snip]

 So, in my experience, numpy.memmap is really using that large chunk of memory 
 (unless my testbed is badly programmed, in which case I'd be grateful if you 
 can point out what's wrong).

OK, so what you are saying is that my assertion #1 was wrong. Fair
enough, as I was writing it I was thinking that I had no hard fact to
back it. How about assertion #2? I can think only of this 'story' to
explain why I can run parallel computation when I use memmap that blow up
if I don't use memmap.

Also, could it be that the memmap mode changes things? I use only the 'r'
mode, which is read-only.

This is all very interesting, and you have much more insights on these
problems than me. Would you be interested in coming to Euroscipy in Paris
to give a 1 or 2 hours long tutorial on memory and IO problems and how
you address them with Pytables? It would be absolutely thrilling. I must
warn that I am afraid that we won't be able to pay for your trip, though,
as I want to keep the price of the conference low.

Best,

Gaël
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multiprocessing shared arrays and numpy

2010-03-11 Thread Francesc Alted
A Thursday 11 March 2010 14:35:49 Gael Varoquaux escrigué:
  So, in my experience, numpy.memmap is really using that large chunk of
  memory (unless my testbed is badly programmed, in which case I'd be
  grateful if you can point out what's wrong).
 
 OK, so what you are saying is that my assertion #1 was wrong. Fair
 enough, as I was writing it I was thinking that I had no hard fact to
 back it. How about assertion #2? I can think only of this 'story' to
 explain why I can run parallel computation when I use memmap that blow up
 if I don't use memmap.

Well, I must tell that I've not experience about running memmapped arrays in 
parallel computations, but it sounds like they can actually behave as shared-
memory arrays, so yes, you may definitely be right for #2, i.e. memmapped data 
is not duplicated when accessed in parallel by different processes (in read-
only mode, of course), which is certainly a very interesting technique to 
share data in parallel processes.  Thanks for pointing out this!

 Also, could it be that the memmap mode changes things? I use only the 'r'
 mode, which is read-only.

I don't think so.  When doing the computation, I open the x values in read-
only mode, and memory consumption is still there.

 This is all very interesting, and you have much more insights on these
 problems than me. Would you be interested in coming to Euroscipy in Paris
 to give a 1 or 2 hours long tutorial on memory and IO problems and how
 you address them with Pytables? It would be absolutely thrilling. I must
 warn that I am afraid that we won't be able to pay for your trip, though,
 as I want to keep the price of the conference low.

Yes, no problem.  I was already thinking about presenting something at 
EuroSciPy.  A tutorial about PyTables/memory IO would be really great for me.  
We can nail the details off-list.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] subclassing ndarray in python3

2010-03-11 Thread Darren Dale
Now that the trunk has some support for python3, I am working on
making Quantities work with python3 as well. I'm running into some
problems related to subclassing ndarray that can be illustrated with a
simple script, reproduced below. It looks like there is a problem with
the reflected operations, I see problems with __rmul__ and __radd__,
but not with __mul__ and __add__:

import numpy as np


class A(np.ndarray):
def __new__(cls, *args, **kwargs):
return np.ndarray.__new__(cls, *args, **kwargs)

class B(A):
def __mul__(self, other):
return self.view(A).__mul__(other)
def __rmul__(self, other):
return self.view(A).__rmul__(other)
def __add__(self, other):
return self.view(A).__add__(other)
def __radd__(self, other):
return self.view(A).__radd__(other)

a = A((10,))
b = B((10,))

print('A __mul__:')
print(a.__mul__(2))
# ok
print(a.view(np.ndarray).__mul__(2))
# ok
print(a*2)
# ok

print('A __rmul__:')
print(a.__rmul__(2))
# yields NotImplemented
print(a.view(np.ndarray).__rmul__(2))
# yields NotImplemented
print(2*a)
# ok !!??

print('B __mul__:')
print(b.__mul__(2))
# ok
print(b.view(A).__mul__(2))
# ok
print(b.view(np.ndarray).__mul__(2))
# ok
print(b*2)
# ok

print('B __add__:')
print(b.__add__(2))
# ok
print(b.view(A).__add__(2))
# ok
print(b.view(np.ndarray).__add__(2))
# ok
print(b+2)
# ok

print('B __rmul__:')
print(b.__rmul__(2))
# yields NotImplemented
print(b.view(A).__rmul__(2))
# yields NotImplemented
print(b.view(np.ndarray).__rmul__(2))
# yields NotImplemented
print(2*b)
# yields: TypeError: unsupported operand type(s) for *: 'int' and 'B'

print('B __radd__:')
print(b.__radd__(2))
# yields NotImplemented
print(b.view(A).__radd__(2))
# yields NotImplemented
print(b.view(np.ndarray).__radd__(2))
# yields NotImplemented
print(2+b)
# yields: TypeError: unsupported operand type(s) for +: 'int' and 'B'
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-11 Thread Johann Cohen-Tanugi
hi there, I am adding this to this thread and not to the trac, because I 
am not sure whether it adds noise or a piece of info. I just downloaded 
the scipy trunk and built it, and ran nosetests on it, which bombed 
instantly
So I tried to get into subdirs to check test scripts separately. and 
here is one :

[co...@jarrett tests]$ ~/.local/bin/ipython test_integrate.py
---
AssertionErrorTraceback (most recent call last)

/home/cohen/sources/python/scipy/scipy/integrate/tests/test_integrate.py 
in module()

208
209 if __name__ == __main__:
-- 210 run_module_suite()
211
212

/home/cohen/.local/lib/python2.6/site-packages/numpy/testing/nosetester.pyc 
in run_module_suite(file_to_run)

 75 f = sys._getframe(1)
 76 file_to_run = f.f_locals.get('__file__', None)
--- 77 assert file_to_run is not None
 78
 79 import_nose().run(argv=['',file_to_run])

AssertionError:
python: Modules/gcmodule.c:277: visit_decref: Assertion `gc-gc.gc_refs 
!= 0' failed.

Aborted (core dumped)
[co...@jarrett tests]$ pwd
/home/cohen/sources/python/scipy/scipy/integrate/tests

the bomb is the same, but the context seems different... I leave that to 
the experts :)

Johann

On 03/10/2010 06:06 PM, Charles R Harris wrote:



On Wed, Mar 10, 2010 at 10:39 AM, Bruce Southey bsout...@gmail.com 
mailto:bsout...@gmail.com wrote:


On 03/10/2010 08:59 AM, Pauli Virtanen wrote:
 Wed, 10 Mar 2010 15:40:04 +0100, Johann Cohen-Tanugi wrote:

 Pauli, isn't it hopeless to follow the execution of the source
code when
 the crash actually occurs when I exit, and not when I execute.
I would
 have to understand enough of this umath_tests.c.src to spot a
refcount
 error or things like that

 Yeah, it's not easy, and requires knowing how to track this type of
 errors. I didn't actually mean that you should try do it, just
posed it
 as a general challenge to all interested parties :)

 On a more serious note, maybe there's a compilation flag or
something in
 Python that warns when refcounts go negative (or something).

 Cheers,
 Pauli

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi,
I think I managed to find this. I reverted back my svn versions ($svn
update -r 8262) and cleaned both the build and installation
directories.

It occurred with changeset 8262 (earlier changesets appear okay but
later ones do not)
http://projects.scipy.org/numpy/changeset/8262

Specifically in the file:
numpy/core/code_generators/generate_ufunc_api.py

There is an extra call to that should have been deleted on line 54(?).
Py_DECREF(numpy);

Attached a patch to ticket 1425
http://projects.scipy.org/numpy/ticket/1425


Look like my bad. I'm out of town at the moment so someone else needs 
to apply the patch. That whole bit of code could probably use a 
daylight audit.


Chuck


--
This message has been scanned for viruses and
dangerous content by *MailScanner* http://www.mailscanner.info/, and is
believed to be clean.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] subclassing ndarray in python3

2010-03-11 Thread Pauli Virtanen
Hi Darren,

to, 2010-03-11 kello 11:11 -0500, Darren Dale kirjoitti:
 Now that the trunk has some support for python3, I am working on
 making Quantities work with python3 as well. I'm running into some
 problems related to subclassing ndarray that can be illustrated with a
 simple script, reproduced below. It looks like there is a problem with
 the reflected operations, I see problems with __rmul__ and __radd__,
 but not with __mul__ and __add__:

Thanks for testing. I wish the test suite was more complete (hint!
hint! :)

Yes, Python 3 introduced some semantic changes in how subclasses of
builtin classes (= written in C) inherit the __r*__ operations.

Below I'll try to explain what is going on. We probably need to change
some things to make things work better on Py3, within the bounds we are
able to.

Suggestions are welcome. The most obvious one could be to explicitly
implement __rmul__ etc. on Python 3.

[clip]
 class A(np.ndarray):
 def __new__(cls, *args, **kwargs):
 return np.ndarray.__new__(cls, *args, **kwargs)
 
 class B(A):
 def __mul__(self, other):
 return self.view(A).__mul__(other)
 def __rmul__(self, other):
 return self.view(A).__rmul__(other)
 def __add__(self, other):
 return self.view(A).__add__(other)
 def __radd__(self, other):
 return self.view(A).__radd__(other)
[clip]
 print('A __rmul__:')
 print(a.__rmul__(2))
 # yields NotImplemented
 print(a.view(np.ndarray).__rmul__(2))
 # yields NotImplemented

Correct. ndarray does not implement __rmul__, but relies on an automatic
wrapper generated by Python.

The automatic wrapper (wrap_binaryfunc_r) does the following:

1. Is `type(other)` a subclass of `type(self)`?
   If yes, call __mul__ with swapped arguments.
2. If not, bail out with NotImplemented.

So it bails out.

Previously, the ndarray type had a flag that made Python to skip the
subclass check. That does not exist any more on Python 3, and is the
root of this issue.

 print(2*a)
 # ok !!??

Here, Python checks

1. Does nb_multiply from the left op succeed? Nope, since floats don't
   know how to multiply ndarrays.

2. Does nb_multiply from the right op succeed? Here the execution
   passes *directly* to array_multiply, completely skipping the __rmul__
   wrapper.

   Note also that in the C-level number protocol there is only a single
   multiplication function for both left and right multiplication.

[clip]
 print('B __rmul__:')
 print(b.__rmul__(2))
 # yields NotImplemented
 print(b.view(A).__rmul__(2))
 # yields NotImplemented
 print(b.view(np.ndarray).__rmul__(2))
 # yields NotImplemented
 print(2*b)
 # yields: TypeError: unsupported operand type(s) for *: 'int' and 'B'

But here, the subclass calls the wrapper ndarray.__rmul__, which wants
to be careful with types, and hence fails.

Yes, probably explicitly defining __rmul__ for ndarray could be the
right solution. Please file a bug report on this.

Cheers,
Pauli



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-11 Thread Bruce Southey

On 03/11/2010 02:01 PM, Johann Cohen-Tanugi wrote:
hi there, I am adding this to this thread and not to the trac, because 
I am not sure whether it adds noise or a piece of info. I just 
downloaded the scipy trunk and built it, and ran nosetests on it, 
which bombed instantly
So I tried to get into subdirs to check test scripts separately. 
and here is one :

[co...@jarrett tests]$ ~/.local/bin/ipython test_integrate.py
---
AssertionErrorTraceback (most recent call 
last)


/home/cohen/sources/python/scipy/scipy/integrate/tests/test_integrate.py 
in module()

208
209 if __name__ == __main__:
-- 210 run_module_suite()
211
212

/home/cohen/.local/lib/python2.6/site-packages/numpy/testing/nosetester.pyc 
in run_module_suite(file_to_run)

 75 f = sys._getframe(1)
 76 file_to_run = f.f_locals.get('__file__', None)
--- 77 assert file_to_run is not None
 78
 79 import_nose().run(argv=['',file_to_run])

AssertionError:
python: Modules/gcmodule.c:277: visit_decref: Assertion 
`gc-gc.gc_refs != 0' failed.

Aborted (core dumped)
[co...@jarrett tests]$ pwd
/home/cohen/sources/python/scipy/scipy/integrate/tests

the bomb is the same, but the context seems different... I leave that 
to the experts :)

Johann


Yes,
I think it is the same issue as I do not have the problem after fixing 
the following file and rebuilding numpy and scipy:

numpy/core/code_generators/generate_ufunc_api.py

Bruce

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-11 Thread Johann Cohen-Tanugi

is your fix committed?


On 03/11/2010 09:47 PM, Bruce Southey wrote:

On 03/11/2010 02:01 PM, Johann Cohen-Tanugi wrote:
hi there, I am adding this to this thread and not to the trac, 
because I am not sure whether it adds noise or a piece of info. I 
just downloaded the scipy trunk and built it, and ran nosetests on 
it, which bombed instantly
So I tried to get into subdirs to check test scripts separately. 
and here is one :

[co...@jarrett tests]$ ~/.local/bin/ipython test_integrate.py
---
AssertionErrorTraceback (most recent call 
last)


/home/cohen/sources/python/scipy/scipy/integrate/tests/test_integrate.py 
in module()

208
209 if __name__ == __main__:
-- 210 run_module_suite()
211
212

/home/cohen/.local/lib/python2.6/site-packages/numpy/testing/nosetester.pyc 
in run_module_suite(file_to_run)

 75 f = sys._getframe(1)
 76 file_to_run = f.f_locals.get('__file__', None)
--- 77 assert file_to_run is not None
 78
 79 import_nose().run(argv=['',file_to_run])

AssertionError:
python: Modules/gcmodule.c:277: visit_decref: Assertion 
`gc-gc.gc_refs != 0' failed.

Aborted (core dumped)
[co...@jarrett tests]$ pwd
/home/cohen/sources/python/scipy/scipy/integrate/tests

the bomb is the same, but the context seems different... I leave that 
to the experts :)

Johann


Yes,
I think it is the same issue as I do not have the problem after fixing 
the following file and rebuilding numpy and scipy:

numpy/core/code_generators/generate_ufunc_api.py

Bruce


--
This message has been scanned for viruses and
dangerous content by *MailScanner* http://www.mailscanner.info/, and is
believed to be clean.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] subclassing ndarray in python3

2010-03-11 Thread Darren Dale
Hi Pauli,

On Thu, Mar 11, 2010 at 3:38 PM, Pauli Virtanen p...@iki.fi wrote:
 Thanks for testing. I wish the test suite was more complete (hint!
 hint! :)

I'll be happy to contribute, but lately I get a few 15-30 minute
blocks a week for this kind of work (hence the short attempt to work
on Quantities this morning), and its not likely to let up for about 3
weeks.

 Yes, probably explicitly defining __rmul__ for ndarray could be the
 right solution. Please file a bug report on this.

Done: http://projects.scipy.org/numpy/ticket/1426

Cheers, and *thank you* for all you have already done to support python-3,
Darren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] arange including stop value?

2010-03-11 Thread Tom K.



davefallest wrote:
 
 ...
 In [3]: np.arange(1.01, 1.1, 0.01)
 Out[3]: array([ 1.01,  1.02,  1.03,  1.04,  1.05,  1.06,  1.07,  1.08, 
 1.09,  1.1 ])
 
 Why does the ... np.arange command end up including my stop value?
 
From the help for arange:

For floating point arguments, the length of the result is
``ceil((stop - start)/step)``.  Because of floating point overflow,
this rule may result in the last element of `out` being greater
than `stop`.

-- 
View this message in context: 
http://old.nabble.com/arange-including-stop-value--tp27866607p27872069.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] arange including stop value?

2010-03-11 Thread Anne Archibald
On 11 March 2010 19:30, Tom K. t...@kraussfamily.org wrote:



 davefallest wrote:

 ...
 In [3]: np.arange(1.01, 1.1, 0.01)
 Out[3]: array([ 1.01,  1.02,  1.03,  1.04,  1.05,  1.06,  1.07,  1.08,
 1.09,  1.1 ])

 Why does the ... np.arange command end up including my stop value?

Don't use arange for floating-point values. Use linspace instead.

Anne

 From the help for arange:

        For floating point arguments, the length of the result is
        ``ceil((stop - start)/step)``.  Because of floating point overflow,
        this rule may result in the last element of `out` being greater
        than `stop`.

 --
 View this message in context: 
 http://old.nabble.com/arange-including-stop-value--tp27866607p27872069.html
 Sent from the Numpy-discussion mailing list archive at Nabble.com.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] crash at prompt exit after running test

2010-03-11 Thread Charles R Harris
On Thu, Mar 11, 2010 at 3:57 PM, Johann Cohen-Tanugi co...@lpta.in2p3.frwrote:

  is your fix committed?


No. Pauli thinks the problem may lie elsewhere. I haven't had time to look
things over, but it is possible that the changes in the generated api
exposed a bug elsewhere.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion