Re: [Numpy-discussion] Help to process a large data file

2008-10-02 Thread orionbelt2
Frank,

I would imagine that you cannot get a much better performance in python 
than this, which avoids string conversions:

c = []
count = 0
for line in open('foo'):
if line == '1 1\n':
c.append(count)
count = 0
else:
if '1' in line: count += 1

One could do some numpy trick like:

a = np.loadtxt('foo',dtype=int)
a = np.sum(a,axis=1)# Add the two columns horizontally
b = np.where(a==2)[0]   # Find with sum == 2 (1 + 1)
count = []
for i,j in zip(b[:-1],b[1:]):
count.append( a[i+1:j].sum() )  # Calculate number of lines with 1

but on my machine the numpy version takes about 20 sec for a 'foo' file 
of 2,500,000 lines versus 1.2 sec for the pure python version...

As a side note, if i replace line == '1 1\n' with line.startswith('1 
1'), the pure python version goes up to 1.8 sec... Isn't this a bit 
weird, i'd think startswith() should be faster...

Chris

On Wed, Oct 01, 2008 at 07:27:27PM -0600, frank wang wrote:

Hi,
 
I have a large data file which contains 2 columns of data. The two 
columns only have zero and one. Now I want to cound how many one in 
between if both columns are one. For example, if my data is:
 
1 0
0 0
1 1
0 0
0 1x
0 1x
0 0
0 1x
1 1
0 0
0 1x
0 1x
1 1
 
Then my count will be 3 and 2 (the numbers with x).
 
Are there an efficient way to do this? My data file is pretty big.
 
Thanks
 
Frank
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fromfunction() bug?

2008-03-13 Thread orionbelt2

On Thu, Mar 13, 2008 at 06:18:30PM -0400, Alan G Isaac wrote:

 This is how I would hope ``fromfunction`` would work
 and it matches the docs. (See below.)  You can fix
 the example ...

Interesting, i thought the output in the Example List page is 
auto-generated...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion