Re: file seek is slow

2010-03-12 Thread Stefan Behnel

CHEN Guang, 12.03.2010 08:51:

Metalone wrote:

I just tried the seek test with Cython.
Cython fseek() : 1.059 seconds.  30% slower than 'C'
Python f.seek  : 1.458 secondds. 80% slower than 'C'.

It is amazing to me that Cython generates a 'C' file that is 1478
lines.


PythoidC ( http://pythoidc.googlecode.com ) generates the shortest 'C' file.
PythoidC is the C language like the Python, by the Python and for the Python.


Except that it's not a language but rather a syntax converter, i.e. it 
doesn't really add any features to the C language but rather restricts 
Python syntax to C language features (plus a bit of header file 
introspection, it seems, but C's preprocessor has a bit of that, too).


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-12 Thread Stefan Behnel

Metalone, 11.03.2010 23:57:

I just tried the seek test with Cython.
Cython fseek() : 1.059 seconds.  30% slower than 'C'
Python f.seek  : 1.458 secondds. 80% slower than 'C'.

It is amazing to me that Cython generates a 'C' file that is 1478
lines.


Well, it generated an optimised Python interface for your module and made 
it compilable in CPython 2.3 through 3.2. It doesn't look like your C 
module features that. ;)




#Cython code

import time

cdef int SEEK_SET = 0

cdef extern from stdio.h:
 void* fopen(char* filename, char* mode)
 int fseek(void*, long, int)


Cython ships with a stdio.pxd that you can cimport. It looks like it 
doesn't currently define fseek(), but it defines at least fopen() and FILE. 
Patches are always welcome.




def main():
 cdef void* f1 = fopen('video.txt', 'rb')
 cdef int i=100
 t0 = time.clock()
 while i  0:
fseek(f1, 0, SEEK_SET)
i -= 1
 delta = time.clock() - t0


Note that the call to time.clock() takes some time, too, so it's not 
surprising that this is slower than hand-written C code. Did you test how 
it scales?


Also, did you look at the generated C code or the annotated Cython code 
(cython -a)? Did you make sure both were compiled with the same CFLAGS?


Also, any reason you're not using a for-in-xrange loop? It shouldn't make a 
difference in speed, it's just more common. You even used a for loop in 
your C code.


Finally, I'm not sure why you think that these 30% matter at all. In your 
original post, you even state that seek-time isn't the deal breaker, so 
maybe you should concentrate on the real issues?


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-12 Thread Antoine Pitrou
Le Tue, 09 Mar 2010 15:56:47 -0800, Metalone a écrit :
 for i in xrange(100):
f1.seek(0)

This is quite a stupid benchmark to write, since repeatedly seeking to 0 
is a no-op. I haven't re-read the file object code recently, but chances 
are that the Python file object has its own bookkeeping which adds a bit 
of execution time.

But I would suggest measuring the performance of *actual* seeks to 
different file offsets, before handwaving about the supposed slowness 
of file seeks in Python.

Regards

Antoine.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-12 Thread Metalone
I almost wrote a long reply to all this.
In the end it boils down to being concerned about how much overhead
there is to calling a 'C' function.
I assumed that file.seek() simply delegated to fseek() and thus was
one way to test this overhead.
However, I now think that it must be doing more and may not be a
reasonable comparison.

I have used the profiler about as much as I can to find where my
program is slow, and it appears to me that
the overhead to calling 'C' functions is now my biggest problem.
I have been using Ctypes, which has been a great tool so far.
I just discovered Cython and this looks like it may help me.
I had not heard of pythoid, so I will check it out.

I did not mean to offend anybody in Cython community.
It just seemed funny to me that 21 lines of Python became 1478 lines
of 'C'.
I wasn't really expecting any response to this.
I don't know enough about this to really assume anything.

Stephan,
I just tested 1e7 loops.
'C': 8.133 seconds
Cython: 10.812 seconds

I can't figure out what Cython is using for CFLAGS, so this could be
important.

I used While instead of xrange, because I thought it would be faster
in Cython.
They had roughly the same execution speed.

Thanks all for the suggestions.
I think I will just consider this thread closed.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-12 Thread CHEN Guang
 Metalone wrote:
 I just tried the seek test with Cython.
 Cython fseek() : 1.059 seconds.  30% slower than 'C'
 Python f.seek  : 1.458 secondds. 80% slower than 'C'.

 It is amazing to me that Cython generates a 'C' file that is 1478
 lines.

 PythoidC ( http://pythoidc.googlecode.com ) generates the shortest 'C' file.
 PythoidC is the C language like the Python, by the Python and for the Python.

Except that it's not a language but rather a syntax converter, i.e. it 
doesn't really add any features to the C language but rather restricts 
Python syntax to C language features (plus a bit of header file 
introspection, it seems, but C's preprocessor has a bit of that, too).
Stefan

PythoidC is a familar language to Python and C programmers, I do not like
waste my time to create unfamilar things to waste users' time studying. 
In fact PythoidC removed some boring features from C language:
1. no semicolon ; at line ends
2. no braces {} , take Pythonic indent region to express code block
PythoidC restricts C syntax to Python language feature, so that 
C language bacomes friendly to Python programmers and Python IDE. 
PythoidC realized introspection not only on header files but also any C files.
The PythoidC introspection will be as good as Python introspection, 
if only the C header file wirters adds more detailed annotation. 
PythoidC is a familar and convenient C language tool for Python programmers and 
mixed programming.
plus, PythoidC is realizable only with Python, it's too far beyond C's 
preprocessor, believe it, or show us. 
CHEN Guang
Convenient C Python mixed programming --- PythoidC ( 
http://pythoidc.googlecode.com )
 -- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-11 Thread Metalone
I am assuming that Python delegates the f.seek call to the seek call
in the MS C runtime library msvcrt.dll.
Does anybody know a nice link to the Python source like was posted
above for the BSD 'C' library?

Ok, I ran some more tests.
C, seek: 0.812 seconds   // test from original post
Python, f.seek : 1.458 seconds.  // test from original post

C, time(tm)   : 0.671 seconds
Python, time.time(): 0.513 seconds.
Python, ctypes.msvcrt.time(ctypes.byref(tm)): 0.971 seconds.   #
factored the overhead to be outside the loop, so really this was
func_ptr(ptr).

Perhaps I am just comparing apples to oranges.
I never tested the overhead of ctypes like this before.
Most of my problem timings involve calls through ctypes.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-11 Thread Metalone
I just tried the seek test with Cython.
Cython fseek() : 1.059 seconds.  30% slower than 'C'
Python f.seek  : 1.458 secondds. 80% slower than 'C'.

It is amazing to me that Cython generates a 'C' file that is 1478
lines.


#Cython code

import time

cdef int SEEK_SET = 0

cdef extern from stdio.h:
void* fopen(char* filename, char* mode)
int fseek(void*, long, int)

def main():
cdef void* f1 = fopen('video.txt', 'rb')
cdef int i=100
t0 = time.clock()
while i  0:
   fseek(f1, 0, SEEK_SET)
   i -= 1
delta = time.clock() - t0
print %.3f % delta

if __name__ == '__main__':
main()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-11 Thread Steve Holden
Metalone wrote:
 I just tried the seek test with Cython.
 Cython fseek() : 1.059 seconds.  30% slower than 'C'
 Python f.seek  : 1.458 secondds. 80% slower than 'C'.
 
 It is amazing to me that Cython generates a 'C' file that is 1478
 lines.
 
And what response are you seeking to your amazement?

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See PyCon Talks from Atlanta 2010  http://pycon.blip.tv/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-11 Thread CHEN Guang
 
Metalone wrote:
 I just tried the seek test with Cython.
 Cython fseek() : 1.059 seconds.  30% slower than 'C'
 Python f.seek  : 1.458 secondds. 80% slower than 'C'.
 
 It is amazing to me that Cython generates a 'C' file that is 1478
 lines.
 
 
PythoidC ( http://pythoidc.googlecode.com ) generates the shortest 'C' file.
PythoidC is the C language like the Python, by the Python and for the Python.
CHEN Guang-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-10 Thread Metalone
f1_seek = f1.seek did not change the performance at all.
As it turns out each call is only
646 nanoseconds slower than 'C'.
However, that is still 80% of the time to perform a file seek,
which I would think is a relatively slow operation compared to just
making a system call.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-10 Thread Metalone
Thanks, Tim.
Good to know.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-10 Thread Neil Hodgson
Metalone:

 As it turns out each call is only
 646 nanoseconds slower than 'C'.
 However, that is still 80% of the time to perform a file seek,
 which I would think is a relatively slow operation compared to just
 making a system call.

   A seek may not be doing much beyond setting a current offset value.
It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call.

   An implementation of fseek will often return relatively quickly when
the position is within the current buffer -- from line 192 in
http://www.google.com/codesearch/p?hl=en#XAzRy8oK4zA/libc/stdio/fseek.cq=fseeksa=Ncd=1ct=rc

   Neil
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-10 Thread sjdevn...@yahoo.com
On Mar 10, 6:01 pm, Neil Hodgson nyamatongwe+thun...@gmail.com
wrote:
 Metalone:

  As it turns out each call is only
  646 nanoseconds slower than 'C'.
  However, that is still 80% of the time to perform a file seek,
  which I would think is a relatively slow operation compared to just
  making a system call.

    A seek may not be doing much beyond setting a current offset value.
 It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call.

Exactly.  If I replace both calls to fseek with gettimeofday (aka
time.time() on my platform in python) I get fairly close results:
$ ./testseek
4.120
$ python2.5 testseek.py
4.170
$ ./testseek
4.080
$ python2.5 testseek.py
4.130


FWIW, my results with fseek aren't as bad as those of the OP.  This is
python2.5 on a 2.6.9 Linux OS, with psyco:
$ ./testseek
0.560
$ python2.5 testseek.py
0.750
$ ./testseek
0.570
$ python2.5 testseek.py
0.760
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-09 Thread Paul McGuire
This is a pretty tight loop:

for i in xrange(100):
   f1.seek(0)

But there is still a lot going on, some of which you can lift out of
the loop.  The easiest I can think of is the lookup of the 'seek'
attribute on the f1 object.  Try this:

f1_seek = f1.seek
for i in xrange(100):
   f1_seek(0)

How does that help your timing?

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file seek is slow

2010-03-09 Thread Tim Roberts
Metalone j...@iteris.com wrote:

static void main(int argc, char *argv[])

As a side note, do you realize that this definition is invalid, in two
ways?  main cannot be declared static.  The whole reason we use the
special name main is so that the startup code in the C run-time can link
to it.  If main is static, it won't be exposed in the object file, and
the linker couldn't find it.  It happens to work here because your C
compiler knows about main and discards the static, but that's not a
good practice.

Further, it's not valid to have main return void.  The standards
require that it be declared as returning int.  Again, void happens to
work in VC++, but there are architectures where it does not.
-- 
Tim Roberts, t...@probo.com
Providenza  Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list