Re: file seek is slow
CHEN Guang, 12.03.2010 08:51: Metalone wrote: I just tried the seek test with Cython. Cython fseek() : 1.059 seconds. 30% slower than 'C' Python f.seek : 1.458 secondds. 80% slower than 'C'. It is amazing to me that Cython generates a 'C' file that is 1478 lines. PythoidC ( http://pythoidc.googlecode.com ) generates the shortest 'C' file. PythoidC is the C language like the Python, by the Python and for the Python. Except that it's not a language but rather a syntax converter, i.e. it doesn't really add any features to the C language but rather restricts Python syntax to C language features (plus a bit of header file introspection, it seems, but C's preprocessor has a bit of that, too). Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone, 11.03.2010 23:57: I just tried the seek test with Cython. Cython fseek() : 1.059 seconds. 30% slower than 'C' Python f.seek : 1.458 secondds. 80% slower than 'C'. It is amazing to me that Cython generates a 'C' file that is 1478 lines. Well, it generated an optimised Python interface for your module and made it compilable in CPython 2.3 through 3.2. It doesn't look like your C module features that. ;) #Cython code import time cdef int SEEK_SET = 0 cdef extern from stdio.h: void* fopen(char* filename, char* mode) int fseek(void*, long, int) Cython ships with a stdio.pxd that you can cimport. It looks like it doesn't currently define fseek(), but it defines at least fopen() and FILE. Patches are always welcome. def main(): cdef void* f1 = fopen('video.txt', 'rb') cdef int i=100 t0 = time.clock() while i 0: fseek(f1, 0, SEEK_SET) i -= 1 delta = time.clock() - t0 Note that the call to time.clock() takes some time, too, so it's not surprising that this is slower than hand-written C code. Did you test how it scales? Also, did you look at the generated C code or the annotated Cython code (cython -a)? Did you make sure both were compiled with the same CFLAGS? Also, any reason you're not using a for-in-xrange loop? It shouldn't make a difference in speed, it's just more common. You even used a for loop in your C code. Finally, I'm not sure why you think that these 30% matter at all. In your original post, you even state that seek-time isn't the deal breaker, so maybe you should concentrate on the real issues? Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Le Tue, 09 Mar 2010 15:56:47 -0800, Metalone a écrit : for i in xrange(100): f1.seek(0) This is quite a stupid benchmark to write, since repeatedly seeking to 0 is a no-op. I haven't re-read the file object code recently, but chances are that the Python file object has its own bookkeeping which adds a bit of execution time. But I would suggest measuring the performance of *actual* seeks to different file offsets, before handwaving about the supposed slowness of file seeks in Python. Regards Antoine. -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
I almost wrote a long reply to all this. In the end it boils down to being concerned about how much overhead there is to calling a 'C' function. I assumed that file.seek() simply delegated to fseek() and thus was one way to test this overhead. However, I now think that it must be doing more and may not be a reasonable comparison. I have used the profiler about as much as I can to find where my program is slow, and it appears to me that the overhead to calling 'C' functions is now my biggest problem. I have been using Ctypes, which has been a great tool so far. I just discovered Cython and this looks like it may help me. I had not heard of pythoid, so I will check it out. I did not mean to offend anybody in Cython community. It just seemed funny to me that 21 lines of Python became 1478 lines of 'C'. I wasn't really expecting any response to this. I don't know enough about this to really assume anything. Stephan, I just tested 1e7 loops. 'C': 8.133 seconds Cython: 10.812 seconds I can't figure out what Cython is using for CFLAGS, so this could be important. I used While instead of xrange, because I thought it would be faster in Cython. They had roughly the same execution speed. Thanks all for the suggestions. I think I will just consider this thread closed. -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone wrote: I just tried the seek test with Cython. Cython fseek() : 1.059 seconds. 30% slower than 'C' Python f.seek : 1.458 secondds. 80% slower than 'C'. It is amazing to me that Cython generates a 'C' file that is 1478 lines. PythoidC ( http://pythoidc.googlecode.com ) generates the shortest 'C' file. PythoidC is the C language like the Python, by the Python and for the Python. Except that it's not a language but rather a syntax converter, i.e. it doesn't really add any features to the C language but rather restricts Python syntax to C language features (plus a bit of header file introspection, it seems, but C's preprocessor has a bit of that, too). Stefan PythoidC is a familar language to Python and C programmers, I do not like waste my time to create unfamilar things to waste users' time studying. In fact PythoidC removed some boring features from C language: 1. no semicolon ; at line ends 2. no braces {} , take Pythonic indent region to express code block PythoidC restricts C syntax to Python language feature, so that C language bacomes friendly to Python programmers and Python IDE. PythoidC realized introspection not only on header files but also any C files. The PythoidC introspection will be as good as Python introspection, if only the C header file wirters adds more detailed annotation. PythoidC is a familar and convenient C language tool for Python programmers and mixed programming. plus, PythoidC is realizable only with Python, it's too far beyond C's preprocessor, believe it, or show us. CHEN Guang Convenient C Python mixed programming --- PythoidC ( http://pythoidc.googlecode.com ) -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
I am assuming that Python delegates the f.seek call to the seek call in the MS C runtime library msvcrt.dll. Does anybody know a nice link to the Python source like was posted above for the BSD 'C' library? Ok, I ran some more tests. C, seek: 0.812 seconds // test from original post Python, f.seek : 1.458 seconds. // test from original post C, time(tm) : 0.671 seconds Python, time.time(): 0.513 seconds. Python, ctypes.msvcrt.time(ctypes.byref(tm)): 0.971 seconds. # factored the overhead to be outside the loop, so really this was func_ptr(ptr). Perhaps I am just comparing apples to oranges. I never tested the overhead of ctypes like this before. Most of my problem timings involve calls through ctypes. -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
I just tried the seek test with Cython. Cython fseek() : 1.059 seconds. 30% slower than 'C' Python f.seek : 1.458 secondds. 80% slower than 'C'. It is amazing to me that Cython generates a 'C' file that is 1478 lines. #Cython code import time cdef int SEEK_SET = 0 cdef extern from stdio.h: void* fopen(char* filename, char* mode) int fseek(void*, long, int) def main(): cdef void* f1 = fopen('video.txt', 'rb') cdef int i=100 t0 = time.clock() while i 0: fseek(f1, 0, SEEK_SET) i -= 1 delta = time.clock() - t0 print %.3f % delta if __name__ == '__main__': main() -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone wrote: I just tried the seek test with Cython. Cython fseek() : 1.059 seconds. 30% slower than 'C' Python f.seek : 1.458 secondds. 80% slower than 'C'. It is amazing to me that Cython generates a 'C' file that is 1478 lines. And what response are you seeking to your amazement? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See PyCon Talks from Atlanta 2010 http://pycon.blip.tv/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone wrote: I just tried the seek test with Cython. Cython fseek() : 1.059 seconds. 30% slower than 'C' Python f.seek : 1.458 secondds. 80% slower than 'C'. It is amazing to me that Cython generates a 'C' file that is 1478 lines. PythoidC ( http://pythoidc.googlecode.com ) generates the shortest 'C' file. PythoidC is the C language like the Python, by the Python and for the Python. CHEN Guang-- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
f1_seek = f1.seek did not change the performance at all. As it turns out each call is only 646 nanoseconds slower than 'C'. However, that is still 80% of the time to perform a file seek, which I would think is a relatively slow operation compared to just making a system call. -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Thanks, Tim. Good to know. -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone: As it turns out each call is only 646 nanoseconds slower than 'C'. However, that is still 80% of the time to perform a file seek, which I would think is a relatively slow operation compared to just making a system call. A seek may not be doing much beyond setting a current offset value. It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call. An implementation of fseek will often return relatively quickly when the position is within the current buffer -- from line 192 in http://www.google.com/codesearch/p?hl=en#XAzRy8oK4zA/libc/stdio/fseek.cq=fseeksa=Ncd=1ct=rc Neil -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
On Mar 10, 6:01 pm, Neil Hodgson nyamatongwe+thun...@gmail.com wrote: Metalone: As it turns out each call is only 646 nanoseconds slower than 'C'. However, that is still 80% of the time to perform a file seek, which I would think is a relatively slow operation compared to just making a system call. A seek may not be doing much beyond setting a current offset value. It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call. Exactly. If I replace both calls to fseek with gettimeofday (aka time.time() on my platform in python) I get fairly close results: $ ./testseek 4.120 $ python2.5 testseek.py 4.170 $ ./testseek 4.080 $ python2.5 testseek.py 4.130 FWIW, my results with fseek aren't as bad as those of the OP. This is python2.5 on a 2.6.9 Linux OS, with psyco: $ ./testseek 0.560 $ python2.5 testseek.py 0.750 $ ./testseek 0.570 $ python2.5 testseek.py 0.760 -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
This is a pretty tight loop: for i in xrange(100): f1.seek(0) But there is still a lot going on, some of which you can lift out of the loop. The easiest I can think of is the lookup of the 'seek' attribute on the f1 object. Try this: f1_seek = f1.seek for i in xrange(100): f1_seek(0) How does that help your timing? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
Metalone j...@iteris.com wrote: static void main(int argc, char *argv[]) As a side note, do you realize that this definition is invalid, in two ways? main cannot be declared static. The whole reason we use the special name main is so that the startup code in the C run-time can link to it. If main is static, it won't be exposed in the object file, and the linker couldn't find it. It happens to work here because your C compiler knows about main and discards the static, but that's not a good practice. Further, it's not valid to have main return void. The standards require that it be declared as returning int. Again, void happens to work in VC++, but there are architectures where it does not. -- Tim Roberts, t...@probo.com Providenza Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list