[issue21638] Seeking to EOF is too inefficient!

2014-06-03 Thread Charles-François Natali

Charles-François Natali added the comment:

 I agree that Python 2 should use fopen / fread rather than directly read().
 But you may misunderstand this. The 'strace' tool reports Linux system
 calls, including read() rather than fread(), and I guess that read() should
 be finally called in fread() implementation.

 What I mean is that Python 2's seek(0, 2) does not use fseek(0, SEEK_END),
 but fseek(somewhere, SEEK_SET) and fread(rest-bytes) instead, which is too
 inefficient in some kind of storage.

Actually, Python does use fopen(), and fseek(): the culprit is the libc:
$ cat /tmp/test.c; gcc -o /tmp/test /tmp/test.c -Wall; strace /tmp/test
open(/etc/fstab, O_RDONLY)= 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=809, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0xb77ae000
fstat64(3, {st_mode=S_IFREG|0644, st_size=809, ...}) = 0
_llseek(3, 0, [0], SEEK_SET)= 0
read(3, # /etc/fstab: static file system..., 809) = 809
close(3)= 0

 By the way, Python 3 does not behavior like this.

That's because in Python 3, the IO stack is implemented directly on top of 
open()/read()/lseek().

It's not the first time we stumble upon glibc stdio bugs.

I'd suggest closing this.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21638
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21638] Seeking to EOF is too inefficient!

2014-06-03 Thread Linlin Yan

Linlin Yan added the comment:

Thanks! I agree with that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21638
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21638] Seeking to EOF is too inefficient!

2014-06-03 Thread Linlin Yan

Linlin Yan added the comment:

I ensured that the problem is in libc. I will try to figure out it by updating 
libc or optimizing some related parameters.

--
resolution:  - third party
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21638
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21638] Seeking to EOF is too inefficient!

2014-06-02 Thread Linlin Yan

New submission from Linlin Yan:

I noticed this problem when I run a Python2 program (MACS: 
http://liulab.dfci.harvard.edu/MACS/) very inefficiently on a large storage on 
a high performace server (64-bit Linux). It was much slower (more than two 
days) than running it on a normal PC (less than two hours).

After ruling out many optimizing conditions, I finally located the problem on 
the seek() function of Python2. Now I can reproduce the problem in a very 
simple example:

#!/usr/bin/python2
f = open(Input.sort.bam, rb)
f.seek(0, 2)
f.close()

Here, the size of file 'Input.sort.bam' is 4,110,535,920 bytes. When I run the 
program with 'strace' to see the system calls on Linux:

$ strace python2 foo.py
...
open(Input.sort.bam, O_RDONLY)= 3
fstat(3, {st_mode=S_IFREG|0644, st_size=4110535920, ...}) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=4110535920, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f23d4492000
fstat(3, {st_mode=S_IFREG|0644, st_size=4110535920, ...}) = 0
lseek(3, 4110532608, SEEK_SET)  = 4110532608
read(3, 
f\203\337\334\350\313\315\345T\227\211\fC\212a\260\204P\235\366\326\353\230\327\373\361\221\357\373...,
 3312) = 3312
close(3)= 0
...

It seems that python2 just move file cursor to a specific position (4110532608 
in this case) and read ahead the rest bytes, rather than seek to the file end 
directly. I tried to run the exact the same program on the large storage, the 
position changed to 1073741824, left 889310448 bytes to read to reach the file 
end, which reduced the performance a lot!

--
components: IO
messages: 219586
nosy: yanlinlin82
priority: normal
severity: normal
status: open
title: Seeking to EOF is too inefficient!
type: performance
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21638
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21638] Seeking to EOF is too inefficient!

2014-06-02 Thread STINNER Victor

STINNER Victor added the comment:

I don't think that Python calls directly read(). Python 2 uses fopen / fread.

Python 3 doesn't use buffered files, but call open / read directly.

--
nosy: +haypo, neologix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21638
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21638] Seeking to EOF is too inefficient!

2014-06-02 Thread Linlin Yan

Linlin Yan added the comment:

I agree that Python 2 should use fopen / fread rather than directly read().
But you may misunderstand this. The 'strace' tool reports Linux system
calls, including read() rather than fread(), and I guess that read() should
be finally called in fread() implementation.

What I mean is that Python 2's seek(0, 2) does not use fseek(0, SEEK_END),
but fseek(somewhere, SEEK_SET) and fread(rest-bytes) instead, which is too
inefficient in some kind of storage.

By the way, Python 3 does not behavior like this.

On Tue, Jun 3, 2014 at 4:35 AM, STINNER Victor rep...@bugs.python.org
wrote:


 STINNER Victor added the comment:

 I don't think that Python calls directly read(). Python 2 uses fopen /
 fread.

 Python 3 doesn't use buffered files, but call open / read directly.

 --
 nosy: +haypo, neologix

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue21638
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21638
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com