[issue1142] code sample showing errors reading large files with py 2.5/3.0

2010-09-17 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

issue1744752 describes why it's probably a bug in the C library.
possible workarounds are to open the files in universal mode, to use io.open(), 
or to switch to python 3!

--
nosy: +amaury.forgeotdarc
resolution:  - wont fix
status: open - closed
superseder:  - Newline skipped in for line in file for huge file

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1142
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2010-08-06 Thread Tim Golden

Changes by Tim Golden m...@timgolden.me.uk:


--
nosy: +tim.golden

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1142
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2010-08-06 Thread Guido van Rossum

Changes by Guido van Rossum gu...@python.org:


--
nosy:  -gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1142
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2009-05-12 Thread Daniel Diniz

Changes by Daniel Diniz aja...@gmail.com:


--
components: +IO
nosy: +benjamin.peterson, pitrou
stage:  - test needed
versions: +Python 2.6, Python 3.1 -Python 2.5

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1142
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2008-03-17 Thread Sean Reifschneider

Sean Reifschneider [EMAIL PROTECTED] added the comment:

I have run this under the current py3k SVN version on an 64-bit Linux
(Fedora 8), and it runs fine, FYI.  ISTR that I had a patch which fixed
something that sounds very much like this, but I can't find that other
issue.

--
nosy: +jafo
priority:  - normal

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1142
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-12 Thread christen

christen added the comment:

Bug is still there but pb is solved, simply use oepn('file', 'U')
see outputs :

fichin=open('test.txt','U')
===
(2, 5, 0, 'final', 0)
2007-09-12 08:00:43
(500, 9.31236239624)
(1000, 22.31236239624)
(1500, 35.094000101089478)
(2000, 47.81236239624)
(2500, 60.56236239624)
(3000, 73.265000104904175)
(3500, 85.95368664551)
(4000, 98.672000169754028)
(4500, 111.35900020599365)
(5000, 123.98400020599365)
(5500, 136.625)
(6000, 149.26500010490417)
(6500, 161.9060001373291)
(7000, 174.625)
(7500, 187.29700016975403)
(8000, 199.8910490417)
(8500, 212.5310001373291)
('total lines read ', 85014960)
212.56236

now with
fichin=open('test.txt')
or
fichin=open('test.txt','r')
===

(2, 5, 0, 'final', 0)
2007-09-12 08:04:48
(500, 3.18763760376)
(1000, 6.3440001010894775)
(1500, 9.4690001010894775)
(2000, 12.594000101089478)
(2500, 15.719000101089478)
(3000, 18.844000101089478)
(3500, 21.969000101089478)
(4000, 25.094000101089478)
(4500, 28.219000101089478)
(5000, 31.344000101089478)
(5500, 34.469000101089478)
(6000, 37.594000101089478)
* 62410138   
62410139 *
* 62414887   
62414888 *
* 62415540   
62415541 *
* 62420289   
62420290 *
* 62420942   
62420943 *
* 62421595   
62421596 *
* 62422248   
62422249 *
* 62422901   
62422902 *
* 62427650   
62427651 *
* 62428303   
62428304 *
(6500, 40.75)
(7000, 43.95368664551)
(7500, 47.125)
(8000, 50.32868664551)
(8500, 53.51632424927)
('total lines read ', 85014950)
53.516324

best
Richard

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1142
__begin:vcard
fn:Richard Christen
n:Christen;Richard
org;quoted-printable:CNRS UMR 6543   Universit=C3=A9 de Nice;Laboratoire de Biologie Virtuelle
adr:Parc Valrose;;Centre de Biochimie;Nice;;06108;France
email;internet:[EMAIL PROTECTED]
title;quoted-printable:Champion de saut en =C3=A9paisseur
tel;work:33- 492 076 947
url:http://bioinfo.unice.fr
version:2.1
end:vcard

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-12 Thread Guido van Rossum

Guido van Rossum added the comment:

Cool. This helps track down the bug a bit more; it's either in (our
routine) getline_via_fgets or it's in Microsoft's text mode line end
translation (which universal newlines bypasses).

I'm assigning this to Tim Peters, who probably still has a Windows box
and once optimized the snot out of this code.

--
assignee:  - tim_one
nosy: +tim_one

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1142
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-11 Thread christen

christen added the comment:

Hi Guido

It is not the end of the file that is not read (see also below)

I found about that about one year ago when I was parsing very large 
files resulting from blast on the human genome
My parser chock after 4 Go, well before the end of the file : one line 
was missing and my acc=li[x:y] end up with an error, because acc was 
never filled...
This was kind of strange because this had not happened before with my 
Linux box.

I opened the file (which I had created myself) with a editor that could 
show hexa code : the proper line was there and allright.
If I remember well, I modified my code to see better what was going on : 
in fact the missing line had been concateneted to the previous line 
despite the proper existence of the end of line (hexa code was ok). see 
also below

I forgot about that because nobody replied to my mails, and I thought it 
was possibly related with windows 32 . I moved to a windows 64 recently 
(windows has the best driver for SQL databases) and forgot about the bug 
until I again ran into it. I then decided to try python 3k, it reads 
 4Go file with no trouble but is so so slow, both in reading and 
writing files.
The following code produces either 4Go or 4Go files depending upon 
which fichout.write is commented
They both have the same line numbers, but the 4Go does not read 
completely under windows (32 or 64)
I have no such pb on Linux or BSD (Mac).

python 3k on windows read both files ok, but is very very slow (change 
xrange to range , I guess it is preposterous to advice you about that :-).

best
Richard

import sys
print(sys.version_info)
import time
print (time.strftime('%Y-%m-%d %H:%M:%S'))
liste=[]
start = time.time()
fichout=open('test.txt','w')
for i in xrange(85014961):
if i%500==0 and i0:
print (i,time.time()-start)
fichout.write(str(i)+' '*59+'\n')  #big file
#fichout.write(str(i)+'\n')#small file, same number of lines

fishout.flush()
fichout.close()
print ('total lines written ',i)
print (i,time.time()-start)
print ('*'*50)
fichin=open('test.txt')
start3 = time.time()
for i,li in enumerate(fichin):
if i%500==0 and i0:
print (i,time.time()-start3)
fichin.close()
print ('total lines read ',i)
print(time.time()-start)

 Richard, can you somehow view the end of the file to see what its last
 lines actually are?  It should end like this:

 85014951
 85014952
 85014953
 85014954
 85014955
 85014956
 85014957
 85014958
 85014959
 85014960

   

using a text editor reads:
85014944  
85014945  
85014946  
85014947  
85014948  
85014949  
85014950  
85014951  
85014952  
85014953  
85014954  
85014955  
85014956  
85014957  
85014958  
85014959  
85014960  

windows py 2.5, with
if i85014940:
print i, li.strip()

prints :
(2, 5, 0, 'final', 0)
2007-09-11 07:58:47
(500, 2.6720001697540283)
(1000, 5.375)
(1500, 8.032648498535)
(2000, 10.70368664551)
(2500, 13.375)
(3000, 16.047000169754028)
(3500, 18.70368664551)
(4000, 21.36133514404)
(4500, 24.03264849854)
(5000, 26.68763760376)
(5500, 29.36133514404)
(6000, 32.03264849854)
(6500, 34.70368664551)
(7000, 37.40764849854)
(7500, 40.094000101089478)
(8000, 42.797000169754028)
(8500, 45.485000133514404)
85014941 85014951  
85014942 85014952  
85014943 85014953  
85014944 85014954  
85014945 85014955  
85014946 85014956  
85014947 85014957  
85014948 85014958  
85014949 85014959  
85014950 

[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-10 Thread Stefan Sonnenberg-Carstens

Changes by Stefan Sonnenberg-Carstens:


--
components: +Interpreter Core
title: code sample showing errors reading large files with py 2.5 - code 
sample showing errors reading large files with py 2.5/3.0
versions: +Python 3.0

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1142
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-10 Thread Guido van Rossum

Guido van Rossum added the comment:

PythonMeister, what do you mean, confirmed? Your read loop ends printing 

('total lines read ', 85014960)

which is the expected output.  (It's one less than the number of lines
written due to a bug in the program -- it prints the 0-based ordinal of
the last line written rather than the total number of lines written,
which is one more. But the bug is the same in the input and output loop.
 Richard's output from the read loop was

('total lines read ', 85014950)

i.e. 10 less than written.

I wonder if the bug is simply a matter of a failure to flush on Windows?
 I can't reproduce it on Linux (Ubuntu dapper).

Richard, can you somehow view the end of the file to see what its last
lines actually are?  It should end like this:

85014951
85014952
85014953
85014954
85014955
85014956
85014957
85014958
85014959
85014960

--
nosy: +gvanrossum

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1142
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-10 Thread Stefan Sonnenberg-Carstens

Stefan Sonnenberg-Carstens added the comment:

I can confirm that under Linux (Linux nx6310 2.6.22-1-mepis-smp #1 SMP
PREEMPT Wed Sep 5 22:23:08 EDT 2007 i686 GNU/Linux, SimplyMepis 7.0b3)
1. using Python 3.0a1 is _very_ slow
2. it eats all your cpu (see my post)
I did not take the time to wait for the program to finish with 3.0a1,
as my patience is limited. I don't think it would silently drop lines,
as the windows version.

To see if flushing matters, I'll try this later:

import sys
print(sys.version_info)
import time
print (time.strftime('%Y-%m-%d %H:%M:%S'))
liste=[]
start = time.time()
fichout=open('test.txt','w')
for i in xrange(85014961):
if i%500==0 and i0:
print (i,time.time()-start)
fichout.write(str(i)+' '*59+'\n')
fishout.flush()
fichout.close()
print ('total lines written ',i)
print (i,time.time()-start)
print ('*'*50)
fichin=open('test.txt')
start3 = time.time()
for i,li in enumerate(fichin):
if i%500==0 and i0:
print (i,time.time()-start3)
fichin.close()
print ('total lines read ',i)
print(time.time()-start)


I've seen a case lately on Windows XP SP2 with Python 2.3, where a
college of mine wrote some files he read from a zip file to disk.
Before the close() he also had to flush() the written files
explicitly, otherwise he was not able to rename them afterwards.
His first approach was time.sleep(30), which was not an option.
I'll come back, if I ran the code under Windows.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1142
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com