Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-12 Thread John J. Lee
"Chris Mellon" <[EMAIL PROTECTED]> writes: [...] > The minimum bounds for a line is at least one byte (the newline) and > maybe more, depending on your data. You can seek() forward the minimum > amount of bytes that (1 billion -1) lines will consume and save > yourself some wasted IO. But how do y

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Terry Reedy
"Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] | On Wed, 08 Aug 2007 09:54:26 +0200, Méta-MCI \(MVP\) wrote: | | > Create a "index" (a file with 3,453,299,000 tuples : | > line_number + start_byte) ; this file has fix-length lines. | > slow, OK, but once. |

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Chris Mellon
On 8/8/07, Steve Holden <[EMAIL PROTECTED]> wrote: > Chris Mellon wrote: > > On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: > >> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > >> > >>> On Aug 8, 2:35 am, Paul Rubin wrote: > Sullivan WxPyQtKinter <[EMAIL PROTEC

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Steve Holden
Chris Mellon wrote: > On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: >> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: >> >>> On Aug 8, 2:35 am, Paul Rubin wrote: Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > This program: > for i in range(1000

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Chris Mellon
On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > On Aug 8, 2:35 am, Paul Rubin wrote: > > > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > > This program: > > > > for i in range(10): > > > >

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Bruno Desthuilliers
Ant a écrit : > On Aug 8, 11:10 am, Bruno Desthuilliers [EMAIL PROTECTED]> wrote: >> Jay Loden a écrit : >> (snip) >> >>> If we just want to iterate through the file one line at a time, why not >>> just: >>> count = 0 >>> handle = open('hugelogfile.txt') >>> for line in handle.xreadlines(): >>>

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Bjoern Schliessmann
Peter Otten wrote: > n = 10**9 - 1 > assert n < sys.maxint > f = open(filename) > wanted_line = itertools.islice(f, n, None).next() > > should do slightly better than your implementation. It will do vastly better, at least in memory usage terms, because there is no memory eating range call. Reg

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Ant
On Aug 8, 11:10 am, Bruno Desthuilliers wrote: > Jay Loden a écrit : > (snip) > > > If we just want to iterate through the file one line at a time, why not > > just: > > > count = 0 > > handle = open('hugelogfile.txt') > > for line in handle.xreadlines(): > > count = count + 1 > > if coun

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Ben Finney
Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > On Aug 8, 2:35 am, Paul Rubin wrote: > > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > This program: > > > for i in range(10): > > > f.readline() > > > is absolutely every slow > > > > There

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Marc 'BlackJack' Rintsch
On Wed, 08 Aug 2007 09:54:26 +0200, Méta-MCI \(MVP\) wrote: > Create a "index" (a file with 3,453,299,000 tuples : > line_number + start_byte) ; this file has fix-length lines. > slow, OK, but once. Why storing the line number? The first start offset is for the first line, the second start offs

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Bruno Desthuilliers
Jay Loden a écrit : (snip) > If we just want to iterate through the file one line at a time, why not just: > > count = 0 > handle = open('hugelogfile.txt') > for line in handle.xreadlines(): > count = count + 1 > if count == '10': > #do something for count, line in enumera

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread M�ta-MCI (MVP)
Hi! Create a "index" (a file with 3,453,299,000 tuples : line_number + start_byte) ; this file has fix-length lines. slow, OK, but once. Then, for every consult/read a specific line: - direct acces read on index - seek at the fisrt byte of the line desired @+ Michel Claveau -- http://

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Jay Loden
Paul Rubin wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: >> This program: >> for i in range(10): >> f.readline() >> is absolutely every slow > > There are two problems: > > 1) range(10) builds a list of a billion elements in memory, > which is many gig

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Peter Otten
Sullivan WxPyQtKinter wrote: > I have a huge log file which contains 3,453,299,000 lines with > different lengths. It is not possible to calculate the absolute > position of the beginning of the one billionth line. Are there > efficient way to seek to the beginning of that line in python? > > Thi

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Sullivan WxPyQtKinter
On Aug 8, 2:35 am, Paul Rubin wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > This program: > > for i in range(10): > > f.readline() > > is absolutely every slow > > There are two problems: > > 1) range(10) builds a list of a bill

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Paul Rubin
Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > This program: > for i in range(10): > f.readline() > is absolutely every slow There are two problems: 1) range(10) builds a list of a billion elements in memory, which is many gigabytes and probably thrashing your

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Evan Klitzke
On 8/7/07, Sullivan WxPyQtKinter <[EMAIL PROTECTED]> wrote: > I have a huge log file which contains 3,453,299,000 lines with > different lengths. It is not possible to calculate the absolute > position of the beginning of the one billionth line. Are there > efficient way to seek to the beginning of

Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Sullivan WxPyQtKinter
I have a huge log file which contains 3,453,299,000 lines with different lengths. It is not possible to calculate the absolute position of the beginning of the one billionth line. Are there efficient way to seek to the beginning of that line in python? This program: for i in range(10):