In article [EMAIL PROTECTED],
[EMAIL PROTECTED] wrote:
Thanks to all who replied. It's very appreciated.
Yes, I had to doublecheck line counts and the number of lines is ~16
million (insetead of stated 1.6B).
Also:
What is a Unicode text file? How is it encoded: utf8, utf16, utf16le,
utf16be,
In article [EMAIL PROTECTED],
John Nagle [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
Thanks to all who replied. It's very appreciated.
Yes, I had to double check line counts and the number of lines is ~16
million (instead of stated 1.6B).
OK, that's not bad at all.
You have a
Gabriel Genellina wrote:
use the Windows sort command. It has been
there since MS-DOS ages, there is no need to download and install other
packages, and the documentation at
http://technet.microsoft.com/en-us/library/bb491004.aspx says:
Limits on file size:
The sort command has no limit
On 2008-01-27, Stefan Behnel [EMAIL PROTECTED] wrote:
Gabriel Genellina wrote:
use the Windows sort command. It has been
there since MS-DOS ages, there is no need to download and install other
packages, and the documentation at
http://technet.microsoft.com/en-us/library/bb491004.aspx says:
On Sun, 27 Jan 2008 10:00:45 +, Grant Edwards wrote:
On 2008-01-27, Stefan Behnel [EMAIL PROTECTED] wrote:
Gabriel Genellina wrote:
use the Windows sort command. It has been
there since MS-DOS ages, there is no need to download and install other
packages, and the documentation at
En Fri, 25 Jan 2008 17:50:17 -0200, Paul Rubin
http://phr.cx@NOSPAM.invalid escribi�:
Nicko [EMAIL PROTECTED] writes:
# The next line is order O(n) in the number of chunks
(line, fileindex) = min(mergechunks)
You should use the heapq module to make this operation O(log n) instead.
On Jan 24, 9:26 pm, [EMAIL PROTECTED] wrote:
If you really have a 2GB file and only 2GB of RAM, I suggest that you don't
hold your breath.
I am limited with resources. Unfortunately.
As long as you have at least as much disc space spare as you need to
hold a copy of the file then this is
On Jan 24, 4:26 pm, [EMAIL PROTECTED] wrote:
Thanks to all who replied. It's very appreciated.
Yes, I had to doublecheck line counts and the number of lines is ~16
million (insetead of stated 1.6B).
Also:
What is a Unicode text file? How is it encoded: utf8, utf16, utf16le,
utf16be, ???
On Jan 25, 9:23 am, Asim [EMAIL PROTECTED] wrote:
On Jan 24, 4:26 pm, [EMAIL PROTECTED] wrote:
Thanks to all who replied. It's very appreciated.
Yes, I had to doublecheck line counts and the number of lines is ~16
million (insetead of stated 1.6B).
Also:
What is a Unicode text
Nicko [EMAIL PROTECTED] writes:
# The next line is order O(n) in the number of chunks
(line, fileindex) = min(mergechunks)
You should use the heapq module to make this operation O(log n) instead.
--
http://mail.python.org/mailman/listinfo/python-list
Hello all,
I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
to sort based on first two characters.
I'd greatly appreciate if someone can post sample code that can help
me do this.
Also, any ideas on approximately how long is the sort process going to
take (XP, Dual Core
[EMAIL PROTECTED] writes:
I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
to sort based on first two characters.
I'd greatly appreciate if someone can post sample code that can help
me do this.
Use the unix sort command:
sort inputfile -o outputfile
I think
[EMAIL PROTECTED] wrote:
Hello all,
I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
to sort based on first two characters.
Given those numbers, the average number of characters per line is
less than 2. Please check.
John
On Jan 25, 6:18 am, [EMAIL PROTECTED] wrote:
Hello all,
I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
to sort based on first two characters.
If you mean 1.6 American billion i.e. 1.6 * 1000 ** 3 lines, and 2 *
1024 ** 3 bytes of data, that's 1.34 bytes per line. If
Thanks to all who replied. It's very appreciated.
Yes, I had to doublecheck line counts and the number of lines is ~16
million (insetead of stated 1.6B).
Also:
What is a Unicode text file? How is it encoded: utf8, utf16, utf16le,
utf16be, ??? If you don't know, do this:
The file is UTF-8
Do
[EMAIL PROTECTED] wrote:
What are you going to do with it after it's sorted?
I need to isolate all lines that start with two characters (zz to be
particular)
Isolate as in extract? Remove the rest?
Then why don't you extract the lines first, without sorting the file? (or sort
it afterwards if
Stefan Behnel wrote:
[EMAIL PROTECTED] wrote:
What are you going to do with it after it's sorted?
I need to isolate all lines that start with two characters (zz to be
particular)
Isolate as in extract? Remove the rest?
Then why don't you extract the lines first, without sorting the file?
On Jan 25, 8:26 am, [EMAIL PROTECTED] wrote:
I need to isolate all lines that start with two characters (zz to be
particular)
What does isolate mean to you? What does this have to do with
sorting? What do you actually want to do with (a) the lines starting
with zz (b) the other lines? What
On Thursday 24 January 2008 20:56 John Nagle wrote:
[EMAIL PROTECTED] wrote:
Hello all,
I have an Unicode text file with 1.6 billon lines (~2GB) that I'd like
to sort based on first two characters.
Given those numbers, the average number of characters per line is
less than 2.
John Nagle [EMAIL PROTECTED] writes:
- Get enough memory to do the sort with an in-memory sort, like
UNIX sort or Python's sort function.
Unix sort does external sorting when needed.
--
http://mail.python.org/mailman/listinfo/python-list
[EMAIL PROTECTED] wrote:
Thanks to all who replied. It's very appreciated.
Yes, I had to double check line counts and the number of lines is ~16
million (instead of stated 1.6B).
OK, that's not bad at all.
You have a few options:
- Get enough memory to do the sort with an
Paul Rubin wrote:
John Nagle [EMAIL PROTECTED] writes:
- Get enough memory to do the sort with an in-memory sort, like
UNIX sort or Python's sort function.
Unix sort does external sorting when needed.
Ah, someone finally put that in. Good. I hadn't looked at sort's manual
page
John Nagle [EMAIL PROTECTED] writes:
Unix sort does external sorting when needed.
Ah, someone finally put that in. Good. I hadn't looked at
sort's manual page in many years.
Huh? It has been like that from the beginning. It HAD to be. Unix
was originally written on a PDP-11. The GNU
23 matches
Mail list logo