Thanks lot for your valuable suggestions On Sun, Jun 15, 2008 at 4:04 AM, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
> On Sat, 14 Jun 2008 12:45:47 +0530, "Beema shafreen" > <[EMAIL PROTECTED]> declaimed the following in > gmane.comp.python.general: > > Strange: I don't recall seeing this on comp.lang.py, just the first > responder; and a search on message ID only found it on gmane... > > > Hi all, > > > > I have a file with three columns i need to sort the file with respect to > > the third column. How do I do it uisng python. I used Linux command to do > > this. Sort but i not able to do it ? > > can any body ssuggest me > > Question 1: Will the file fit completely within the memory of a running > Python program? > > Question 2: How are the columns defined? Fixed width, known in advance; > tab separated; comma separated. > > If #1 is true, I'd read the file into a list of tuples/sublists (if line > is fixed width columns, read line, manually split on column widths; if > TSV or CSV use the proper options with the CSV module to read the file). > Define a sort key function to extract the key column and use the > built-in list sort method > > data.sort(key=lambda x : x[2]) #warning, I'm not skilled at lambda > > Actually, if text sort order (not numeric value order) is okay, and the > lines are fixed width columns, no need to manually split the columns > into tuples; just read all lines into a list and define a key function > that picks out the columns needed > > data.sort(key=lambda x : x[colstart:colend]) > > > If #1 if FALSE (too big for memory) you will need to create a sort-merge > procedure in which you read n-lines of the file; sort them, write to > temporary file; alternating among 2+ temporary files keeping the same > n-lines (except for the last packet). Then merge the 2+ temporaries over > the n-lines in the batch to a new temporary file; after the first n > lines have been merged (giving n*2+ lines in the batch) switch to > another temporary file for the next batch.... When all original batches > are merged, repeat the merge using batches of size n*2+... Repeat until > only one temporary file is left (ie, only one long merge batch is > written). > > Or figure out how to call whatever system sort command is available > with whatever parameters are needed -- after all, why reinvent the wheel > if you can reach outside the snake and grab that is already in the snake > pit ("outside the snake" => os.system(...); "snake pit" => the OS > environment). Even WinXP has a command line sort command; as long as you > don't need a multikey sort it can handle the simple text record sorting > with limitations on memory size to use. > > -- > Wulfraed Dennis Lee Bieber KD6MOG > [EMAIL PROTECTED] [EMAIL PROTECTED] > HTTP://wlfraed.home.netcom.com/ > (Bestiaria Support Staff: [EMAIL PROTECTED]) > HTTP://www.bestiaria.com/ > -- > http://mail.python.org/mailman/listinfo/python-list > -- Beema Shafreen
-- http://mail.python.org/mailman/listinfo/python-list