Re: fixing an horrific formatted csv file.
flebber wrote: so in my file I had on line 44 this trainer name. Michael, Wayne John Hawkes and in line 95 this horse name. Inz'n'out this throws of my capturing correct item 9. How do I protect against this? Use python's csv module to read the file. Don't try to do it yourself; the rules for handling embedded commas and quotes in csv are quite complicated. As long as the file is a well-formed csv file, the csv module should parse fields like that correctly. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On Friday, 4 July 2014 14:12:15 UTC+10, flebber wrote: I have taken the code and gone a little further, but I need to be able to protect myself against commas and single quotes in names. How is it the best to do this? so in my file I had on line 44 this trainer name. Michael, Wayne John Hawkes and in line 95 this horse name. Inz'n'out this throws of my capturing correct item 9. How do I protect against this? Here is current code. import re from sys import argv SCRIPT, FILENAME = argv def out_file_name(file_name): take an input file and keep the name with appended _clean file_parts = file_name.split(.,) output_file = file_parts[0] + '_clean.' + file_parts[1] return output_file def race_table(text_file): utility to reorganise poorly made csv entry input_table = [[item.strip(' ') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record[0] == 'Meeting': meeting = record[3] elif record[0] == 'Race': date = record[13] race = record[1] elif record[0] == 'Horse': number = record[1] name = record[2] results = record[9] res_split = re.split('[- ]', results) starts = res_split[0] wins = res_split[1] seconds = res_split[2] thirds = res_split[3] prizemoney = res_split[4] trainer = record[4] location = record[5] print(name, wins, seconds) output_table.append((meeting, date, race, number, name, starts, wins, seconds, thirds, prizemoney, trainer, location)) return output_table MY_FILE = out_file_name(FILENAME) # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: # for line in race_table(f_in.readline()): # new_row = line with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: CONTENT = f_in.read() # print(content) FILE_CONTENTS = race_table(CONTENT) # print new_name f_out.write(str(FILE_CONTENTS)) if __name__ == '__main__': pass So I found this on stack overflow In [2]: import string In [3]: identity = string.maketrans(, ) In [4]: x = ['+5556', '-1539', '-99', '+1500'] In [5]: x = [s.translate(identity, +-) for s in x] In [6]: x Out[6]: ['5556', '1539', '99', '1500'] but it fails in my file, due to I believe mine being a list of list. Is there an easy way to iterate the sublists without flattening? Current code. input_table = [[item.strip(' ') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices identity = string.maketrans(, ) print(input_table) input_table = [s.translate(identity, ,') for s in input_table] Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On Friday, 4 July 2014 16:19:09 UTC+10, Gregory Ewing wrote: flebber wrote: so in my file I had on line 44 this trainer name. Michael, Wayne John Hawkes and in line 95 this horse name. Inz'n'out this throws of my capturing correct item 9. How do I protect against this? Use python's csv module to read the file. Don't try to do it yourself; the rules for handling embedded commas and quotes in csv are quite complicated. As long as the file is a well-formed csv file, the csv module should parse fields like that correctly. -- Greg True Greg worked easier def race_table(text_file): utility to reorganise poorly made csv entry # input_table = [[item.strip(' ') for item in record.split(',')] #for record in text_file.splitlines()] # At this point look at input_table to find the record indices # identity = string.maketrans(, ) # print(input_table) # input_table = [s.translate(identity, ,') for s #in input_table] output_table = [] for record in text_file: if record[0] == 'Meeting': meeting = record[3] elif record[0] == 'Race': date = record[13] race = record[1] elif record[0] == 'Horse': number = record[1] name = record[2] results = record[9] res_split = re.split('[- ]', results) starts = res_split[0] wins = res_split[1] seconds = res_split[2] thirds = res_split[3] try: prizemoney = res_split[4] finally: prizemoney = 0 trainer = record[4] location = record[5] print(name, wins, seconds) output_table.append((meeting, date, race, number, name, starts, wins, seconds, thirds, prizemoney, trainer, location)) return output_table MY_FILE = out_file_name(FILENAME) # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: # for line in race_table(f_in.readline()): # new_row = line with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: CONTENT = csv.reader(f_in) # print(content) FILE_CONTENTS = race_table(CONTENT) # print new_name f_out.write(str(FILE_CONTENTS)) if __name__ == '__main__': pass Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On 07/04/2014 12:28 PM, flebber wrote: On Friday, 4 July 2014 14:12:15 UTC+10, flebber wrote: I have taken the code and gone a little further, but I need to be able to protect myself against commas and single quotes in names. How is it the best to do this? so in my file I had on line 44 this trainer name. Michael, Wayne John Hawkes and in line 95 this horse name. Inz'n'out this throws of my capturing correct item 9. How do I protect against this? Here is current code. import re from sys import argv SCRIPT, FILENAME = argv def out_file_name(file_name): take an input file and keep the name with appended _clean file_parts = file_name.split(.,) output_file = file_parts[0] + '_clean.' + file_parts[1] return output_file def race_table(text_file): utility to reorganise poorly made csv entry input_table = [[item.strip(' ') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record[0] == 'Meeting': meeting = record[3] elif record[0] == 'Race': date = record[13] race = record[1] elif record[0] == 'Horse': number = record[1] name = record[2] results = record[9] res_split = re.split('[- ]', results) starts = res_split[0] wins = res_split[1] seconds = res_split[2] thirds = res_split[3] prizemoney = res_split[4] trainer = record[4] location = record[5] print(name, wins, seconds) output_table.append((meeting, date, race, number, name, starts, wins, seconds, thirds, prizemoney, trainer, location)) return output_table MY_FILE = out_file_name(FILENAME) # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: # for line in race_table(f_in.readline()): # new_row = line with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: CONTENT = f_in.read() # print(content) FILE_CONTENTS = race_table(CONTENT) # print new_name f_out.write(str(FILE_CONTENTS)) if __name__ == '__main__': pass So I found this on stack overflow In [2]: import string In [3]: identity = string.maketrans(, ) In [4]: x = ['+5556', '-1539', '-99', '+1500'] In [5]: x = [s.translate(identity, +-) for s in x] In [6]: x Out[6]: ['5556', '1539', '99', '1500'] but it fails in my file, due to I believe mine being a list of list. Is there an easy way to iterate the sublists without flattening? Current code. input_table = [[item.strip(' ') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices identity = string.maketrans(, ) print(input_table) input_table = [s.translate(identity, ,') for s in input_table] Sayth Take Gregory's advice and use the csv module. Don't reinvent a csv parser. My csv splitter was the simplest approach possible, which I tend to use with undocumented formats, tweaking for unexpected features as they come along. Frederic -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
I have taken the code and gone a little further, but I need to be able to protect myself against commas and single quotes in names. How is it the best to do this? so in my file I had on line 44 this trainer name. Michael, Wayne John Hawkes and in line 95 this horse name. Inz'n'out this throws of my capturing correct item 9. How do I protect against this? Here is current code. import re from sys import argv SCRIPT, FILENAME = argv def out_file_name(file_name): take an input file and keep the name with appended _clean file_parts = file_name.split(.,) output_file = file_parts[0] + '_clean.' + file_parts[1] return output_file def race_table(text_file): utility to reorganise poorly made csv entry input_table = [[item.strip(' ') for item in record.split(',')] for record in text_file.splitlines()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record[0] == 'Meeting': meeting = record[3] elif record[0] == 'Race': date = record[13] race = record[1] elif record[0] == 'Horse': number = record[1] name = record[2] results = record[9] res_split = re.split('[- ]', results) starts = res_split[0] wins = res_split[1] seconds = res_split[2] thirds = res_split[3] prizemoney = res_split[4] trainer = record[4] location = record[5] print(name, wins, seconds) output_table.append((meeting, date, race, number, name, starts, wins, seconds, thirds, prizemoney, trainer, location)) return output_table MY_FILE = out_file_name(FILENAME) # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: # for line in race_table(f_in.readline()): # new_row = line with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out: CONTENT = f_in.read() # print(content) FILE_CONTENTS = race_table(CONTENT) # print new_name f_out.write(str(FILE_CONTENTS)) if __name__ == '__main__': pass -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
TM = TX.Table_Maker (headings = ('Meeting','Date','Race','Number','Name','Trainer','Location')) TM (race_table (your_csv_text)).write () Where do I find TX? Found this mention in the list, was it available in pip by any name? https://mail.python.org/pipermail/python-list/2014-February/667464.html Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On 07/02/2014 11:13 AM, flebber wrote: TM = TX.Table_Maker (headings = ('Meeting','Date','Race','Number','Name','Trainer','Location')) TM (race_table (your_csv_text)).write () Where do I find TX? Found this mention in the list, was it available in pip by any name? https://mail.python.org/pipermail/python-list/2014-February/667464.html Sayth I'd have to make it available. I proposed it some time ago and received a couple of suggestions in return. It is a modular transformation framework written entirely in python (2.7). It consists essentially of a base class Transformer that handles input and output in such a way that Transformer objects can be chained. It saved me from drowning an a horrible and growing tangle of hacks. Finding something usable I had previously done took time. Understanding how it worked took more time and adapting it took still more time, so that writing yet another hack from scratch was faster. A number of hacks I could quickly wrap into a Transformer object and so could start building a library of standard Transformers. The Table_Maker is one of them. The table making code is quite bad. It suffers from feature overload. I would clean it up for distribution. I'd be happy to distribute the base class and a few standard Translators, such as I use every day. (File Reader, File Writer, DB Run Command, DB Write, Table Maker, PDF To Text, Text To Lines, Lines To Text, Sort, Sort And Unique, etc.) Writing one's own Transformers is a breeze. Testing too, because a Transformer keeps its input and output and, in line with the system's design philosophy, does only its own single thing. A Chain is a list of Transformers that run in sequence. It is itself derived from Transformer and is a functional equivalent. So Chains nest. Fixing a Chain that nothing comes out of is a straightforward matter too. It will still have run up to the failing element. Chain.show () reveals the culprit as the first one to have no output. I am not up to date on distributing and would depend on qualified help on that. Frederic A brief overview The TX solution to your race table would be (TX is the name of the module): class Race_Table (TX.Transformer): ''' In: CSV text Out: Tabular data (2-dimensional list) ''' name = 'Race_Table' @TX.setup # Checks timestamps to prevent needless reruns in the absence of new input def transform (self): for line in self.Input.data: # See my post self.Output.take (output_table) Example file to file: Race_Schedule_F2F = TX.Chain (TX.File_Reader (), Race_Table (), TX.List_To_CSV (delimiter = ';'), TX.File_Writer (terminal = out_file_name) Race_Schedule_F2F (input_file_name) # Does it all! Example web to database: Race_Schedule_WWW2DB = TX.Chain (TX.WWW_Reader (), Race_Schedule_HTML_Reader (), Race_Table (), TX.DB_Writer (table_name = 'horses')) Race_Schedule_WWW2DB (url) # Does is all! You'd have to write the Race_Schedule_HTML_Reader Verify your table: Table_Viewer = TX.Chain (TX.Table_Maker (), TX.Table_Writer ()) Race_Schedule_WWW2DB.show_tree () # See which one should display Chain Chain[0] - WWW Reader Chain[1] - Race_Schedule_HTML_Reader Chain[2] - Race_Table Chain[3] - DB Writer print Table_Viewer (Race_Schedule_WWW2DB[2]()) # All Transformers keep their data (Display of table) Verify database: print Table_Viewer (TX.DB_Reader (table_name = 'horses')()) (Display of database table) -- https://mail.python.org/mailman/listinfo/python-list
fixing an horrific formatted csv file.
What I am trying to do is to reformat a csv file into something more usable. currently the file has no headers, multiple lines with varying columns that are not related. This is a sample Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit, , Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT ,54,0,0,5/07/2014,, , , , ,No class restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, (Listed),Of $10. First $6, second $2, third $1, fourth $5000, fifth $2000, sixth $1000, seventh $1000, eighth $1000 Horse,1,Bennetta,0,Grahame Begg,Randwick,,0,0,16-3-1-3 $390450.00,,0,0,0,,98.00,M, Horse,2,Breakfast in Bed,0,David Vandyke,Warwick Farm,,0,0,20-6-1-5 $201250.00,,0,0,0,,81.00,M, Horse,3,Capital Commander,0,Gerald Ryan,Rosehill,,0,0,43-9-9-3 $438625.00,,0,0,0,,85.00,M, Horse,4,Coup Ay Tee (NZ),0,Chris Waller,Rosehill,,0,0,35-9-6-5 $519811.00,,0,0,0,,101.00,G, Horse,5,Generalife,0,John O'Shea,Warwick Farm,,0,0,19-6-1-3 $235045.00,,0,0,0,,87.00,G, Horse,6,He's Your Man (FR),0,Chris Waller,Rosehill,,0,0,13-2-3-1 $108110.00,,0,0,0,,93.00,G, Horse,7,Hidden Kisses,0,Chris Waller,Rosehill,,0,0,40-8-8-5 $565750.00,,0,0,0,,96.00,M, Horse,8,Oakfield Commands,0,Gerald Ryan,Rosehill,,0,0,22-7-4-6 $269530.00,,0,0,0,,94.00,G, Horse,9,Taxmeifyoucan,0,Gregory Hickman,Warwick Farm,,0,0,18-2-4-4 $539730.00,,0,0,0,,91.00,G, Horse,10,The Peak,0,Bart James Cummings,Randwick,,0,0,15-6-1-0 $426732.00,,0,0,0,,95.00,G, Horse,11,Tougher Than Ever (NZ),0,Chris Waller,Rosehill,,0,0,17-3-2-3 $321613.00,,0,0,0,,97.00,H, Horse,12,TROMSO,0,Chris Waller,Rosehill,,0,0,47-8-11-2 $622300.00,,0,0,0,,103.00,G, Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95 ,3U,~ ,HCP ,54,0,0,5/07/2014,, , , , ,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of $85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000 Horse,1,Big Bonanza,0,Don Robb,Wyong,,0,57.5,31-9-4-3 $366860.00,,0,0,0,,92.00,G, Horse,2,Casual Choice,0,Joseph Pride,Warwick Farm,,0,54,8-2-3-0 $105930.00,,0,0,0, So what I am trying to so is end up with an output like this. Meeting, Date, Race, Number, Name, Trainer, Location Rosehill, 05/07/14, 1, 1,Bennetta,Grahame Begg,Randwick, Rosehill, 05/07/14, 1, 2,Breakfast in Bed,David Vandyke,Warwick Farm, So as a start i thought i would try inserting the Meeting and Race number however I am just not getting it right. import csv outfile = open(/home/sayth/Scripts/cleancsv.csv, w) with open('/home/sayth/Scripts/test.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) for row in f_csv: meeting = row[3] in row[0] == 'Meeting' new = row.insert(0, meeting) while row[1] in row[0] == 'Race' 9: # pref less than next found row[0] # grab row[1] as id number id = row[1] # from row[0] and insert it in first position new_lines = new.insert(1, id) outfile.write(new_lines) outfile.close() How should I go about this? Thanks Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On 2014-07-01 15:04, flebber wrote: What I am trying to do is to reformat a csv file into something more usable. currently the file has no headers, multiple lines with varying columns that are not related. This is a sample Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit, , Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT ,54,0,0,5/07/2014,, , , , ,No class restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, (Listed),Of $10. First $6, second $2, third $1, fourth $5000, fifth $2000, sixth $1000, seventh $1000, eighth $1000 Horse,1,Bennetta,0,Grahame Begg,Randwick,,0,0,16-3-1-3 $390450.00,,0,0,0,,98.00,M, Horse,2,Breakfast in Bed,0,David Vandyke,Warwick Farm,,0,0,20-6-1-5 $201250.00,,0,0,0,,81.00,M, Horse,3,Capital Commander,0,Gerald Ryan,Rosehill,,0,0,43-9-9-3 $438625.00,,0,0,0,,85.00,M, Horse,4,Coup Ay Tee (NZ),0,Chris Waller,Rosehill,,0,0,35-9-6-5 $519811.00,,0,0,0,,101.00,G, Horse,5,Generalife,0,John O'Shea,Warwick Farm,,0,0,19-6-1-3 $235045.00,,0,0,0,,87.00,G, Horse,6,He's Your Man (FR),0,Chris Waller,Rosehill,,0,0,13-2-3-1 $108110.00,,0,0,0,,93.00,G, Horse,7,Hidden Kisses,0,Chris Waller,Rosehill,,0,0,40-8-8-5 $565750.00,,0,0,0,,96.00,M, Horse,8,Oakfield Commands,0,Gerald Ryan,Rosehill,,0,0,22-7-4-6 $269530.00,,0,0,0,,94.00,G, Horse,9,Taxmeifyoucan,0,Gregory Hickman,Warwick Farm,,0,0,18-2-4-4 $539730.00,,0,0,0,,91.00,G, Horse,10,The Peak,0,Bart James Cummings,Randwick,,0,0,15-6-1-0 $426732.00,,0,0,0,,95.00,G, Horse,11,Tougher Than Ever (NZ),0,Chris Waller,Rosehill,,0,0,17-3-2-3 $321613.00,,0,0,0,,97.00,H, Horse,12,TROMSO,0,Chris Waller,Rosehill,,0,0,47-8-11-2 $622300.00,,0,0,0,,103.00,G, Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95 ,3U,~ ,HCP ,54,0,0,5/07/2014,, , , , ,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of $85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000 Horse,1,Big Bonanza,0,Don Robb,Wyong,,0,57.5,31-9-4-3 $366860.00,,0,0,0,,92.00,G, Horse,2,Casual Choice,0,Joseph Pride,Warwick Farm,,0,54,8-2-3-0 $105930.00,,0,0,0, So what I am trying to so is end up with an output like this. Meeting, Date, Race, Number, Name, Trainer, Location Rosehill, 05/07/14, 1, 1,Bennetta,Grahame Begg,Randwick, Rosehill, 05/07/14, 1, 2,Breakfast in Bed,David Vandyke,Warwick Farm, So as a start i thought i would try inserting the Meeting and Race number however I am just not getting it right. import csv outfile = open(/home/sayth/Scripts/cleancsv.csv, w) with open('/home/sayth/Scripts/test.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) for row in f_csv: meeting = row[3] in row[0] == 'Meeting' new = row.insert(0, meeting) while row[1] in row[0] == 'Race' 9: # pref less than next found row[0] # grab row[1] as id number id = row[1] # from row[0] and insert it in first position new_lines = new.insert(1, id) outfile.write(new_lines) outfile.close() How should I go about this? There's no point in reading the first row as the headers because it clearly doesn't contain just the headings. First write a row for the header. Then, for each row: If the first field is 'Meeting', then remember the meeting, etc. If the first field is 'Race', then remember the race, etc. If the first field is 'Horse', then write the row with the additional fields for race, etc. And so on. BTW, the indentation for the 'outfile.close()' line is wrong. It would, of course, be better to use the 'with' statement for that file too. -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On 07/01/2014 04:04 PM, flebber wrote: What I am trying to do is to reformat a csv file into something more usable. currently the file has no headers, multiple lines with varying columns that are not related. This is a sample Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit, , Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT ,54,0,0,5/07/2014,, , , , ,No class restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, (Listed),Of $10. First $6, second $2, third $1, fourth $5000, fifth $2000, sixth $1000, seventh $1000, eighth $1000 Horse,1,Bennetta,0,Grahame Begg,Randwick,,0,0,16-3-1-3 $390450.00,,0,0,0,,98.00,M, Horse,2,Breakfast in Bed,0,David Vandyke,Warwick Farm,,0,0,20-6-1-5 $201250.00,,0,0,0,,81.00,M, Horse,3,Capital Commander,0,Gerald Ryan,Rosehill,,0,0,43-9-9-3 $438625.00,,0,0,0,,85.00,M, Horse,4,Coup Ay Tee (NZ),0,Chris Waller,Rosehill,,0,0,35-9-6-5 $519811.00,,0,0,0,,101.00,G, Horse,5,Generalife,0,John O'Shea,Warwick Farm,,0,0,19-6-1-3 $235045.00,,0,0,0,,87.00,G, Horse,6,He's Your Man (FR),0,Chris Waller,Rosehill,,0,0,13-2-3-1 $108110.00,,0,0,0,,93.00,G, Horse,7,Hidden Kisses,0,Chris Waller,Rosehill,,0,0,40-8-8-5 $565750.00,,0,0,0,,96.00,M, Horse,8,Oakfield Commands,0,Gerald Ryan,Rosehill,,0,0,22-7-4-6 $269530.00,,0,0,0,,94.00,G, Horse,9,Taxmeifyoucan,0,Gregory Hickman,Warwick Farm,,0,0,18-2-4-4 $539730.00,,0,0,0,,91.00,G, Horse,10,The Peak,0,Bart James Cummings,Randwick,,0,0,15-6-1-0 $426732.00,,0,0,0,,95.00,G, Horse,11,Tougher Than Ever (NZ),0,Chris Waller,Rosehill,,0,0,17-3-2-3 $321613.00,,0,0,0,,97.00,H, Horse,12,TROMSO,0,Chris Waller,Rosehill,,0,0,47-8-11-2 $622300.00,,0,0,0,,103.00,G, Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95 ,3U,~ ,HCP ,54,0,0,5/07/2014,, , , , ,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of $85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000 Horse,1,Big Bonanza,0,Don Robb,Wyong,,0,57.5,31-9-4-3 $366860.00,,0,0,0,,92.00,G, Horse,2,Casual Choice,0,Joseph Pride,Warwick Farm,,0,54,8-2-3-0 $105930.00,,0,0,0, So what I am trying to so is end up with an output like this. Meeting, Date, Race, Number, Name, Trainer, Location Rosehill, 05/07/14, 1, 1,Bennetta,Grahame Begg,Randwick, Rosehill, 05/07/14, 1, 2,Breakfast in Bed,David Vandyke,Warwick Farm, So as a start i thought i would try inserting the Meeting and Race number however I am just not getting it right. import csv outfile = open(/home/sayth/Scripts/cleancsv.csv, w) with open('/home/sayth/Scripts/test.csv') as f: f_csv = csv.reader(f) headers = next(f_csv) for row in f_csv: meeting = row[3] in row[0] == 'Meeting' new = row.insert(0, meeting) while row[1] in row[0] == 'Race' 9: # pref less than next found row[0] # grab row[1] as id number id = row[1] # from row[0] and insert it in first position new_lines = new.insert(1, id) outfile.write(new_lines) outfile.close() How should I go about this? Thanks Sayth Reformatting is what I do most and over time I have acquired some practice. Complete solutions are not often proposed, possibly sneered on for their officiousness. In that case I apologize. I couldn't resist. It is such a nice example. Having solved it, I figure why not share it . . . Frederic def race_table (csv_text): input_table = [[item.strip(' ') for item in record.split (',')] for record in csv_text.splitlines ()] # At this point look at input_table to find the record indices output_table = [] for record in input_table: if record [0] == 'Meeting': meeting = record [3] elif record [0] == 'Race': date = record [13] race = record [1] elif record [0] == 'Horse': number = record [1] name = record [2] trainer = record [4] location = record [5] output_table.append ((meeting, date, race, number, name, trainer, location)) return output_table for record in race_table (your_csv_text): print record ('Rosehill Gardens', '5/07/2014', '1', '1', 'Bennetta', 'Grahame Begg', 'Randwick') ('Rosehill Gardens', '5/07/2014', '1', '2', 'Breakfast in Bed', 'David Vandyke', 'Warwick Farm') ('Rosehill Gardens', '5/07/2014', '1', '3', 'Capital Commander', 'Gerald Ryan', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '4', 'Coup Ay Tee (NZ)', 'Chris Waller', 'Rosehill') ('Rosehill Gardens', '5/07/2014', '1', '5', 'Generalife', John O'Shea, 'Warwick Farm') ('Rosehill Gardens', '5/07/2014', '1',
Re: fixing an horrific formatted csv file.
That's a really cool solution. I understand why providing full solutions is frowned upon, because it doesn't assist in learning. Which is true, it's incredibly helpful in this case. The python cookbook is really good and what I was using as a start for dealing with csv. But it doesn't even go anywhere near this. Lots of examples with simple inputs. Anyway Thanks again Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: fixing an horrific formatted csv file.
On Wed, Jul 2, 2014 at 7:41 AM, flebber flebber.c...@gmail.com wrote: I understand why providing full solutions is frowned upon, because it doesn't assist in learning. Which is true, it's incredibly helpful in this case. In this case, my main reason for not providing a full solution is that the work tends to be iterative. When I have a huge and messy file, what I usually do is grab the first half-dozen lines and work out how I'd go about fixing them manually, then write a script that does that. Then run the script on the whole file, and see where it either chokes or produces wrong data. Pick up the first few lines of wrong data, figure out how to tweak the program to handle those. Rinse and repeat. Often, what that results in is a file that gets progressively tidier. When the scope of the mess is infinite (like with human-entered data - believe you me, you haven't seen messy until you've seen what a committee can do to a simple job), this means you stop working on the script at exactly the point where it stops being worth the effort - which is something that only you can decide. ChrisA -- https://mail.python.org/mailman/listinfo/python-list