subject:"fixing an horrific formatted csv file."

Re: fixing an horrific formatted csv file.

2014-07-04 Thread Gregory Ewing


flebber wrote:

so in my file I had on line 44 this trainer name.

Michael, Wayne  John Hawkes

and in line 95 this horse name. Inz'n'out

this throws of my capturing correct item 9. How do I protect against this?


Use python's csv module to read the file. Don't try to
do it yourself; the rules for handling embedded commas
and quotes in csv are quite complicated. As long as
the file is a well-formed csv file, the csv module
should parse fields like that correctly.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-04 Thread flebber

On Friday, 4 July 2014 14:12:15 UTC+10, flebber  wrote:
 I have taken the code and gone a little further, but I need to be able to 
 protect myself against commas and single quotes in names.
 
 
 
 How is it the best to do this?
 
 
 
 so in my file I had on line 44 this trainer name.
 
 
 
 Michael, Wayne  John Hawkes 
 
 
 
 and in line 95 this horse name.
 
 Inz'n'out
 
 
 
 this throws of my capturing correct item 9. How do I protect against this?
 
 
 
 Here is current code.
 
 
 
 import re
 
 from sys import argv
 
 SCRIPT, FILENAME = argv
 
 
 
 
 
 def out_file_name(file_name):
 
 take an input file and keep the name with appended _clean
 
 file_parts = file_name.split(.,)
 
 output_file = file_parts[0] + '_clean.' + file_parts[1]
 
 return output_file
 
 
 
 
 
 def race_table(text_file):
 
 utility to reorganise poorly made csv entry
 
 input_table = [[item.strip(' ') for item in record.split(',')]
 
for record in text_file.splitlines()]
 
 # At this point look at input_table to find the record indices
 
 output_table = []
 
 for record in input_table:
 
 if record[0] == 'Meeting':
 
 meeting = record[3]
 
 elif record[0] == 'Race':
 
 date = record[13]
 
 race = record[1]
 
 elif record[0] == 'Horse':
 
 number = record[1]
 
 name = record[2]
 
 results = record[9]
 
 res_split = re.split('[- ]', results)
 
 starts = res_split[0]
 
 wins = res_split[1]
 
 seconds = res_split[2]
 
 thirds = res_split[3]
 
 prizemoney = res_split[4]
 
 trainer = record[4]
 
 location = record[5]
 
 print(name, wins, seconds)
 
 output_table.append((meeting, date, race, number, name,
 
  starts, wins, seconds, thirds, prizemoney,
 
  trainer, location))
 
 return output_table
 
 
 
 MY_FILE = out_file_name(FILENAME)
 
 
 
 # with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
 
 # for line in race_table(f_in.readline()):
 
 # new_row = line
 
 with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
 
 CONTENT = f_in.read()
 
 # print(content)
 
 FILE_CONTENTS = race_table(CONTENT)
 
 # print new_name
 
 f_out.write(str(FILE_CONTENTS))
 
 
 
 
 
 if __name__ == '__main__':
 
 pass

So I found this on stack overflow

In [2]: import string

In [3]: identity = string.maketrans(, )

In [4]: x = ['+5556', '-1539', '-99', '+1500']

In [5]: x = [s.translate(identity, +-) for s in x]

In [6]: x
Out[6]: ['5556', '1539', '99', '1500']

but it fails in my file, due to I believe mine being a list of list. Is there 
an easy way to iterate the sublists without flattening?

Current code.

input_table = [[item.strip(' ') for item in record.split(',')]
   for record in text_file.splitlines()]
# At this point look at input_table to find the record indices
identity = string.maketrans(, )
print(input_table)
input_table = [s.translate(identity, ,') for s
   in input_table]

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-04 Thread flebber

On Friday, 4 July 2014 16:19:09 UTC+10, Gregory Ewing  wrote:
 flebber wrote:
 
  so in my file I had on line 44 this trainer name.
 
  
 
  Michael, Wayne  John Hawkes
 
  
 
  and in line 95 this horse name. Inz'n'out
 
  
 
  this throws of my capturing correct item 9. How do I protect against this?
 
 
 
 Use python's csv module to read the file. Don't try to
 
 do it yourself; the rules for handling embedded commas
 
 and quotes in csv are quite complicated. As long as
 
 the file is a well-formed csv file, the csv module
 
 should parse fields like that correctly.
 
 
 
 -- 
 
 Greg

True Greg worked easier

def race_table(text_file):
utility to reorganise poorly made csv entry
# input_table = [[item.strip(' ') for item in record.split(',')]
#for record in text_file.splitlines()]
# At this point look at input_table to find the record indices
# identity = string.maketrans(, )
# print(input_table)
# input_table = [s.translate(identity, ,') for s
#in input_table]
output_table = []
for record in text_file:
if record[0] == 'Meeting':
meeting = record[3]
elif record[0] == 'Race':
date = record[13]
race = record[1]
elif record[0] == 'Horse':
number = record[1]
name = record[2]
results = record[9]
res_split = re.split('[- ]', results)
starts = res_split[0]
wins = res_split[1]
seconds = res_split[2]
thirds = res_split[3]
try:
prizemoney = res_split[4]
finally:
prizemoney = 0
trainer = record[4]
location = record[5]
print(name, wins, seconds)
output_table.append((meeting, date, race, number, name,
 starts, wins, seconds, thirds, prizemoney,
 trainer, location))
return output_table

MY_FILE = out_file_name(FILENAME)

# with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
# for line in race_table(f_in.readline()):
# new_row = line
with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
CONTENT = csv.reader(f_in)
# print(content)
FILE_CONTENTS = race_table(CONTENT)
# print new_name
f_out.write(str(FILE_CONTENTS))


if __name__ == '__main__':
pass

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-04 Thread F.R.


On 07/04/2014 12:28 PM, flebber wrote:

On Friday, 4 July 2014 14:12:15 UTC+10, flebber  wrote:

I have taken the code and gone a little further, but I need to be able to 
protect myself against commas and single quotes in names.



How is it the best to do this?



so in my file I had on line 44 this trainer name.



Michael, Wayne  John Hawkes



and in line 95 this horse name.

Inz'n'out



this throws of my capturing correct item 9. How do I protect against this?



Here is current code.



import re

from sys import argv

SCRIPT, FILENAME = argv





def out_file_name(file_name):

 take an input file and keep the name with appended _clean

 file_parts = file_name.split(.,)

 output_file = file_parts[0] + '_clean.' + file_parts[1]

 return output_file





def race_table(text_file):

 utility to reorganise poorly made csv entry

 input_table = [[item.strip(' ') for item in record.split(',')]

for record in text_file.splitlines()]

 # At this point look at input_table to find the record indices

 output_table = []

 for record in input_table:

 if record[0] == 'Meeting':

 meeting = record[3]

 elif record[0] == 'Race':

 date = record[13]

 race = record[1]

 elif record[0] == 'Horse':

 number = record[1]

 name = record[2]

 results = record[9]

 res_split = re.split('[- ]', results)

 starts = res_split[0]

 wins = res_split[1]

 seconds = res_split[2]

 thirds = res_split[3]

 prizemoney = res_split[4]

 trainer = record[4]

 location = record[5]

 print(name, wins, seconds)

 output_table.append((meeting, date, race, number, name,

  starts, wins, seconds, thirds, prizemoney,

  trainer, location))

 return output_table



MY_FILE = out_file_name(FILENAME)



# with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:

# for line in race_table(f_in.readline()):

# new_row = line

with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:

 CONTENT = f_in.read()

 # print(content)

 FILE_CONTENTS = race_table(CONTENT)

 # print new_name

 f_out.write(str(FILE_CONTENTS))





if __name__ == '__main__':

 pass

So I found this on stack overflow

In [2]: import string

In [3]: identity = string.maketrans(, )

In [4]: x = ['+5556', '-1539', '-99', '+1500']

In [5]: x = [s.translate(identity, +-) for s in x]

In [6]: x
Out[6]: ['5556', '1539', '99', '1500']

but it fails in my file, due to I believe mine being a list of list. Is there 
an easy way to iterate the sublists without flattening?

Current code.

 input_table = [[item.strip(' ') for item in record.split(',')]
for record in text_file.splitlines()]
 # At this point look at input_table to find the record indices
 identity = string.maketrans(, )
 print(input_table)
 input_table = [s.translate(identity, ,') for s
in input_table]

Sayth


Take Gregory's advice and use the csv module. Don't reinvent a csv 
parser. My csv splitter was the simplest approach possible, which I 
tend to use with undocumented formats, tweaking for unexpected features 
as they come along.


Frederic


--
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-03 Thread flebber

I have taken the code and gone a little further, but I need to be able to 
protect myself against commas and single quotes in names.

How is it the best to do this?

so in my file I had on line 44 this trainer name.

Michael, Wayne  John Hawkes 

and in line 95 this horse name.
Inz'n'out

this throws of my capturing correct item 9. How do I protect against this?

Here is current code.

import re
from sys import argv
SCRIPT, FILENAME = argv


def out_file_name(file_name):
take an input file and keep the name with appended _clean
file_parts = file_name.split(.,)
output_file = file_parts[0] + '_clean.' + file_parts[1]
return output_file


def race_table(text_file):
utility to reorganise poorly made csv entry
input_table = [[item.strip(' ') for item in record.split(',')]
   for record in text_file.splitlines()]
# At this point look at input_table to find the record indices
output_table = []
for record in input_table:
if record[0] == 'Meeting':
meeting = record[3]
elif record[0] == 'Race':
date = record[13]
race = record[1]
elif record[0] == 'Horse':
number = record[1]
name = record[2]
results = record[9]
res_split = re.split('[- ]', results)
starts = res_split[0]
wins = res_split[1]
seconds = res_split[2]
thirds = res_split[3]
prizemoney = res_split[4]
trainer = record[4]
location = record[5]
print(name, wins, seconds)
output_table.append((meeting, date, race, number, name,
 starts, wins, seconds, thirds, prizemoney,
 trainer, location))
return output_table

MY_FILE = out_file_name(FILENAME)

# with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
# for line in race_table(f_in.readline()):
# new_row = line
with open(FILENAME, 'r') as f_in, open(MY_FILE, 'w') as f_out:
CONTENT = f_in.read()
# print(content)
FILE_CONTENTS = race_table(CONTENT)
# print new_name
f_out.write(str(FILE_CONTENTS))


if __name__ == '__main__':
pass
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-02 Thread flebber

  TM = TX.Table_Maker (headings = 
 ('Meeting','Date','Race','Number','Name','Trainer','Location')) 
  TM (race_table (your_csv_text)).write () 

Where do I find TX? Found this mention in the list, was it available in pip by 
any name?
https://mail.python.org/pipermail/python-list/2014-February/667464.html

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-02 Thread F.R.


On 07/02/2014 11:13 AM, flebber wrote:

TM = TX.Table_Maker (headings =

('Meeting','Date','Race','Number','Name','Trainer','Location'))

TM (race_table (your_csv_text)).write ()

Where do I find TX? Found this mention in the list, was it available in pip by 
any name?
https://mail.python.org/pipermail/python-list/2014-February/667464.html

Sayth


I'd have to make it available. I proposed it some time ago and received 
a couple of suggestions in return. It is a modular transformation 
framework written entirely in python (2.7). It consists essentially of a 
base class Transformer that handles input and output in such a way 
that Transformer objects can be chained. It saved me from drowning an a 
horrible and growing tangle of hacks. Finding something usable I had 
previously done took time. Understanding how it worked took more time 
and adapting it took still more time, so that writing yet another hack 
from scratch was faster.
A number of hacks I could quickly wrap into a Transformer object 
and so could start building a library of standard Transformers. The 
Table_Maker is one of them. The table making code is quite bad. It 
suffers from feature overload. I would clean it up for distribution.
I'd be happy to distribute the base class and a few standard 
Translators, such as I use every day. (File Reader, File Writer, DB Run 
Command, DB Write, Table Maker, PDF To Text, Text To Lines, Lines To 
Text, Sort, Sort And Unique, etc.) Writing one's own Transformers is a 
breeze. Testing too, because a Transformer keeps its input and output 
and, in line with the system's design philosophy, does only its own 
single thing.
A Chain is a list of Transformers that run in sequence. It is 
itself derived from Transformer and is a functional equivalent. So 
Chains nest. Fixing a Chain that nothing comes out of is a 
straightforward matter too. It will still have run up to the failing 
element. Chain.show () reveals the culprit as the first one to have no 
output.
I am not up to date on distributing and would depend on qualified 
help on that.


Frederic





A brief overview


The TX solution to your race table would be (TX is the name of the module):

class Race_Table (TX.Transformer):
'''
In: CSV text
Out: Tabular data (2-dimensional list)
'''
name = 'Race_Table'
@TX.setup   # Checks timestamps to prevent needless reruns in 
the absence of new input

def transform (self):
for line in self.Input.data:
# See my post
self.Output.take (output_table)

Example file to file:
 Race_Schedule_F2F = TX.Chain (TX.File_Reader (), Race_Table (), 
TX.List_To_CSV (delimiter = ';'), TX.File_Writer (terminal = out_file_name)

 Race_Schedule_F2F (input_file_name)   # Does it all!

Example web to database:
 Race_Schedule_WWW2DB = TX.Chain (TX.WWW_Reader (), 
Race_Schedule_HTML_Reader (), Race_Table (), TX.DB_Writer (table_name = 
'horses'))
 Race_Schedule_WWW2DB (url)   # Does is all! You'd have to write 
the Race_Schedule_HTML_Reader


Verify your table:
 Table_Viewer = TX.Chain (TX.Table_Maker (), TX.Table_Writer ())
 Race_Schedule_WWW2DB.show_tree () # See which one should display
Chain
Chain[0] - WWW Reader
Chain[1] - Race_Schedule_HTML_Reader
Chain[2] - Race_Table
Chain[3] - DB Writer
 print Table_Viewer (Race_Schedule_WWW2DB[2]()) # All 
Transformers keep their data

(Display of table)

Verify database:
 print Table_Viewer (TX.DB_Reader (table_name = 'horses')())
(Display of database table)

--
https://mail.python.org/mailman/listinfo/python-list

fixing an horrific formatted csv file.

2014-07-01 Thread flebber

What I am trying to do is to reformat a csv file into something more usable.
currently the file has no headers, multiple lines with varying columns that are 
not related.

This is a sample

Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit,  
,
Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT   
,54,0,0,5/07/2014,,  ,  ,  ,  ,No class 
restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, 
(Listed),Of $10. First $6, second $2, third $1, fourth $5000, 
fifth $2000, sixth $1000, seventh $1000, eighth $1000
Horse,1,Bennetta,0,Grahame Begg,Randwick,,0,0,16-3-1-3 
$390450.00,,0,0,0,,98.00,M,
Horse,2,Breakfast in Bed,0,David Vandyke,Warwick Farm,,0,0,20-6-1-5 
$201250.00,,0,0,0,,81.00,M,
Horse,3,Capital Commander,0,Gerald Ryan,Rosehill,,0,0,43-9-9-3 
$438625.00,,0,0,0,,85.00,M,
Horse,4,Coup Ay Tee (NZ),0,Chris Waller,Rosehill,,0,0,35-9-6-5 
$519811.00,,0,0,0,,101.00,G,
Horse,5,Generalife,0,John O'Shea,Warwick Farm,,0,0,19-6-1-3 
$235045.00,,0,0,0,,87.00,G,
Horse,6,He's Your Man (FR),0,Chris Waller,Rosehill,,0,0,13-2-3-1 
$108110.00,,0,0,0,,93.00,G,
Horse,7,Hidden Kisses,0,Chris Waller,Rosehill,,0,0,40-8-8-5 
$565750.00,,0,0,0,,96.00,M,
Horse,8,Oakfield Commands,0,Gerald Ryan,Rosehill,,0,0,22-7-4-6 
$269530.00,,0,0,0,,94.00,G,
Horse,9,Taxmeifyoucan,0,Gregory Hickman,Warwick Farm,,0,0,18-2-4-4 
$539730.00,,0,0,0,,91.00,G,
Horse,10,The Peak,0,Bart  James Cummings,Randwick,,0,0,15-6-1-0 
$426732.00,,0,0,0,,95.00,G,
Horse,11,Tougher Than Ever (NZ),0,Chris Waller,Rosehill,,0,0,17-3-2-3 
$321613.00,,0,0,0,,97.00,H,
Horse,12,TROMSO,0,Chris Waller,Rosehill,,0,0,47-8-11-2 
$622300.00,,0,0,0,,103.00,G,
Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95  ,3U,~  
   ,HCP   ,54,0,0,5/07/2014,,  ,  ,  ,  
,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of 
$85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, 
sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000
Horse,1,Big Bonanza,0,Don Robb,Wyong,,0,57.5,31-9-4-3 
$366860.00,,0,0,0,,92.00,G,
Horse,2,Casual Choice,0,Joseph Pride,Warwick Farm,,0,54,8-2-3-0 
$105930.00,,0,0,0,

So what I am trying to so is end up with an output like this.

Meeting, Date, Race, Number, Name, Trainer, Location
Rosehill, 05/07/14, 1, 1,Bennetta,Grahame Begg,Randwick,
Rosehill, 05/07/14, 1, 2,Breakfast in Bed,David Vandyke,Warwick Farm,

So as a start i thought i would try inserting the Meeting and Race number 
however I am just not getting it right.

import csv

outfile = open(/home/sayth/Scripts/cleancsv.csv, w)
with open('/home/sayth/Scripts/test.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
meeting = row[3] in row[0] == 'Meeting'
new = row.insert(0, meeting)
while row[1] in row[0] == 'Race'  9:  # pref less than next found 
row[0]

# grab row[1] as id number
id = row[1]
# from row[0] and insert it in first position
new_lines = new.insert(1, id)
outfile.write(new_lines)
outfile.close()

How should I go about this?

Thanks

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-01 Thread MRAB


On 2014-07-01 15:04, flebber wrote:

What I am trying to do is to reformat a csv file into something more usable.
currently the file has no headers, multiple lines with varying columns that are 
not related.

This is a sample

Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit,  
,
Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT   
,54,0,0,5/07/2014,,  ,  ,  ,  ,No class 
restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, 
(Listed),Of $10. First $6, second $2, third $1, fourth $5000, 
fifth $2000, sixth $1000, seventh $1000, eighth $1000
Horse,1,Bennetta,0,Grahame Begg,Randwick,,0,0,16-3-1-3 
$390450.00,,0,0,0,,98.00,M,
Horse,2,Breakfast in Bed,0,David Vandyke,Warwick Farm,,0,0,20-6-1-5 
$201250.00,,0,0,0,,81.00,M,
Horse,3,Capital Commander,0,Gerald Ryan,Rosehill,,0,0,43-9-9-3 
$438625.00,,0,0,0,,85.00,M,
Horse,4,Coup Ay Tee (NZ),0,Chris Waller,Rosehill,,0,0,35-9-6-5 
$519811.00,,0,0,0,,101.00,G,
Horse,5,Generalife,0,John O'Shea,Warwick Farm,,0,0,19-6-1-3 
$235045.00,,0,0,0,,87.00,G,
Horse,6,He's Your Man (FR),0,Chris Waller,Rosehill,,0,0,13-2-3-1 
$108110.00,,0,0,0,,93.00,G,
Horse,7,Hidden Kisses,0,Chris Waller,Rosehill,,0,0,40-8-8-5 
$565750.00,,0,0,0,,96.00,M,
Horse,8,Oakfield Commands,0,Gerald Ryan,Rosehill,,0,0,22-7-4-6 
$269530.00,,0,0,0,,94.00,G,
Horse,9,Taxmeifyoucan,0,Gregory Hickman,Warwick Farm,,0,0,18-2-4-4 
$539730.00,,0,0,0,,91.00,G,
Horse,10,The Peak,0,Bart  James Cummings,Randwick,,0,0,15-6-1-0 
$426732.00,,0,0,0,,95.00,G,
Horse,11,Tougher Than Ever (NZ),0,Chris Waller,Rosehill,,0,0,17-3-2-3 
$321613.00,,0,0,0,,97.00,H,
Horse,12,TROMSO,0,Chris Waller,Rosehill,,0,0,47-8-11-2 
$622300.00,,0,0,0,,103.00,G,
Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95  ,3U,~  
   ,HCP   ,54,0,0,5/07/2014,,  ,  ,  ,  
,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of 
$85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, 
sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000
Horse,1,Big Bonanza,0,Don Robb,Wyong,,0,57.5,31-9-4-3 
$366860.00,,0,0,0,,92.00,G,
Horse,2,Casual Choice,0,Joseph Pride,Warwick Farm,,0,54,8-2-3-0 
$105930.00,,0,0,0,

So what I am trying to so is end up with an output like this.

Meeting, Date, Race, Number, Name, Trainer, Location
Rosehill, 05/07/14, 1, 1,Bennetta,Grahame Begg,Randwick,
Rosehill, 05/07/14, 1, 2,Breakfast in Bed,David Vandyke,Warwick Farm,

So as a start i thought i would try inserting the Meeting and Race number 
however I am just not getting it right.

import csv

outfile = open(/home/sayth/Scripts/cleancsv.csv, w)
with open('/home/sayth/Scripts/test.csv') as f:
 f_csv = csv.reader(f)
 headers = next(f_csv)
 for row in f_csv:
 meeting = row[3] in row[0] == 'Meeting'
 new = row.insert(0, meeting)
 while row[1] in row[0] == 'Race'  9:  # pref less than next found 
row[0]

 # grab row[1] as id number
 id = row[1]
 # from row[0] and insert it in first position
 new_lines = new.insert(1, id)
 outfile.write(new_lines)
 outfile.close()

How should I go about this?


There's no point in reading the first row as the headers because it
clearly doesn't contain just the headings.

First write a row for the header.

Then, for each row:

If the first field is 'Meeting', then remember the meeting, etc.

If the first field is 'Race', then remember the race, etc.

If the first field is 'Horse', then write the row with the additional
fields for race, etc.

And so on.

BTW, the indentation for the 'outfile.close()' line is wrong. It would,
of course, be better to use the 'with' statement for that file too.

--
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-01 Thread F.R.


On 07/01/2014 04:04 PM, flebber wrote:

What I am trying to do is to reformat a csv file into something more usable.
currently the file has no headers, multiple lines with varying columns that are 
not related.

This is a sample

Meeting,05/07/14,RHIL,Rosehill Gardens,Weights,TAB,+3m Entire Circuit,  
,
Race,1,CIVIC STAKES,CIVIC,CIVIC,1350,~ ,3U,~ ,QLT   
,54,0,0,5/07/2014,,  ,  ,  ,  ,No class 
restriction, Quality, For Three-Years-Old and Upwards, No sex restriction, 
(Listed),Of $10. First $6, second $2, third $1, fourth $5000, 
fifth $2000, sixth $1000, seventh $1000, eighth $1000
Horse,1,Bennetta,0,Grahame Begg,Randwick,,0,0,16-3-1-3 
$390450.00,,0,0,0,,98.00,M,
Horse,2,Breakfast in Bed,0,David Vandyke,Warwick Farm,,0,0,20-6-1-5 
$201250.00,,0,0,0,,81.00,M,
Horse,3,Capital Commander,0,Gerald Ryan,Rosehill,,0,0,43-9-9-3 
$438625.00,,0,0,0,,85.00,M,
Horse,4,Coup Ay Tee (NZ),0,Chris Waller,Rosehill,,0,0,35-9-6-5 
$519811.00,,0,0,0,,101.00,G,
Horse,5,Generalife,0,John O'Shea,Warwick Farm,,0,0,19-6-1-3 
$235045.00,,0,0,0,,87.00,G,
Horse,6,He's Your Man (FR),0,Chris Waller,Rosehill,,0,0,13-2-3-1 
$108110.00,,0,0,0,,93.00,G,
Horse,7,Hidden Kisses,0,Chris Waller,Rosehill,,0,0,40-8-8-5 
$565750.00,,0,0,0,,96.00,M,
Horse,8,Oakfield Commands,0,Gerald Ryan,Rosehill,,0,0,22-7-4-6 
$269530.00,,0,0,0,,94.00,G,
Horse,9,Taxmeifyoucan,0,Gregory Hickman,Warwick Farm,,0,0,18-2-4-4 
$539730.00,,0,0,0,,91.00,G,
Horse,10,The Peak,0,Bart  James Cummings,Randwick,,0,0,15-6-1-0 
$426732.00,,0,0,0,,95.00,G,
Horse,11,Tougher Than Ever (NZ),0,Chris Waller,Rosehill,,0,0,17-3-2-3 
$321613.00,,0,0,0,,97.00,H,
Horse,12,TROMSO,0,Chris Waller,Rosehill,,0,0,47-8-11-2 
$622300.00,,0,0,0,,103.00,G,
Race,2,FLYING WELTER - BENCHMARK 95 HCP,BM95,BM95,1100,BM95  ,3U,~  
   ,HCP   ,54,0,0,5/07/2014,,  ,  ,  ,  
,BenchMark 95, Handicap, For Three-Years-Old and Upwards, No sex restriction,Of 
$85000. First $48750, second $16750, third $8350, fourth $4150, fifth $2000, 
sixth $1000, seventh $1000, eighth $1000, ninth $1000, tenth $1000
Horse,1,Big Bonanza,0,Don Robb,Wyong,,0,57.5,31-9-4-3 
$366860.00,,0,0,0,,92.00,G,
Horse,2,Casual Choice,0,Joseph Pride,Warwick Farm,,0,54,8-2-3-0 
$105930.00,,0,0,0,

So what I am trying to so is end up with an output like this.

Meeting, Date, Race, Number, Name, Trainer, Location
Rosehill, 05/07/14, 1, 1,Bennetta,Grahame Begg,Randwick,
Rosehill, 05/07/14, 1, 2,Breakfast in Bed,David Vandyke,Warwick Farm,

So as a start i thought i would try inserting the Meeting and Race number 
however I am just not getting it right.

import csv

outfile = open(/home/sayth/Scripts/cleancsv.csv, w)
with open('/home/sayth/Scripts/test.csv') as f:
 f_csv = csv.reader(f)
 headers = next(f_csv)
 for row in f_csv:
 meeting = row[3] in row[0] == 'Meeting'
 new = row.insert(0, meeting)
 while row[1] in row[0] == 'Race'  9:  # pref less than next found 
row[0]

 # grab row[1] as id number
 id = row[1]
 # from row[0] and insert it in first position
 new_lines = new.insert(1, id)
 outfile.write(new_lines)
 outfile.close()

How should I go about this?

Thanks

Sayth


Reformatting is what I do most and over time I have acquired some 
practice. Complete solutions are not
often proposed, possibly sneered on for their officiousness. In that 
case I apologize. I couldn't resist. It is such a nice example. Having 
solved it, I figure why not share it . . .


Frederic



def race_table (csv_text):
input_table = [[item.strip(' ') for item in record.split (',')] 
for record in csv_text.splitlines ()]

# At this point look at input_table to find the record indices
output_table = []
for record in input_table:
if record [0] == 'Meeting':
meeting = record [3]
elif record [0] == 'Race':
date = record [13]
race = record [1]
elif record [0] == 'Horse':
number = record [1]
name = record [2]
trainer = record [4]
location = record [5]
output_table.append ((meeting, date, race, number, name, 
trainer, location))

return output_table


 for record in race_table (your_csv_text): print record

('Rosehill Gardens', '5/07/2014', '1', '1', 'Bennetta', 'Grahame Begg', 
'Randwick')
('Rosehill Gardens', '5/07/2014', '1', '2', 'Breakfast in Bed', 'David 
Vandyke', 'Warwick Farm')
('Rosehill Gardens', '5/07/2014', '1', '3', 'Capital Commander', 'Gerald 
Ryan', 'Rosehill')
('Rosehill Gardens', '5/07/2014', '1', '4', 'Coup Ay Tee (NZ)', 'Chris 
Waller', 'Rosehill')
('Rosehill Gardens', '5/07/2014', '1', '5', 'Generalife', John O'Shea, 
'Warwick Farm')
('Rosehill Gardens', '5/07/2014', '1',

Re: fixing an horrific formatted csv file.

2014-07-01 Thread flebber

That's a really cool solution.

I understand why providing full solutions is frowned upon, because it doesn't 
assist in learning. Which is true,  it's incredibly helpful in this case.

The python cookbook is really good and what I was using as a start for dealing 
with csv. But it doesn't even go anywhere near this. Lots of examples with 
simple inputs.

Anyway Thanks again

 Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

2014-07-01 Thread Chris Angelico

On Wed, Jul 2, 2014 at 7:41 AM, flebber flebber.c...@gmail.com wrote:
 I understand why providing full solutions is frowned upon, because it doesn't 
 assist in learning. Which is true,  it's incredibly helpful in this case.

In this case, my main reason for not providing a full solution is that
the work tends to be iterative. When I have a huge and messy file,
what I usually do is grab the first half-dozen lines and work out how
I'd go about fixing them manually, then write a script that does that.
Then run the script on the whole file, and see where it either chokes
or produces wrong data. Pick up the first few lines of wrong data,
figure out how to tweak the program to handle those. Rinse and repeat.

Often, what that results in is a file that gets progressively tidier.
When the scope of the mess is infinite (like with human-entered data -
believe you me, you haven't seen messy until you've seen what a
committee can do to a simple job), this means you stop working on the
script at exactly the point where it stops being worth the effort -
which is something that only you can decide.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

Re: fixing an horrific formatted csv file.

12 matches

Site Navigation

Mail list logo

Footer information