That looks promising. The field numbers are pre-defined at the mainframe level. This may help me get to my ultimate goal which is to pump these into a DB on a row-by-row basis ( :-P ) I'll have to do some playing around with this. I knew that it looked like a dictionary, but wasn't sure how best to handle this.
One follow-up question: I'll end up getting multiple records for each "type". Would I be referencing these by row[#][field#]? Minor revision to the format is that starts like: ###,1,1,val_1,.... I think right now the plan is to parse through the file and insert the pairs directly into a DB table. Something like RowID, LetterType, date, Field#, Value. I can get RowID and LetterType overall, date is a constant, the rest would involve reading each pair and inserting both values into the table. Time to hit the books a little more to get up to speed on all of this. I really appreciate the help. I knew there had to be a better way to do this, just wasn't sure what it was. -Pete John Machin <[EMAIL PROTECTED]> wrote: > On Thu, 28 Apr 2005 23:34:31 GMT, Peter A. Schott > <[EMAIL PROTECTED]> wrote: > > >I've got a file that seems to come across more like a dictionary from what I > >can > >tell. Something like the following format: > > > >###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10 > >###,1,val_1,2,val_2,3,val_3,5,val_5,11,val_11,25,val_25,967,val_967 > > > >In other words, different layouts (defined mostly by what is in val_1, val_2, > >val_3). > > > >The ,#, fields indicate what "field" from our mainframe the corresponding > >value > >represents. > > > >Is there a good way to parse this into a DB-type format where I only pull out > >the values corresponding to the appropriate field numbers? Seems like > >converting to a dictionary of some sort would be best, but I don't quite know > >how I would go about that. > > > >In this case, the first field is a value unto itself - represents a "letter > >type" that would be associated with the rest of the record. The fields are > >either present or not, no placeholder if there's no value for e.g. Field #4. > > Here's a sketch, tested as you'll see, but totally devoid of the > error-checking that would be necessary for any data coming from an MF. > > C:\junk>type schott.txt > pers,1,xxx,2,yyy,3,zzz,100,SMITH,101,JOHN,102,ALOYSIUS,103,1969-12-31 > addr,1,qqq,2,www,3,eee,200,"""THE LODGE"", 123 MAIN ST",205,WALLA > WALLA,206,WA > > C:\junk>type schott.py > import csv > for row in csv.reader(open('schott.txt', 'rb')): > rectype = row[0] > recdict = {} > for k in range(1, len(row), 2): > recdict[int(row[k])] = row[k+1] > print rectype, recdict > > C:\junk>python schott.py > pers {1: 'xxx', 2: 'yyy', 3: 'zzz', 100: 'SMITH', 101: 'JOHN', 102: > 'ALOYSIUS', 103: '1969-12-31'} > addr {1: 'qqq', 2: 'www', 3: 'eee', 200: '"THE LODGE", 123 MAIN ST', > 205: 'WALLA WALLA', 206: 'WA'} > > Hint: you'll probably go nuts if you don't implement some sort of > naming convention instead of those numbers. > > One way would be like this: > > mf_surname = 100 > mf_given_1 = 101 > ... > mf_state = 206 > > then you can refer to recdict[mf_state] instead of recdict[206]. > > Going upmarket a bit: > > Have a mf_map = {100: 'surname', 206: 'state', } # etc etc > > then you do > > class Record(object): > pass > > # for each row: > rec = Record() > rec.rectype = row[0] > for k in range(1, len(row), 2): > setattr(rec, mf_map[int(row[k])], row[k+1]) > > Then you can refer to rec.state instead of recdict[mf_state] or > recdict[206]. > > Further upmarket would involve gathering basic "type" information > about the MF fields (free text, alpha code, identifier (e.g. SSN), > money, quantity, date, etc etc) so that you can do validations and > format conversions as appropriate. > > HTH, > John > > -- http://mail.python.org/mailman/listinfo/python-list