On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote: > On Jan 26, 1:03 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar> > wrote: > > En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase > > <python.l...@tim.thechases.com> escribió: > > > > > > > > > Unfortunately, a raw rstrip() eats other whitespace that may be > > > important. I frequently get tab-delimited files, using the following > > > pseudo-code: > > > > > def clean_line(line): > > > return line.rstrip('\r\n').split('\t') > > > > > f = file('customer_x.txt') > > > headers = clean_line(f.next()) > > > for line in f: > > > field1, field2, field3 = clean_line(line) > > > do_stuff() > > > > > if field3 is empty in the source-file, using rstrip(None) as you suggest > > > triggers errors on the tuple assignment because it eats the tab that > > > defined it. > > > > > I suppose if I were really smart, I'd dig a little deeper in the CSV > > > module to sniff out the "right" way to parse tab-delimited files. > > > > It's so easy that don't doing that is just inexcusable lazyness :) > > Your own example, written using the csv module: > > > > import csv > > > > f = csv.reader(open('customer_x.txt','rb'), delimiter='\t') > > headers = f.next() > > for line in f: > > field1, field2, field3 = line > > do_stuff() > > > > And where in all of that do you recommend that .decode(some_encoding) > be inserted? >
If encoding is an issue for your application, then I'd recommend you use codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open() > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list