Thanks again for the comment, not sure I will implement all of it but I will separate the "if not row" The files have some extraneous blank rows in the middle that I need to be sure not to import as blank rows. I am actually having trouble with this filling my sys memory, I posted a separate question "Why is this filling my sys memory" or something like that is the subject. I might be that my 1yr old son has been trying to help for the last hour. It is very distracting.
*Vincent Davis 720-301-3003 * vinc...@vincentdavis.net my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis> On Sat, Feb 20, 2010 at 6:18 PM, Jonathan Gardner < jgard...@jonathangardner.net> wrote: > On Sat, Feb 20, 2010 at 4:21 PM, Vincent Davis > <vinc...@vincentdavis.net>wrote: > >> Thanks for the help, this is considerably faster and easier to read (see >> below). I changed it to avoid the "break" and I think it makes it easy to >> understand. I am checking the conditions each time slows it but it is worth >> it to me at this time. >> >> > It seems you are beginning to understand that programmer time is more > valuable than machine time. Congratulations. > > > >> def read_data_file(filename): >> reader = csv.reader(open(filename, "U"),delimiter='\t') >> >> data = [] >> mask = [] >> outliers = [] >> modified = [] >> >> data_append = data.append >> mask_append = mask.append >> outliers_append = outliers.append >> modified_append = modified.append >> >> > > I know some people do this to speed things up. Really, I don't think it's > necessary or wise to do so. > > >> maskcount = 0 >> outliercount = 0 >> modifiedcount = 0 >> >> for row in reader: >> if '[MASKS]' in row: >> maskcount += 1 >> if '[OUTLIERS]' in row: >> outliercount += 1 >> if '[MODIFIED]' in row: >> modifiedcount += 1 >> if not any((maskcount, outliercount, modifiedcount, not row)): >> data_append(row) >> elif not any((outliercount, modifiedcount, not row)): >> mask_append(row) >> elif not any((modifiedcount, not row)): >> outliers_append(row) >> else: >> if row: modified_append(row) >> >> > > Just playing with the logic here: > > 1. Notice that if "not row" is True, nothing happens? Pull it out > explicitly. > > 2. Notice how it switches from mode to mode? Program it more explicitly. > > Here's my suggestion: > > def parse_masks(reader): > for row in reader: > if not row: continue > elif '[OUTLIERS]' in row: parse_outliers(reader) > elif '[MODIFIED]' in row: parse_modified(reader) > masks.append(row) > > def parse_outliers(reader): > for row in reader: > if not row: continue > elif '[MODIFIED]' in row: parse_modified(reader) > outliers.append(row) > > def parse_modified(reader): > for row in reader: > if not row: continue > modified.append(row) > > for row in reader: > if not row: continue > elif '[MASKS]' in row: parse_masks(reader) > elif '[OUTLIERS]' in row: parse_outliers(reader) > elif '[MODIFIED]' in row: parse_modified(reader) > else: data.append(row) > > Since there is global state involved, you may want to save yourself some > trouble in the future and put the above in a class where separate parsers > can be kept separate. > > It looks like your program is turning into a regular old parser. Any format > that is a little more than trivial to parse will need a real parser like the > above. > > -- > Jonathan Gardner > jgard...@jonathangardner.net >
-- http://mail.python.org/mailman/listinfo/python-list