I have some some (~50) text files that have about 250,000 rows each. I am
reading them in using the following which gets me what I want. But it is not
fast. Is there something I am missing that should help. This is mostly an
question to help me learn more about python. It takes about 4 min right now.

def read_data_file(filename):
    reader = csv.reader(open(filename, "U"),delimiter='\t')
    read = list(reader)
    data_rows = takewhile(lambda trow: '[MASKS]' not in trow, [x for x in
read])
    data = [x for x in data_rows][1:]

    mask_rows = takewhile(lambda trow: '[OUTLIERS]' not in trow,
list(dropwhile(lambda drow: '[MASKS]' not in drow, read)))
    mask = [row for row in mask_rows if row][3:]

    outlier_rows = dropwhile(lambda drows: '[OUTLIERS]' not in drows, read)
    outlier = [row for row in outlier_rows if row][3:]


  *Vincent Davis
720-301-3003 *
vinc...@vincentdavis.net
 my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to