On Fri, Sep 4, 2015 at 1:57 AM, kbtyo <ahlusar.ahluwa...@gmail.com> wrote: > I have used CSV and collections. For some reason when I apply this algorithm, > all of my files are not added (the output is ridiculously small considering > how much goes in - think KB output vs MB input): > > from glob import iglob > import csv > from collections import OrderedDict > > files = sorted(iglob('*.csv')) > header = OrderedDict() > data = [] > > for filename in files: > with open(filename, 'r') as fin: > csvin = csv.DictReader(fin) > header.update(OrderedDict.fromkeys(csvin.fieldnames)) > data.append(next(csvin)) > > with open('output_filename_version2.csv', 'w') as fout: > csvout = csv.DictWriter(fout, fieldnames=list(header)) > csvout.writeheader() > csvout.writerows(data)
You're collecting up just one row from each file. Since you say your input is measured in MB (not GB or anything bigger), the simplest approach is probably fine: instead of "data.append(next(csvin))", just use "data.extend(csvin)", which should grab them all. That'll store all your input data in memory, which should be fine if it's only a few meg, and probably not a problem for anything under a few hundred meg. ChrisA -- https://mail.python.org/mailman/listinfo/python-list