Hi all, I need to read in ~25000 files whose lines are in a fixed file format:
field 1: line 1, chars 1-4 field 2: line 1, chars 5-6 field 3: line 1, chars 7-11 field 4: line 1, chars 12 to EOL field 5: line 2, chars 1-30 field 6: line 3, chars 1-10 field 7: line 4, chars 1-2 ... The naive method is looping over each file, incrementally reading 4 chars, 2 chars, 5 chars, etc. However, the slowest part of all this (I would think) would be the constant disk access for each field for each file. There are ~25000 files. Altogether they're 1.7GB. The machine I'm on has 32GB of memory. It only took 27s to read all 25 kfiles into a Perl array on a typically loaded machine. I can imagine some fancy scenario where I slurp in all files into an array, and then process, but there are many different ways of doing this. Am I correct that reading everything into memory in one fell swoop will make a noticeable difference in run time? And once everything is in an array, what's the most efficient way to take a string representing one line of a file, and break it down into fields? Is there anything faster than substr()? Thanks! Pete _______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
