> I normally need to convert csv and text files to a Numpy array. I tried to > do the same thing using (1) reader=DictReader(MyFile), (2) > reader=csv.readre(MyFile), or (3) genfromtxt (MyFile ,……). The first two > is after I open the file. They produce a list of lists, list of tuples or > list of dictionaries which later are converted to an array.
If we're touching the hard drive as a part of input/output operations, likely you won't need to worry about efficiency, especially for a program dedicated to read files. What I mean is, disk operations are *several orders of magnitude* more expensive than most other non-I/O operations your program will perform. As long as we're reading and processing the input in a non-crazy way, we should be ok. ("Non-crazy": A small constant number of passes over the input file, and if the file is very large, doesn't try to read the whole file into memory at once). I think all three of your described processes will be non-crazy. What will dominate the time a file-parsing program takes will almost certainly be I/O-bound. And you probably can't do anything to change the physics of how disk platters spin. This rough rule is sensitive to context, and several of my assumptions may be wrong. I'm assuming a standard desktop environment on a single machine, with a physical hard drive. But maybe you have SSDs or some unusual storage that's very fast or parallel. If those assumptions are wrong, then yes, you may need to be concerned about shaving off every last millisecond to get performance. How can we know for sure? We can measure: a "profile" of a program lets us see if time is truly being spent in non-I/O computation. See: https://docs.python.org/3.5/library/profile.html for more details. If you have versions of your reader for those three strategies, try profiling them. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor