poostenr added the comment:
Thank you for your feedback Victor and Steven.
I just copied my scripts and 360MB of CSV files over to Linux.
The entire process finished in 4 minutes exactly, using the original python
scripts.
So there is something different between my environments.
If it was a
poostenr added the comment:
Eric, Steven,
During further testing I was not able to find any real evidence that the
statement I was focused on had a real performance issue.
As I did more testing I noticed that appending data to the file slowed down.
The file grew initially with ~30-50KB
poostenr added the comment:
Eric,
I just tried your examples.
The loop count is 100x more, but the results are about a factor 10 off.
Test1:
My results:
C:\Data>python -m timeit -s 'x=4' '",{0}".format(x)'
1 loops, best of 3: 0.0116 usec per loop
E
poostenr added the comment:
Eric, Steven, thank you for your feedback so far.
I am using Windows7, Intel i7.
That one particular file of 6.5MB took ~1 minute on my machine.
When I ran that same test on Linux with Python 3.5.1, it took about 3 seconds.
I was amazed to see a 20x difference
poostenr added the comment:
My initial observations with my Python script using:
s = "{0},".format(columnvalue) # fast
Processed ~360MB of data from 2:16PM - 2:51PM (35 minutes, ~10MB/min)
One particular file 6.5MB took ~1 minute.
When I changed this line of code to:
s = "
New submission from poostenr:
There appears to be a significant performance issue between the following two
statements. Unable to explain performance impact.
s = "{0},".format(columnvalue) # fast
s = "'{0}',".format(columnvalue) # ~30x slower
So far, no luck t