poostenr added the comment:
Thank you for your feedback Victor and Steven.
I just copied my scripts and 360MB of CSV files over to Linux.
The entire process finished in 4 minutes exactly, using the original python
scripts.
So there is something different between my environments.
If it was a fragmentation issue, then I would expect to always have a slow
performance on the Windows system. But I can influence the performance by
alternating between the two original statements:
s = "{0},".format(columnvalue) # fast
s = "'{0}',".format(columnvalue) # ~30x slower
I apologize for not being able to provide the entire code.
There is too much code to post at this time.
I am opening a file like this:
#logger = open(filename, rw, buffering, encoding)
logger = open('output.sql', 'a', 1, 'iso-8859-1')
I write to file:
logger.write(text+'\n')
I'm using a library to escape the string before saving to file.
import pymysql.converters as conv
<...>
for key in listkeys:
keyvalue = self.recordstats[key]
fieldtype = keyvalue[0]
columnvalue = record[key]
columnvalue = conv.escape_string(columnvalue)
if (count > 1):
s = "{0},".format(columnvalue) # No single quotes
else
s = "{0},".format(columnvalue) # No single quotes
count -= 1
logger.write(s+'\n')
I appreciate the feedback and ideas so far.
Trying the profiler is on my list to see if it provides more insight.
I am not using Anaconda3 on Linux. Perhaps that has an impact somehow?
I never suspected inserting the two single quotes to cause such a problem in
performance. I noticed it when I parsed ~40GB of data and it took almost a week
to complete instead of my expected 6-7 hrs.
Just the other day I decided to remove the single quotes because it was the
only thing left that I'd changed. I had discarded that change the past two
weeks because that couldn't be causing the performance problem.
Today, I wasn't expecting such a big difference between running my script on
Linux or Windows.
If I discover anything else, I will post an update.
When I get the chance I can remove redundant code and post the source.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue26118>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com