[issue35195] [Windows] Python 3.7 initializes LC_CTYPE locale at startup, causing performance issue on msvcrt isdigit()

2018-11-16 Thread Dragoljub
Dragoljub added the comment: Do we know if its possible to prevent the initialize LC_CTYPE on startup? Is there some combination of ENV-Var or CMD-Args that can avoid this slowdown on Windows? What are the next step to get the issue assigned

[issue35195] [Windows] Python 3.7 initializes LC_CTYPE locale at startup, causing performance issue on msvcrt isdigit()

2018-11-13 Thread Dragoljub
Dragoljub added the comment: This is the default LC_CTYPE locale type I see on Windows10 and Python 3.7.1 vs 3.5.2: Same: 'C' '3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]' import _locale _locale.setlocale(_locale.LC_CTYPE, None)

[issue35195] [Windows] Python 3.7 initializes LC_CTYPE locale at startup, causing performance issue on msvcrt isdigit()

2018-11-13 Thread Dragoljub
Dragoljub added the comment: On Python 3.7.1 and Windows 10: I attempted locale.setlocale(locale.LC_ALL, "POSIX") --> Errors Out --- Error Traceback (most recent call last)

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-12 Thread Dragoljub
Dragoljub added the comment: Here is a simple pure python example: digits = ''.join([str(i) for i in range(10)]*1000) %timeit digits.isdigit() # --> 2X+ slower on python 3.7.1 Basically in Pandas C-code parser we call the isdigit() function for each number that is to

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-12 Thread Dragoljub
Dragoljub added the comment: @Vstinner, Any way you can help test out a config setting to avoid the locale changes on Python 2.7.0a4+? It is currently causing the isdigit() low-level function to call the local-specific function on windows and update locals each call slowing down CSV Paring

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-10 Thread Dragoljub
Dragoljub added the comment: @cgohlke compared the statement df2 = pd.read_csv(csv) on Python 3.7.0a3 and a4 in the Visual Studio profiler. The culprit is the isdigit function called in the parsers extension module. On 3.7.0a3 the function is fast at ~8% of samples. On 3.7.0a4 the function

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-09 Thread Dragoljub
Dragoljub added the comment: I tested this at runtime with sys._enablelegacywindowsfsencoding() Also this was new in 3.6 and Py 3.6 does not have the slowdown issue. New in version 3.6: See PEP 529 for more details. -- ___ Python tracker <ht

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-09 Thread Dragoljub
Dragoljub added the comment: I tried playing around with the UTF-8 mode settings but did not get a speed improvement. After reading through the PEP it appears that on Windoes: "To allow for better cross-platform binary portability and to adjust automatically to future changes in l

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-09 Thread Dragoljub
Dragoljub added the comment: After some more digging it appears that we see the 3.5x slowdown manifest in Python 3.7.0a4 and is not present in Python 3.7.0a3. One guess is that https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-0-alpha-4 bpo-29240: Add a new UTF-8 mode

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-09 Thread Dragoljub
Dragoljub added the comment: After some more benchmarks I'm seeing this line of code called in Python 3.7 but not in Python 3.5: {built-in method _thread.allocate_lock} -- ___ Python tracker <https://bugs.python.org/is

[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

2018-11-08 Thread Dragoljub
New submission from Dragoljub : xref: https://github.com/pandas-dev/pandas/issues/23516 Example: import io import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(100, 10), columns=('COL{}'.format(i) for i in range(10))) csv = io.StringIO(df.to_csv(index=F