Dragoljub added the comment:
Do we know if its possible to prevent the initialize LC_CTYPE on startup? Is
there some combination of ENV-Var or CMD-Args that can avoid this slowdown on
Windows?
What are the next step to get the issue assigned
Dragoljub added the comment:
This is the default LC_CTYPE locale type I see on Windows10 and Python 3.7.1 vs
3.5.2: Same: 'C'
'3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]'
import _locale
_locale.setlocale(_locale.LC_CTYPE, None)
Dragoljub added the comment:
On Python 3.7.1 and Windows 10:
I attempted locale.setlocale(locale.LC_ALL, "POSIX") --> Errors Out
---
Error Traceback (most recent call last)
Dragoljub added the comment:
Here is a simple pure python example:
digits = ''.join([str(i) for i in range(10)]*1000)
%timeit digits.isdigit() # --> 2X+ slower on python 3.7.1
Basically in Pandas C-code parser we call the isdigit() function for each
number that is to
Dragoljub added the comment:
@Vstinner,
Any way you can help test out a config setting to avoid the locale changes on
Python 2.7.0a4+? It is currently causing the isdigit() low-level function to
call the local-specific function on windows and update locals each call slowing
down CSV Paring
Dragoljub added the comment:
@cgohlke compared the statement df2 = pd.read_csv(csv) on Python 3.7.0a3 and a4
in the Visual Studio profiler. The culprit is the isdigit function called in
the parsers extension module. On 3.7.0a3 the function is fast at ~8% of
samples. On 3.7.0a4 the function
Dragoljub added the comment:
I tested this at runtime with sys._enablelegacywindowsfsencoding()
Also this was new in 3.6 and Py 3.6 does not have the slowdown issue.
New in version 3.6: See PEP 529 for more details.
--
___
Python tracker
<ht
Dragoljub added the comment:
I tried playing around with the UTF-8 mode settings but did not get a speed
improvement.
After reading through the PEP it appears that on Windoes:
"To allow for better cross-platform binary portability and to adjust
automatically to future changes in l
Dragoljub added the comment:
After some more digging it appears that we see the 3.5x slowdown manifest in
Python 3.7.0a4 and is not present in Python 3.7.0a3.
One guess is that
https://docs.python.org/3.7/whatsnew/changelog.html#python-3-7-0-alpha-4
bpo-29240: Add a new UTF-8 mode
Dragoljub added the comment:
After some more benchmarks I'm seeing this line of code called in Python 3.7
but not in Python 3.5:
{built-in method _thread.allocate_lock}
--
___
Python tracker
<https://bugs.python.org/is
New submission from Dragoljub :
xref: https://github.com/pandas-dev/pandas/issues/23516
Example:
import io
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 10), columns=('COL{}'.format(i) for
i in range(10)))
csv = io.StringIO(df.to_csv(index=F
11 matches
Mail list logo