Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Inada Naoki
On Wed, Apr 24, 2019 at 6:17 AM Mark Shannon  wrote:
>
> Hi,
>
> On 12/04/2019 2:44 pm, Inada Naoki wrote:
> > Hi, all.
> >
> > I propose adding new method: dict.with_values(iterable)
>
> You can already do something like this, if memory saving is the main
> concern. This should work on all versions from 3.3.
>

Of course, performance is main concern too.

-- 
Inada Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use C extensions compiled in release mode on a Python compiled in debug mode

2019-04-23 Thread Victor Stinner
Le mer. 24 avr. 2019 à 01:44, Victor Stinner  a écrit :
> The current blocker issue is that
> the Py_DEBUG define imply the Py_TRACE_REFS define (...):
>
> https://bugs.python.org/issue36465
> https://github.com/python/cpython/pull/12615

I updated my PR:

"""
Release build and debug build are now ABI compatible: the Py_DEBUG
define no longer implies Py_TRACE_REFS define which introduces the
only ABI incompatibility.

A new "./configure --with-trace-refs" build option is now required to
get Py_TRACE_REFS define which adds sys.getobjects() function and
PYTHONDUMPREFS environment variable.

Changes:

* Add ./configure --with-trace-refs
* Py_DEBUG no longer implies Py_TRACE_REFS
* The "d" flag of SOABI (sys.implementation.cache_tag) is now
  only added by --with-trace-refs. It is no longer added by
  --with-pydebug.
"""


> Maybe pip could be enhanced to support installing C extensions
> compiled in release mode when using a debug mode.

In fact, pip doesn't have to be modified. I "fixed"
sys.implementation.cache_tag by removing "d" in debug mode instead ;-)


By the way, the "m" ABI flag for pymalloc is outdated. I proposed the
following change to simply remove it:

https://bugs.python.org/issue36707
https://github.com/python/cpython/pull/12931/files

With my PR 12931 and my PR 12615, the only remaining ABI flag which be
"d" which would only be enabled by ./configure --with-trace-refs,
whereas ./configure --with-pydebug has no more effect on SOABI
(sys.implementation.cache_tag).

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Use C extensions compiled in release mode on a Python compiled in debug mode

2019-04-23 Thread Ivan Pozdeev via Python-Dev

On 24.04.2019 2:44, Victor Stinner wrote:

Hi,

Two weeks ago, I started a thread "No longer enable Py_TRACE_REFS by
default in debug build", but I lost myself in details, I forgot the
main purpose of my proposal...

Let me retry from scratch with a more explicit title: I would like to
be able to run C extensions compiled in release mode on a Python
compiled in debug mode ("pydebug").


This is going to be impossible because debug Python links against debug C runtime which is binary incompatible with the release one (at 
least, in Windows).



The use case is to debug bugs in C
extensions thanks to additional runtime checks of a Python debug
build, and more generally get a better debugging experiences on
Python. Even for pure Python, a debug build is useful (to get the
Pyhon traceback in gdb using "py-bt" command).
That said, debug vs release extension compilation is currently bugged. It's impossible to make a debug build of an extension against a 
release Python (linked against release runtime, so not fully debug, just without optimizations) and vice versa. pip fails to build 
extensions for a debug Python for the same reason. I've no idea how (and if at all) people manage to diagnose problems in extensions.

https://bugs.python.org/issue33637


Currently, using a Python compiled in debug mode means to have to
recompile C extensions in debug mode. Compile a C extension requires a
C compiler, header files, pull dependencies, etc. It can be very
complicated in practical (and pollute your system with all these
additional dependencies). On Linux, it's already hard, but on Windows
it can be even harder.

Just one concrete example: no debug build of numpy is provided at
https://pypi.org/project/numpy/ Good luck to build numpy in debug mode
manually (install OpenBLAS, ATLAS, Fortran compiler, Cython, etc.)
:-)

The above paragraph is probably the reason ;-)


--

The first requirement for the use case is that a Python debug build
supports the ABI of a release build. The current blocker issue is that
the Py_DEBUG define imply the Py_TRACE_REFS define: PyObject gets 2
extra fields (_ob_prev and _ob_next) which change the offset of all
attributes of all objects and makes the ABI completely incompatible. I
propose to no longer imply Py_TRACE_REFS *by default* (but keep the
code):

https://bugs.python.org/issue36465
https://github.com/python/cpython/pull/12615

(Py_TRACE_REFS would be a different ABI.)

The second issue is that library filenames are different for a debug
build: SOABI gets an additional "d" flag for Py_DEBUG. A debug build
should first look for "NAME.cpython-38dm.so" (flags: "dm"), but then
also look for "NAME.cpython-38m.so" (flags: "m"). The opposite is not
possible: a debug build contains many additional functions missing
from a release build.

For Windows, maybe we should provide a Python compiled in debug mode
with the same C Runtime than a Python compiled in release mode.
Otherwise, the debug C Runtime is causing another ABI issue.

Maybe pip could be enhanced to support installing C extensions
compiled in release mode when using a debug mode. But that's more for
convenience, it's not really required, since it is easy to switch the
Python runtime between release and debug build.

Apart of Py_TRACE_REFS, I'm not aware of other ABI differences in
structures. I know that the COUNT_ALLOCS define changes the ABI, but
it's not implied by Py_DEBUG: you have to opt-in for COUNT_ALLOCS. (I
propose to do the same for Py_TRACE_REFS ;-))

Note: Refleaks buildbots don't use Py_TRACE_REFS to track memory
leaks, only sys.gettotalrefcount().

--

Python debug build has many benefit. If you ignore C extensions, the
debug build is usually compiled with compiler optimization disabled
which makes debugging in gdb a much better experience. If you never
tried: on a release build, most (if not all) variables are "" and it's really painful to basic debug functions like displaying
the current Python frame.

Assertions are removed in release modes, whereas they can detect a
wide range of bugs way earlier: integer overflow, buffer under- and
overflow, exceptions ignored silently, etc. Nobody likes to see a bug
for the first time in production. For example, I modified Python 3.8
to now logs I/O errors when a file is closed implicitly, but only in
debug or development mode. In release Python silently ignored EBADF
error on such case, whereas it can lead to very nasty bugs causing
Python to call abort() (which creates a coredump on Linux): see
https://bugs.python.org/issue18748 ...

DeprecationWarning and ResourceWarning are shown by default in debug mode :-)

There are too many different additional checks done at runtime: I
cannot list them all here.

--

Being able to switch between Python in release mode and Python in
debug mode is a first step. My long term plan would be to better
separate "Python" from its "runtime". CPython in release mode would be
one runtime, CPython in debug mode would be another runtime, PyPy can

Re: [Python-Dev] Use C extensions compiled in release mode on a Python compiled in debug mode

2019-04-23 Thread Victor Stinner
Le mer. 24 avr. 2019 à 01:44, Victor Stinner  a écrit :
> The first requirement for the use case is that a Python debug build
> supports the ABI of a release build. (...) I
> propose to no longer imply Py_TRACE_REFS (...)
>
> Apart of Py_TRACE_REFS, I'm not aware of other ABI differences in
> structures. (...)

I tested manually: just by disabling Py_TRACE_REFS, the release ABI
looks *fully* compatible with a debug build!


I modified Python 3.7 to disable Py_TRACE_REFS and to omit "d" from
SOABI when build in debug mode. I built Python in debug mode.

I ran tests on numpy and lxml.etree: I can use .so libraries from
/usr/lib64/python3.7/site-packages (compiled in release mode), it just
works! I was very surprised of not getting any crash on such non
trivial C extension, so I checked manually that I was running a debug
build: yes, sys.gettotalrefcount is present :-)

I also wanted to test an even more complex application: I installed
gajim, a Jabber client written in Python 3 with PyGTK. It uses many C
extensions. Running Gajim with my debug build is slower, well, that's
not a surprise, but it works well! (no crash)


About the SOABI, maybe we should only keep "d" when Py_TRACE_REFS is
used, since technically, the ABI is same between release and debug
mode without Py_TRACE_REFS. In that case, pip doesn't need to be
modified ;-)


If you also want to try, use:

PYTHONPATH=/usr/lib64/python3.7/site-packages:/usr/lib/python3.7/site-packages
./python /usr/bin/gajim

On a Python compiled with "./configure --with-pydebug && make" and the
following patch:

diff --git a/Include/object.h b/Include/object.h
index bcf78afe6b..4c807981c4 100644
--- a/Include/object.h
+++ b/Include/object.h
@@ -51,13 +51,8 @@ A standard interface exists for objects that
contain an array of items
 whose size is determined when the object is allocated.
 */

-/* Py_DEBUG implies Py_TRACE_REFS. */
-#if defined(Py_DEBUG) && !defined(Py_TRACE_REFS)
-#define Py_TRACE_REFS
-#endif
-
-/* Py_TRACE_REFS implies Py_REF_DEBUG. */
-#if defined(Py_TRACE_REFS) && !defined(Py_REF_DEBUG)
+/* Py_DEBUG implies Py_REF_DEBUG. */
+#if defined(Py_DEBUG) && !defined(Py_REF_DEBUG)
 #define Py_REF_DEBUG
 #endif

diff --git a/configure b/configure
index 2db11e6e86..7271e9de40 100755
--- a/configure
+++ b/configure
@@ -6365,7 +6365,6 @@ $as_echo "#define Py_DEBUG 1" >>confdefs.h
   { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
 $as_echo "yes" >&6; };
   Py_DEBUG='true'
-  ABIFLAGS="${ABIFLAGS}d"
 else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
 $as_echo "no" >&6; }; Py_DEBUG='false'
 fi
diff --git a/configure.ac b/configure.ac
index e5fb7e7b0b..fa4bb1944f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1246,7 +1246,6 @@ then
   [Define if you want to build an interpreter with many run-time checks.])
   AC_MSG_RESULT(yes);
   Py_DEBUG='true'
-  ABIFLAGS="${ABIFLAGS}d"
 else AC_MSG_RESULT(no); Py_DEBUG='false'
 fi],
 [AC_MSG_RESULT(no)])

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Use C extensions compiled in release mode on a Python compiled in debug mode

2019-04-23 Thread Victor Stinner
Hi,

Two weeks ago, I started a thread "No longer enable Py_TRACE_REFS by
default in debug build", but I lost myself in details, I forgot the
main purpose of my proposal...

Let me retry from scratch with a more explicit title: I would like to
be able to run C extensions compiled in release mode on a Python
compiled in debug mode ("pydebug"). The use case is to debug bugs in C
extensions thanks to additional runtime checks of a Python debug
build, and more generally get a better debugging experiences on
Python. Even for pure Python, a debug build is useful (to get the
Pyhon traceback in gdb using "py-bt" command).

Currently, using a Python compiled in debug mode means to have to
recompile C extensions in debug mode. Compile a C extension requires a
C compiler, header files, pull dependencies, etc. It can be very
complicated in practical (and pollute your system with all these
additional dependencies). On Linux, it's already hard, but on Windows
it can be even harder.

Just one concrete example: no debug build of numpy is provided at
https://pypi.org/project/numpy/ Good luck to build numpy in debug mode
manually (install OpenBLAS, ATLAS, Fortran compiler, Cython, etc.)
:-)

--

The first requirement for the use case is that a Python debug build
supports the ABI of a release build. The current blocker issue is that
the Py_DEBUG define imply the Py_TRACE_REFS define: PyObject gets 2
extra fields (_ob_prev and _ob_next) which change the offset of all
attributes of all objects and makes the ABI completely incompatible. I
propose to no longer imply Py_TRACE_REFS *by default* (but keep the
code):

https://bugs.python.org/issue36465
https://github.com/python/cpython/pull/12615

(Py_TRACE_REFS would be a different ABI.)

The second issue is that library filenames are different for a debug
build: SOABI gets an additional "d" flag for Py_DEBUG. A debug build
should first look for "NAME.cpython-38dm.so" (flags: "dm"), but then
also look for "NAME.cpython-38m.so" (flags: "m"). The opposite is not
possible: a debug build contains many additional functions missing
from a release build.

For Windows, maybe we should provide a Python compiled in debug mode
with the same C Runtime than a Python compiled in release mode.
Otherwise, the debug C Runtime is causing another ABI issue.

Maybe pip could be enhanced to support installing C extensions
compiled in release mode when using a debug mode. But that's more for
convenience, it's not really required, since it is easy to switch the
Python runtime between release and debug build.

Apart of Py_TRACE_REFS, I'm not aware of other ABI differences in
structures. I know that the COUNT_ALLOCS define changes the ABI, but
it's not implied by Py_DEBUG: you have to opt-in for COUNT_ALLOCS. (I
propose to do the same for Py_TRACE_REFS ;-))

Note: Refleaks buildbots don't use Py_TRACE_REFS to track memory
leaks, only sys.gettotalrefcount().

--

Python debug build has many benefit. If you ignore C extensions, the
debug build is usually compiled with compiler optimization disabled
which makes debugging in gdb a much better experience. If you never
tried: on a release build, most (if not all) variables are "" and it's really painful to basic debug functions like displaying
the current Python frame.

Assertions are removed in release modes, whereas they can detect a
wide range of bugs way earlier: integer overflow, buffer under- and
overflow, exceptions ignored silently, etc. Nobody likes to see a bug
for the first time in production. For example, I modified Python 3.8
to now logs I/O errors when a file is closed implicitly, but only in
debug or development mode. In release Python silently ignored EBADF
error on such case, whereas it can lead to very nasty bugs causing
Python to call abort() (which creates a coredump on Linux): see
https://bugs.python.org/issue18748 ...

DeprecationWarning and ResourceWarning are shown by default in debug mode :-)

There are too many different additional checks done at runtime: I
cannot list them all here.

--

Being able to switch between Python in release mode and Python in
debug mode is a first step. My long term plan would be to better
separate "Python" from its "runtime". CPython in release mode would be
one runtime, CPython in debug mode would be another runtime, PyPy can
seeen as another runtime, etc. The more general idea is: "compile your
C extension once and use any Python runtime".

https://pythoncapi.readthedocs.io/runtimes.html#runtimes

If you opt-in for the stable ABI, you can already switch between
runtimes of different Python versions (ex: Python 3.6 or Python 3.8).

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Mark Shannon

Hi,

On 12/04/2019 2:44 pm, Inada Naoki wrote:

Hi, all.

I propose adding new method: dict.with_values(iterable)


You can already do something like this, if memory saving is the main 
concern. This should work on all versions from 3.3.



def shared_keys_dict_maker(keys):
class C: pass
instance = C()
for key in keys:
for key in keys:
setattr(instance, key, None)
prototype = instance.__dict__
def maker(values):
result = prototype.copy()
result.update(zip(keys, values))
return result
return maker

m = shared_keys_dict_maker(('a', 'b'))

>>> d1 = {'a':1, 'b':2}
>>> print(sys.getsizeof(d1))
... 248

>>> d2 = m((1,2))
>>> print(sys.getsizeof(d2))
... 120

>>> d3 = m((None,"Hi"))
>>> print(sys.getsizeof(d3))
... 120





# Motivation

Python is used to handle data.
While dict is not efficient way to handle may records, it is still
convenient way.

When creating many dicts with same keys, dict need to
lookup internal hash table while inserting each keys.

It is costful operation.  If we can reuse existing keys of dict,
we can skip this inserting cost.

Additionally, we have "Key-Sharing Dictionary (PEP 412)".
When all keys are string, many dict can share one key.
It reduces memory consumption.

This might be usable for:

* csv.DictReader
* namedtuple._asdict()
* DB-API 2.0 implementations:  (e.g. DictCursor of mysqlclient-python)


# Draft implementation

pull request: https://github.com/python/cpython/pull/12802

with_values(self, iterable, /)
 Create a new dictionary with keys from this dict and values from iterable.

 When length of iterable is different from len(self), ValueError is raised.
 This method does not support dict subclass.


## Memory usage (Key-Sharing dict)


import sys
keys = tuple("abcdefg")
keys

('a', 'b', 'c', 'd', 'e', 'f', 'g')

d = dict(zip(keys, range(7)))
d

{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}

sys.getsizeof(d)

360


keys = dict.fromkeys("abcdefg")
d = keys.with_values(range(7))
d

{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6}

sys.getsizeof(d)

144

## Speed

$ ./python -m perf timeit -o zip_dict.json -s 'keys =
tuple("abcdefg"); values=[*range(7)]' 'dict(zip(keys, values))'

$ ./python -m perf timeit -o with_values.json -s 'keys =
dict.fromkeys("abcdefg"); values=[*range(7)]'
'keys.with_values(values)'

$ ./python -m perf compare_to zip_dict.json with_values.json
Mean +- std dev: [zip_dict] 935 ns +- 9 ns -> [with_values] 109 ns +-
2 ns: 8.59x faster (-88%)


How do you think?
Any comments are appreciated.

Regards,


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Ilya Kamenshchikov
Ok thanks for explaining. I will proceed by trying it with typeshed.

Best Regards,
--
Ilya Kamenshchikov


On Tue, Apr 23, 2019 at 9:44 PM Ivan Levkivskyi 
wrote:

> Mypy doesn't use source code of stlib for analysis and instead uses stub
> files from typeshed. IIUC PyCharm can also do that (i.e. use the typeshed
> stubs).
> The whole idea of typeshed is to avoid changing stlib solely for the sake
> of static analysis. Please open an issue on typeshed an/or PyCharm tracker.
>
> --
> Ivan
>
>
>
> On Tue, 23 Apr 2019 at 20:38, Ilya Kamenshchikov 
> wrote:
>
>> How would we answer the same question if it was not a part of stdlib?
>> I am not sure it is fair to expect of Pycharm to parse / execute the
>> __getattr__ on modules, as more elaborate implementation could even contain
>> different types per some condition at the runtime or anything at all.
>> The code:
>>
>> TYPE_CHECKING = False
>> if TYPE_CHECKING:
>> from .process import ProcessPoolExecutor
>> from .thread import ThreadPoolExecutor
>>
>> works for type checking in PyCharm and is fast.
>>
>> This is how stdlib can be an example to how side libraries can be 
>> implemented. If we can agree that this is the only clear, performant and 
>> sufficient code - then perhaps modifying mypy is a reasonable price to pay.
>>
>> Perhaps this particular case can be just patched locally by PyCharm
>> /JetBrains, but what is a general solution to this class of problems?
>>
>> Best Regards,
>> --
>> Ilya Kamenshchikov
>>
>>
>> On Tue, Apr 23, 2019 at 7:05 PM Guido van Rossum 
>> wrote:
>>
>>> In any case I think this should be filed (by the OP) as an issue against
>>> JetBrains' PyCharm issue tracker. Who knows they may be able to
>>> special-case this in a jiffy. I don't think we should add any clever hacks
>>> to the stdlib for this.
>>>
>>> On Tue, Apr 23, 2019 at 9:59 AM Nathaniel Smith  wrote:
>>>
 On Tue, Apr 23, 2019, 05:09 Andrew Svetlov 
 wrote:

> I agree that `from typing import TYPE_CHECKING` is not desirable from
> the import time reduction perspective.
>
> From my understanding code completion *can* be based on type hinting
> to avoid actual code execution.
> That's why I've mentioned that typeshed already has the correct type
> information.
>
> if TYPE_CHECKING:
> import ...
>
> requires mypy modification.
>
> if False:
> import ...
>
> Works right now for stdlib (mypy ignores stdlib code but uses typeshed
> anyway) but looks a little cryptic.
> Requires a comprehensive comment at least.
>

 Last time I looked at this, I'm pretty sure `if False` broke at least
 one popular static analysis tool (ie it was clever enough to ignore
 everything inside `if False`) – I think either pylint or jedi?

 I'd suggest checking any clever hacks against at least: mypy,
 pylint/astroid, jedi, pyflakes, and pycharm. They all have their own static
 analysis engines, and each one has its own idiosyncratic quirks.

 We've struggled with this a *lot* in trio, and eventually ended up
 giving up on all forms of dynamic export cleverness; we've even banned the
 use of __all__ entirely. Static analysis has gotten good enough that users
 won't accept it not working, but it hasn't gotten good enough to handle
 anything but the simplest static exports in a reliable way:
 https://github.com/python-trio/trio/pull/316
 https://github.com/python-trio/trio/issues/542

 The stdlib has more leeway because when tools don't work on the stdlib
 then they tend to eventually add workarounds. I'm just saying, think twice
 before diving into clever hacks to workaround static analysis limits, and
 if you're going to do it then be careful to be thorough. You're basically
 relying on undocumented bugs, and it gets really messy really quickly.

 -n
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 https://mail.python.org/mailman/options/python-dev/guido%40python.org

>>>
>>>
>>> --
>>> --Guido van Rossum (python.org/~guido)
>>> *Pronouns: he/him/his **(why is my pronoun here?)*
>>> 
>>>
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com
>>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Ivan Levkivskyi
Mypy doesn't use source code of stlib for analysis and instead uses stub
files from typeshed. IIUC PyCharm can also do that (i.e. use the typeshed
stubs).
The whole idea of typeshed is to avoid changing stlib solely for the sake
of static analysis. Please open an issue on typeshed an/or PyCharm tracker.

--
Ivan



On Tue, 23 Apr 2019 at 20:38, Ilya Kamenshchikov 
wrote:

> How would we answer the same question if it was not a part of stdlib?
> I am not sure it is fair to expect of Pycharm to parse / execute the
> __getattr__ on modules, as more elaborate implementation could even contain
> different types per some condition at the runtime or anything at all.
> The code:
>
> TYPE_CHECKING = False
> if TYPE_CHECKING:
> from .process import ProcessPoolExecutor
> from .thread import ThreadPoolExecutor
>
> works for type checking in PyCharm and is fast.
>
> This is how stdlib can be an example to how side libraries can be 
> implemented. If we can agree that this is the only clear, performant and 
> sufficient code - then perhaps modifying mypy is a reasonable price to pay.
>
> Perhaps this particular case can be just patched locally by PyCharm
> /JetBrains, but what is a general solution to this class of problems?
>
> Best Regards,
> --
> Ilya Kamenshchikov
>
>
> On Tue, Apr 23, 2019 at 7:05 PM Guido van Rossum  wrote:
>
>> In any case I think this should be filed (by the OP) as an issue against
>> JetBrains' PyCharm issue tracker. Who knows they may be able to
>> special-case this in a jiffy. I don't think we should add any clever hacks
>> to the stdlib for this.
>>
>> On Tue, Apr 23, 2019 at 9:59 AM Nathaniel Smith  wrote:
>>
>>> On Tue, Apr 23, 2019, 05:09 Andrew Svetlov 
>>> wrote:
>>>
 I agree that `from typing import TYPE_CHECKING` is not desirable from
 the import time reduction perspective.

 From my understanding code completion *can* be based on type hinting
 to avoid actual code execution.
 That's why I've mentioned that typeshed already has the correct type
 information.

 if TYPE_CHECKING:
 import ...

 requires mypy modification.

 if False:
 import ...

 Works right now for stdlib (mypy ignores stdlib code but uses typeshed
 anyway) but looks a little cryptic.
 Requires a comprehensive comment at least.

>>>
>>> Last time I looked at this, I'm pretty sure `if False` broke at least
>>> one popular static analysis tool (ie it was clever enough to ignore
>>> everything inside `if False`) – I think either pylint or jedi?
>>>
>>> I'd suggest checking any clever hacks against at least: mypy,
>>> pylint/astroid, jedi, pyflakes, and pycharm. They all have their own static
>>> analysis engines, and each one has its own idiosyncratic quirks.
>>>
>>> We've struggled with this a *lot* in trio, and eventually ended up
>>> giving up on all forms of dynamic export cleverness; we've even banned the
>>> use of __all__ entirely. Static analysis has gotten good enough that users
>>> won't accept it not working, but it hasn't gotten good enough to handle
>>> anything but the simplest static exports in a reliable way:
>>> https://github.com/python-trio/trio/pull/316
>>> https://github.com/python-trio/trio/issues/542
>>>
>>> The stdlib has more leeway because when tools don't work on the stdlib
>>> then they tend to eventually add workarounds. I'm just saying, think twice
>>> before diving into clever hacks to workaround static analysis limits, and
>>> if you're going to do it then be careful to be thorough. You're basically
>>> relying on undocumented bugs, and it gets really messy really quickly.
>>>
>>> -n
>>> ___
>>> Python-Dev mailing list
>>> Python-Dev@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe:
>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> *Pronouns: he/him/his **(why is my pronoun here?)*
>> 
>>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Guido van Rossum
The general solution is

import typing
if typing.TYPE_CHECKING:


The hack of starting with

TYPE_CHECKING = False

happens to work but is not endorsed by PEP 484 so is not guaranteed for the
future.

Note that 3rd party code is rarely in such a critical part for script
startup that the cost of `import typing` is too much. But the stdlib often
*is* in the critical path for script startup, and some consider the time
spent in that import too  much (startup time should be in the order of tens
of msec so every msec counts -- but once you start importing 3rd party code
you basically can't make it that fast regardless).

Anyway, the stdlib should almost never be used as an example for non-stdlib
code -- there are many reasons for this that I don't want to have to repeat
here.

On Tue, Apr 23, 2019 at 12:33 PM Ilya Kamenshchikov <
ikamenshchi...@gmail.com> wrote:

> How would we answer the same question if it was not a part of stdlib?
> I am not sure it is fair to expect of Pycharm to parse / execute the
> __getattr__ on modules, as more elaborate implementation could even contain
> different types per some condition at the runtime or anything at all.
> The code:
>
> TYPE_CHECKING = False
> if TYPE_CHECKING:
> from .process import ProcessPoolExecutor
> from .thread import ThreadPoolExecutor
>
> works for type checking in PyCharm and is fast.
>
> This is how stdlib can be an example to how side libraries can be 
> implemented. If we can agree that this is the only clear, performant and 
> sufficient code - then perhaps modifying mypy is a reasonable price to pay.
>
> Perhaps this particular case can be just patched locally by PyCharm
> /JetBrains, but what is a general solution to this class of problems?
>
> Best Regards,
> --
> Ilya Kamenshchikov
>
>
> On Tue, Apr 23, 2019 at 7:05 PM Guido van Rossum  wrote:
>
>> In any case I think this should be filed (by the OP) as an issue against
>> JetBrains' PyCharm issue tracker. Who knows they may be able to
>> special-case this in a jiffy. I don't think we should add any clever hacks
>> to the stdlib for this.
>>
>> On Tue, Apr 23, 2019 at 9:59 AM Nathaniel Smith  wrote:
>>
>>> On Tue, Apr 23, 2019, 05:09 Andrew Svetlov 
>>> wrote:
>>>
 I agree that `from typing import TYPE_CHECKING` is not desirable from
 the import time reduction perspective.

 From my understanding code completion *can* be based on type hinting
 to avoid actual code execution.
 That's why I've mentioned that typeshed already has the correct type
 information.

 if TYPE_CHECKING:
 import ...

 requires mypy modification.

 if False:
 import ...

 Works right now for stdlib (mypy ignores stdlib code but uses typeshed
 anyway) but looks a little cryptic.
 Requires a comprehensive comment at least.

>>>
>>> Last time I looked at this, I'm pretty sure `if False` broke at least
>>> one popular static analysis tool (ie it was clever enough to ignore
>>> everything inside `if False`) – I think either pylint or jedi?
>>>
>>> I'd suggest checking any clever hacks against at least: mypy,
>>> pylint/astroid, jedi, pyflakes, and pycharm. They all have their own static
>>> analysis engines, and each one has its own idiosyncratic quirks.
>>>
>>> We've struggled with this a *lot* in trio, and eventually ended up
>>> giving up on all forms of dynamic export cleverness; we've even banned the
>>> use of __all__ entirely. Static analysis has gotten good enough that users
>>> won't accept it not working, but it hasn't gotten good enough to handle
>>> anything but the simplest static exports in a reliable way:
>>> https://github.com/python-trio/trio/pull/316
>>> https://github.com/python-trio/trio/issues/542
>>>
>>> The stdlib has more leeway because when tools don't work on the stdlib
>>> then they tend to eventually add workarounds. I'm just saying, think twice
>>> before diving into clever hacks to workaround static analysis limits, and
>>> if you're going to do it then be careful to be thorough. You're basically
>>> relying on undocumented bugs, and it gets really messy really quickly.
>>>
>>> -n
>>> ___
>>> Python-Dev mailing list
>>> Python-Dev@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe:
>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> *Pronouns: he/him/his **(why is my pronoun here?)*
>> 
>>
>

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him/his **(why is my pronoun here?)*

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 

Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Ilya Kamenshchikov
How would we answer the same question if it was not a part of stdlib?
I am not sure it is fair to expect of Pycharm to parse / execute the
__getattr__ on modules, as more elaborate implementation could even contain
different types per some condition at the runtime or anything at all.
The code:

TYPE_CHECKING = False
if TYPE_CHECKING:
from .process import ProcessPoolExecutor
from .thread import ThreadPoolExecutor

works for type checking in PyCharm and is fast.

This is how stdlib can be an example to how side libraries can be
implemented. If we can agree that this is the only clear, performant
and sufficient code - then perhaps modifying mypy is a reasonable
price to pay.

Perhaps this particular case can be just patched locally by PyCharm
/JetBrains, but what is a general solution to this class of problems?

Best Regards,
--
Ilya Kamenshchikov


On Tue, Apr 23, 2019 at 7:05 PM Guido van Rossum  wrote:

> In any case I think this should be filed (by the OP) as an issue against
> JetBrains' PyCharm issue tracker. Who knows they may be able to
> special-case this in a jiffy. I don't think we should add any clever hacks
> to the stdlib for this.
>
> On Tue, Apr 23, 2019 at 9:59 AM Nathaniel Smith  wrote:
>
>> On Tue, Apr 23, 2019, 05:09 Andrew Svetlov 
>> wrote:
>>
>>> I agree that `from typing import TYPE_CHECKING` is not desirable from
>>> the import time reduction perspective.
>>>
>>> From my understanding code completion *can* be based on type hinting
>>> to avoid actual code execution.
>>> That's why I've mentioned that typeshed already has the correct type
>>> information.
>>>
>>> if TYPE_CHECKING:
>>> import ...
>>>
>>> requires mypy modification.
>>>
>>> if False:
>>> import ...
>>>
>>> Works right now for stdlib (mypy ignores stdlib code but uses typeshed
>>> anyway) but looks a little cryptic.
>>> Requires a comprehensive comment at least.
>>>
>>
>> Last time I looked at this, I'm pretty sure `if False` broke at least one
>> popular static analysis tool (ie it was clever enough to ignore everything
>> inside `if False`) – I think either pylint or jedi?
>>
>> I'd suggest checking any clever hacks against at least: mypy,
>> pylint/astroid, jedi, pyflakes, and pycharm. They all have their own static
>> analysis engines, and each one has its own idiosyncratic quirks.
>>
>> We've struggled with this a *lot* in trio, and eventually ended up giving
>> up on all forms of dynamic export cleverness; we've even banned the use of
>> __all__ entirely. Static analysis has gotten good enough that users won't
>> accept it not working, but it hasn't gotten good enough to handle anything
>> but the simplest static exports in a reliable way:
>> https://github.com/python-trio/trio/pull/316
>> https://github.com/python-trio/trio/issues/542
>>
>> The stdlib has more leeway because when tools don't work on the stdlib
>> then they tend to eventually add workarounds. I'm just saying, think twice
>> before diving into clever hacks to workaround static analysis limits, and
>> if you're going to do it then be careful to be thorough. You're basically
>> relying on undocumented bugs, and it gets really messy really quickly.
>>
>> -n
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him/his **(why is my pronoun here?)*
> 
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Serhiy Storchaka

12.04.19 19:17, Inada Naoki пише:

Maybe, collections.DictBuilder may be another option.  e.g.


from collections import DictBuilder
builder = DictBuilder(tuple("abc"))
builder.build(range(3))

{"a": 0, "b": 1, "c": 2}


Nitpicking: this is rather a factory than a builder. The difference 
between the patterns is that you create a new builder object for every dict:


builder = DictBuilder()
builder['a'] = 0
builder['b'] = 1
builder['c'] = 2
result = builder.build()

and create a fabric only for the whole class of dicts:

factory = DictFactory(tuple("abc"))  # only once
...
result = factory(range(3))

I like the idea of a factory object more than the idea of the dict method.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Guido van Rossum
In any case I think this should be filed (by the OP) as an issue against
JetBrains' PyCharm issue tracker. Who knows they may be able to
special-case this in a jiffy. I don't think we should add any clever hacks
to the stdlib for this.

On Tue, Apr 23, 2019 at 9:59 AM Nathaniel Smith  wrote:

> On Tue, Apr 23, 2019, 05:09 Andrew Svetlov 
> wrote:
>
>> I agree that `from typing import TYPE_CHECKING` is not desirable from
>> the import time reduction perspective.
>>
>> From my understanding code completion *can* be based on type hinting
>> to avoid actual code execution.
>> That's why I've mentioned that typeshed already has the correct type
>> information.
>>
>> if TYPE_CHECKING:
>> import ...
>>
>> requires mypy modification.
>>
>> if False:
>> import ...
>>
>> Works right now for stdlib (mypy ignores stdlib code but uses typeshed
>> anyway) but looks a little cryptic.
>> Requires a comprehensive comment at least.
>>
>
> Last time I looked at this, I'm pretty sure `if False` broke at least one
> popular static analysis tool (ie it was clever enough to ignore everything
> inside `if False`) – I think either pylint or jedi?
>
> I'd suggest checking any clever hacks against at least: mypy,
> pylint/astroid, jedi, pyflakes, and pycharm. They all have their own static
> analysis engines, and each one has its own idiosyncratic quirks.
>
> We've struggled with this a *lot* in trio, and eventually ended up giving
> up on all forms of dynamic export cleverness; we've even banned the use of
> __all__ entirely. Static analysis has gotten good enough that users won't
> accept it not working, but it hasn't gotten good enough to handle anything
> but the simplest static exports in a reliable way:
> https://github.com/python-trio/trio/pull/316
> https://github.com/python-trio/trio/issues/542
>
> The stdlib has more leeway because when tools don't work on the stdlib
> then they tend to eventually add workarounds. I'm just saying, think twice
> before diving into clever hacks to workaround static analysis limits, and
> if you're going to do it then be careful to be thorough. You're basically
> relying on undocumented bugs, and it gets really messy really quickly.
>
> -n
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him/his **(why is my pronoun here?)*

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Nathaniel Smith
On Tue, Apr 23, 2019, 05:09 Andrew Svetlov  wrote:

> I agree that `from typing import TYPE_CHECKING` is not desirable from
> the import time reduction perspective.
>
> From my understanding code completion *can* be based on type hinting
> to avoid actual code execution.
> That's why I've mentioned that typeshed already has the correct type
> information.
>
> if TYPE_CHECKING:
> import ...
>
> requires mypy modification.
>
> if False:
> import ...
>
> Works right now for stdlib (mypy ignores stdlib code but uses typeshed
> anyway) but looks a little cryptic.
> Requires a comprehensive comment at least.
>

Last time I looked at this, I'm pretty sure `if False` broke at least one
popular static analysis tool (ie it was clever enough to ignore everything
inside `if False`) – I think either pylint or jedi?

I'd suggest checking any clever hacks against at least: mypy,
pylint/astroid, jedi, pyflakes, and pycharm. They all have their own static
analysis engines, and each one has its own idiosyncratic quirks.

We've struggled with this a *lot* in trio, and eventually ended up giving
up on all forms of dynamic export cleverness; we've even banned the use of
__all__ entirely. Static analysis has gotten good enough that users won't
accept it not working, but it hasn't gotten good enough to handle anything
but the simplest static exports in a reliable way:
https://github.com/python-trio/trio/pull/316
https://github.com/python-trio/trio/issues/542

The stdlib has more leeway because when tools don't work on the stdlib then
they tend to eventually add workarounds. I'm just saying, think twice
before diving into clever hacks to workaround static analysis limits, and
if you're going to do it then be careful to be thorough. You're basically
relying on undocumented bugs, and it gets really messy really quickly.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Inada Naoki
On Wed, Apr 24, 2019 at 12:34 AM Steve Dower  wrote:
>
> >> If it's a key-sharing dict, then all the keys are strings. We know that
> >> when we go to do the update, so we can intern all the strings (going to
> >> do that anyway) and then it's a quick check if it already exists. If
> >> it's a regular dict, then we calculate hashes as normal. Updating the
> >> value is just a decref, incref and assignment.
> >
> > There are some problem.
> >
> > 1. Searching hash table is not zero-cost, comparing to appending to 
> > sequence.
> > This cost is very near to building new hash tables.
>
> If we know that you're sharing keys with the new items then we can skip
> the search. This was my point about the d2 = copy(d1); d2.update(zip(d2,
> values)) idea:
>

OK, I got it.
But note that zip object doesn't expose items, neither Python level or C level.


> > 2. In my proposal, main user is csv.DictReader or sql.DictCursor.
> > They parse only values on each rows.   So they need to use map.
>
> In that case, use a private helper. _csv already has a native module. We
> don't need to add new public APIs for internal optimisations, provided
> there is a semantically equivalent way to do it without the internal API.

csv is stdlib.  But there are some third party extensions similar to csv.

>
> > 3. (CPython only) dict.copy(), dict(dict), and dict.update() are general 
> > purpose
> > methods.  There is no obvious place to start using key-sharing dict.
>
> See my reply to Glenn, but potentially fromkeys() could start with the
> key-sharing dict and then copy()/dict() could continue sharing it
> (hopefully they already do?).

Key-sharing dict is used only for instance dict at the moment.

2nd argument of dict.fromkeys() is value, not values.
How about adding dict.fromkeyvalues(keys, values)?
When keys is dict, it's behavior is same to my first proposal
(`dict.with_values(d1, values)`).

> >
> > If *CPython* specialized dict(zip(dict, values)),  it still be CPython
> > implementation detail.
> > Do you want recommend using such CPython hacky optimization?
> > Should we use such optimization in stdlib, even if it will be slower
> > than dict(zip(keys_tuple, values)) on some other Python implementations?
>
> We do "hacky" optimisations everywhere :) The point of the runtime is to
> let users write code that works and we do the effort behind the scenes
> to make it efficient. We're not C - we're here to help our users.

But we avoid CPython-only hack which will make stdlib slower on other
Python implementations as possible.
For example, we optimize `s1 += s` loop.  But we use `''.join(list_of_str)`
instead of it.

>
> The point is that it will work on other implementations - including
> previous versions of CPython - and those are free to optimise it however
> they like.
>
> > Or do you propose making dict(zip(dict, values)) optimization as
> > language specification?
>
> Definitely not! It's just a pattern that we have the ability to
> recognize and optimize at runtime, so why not do it?

Why we need to recommend patterns fast only in CPython?

  d2 = dict.fromkeys(keys_dict)   # make key sharing dict, only in CPython 3.8+
  d2.update(zip(d2, row))  # update values without key lookup, only in
CPython 3.8+

Obviously, this may be much slower than `d2 = dict(zip(keys_tuple, row))` on
current CPython and other Python implementations.

Note that this pattern will be used when dict creation is bottleneck.

If we has specialized API, libraries can use it if the API is available,
and use dict(zip(keys, row)) otherwise.


>
> > One obvious advantage of having DictBuilder is it is for specific
> > purpose.  It has at least same performance to dict(zip(keys, values))
> > on all Python implementations.
> > Libraries like csv parser can use it without worrying about its performance
> > on Python other than CPython.
>
> A singular purpose isn't necessarily an obvious advantage. We're better
> off with generic building blocks that our users can compose in ways that
> were originally non-obvious (and then as patterns emerge we can look at
> ways to simplify or formalise them).

In generic building blocks, we can not know user will create massive dicts
with same keys or just creating one copy.  We need to guess, and the guess
may be wrong.

Regards,

-- 
Inada Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Inada Naoki
On Wed, Apr 24, 2019 at 12:28 AM Steve Dower  wrote:
>
> >
> > But if the original dictionary wasn't created with shared keys... the
> > copy can't share them either.  Or are you suggesting adding new code to
> > create a shared key dictionary from one that isn't?
>
> This is a good point. Maybe dict.fromkeys() could do it? Or a
> sys.intern-like function (which is why I brought up that precedent). The
> point is to make it an optional benefit rather than strict
> language/library semantics.
>

Then, why not support values when creating key sharing dict?
That's one form of my proposal :)

-- 
Inada Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Steve Dower

On 23Apr2019 0008, Inada Naoki wrote:

On Tue, Apr 23, 2019 at 2:54 PM Steve Dower  wrote:


On 22Apr2019 2143, Inada Naoki wrote:

On Tue, Apr 23, 2019 at 11:30 AM Steve Dower  wrote:


Or possibly just "dict(existing_dict).update(new_items)".



Do you mean .update accepts values tuple?
I can't think it's


Not sure what you were going to go on to say here, but why not?


Sorry, I sent mail without finishing.

dict.update() has too many overloading.
Adding values_tuple is impossible without breaking backward compatibility.

But I think you're saying about items_sequence, not values.


Right. I'm specifically trying to avoid changing public APIs at all 
(including adding anything new, if possible) by identifying suitable 
patterns that we can handle specially to provide a transparent speed 
improvement.



If it's a key-sharing dict, then all the keys are strings. We know that
when we go to do the update, so we can intern all the strings (going to
do that anyway) and then it's a quick check if it already exists. If
it's a regular dict, then we calculate hashes as normal. Updating the
value is just a decref, incref and assignment.


There are some problem.

1. Searching hash table is not zero-cost, comparing to appending to sequence.
This cost is very near to building new hash tables.


If we know that you're sharing keys with the new items then we can skip 
the search. This was my point about the d2 = copy(d1); d2.update(zip(d2, 
values)) idea:


def update(self, items):
if isinstance(items, ZipObject): # whatever the type is called
if are_sharing_keys(self, items.sequence_1):
# fast update from iter(items.sequence_2)
return
# regular update from iter(items)

Totally transparent and encourages composition of existing builtins. 
It's a bit of a trick and may not be as obvious as a new method, but 
it's backwards compatible at least as far as ordered dicts (which is a 
requirement of any of these approaches anyway, yes?)



2. In my proposal, main user is csv.DictReader or sql.DictCursor.
They parse only values on each rows.   So they need to use map.


In that case, use a private helper. _csv already has a native module. We 
don't need to add new public APIs for internal optimisations, provided 
there is a semantically equivalent way to do it without the internal API.



3. (CPython only) dict.copy(), dict(dict), and dict.update() are general purpose
methods.  There is no obvious place to start using key-sharing dict.


See my reply to Glenn, but potentially fromkeys() could start with the 
key-sharing dict and then copy()/dict() could continue sharing it 
(hopefully they already do?).



That's why I proposed specific method / function for specific purpose.



Note that it .update() would probably require a dict or key/value tuples
here - but if you have the keys in a tuple already then zip() is going
to be good enough for setting it (in fact, zip(existing_dict,
new_values) should be fine, and we can internally special-case that
scenario, too).


If *CPython* specialized dict(zip(dict, values)),  it still be CPython
implementation detail.
Do you want recommend using such CPython hacky optimization?
Should we use such optimization in stdlib, even if it will be slower
than dict(zip(keys_tuple, values)) on some other Python implementations?


We do "hacky" optimisations everywhere :) The point of the runtime is to 
let users write code that works and we do the effort behind the scenes 
to make it efficient. We're not C - we're here to help our users.


The point is that it will work on other implementations - including 
previous versions of CPython - and those are free to optimise it however 
they like.



Or do you propose making dict(zip(dict, values)) optimization as
language specification?


Definitely not! It's just a pattern that we have the ability to 
recognize and optimize at runtime, so why not do it?



One obvious advantage of having DictBuilder is it is for specific
purpose.  It has at least same performance to dict(zip(keys, values))
on all Python implementations.
Libraries like csv parser can use it without worrying about its performance
on Python other than CPython.


A singular purpose isn't necessarily an obvious advantage. We're better 
off with generic building blocks that our users can compose in ways that 
were originally non-obvious (and then as patterns emerge we can look at 
ways to simplify or formalise them).



(Randomizing side note: is this scenario enough to make a case for a
built-in data frame type?)


https://xkcd.com/927/


Yep. The difference is that as the language team, our standard wins by 
default ;)


(For those who don't click links, it's pointing at the "let's make a new 
standard" XKCD comic)



* when you only d2.update existing keys, no need to rebuild the table
* a duplicated key overwrites multiple times - what else are you going
to do?


But all keys should be looked up.  It is very similar overhead to 

Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Steve Dower

On 23Apr2019 0034, Glenn Linderman wrote:

On 4/22/2019 10:59 PM, Steve Dower wrote:

On 22Apr2019 2119, Glenn Linderman wrote:
While Inada's suggested DictBuilder interface was immediately 
obvious, I don't get how either copy or update would achieve the 
goal. Perhaps you could explain? Particularly, what would be the 
trigger that would make dict() choose to create a shared key 
dictionary from the start? Unless it is known that there will be lots 
of (mostly static) dictionaries with the same set of keys at the time 
of creation of the first one, creating a shared key dictionary in 
every case would cause later inefficiencies in converting them, when 
additional items are added? (I'm assuming without knowledge that a 
single shared key dictionary is less efficient than a single regular 
dictionary.)


Passing a dictionary to the dict() constructor creates a copy of that 
dictionary (as does copy.copy()). What other trigger do you need to 
decide "it contains the same keys"? It's a copy of the original dict, 
so by definition at that point it may as well share its entire 
contents with the original.


But if the original dictionary wasn't created with shared keys... the 
copy can't share them either.  Or are you suggesting adding new code to 
create a shared key dictionary from one that isn't?


This is a good point. Maybe dict.fromkeys() could do it? Or a 
sys.intern-like function (which is why I brought up that precedent). The 
point is to make it an optional benefit rather than strict 
language/library semantics.


Is there a cost to using a key sharing dict that is prohibitive when the 
keys aren't actually being shared?


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Concurrent.futures: no type discovery for PyCharm

2019-04-23 Thread Andrew Svetlov
I agree that `from typing import TYPE_CHECKING` is not desirable from
the import time reduction perspective.

>From my understanding code completion *can* be based on type hinting
to avoid actual code execution.
That's why I've mentioned that typeshed already has the correct type
information.

if TYPE_CHECKING:
import ...

requires mypy modification.

if False:
import ...

Works right now for stdlib (mypy ignores stdlib code but uses typeshed
anyway) but looks a little cryptic.
Requires a comprehensive comment at least.

On Tue, Apr 23, 2019 at 1:59 AM Inada Naoki  wrote:
>
> On Tue, Apr 23, 2019 at 4:40 AM Brett Cannon  wrote:
> >
> > On Sat, Apr 20, 2019 at 2:10 PM Inada Naoki  wrote:
> >>
> >> "import typing" is slow too.
> >
> > But is it so slow as to not do the right thing here and use the 'typing' 
> > module as expected?
>
> I don't know it is not a "right thing" yet.  It feel it is just a
> workaround for PyCharm at the moment.
>
> __dir__ and __all__ has ProcessPoolExecutor and ThreadPoolExecutor for
> interactive shell.  So Python REPL can complete them.  But we didn't discussed
> about "static hinting" version of __all__ in PEP 562.
>
> If we decide it's a "right way", we can update example code in PEP 562.
>
> But when we use lazy import, we want to make import faster.
> Adding more 3~5ms import time seems not so happy solution.
>
> Maybe, can we add TYPE_CHECKING=False in builtins?
>
>
> > If you have so much work you need to launch some threads or processes to 
> > deal with it then a single import isn't going to be your biggest bottleneck.
>
> Importing futures module doesn't mean the app really need
> thread or processes.  That's why we defer importing ThreadPoolExecutor
> and ProcessPoolExecutor.
>
> And people who want apps like vim starts quickly (~200ms), we want avoid
> every "significant overhead" as possible.  Not only "the biggest bottleneck"
> is the problem.
>
> --
> Inada Naoki  
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com



-- 
Thanks,
Andrew Svetlov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Glenn Linderman

On 4/22/2019 10:59 PM, Steve Dower wrote:

On 22Apr2019 2119, Glenn Linderman wrote:
While Inada's suggested DictBuilder interface was immediately 
obvious, I don't get how either copy or update would achieve the 
goal. Perhaps you could explain? Particularly, what would be the 
trigger that would make dict() choose to create a shared key 
dictionary from the start? Unless it is known that there will be lots 
of (mostly static) dictionaries with the same set of keys at the time 
of creation of the first one, creating a shared key dictionary in 
every case would cause later inefficiencies in converting them, when 
additional items are added? (I'm assuming without knowledge that a 
single shared key dictionary is less efficient than a single regular 
dictionary.)


Passing a dictionary to the dict() constructor creates a copy of that 
dictionary (as does copy.copy()). What other trigger do you need to 
decide "it contains the same keys"? It's a copy of the original dict, 
so by definition at that point it may as well share its entire 
contents with the original.


But if the original dictionary wasn't created with shared keys... the 
copy can't share them either.  Or are you suggesting adding new code to 
create a shared key dictionary from one that isn't?


Basically this is just a partial copy-on-write, where we copy values 
eagerly - since they're almost certainly going to change - and keys 
lazily - since there are known scenarios where they are not going to 
be changed, but we'll pay the cost later if it turns out they are.


Cheers,
Steve



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Inada Naoki
On Tue, Apr 23, 2019 at 2:54 PM Steve Dower  wrote:
>
> On 22Apr2019 2143, Inada Naoki wrote:
> > On Tue, Apr 23, 2019 at 11:30 AM Steve Dower  wrote:
> >>
> >> Or possibly just "dict(existing_dict).update(new_items)".
> >>
> >
> > Do you mean .update accepts values tuple?
> > I can't think it's
>
> Not sure what you were going to go on to say here, but why not?

Sorry, I sent mail without finishing.

dict.update() has too many overloading.
Adding values_tuple is impossible without breaking backward compatibility.

But I think you're saying about items_sequence, not values.

>
> If it's a key-sharing dict, then all the keys are strings. We know that
> when we go to do the update, so we can intern all the strings (going to
> do that anyway) and then it's a quick check if it already exists. If
> it's a regular dict, then we calculate hashes as normal. Updating the
> value is just a decref, incref and assignment.

There are some problem.

1. Searching hash table is not zero-cost, comparing to appending to sequence.
   This cost is very near to building new hash tables.

2. In my proposal, main user is csv.DictReader or sql.DictCursor.
   They parse only values on each rows.   So they need to use map.

3. (CPython only) dict.copy(), dict(dict), and dict.update() are general purpose
   methods.  There is no obvious place to start using key-sharing dict.

That's why I proposed specific method / function for specific purpose.

>
> Note that it .update() would probably require a dict or key/value tuples
> here - but if you have the keys in a tuple already then zip() is going
> to be good enough for setting it (in fact, zip(existing_dict,
> new_values) should be fine, and we can internally special-case that
> scenario, too).

If *CPython* specialized dict(zip(dict, values)),  it still be CPython
implementation detail.
Do you want recommend using such CPython hacky optimization?
Should we use such optimization in stdlib, even if it will be slower
than dict(zip(keys_tuple, values)) on some other Python implementations?

Or do you propose making dict(zip(dict, values)) optimization as
language specification?

One obvious advantage of having DictBuilder is it is for specific
purpose.  It has at least same performance to dict(zip(keys, values))
on all Python implementations.
Libraries like csv parser can use it without worrying about its performance
on Python other than CPython.


> I'd assumed the benefit was in memory usage after
> construction, rather than speed-to-construct, since everyone keeps
> talking about "key-sharing dictionaries" and not "arrays" ;)

Both is important.  I had talked about non key-sharing dict.

> (Randomizing side note: is this scenario enough to make a case for a
> built-in data frame type?)

https://xkcd.com/927/


> >> My primary concern is still to avoid making CPython performance
> >> characteristics part of the Python language definition. That only makes
> >> it harder for alternate implementations.
> >
> > Note that this proposal is not only for key sharing dict:
> >
> > * We can avoid rebuilding hash table again and again.
> > * We can avoid checking duplicated keys again and again.
> >
> > These characteristics are not only for Python, but for all mapping
> > implementations using hash table.
>
> I believe all of these are met by making d2=dict(d1) construct a dict d2
> that shares keys with d1 by default. Can you show how they are not?

If you want only copy, it's same.

>
> * when you only d2.update existing keys, no need to rebuild the table
> * a duplicated key overwrites multiple times - what else are you going
> to do?

But all keys should be looked up.  It is very similar overhead to rebuilding
hash table.

> This is already easiest, fastest, uses the least memory and is
> most consistent with every other form of setting dict items. Why
> complicate things by checking them? Let the caller do it

As I wrote above, it is:

* slower than my proposal.
* no obvious place to start using key sharing dict.


-- 
Inada Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Steve Dower

On 22Apr2019 2119, Glenn Linderman wrote:
While Inada's suggested DictBuilder interface was immediately obvious, I 
don't get how either copy or update would achieve the goal. Perhaps you 
could explain? Particularly, what would be the trigger that would make 
dict() choose to create a shared key dictionary from the start? Unless 
it is known that there will be lots of (mostly static) dictionaries with 
the same set of keys at the time of creation of the first one, creating 
a shared key dictionary in every case would cause later inefficiencies 
in converting them, when additional items are added? (I'm assuming 
without knowledge that a single shared key dictionary is less efficient 
than a single regular dictionary.)


Passing a dictionary to the dict() constructor creates a copy of that 
dictionary (as does copy.copy()). What other trigger do you need to 
decide "it contains the same keys"? It's a copy of the original dict, so 
by definition at that point it may as well share its entire contents 
with the original.


Basically this is just a partial copy-on-write, where we copy values 
eagerly - since they're almost certainly going to change - and keys 
lazily - since there are known scenarios where they are not going to be 
changed, but we'll pay the cost later if it turns out they are.


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: dict.with_values(iterable)

2019-04-23 Thread Steve Dower

On 22Apr2019 2143, Inada Naoki wrote:

On Tue, Apr 23, 2019 at 11:30 AM Steve Dower  wrote:


Or possibly just "dict(existing_dict).update(new_items)".



Do you mean .update accepts values tuple?
I can't think it's


Not sure what you were going to go on to say here, but why not?

If it's a key-sharing dict, then all the keys are strings. We know that 
when we go to do the update, so we can intern all the strings (going to 
do that anyway) and then it's a quick check if it already exists. If 
it's a regular dict, then we calculate hashes as normal. Updating the 
value is just a decref, incref and assignment.


If not all these conditions are met, we convert to a regular dict. The 
proposed function was going to raise an error in this case, so all we've 
done is make it transparent. The biggest downside is now you don't get a 
warning that your preferred optimization isn't actually working when you 
pass in new_items with different keys from what were in existing_dict.


Note that it .update() would probably require a dict or key/value tuples 
here - but if you have the keys in a tuple already then zip() is going 
to be good enough for setting it (in fact, zip(existing_dict, 
new_values) should be fine, and we can internally special-case that 
scenario, too). I'd assumed the benefit was in memory usage after 
construction, rather than speed-to-construct, since everyone keeps 
talking about "key-sharing dictionaries" and not "arrays" ;)


(Randomizing side note: is this scenario enough to make a case for a 
built-in data frame type?)



My primary concern is still to avoid making CPython performance
characteristics part of the Python language definition. That only makes
it harder for alternate implementations.


Note that this proposal is not only for key sharing dict:

* We can avoid rebuilding hash table again and again.
* We can avoid checking duplicated keys again and again.

These characteristics are not only for Python, but for all mapping
implementations using hash table.


I believe all of these are met by making d2=dict(d1) construct a dict d2 
that shares keys with d1 by default. Can you show how they are not?


* when you only d2.update existing keys, no need to rebuild the table
* a duplicated key overwrites multiple times - what else are you going 
to do? This is already easiest, fastest, uses the least memory and is 
most consistent with every other form of setting dict items. Why 
complicate things by checking them? Let the caller do it


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com