from:"Jim J. Jewett"

[Python-Dev] PEP 411: Provisional packages in the Python standard library

2012-02-10 Thread Jim J. Jewett


Eli Bendersky wrote (in
http://mail.python.org/pipermail/python-dev/2012-February/116393.html ):

 A package will be marked provisional by including the 
 following paragraph as a note at the top of its
 documentation page:

I really would like some marker available from within Python 
itself.  

Use cases:

(1)  During development, the documentation I normally read 
first is whatever results from import module; help(module),
or possibly dir(module).

(2)  At BigCorp, there were scheduled times to move as much
as possible to the current (or current-1) version.  
Regardless of policy, full regression test suites don't 
generally exist.  If Python were viewed as part of the 
infrastructure (rather than as part of a specific 
application), or if I were responsible for maintaining an
internal application built on python, that would be the time 
to upgrade python -- and I would want an easy way to figure 
out which applications and libraries I should concentrate on 
for testing.

 * Encapsulation of the import state (PEP 368)

Wrong PEP number.  I'm guessing that you meant 406.

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review

2012-02-15 Thread Jim J. Jewett



PEP author Victor asked
(in http://mail.python.org/pipermail/python-dev/2012-February/116499.html):

 Maybe I missed the answer, but how do you handle timestamp with an
 unspecified starting point like os.times() or time.clock()? Should we
 leave these function unchanged?

If *all* you know is that it is monotonic, then you can't -- but then
you don't really have resolution either, as the clock may well speed up
or slow down.

If you do have resolution, and the only problem is that you don't know
what the epoch was, then you can figure that out well enough by (once
per type per process) comparing it to something that does have an epoch,
like time.gmtime().


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP for new dictionary implementation

2012-02-16 Thread Jim J. Jewett

PEP author Mark Shannon wrote
(in
http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt):

... allows ... (the ``__dict__`` attribute of an object) to share
keys with other attribute dictionaries of instances of the same class.

Is the same class a deliberate restriction, or just a convenience
of implementation? I have often created subclasses (or even families
of subclasses) where instances (as opposed to the type) aren't likely
to have additional attributes. These would benefit from key-sharing
across classes, but I grant that it is a minority use case that isn't
worth optimizing if it complicates the implementation.

By separating the keys (and hashes) from the values it is possible
to share the keys between multiple dictionaries and improve memory use.

Have you timed not storing the hash (in the dict) at all, at least for
(unicode) str-only dicts? Going to the string for its own cached hash
breaks locality a bit more, but saves 1/3 of the memory for combined
tables, and may make a big difference for classes that have relatively
few instances.

Reduction in memory use is directly related to the number of dictionaries
with shared keys in existence at any time. These dictionaries are typically
half the size of the current dictionary implementation.

How do you measure that? The limit for huge N across huge numbers
of dicts should be 1/3 (because both hashes and keys are shared); I
assume that gets swamped by object overhead in typical small dicts.

If a table is split the values in the keys table are ignored,
instead the values are held in a separate array.

If they're just dead weight, then why not use them to hold indices
into the array, so that values arrays only have to be as long as
the number of keys, rather than rounding them up to a large-enough
power-of-two? (On average, this should save half the slots.)

A combined-table dictionary never becomes a split-table dictionary.

I thought it did (at least temporarily) as part of resizing; are you
saying that it will be re-split by the time another thread is
allowed to see it, so that it is never observed as combined?

Given that this optimization is limited to class instances, I think
there should be some explanation of why you didn't just automatically
add slots for each variable assigned (by hard-coded name) within a
method; the keys would still be stored on the type, and array storage
could still be used for the values; the __dict__ slot could initially
be a NULL pointer, and instance dicts could be added exactly when they
were needed, covering only the oddball keys.

I would reword (or at least reformat) the Cons section; at the
moment, it looks like there are four separate objections, and seems
to be a bit dismissive towards backwards copmatibility. Perhaps
something like:

While this PEP does not change any documented APIs or invariants,
it does break some de facto invariants.

C extension modules may be relying on the current physical layout
of a dictionary. That said, extensions which rely on internals may
already need to be recompiled with each feature release; there are
already changes planned for both Unicode (for efficiency) and dicts
(for security) that would require authors of these extensions to
at least review their code.

Because iteration (and repr) order can depend on the order in which
keys are inserted, it will be possible to construct instances that
iterate in a different order than they would under the current
implementation. Note, however, that this will happen very rarely
in code which does not deliberately trigger the differences, and
that test cases which rely on a particular iteration order will
already need to be corrected in order to take advantage of the
security enhancements being discussed under hash randomization, or
for use with Jython and PyPy.

-jJ

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them. -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Store timestamps as decimal.Decimal objects

2012-02-16 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-February/116073.html
Nick Coghlan wrote:

 Besides, float128 is a bad example - such a type could just be
 returned directly where we return float64 now. (The only reason we
 can't do that with Decimal is because we deliberately don't allow
 implicit conversion of float values to Decimal values in binary
 operations).

If we could really replace float with another type, then there is
no reason that type couldn't be a nearly trivial Decimal subclass
which simply flips the default value of the (never used by any
caller) allow_float parameter to internal function _convert_other.

Since decimal inherits straight from object, this subtype could
even be made to inherit from float as well, and to store the lower-
precision value there.  It could even produce the decimal version
lazily, so as to minimize slowdown on cases that do not need the
greater precision.

Of course, that still doesn't answer questions on whether the higher
precision is a good idea ...

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] plugging the hash attack

2012-02-16 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-January/116003.html


  Benjamin Peterson wrote:
  2. It will be off by default in stable releases ... This will
  prevent code breakage ...

 2012/1/27 Steven D'Aprano steve at pearwood.info:
  ... it will become on by default in some future release?

 On Fri, Jan 27, 2012, Benjamin Peterson benjamin at python.org wrote:
 Yes, 3.3. The solution in 3.3 could even be one of the more
 sophisticated proposals we have today.

Brett Cannon (Mon Jan 30) wrote:

 I think that would be good. And I would  even argue we remove support for
 turning it off to force people to no longer lean on dict ordering as a
 crutch (in 3.3 obviously).

Turning it on by default is fine.

Removing the ability to turn it off is bad.

If regression tests fail with python 3, the easiest thing to do is just
not to migrate to python 3.  Some decisions (certainly around unittest,
but I think even around hash codes) were settled precisely because tests
shouldn't break unless the functionality has really changed.  Python 3
isn't yet so dominant as to change that tradeoff.

I would go so far as to add an extra step in the porting recommendations;
before porting to python 3.x, run your test suite several times with
hash randomization turned on; any failures at this point are relying on
formally undefined behavior and should be fixed, but can *probably* be
fixed just by wrapping the results in sorted.

(I would offer a patch to the porting-to-py3 recommendation, except that
I couldn't find any not associated specifically with 3.0)

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Counting collisions for the win

2012-02-16 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-January/115715.html
Frank Sievertsen wrote:

Am 20.01.2012 13:08, schrieb Victor Stinner:
 I'm surprised we haven't seen bug reports about it from users
 of 64-bit Pythons long ago
 A Python dictionary only uses the lower bits of a hash value. If your
 dictionary has less than 2**32 items, the dictionary order is exactly
 the same on 32 and 64 bits system: hash32(str)  mask == hash64(str)
 mask for mask= 2**32-1.

 No, that's not true.
 Whenever a collision happens, other bits are mixed in very fast.

 Frank

Bits are mixed in quickly from a denial-of-service standpoint, but
Victor is correct from a Why don't the tests already fail? standpoint.

A dict with 2**12 slots, holding over 2700 entries, will be far larger
than most test cases -- particularly those with visible output.  In a
dict that size, 32-bit and 64-bit machines will still probe the same
first, second, third, fourth, fifth, and sixth slots.  Even on the
rare cases when there are at least 6 collisions, the next slots may
well be either the same, or close enough that it doesn't show up in a
changed iteration order.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 414 - Unicode Literals for Python 3

2012-02-27 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-February/116953.html
Terry J. Reedy wrote:

 I presume that most 2.6 code has problems other than u'' when
 attempting to run under 3.x.

Why?

If you're talking about generic code that has seen minimal changes
since 2.0, sure.  But I think this request is specifically for
projects that are thinking about python 3, but are trying to use
a single source base regardless of version.  

Using an automatic translation step means that python (or at least
python 3) would no longer be the actual source code.  I've worked
with enough generated source code in other languages that it is
worth some pain to avoid even a slippery slope.

By the time you drop 2.5, the subset language is already pretty
good; if I have to write something version-specific, I prefer to
treat that as a sign that I am using the wrong approach.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Add a frozendict builtin type

2012-02-27 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-February/116955.html
Victor Stinner proposed:

 The blacklist implementation has a major issue: it is still possible
 to call write methods of the dict class (e.g. dict.set(my_frozendict,
 key, value)).

It is also possible to use ctypes and violate even more invariants.
For most purposes, this falls under consenting adults.

 The whitelist implementation has an issue: frozendict and dict are not
 compatible, dict is not a subclass of frozendict (and frozendict is
 not a subclass of dict).

And because of Liskov substitutability, they shouldn't be; they should
be sibling children of a basedict that doesn't have the the mutating
methods, but also doesn't *promise* not to mutate.

  * frozendict values must be immutable, as dict keys

Why?  That may be useful, but an immutable dict whose values
might mutate is also useful; by forcing that choice, it starts
to feel too specialized for a builtin.

 * Add an hash field to the PyDictObject structure

That is another indication that it should really be a sibling class;
most of the uses I have had for immutable dicts still didn't need
hashing.  It might be a worth adding anyhow, but only to immutable
dicts -- not to every instance dict or keywords parameter.

  * frozendict.__hash__ computes hash(frozenset(self.items())) and
 caches the result is its private hash attribute

Why?  hash(frozenset(selk.keys())) would still meet the hash contract,
but it would be approximately twice as fast, and I can think of only
one case where it wouldn't work just as well.  (That case is wanting
to store a dict of alternative configuration dicts (with no defaulting
of values), but ALSO wanting to use the configurations themselves
(as opposed to their names) as the dict keys.)

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 414 - Unicode Literals for Python 3

2012-02-28 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-February/117070.html
Vinay Sajip wrote:

 It's moot, but as I see it: the purpose of PEP 414 is to facilitate a
 single codebase across 2.x and 3.x. However, it only does this if your
 3.x interest is 3.3+

For many people -- particularly those who haven't ported yet -- 3.x
will mean 3.3+.  There are some who will support 3.2 because it is a
LTS release on some distribution, just as there were some who supported
Python 1.5 (but not 1.6) long into the 2.x cycle, but I expect them to
be the minority.

I certainly don't expect 3.2 to remain a primary development target,
the way that 2.7 is.  IIRC, the only ways to use 3.2 even today are:

  (a)  Make an explicit choice to use something other than the default
  (b)  Download directly and choose 3.x without OS support
  (c)  Use Arch Linux

These are the sort of people who can be expected to upgrade.

Now also remember that we're talking specifically about projects that
have *not* been ported to 3.x (== no existing users to support), and
that won't be ported until 3.2 is already in maintenance mode.

 If you also want to or need to support 3.0 - 3.2, it makes your
 workflow more painful,

Compared to dropping 3.2, yes.  Compared to supporting 3.2 today?
I don't see how.

 because you can't run tests on 2.x or 3.3 and then run them on 3.2
 without an intermediate source conversion step - just like the 2to3
 step that people find painful when it's part of maintenance workflow,
 and which in part prompted the PEP in the first place.

So the only differences compared to today are that:

(a)  Fewer branches are after the auto-conversion.
(b)  No current branches are after the auto-conversion.
(c)  The auto-conversion is much more limited in scope.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 416: Add a frozendict builtin type

2012-02-29 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-February/117113.html
Victor Stinner posted:

 An immutable mapping can be implemented using frozendict::

 class immutabledict(frozendict):
 def __new__(cls, *args, **kw):
 # ensure that all values are immutable
 for key, value in itertools.chain(args, kw.items()):
 if not isinstance(value, (int, float, complex, str, bytes)):
 hash(value)
 # frozendict ensures that all keys are immutable
 return frozendict.__new__(cls, *args, **kw)

What is the purpose of this?  Is it just a hashable frozendict?

If it is for security (as some other messages suggest), then I don't
think it really helps.

class Proxy:
def __eq__(self, other): return self.value == other
def __hash__(self): return hash(self.value)

An instance of Proxy is hashable, and the hash is not object.hash,
but it is still mutable.  You're welcome to call that buggy, but a
secure sandbox will have to deal with much worse.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] [RELEASED] Python 3.3.0 alpha 1

2012-03-06 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-March/117348.html
Georg Brandl ge...@python.org  posted:

 Python 3.3 includes a range of improvements of the 3.x series, as well as 
 easier
 porting between 2.x and 3.x.  Major new features in the 3.3 release series 
 are:

As much as it is nice to just celebrate improvements, I think
readers (particularly on the download page
http://www.python.org/download/releases/3.3.0/  ) would be better
served if there were an additional point about porting and the
hash changes.

http://docs.python.org/dev/whatsnew/3.3.html#porting-to-python-3-3
also failed to mention this, and even the changelog didn't seem to
warn people about failing tests or tell them how to work around it.

Perhaps something like:

Hash Randomization (issue 13703) is now on by default.  Unfortunately,
this does break some tests; it can be temporarily turned off by setting
the environment variable PYTHONHASHSEED to 0 before launching python.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] problem with recursive yield from delegation

2012-03-07 Thread Jim J. Jewett



http://mail.python.org/pipermail/python-dev/2012-March/117396.html
Stefan Behnel posted:

 I found a problem in the current yield from implementation ...

[paraphrasing]

g1 yields from g2
g2 yields from g1
XXX python follows the existing delegation without checking re-entrancy
g2 (2nd call) checks re-entrancy, and raises an exception
g1 (2nd call) gets to handle the exception, and doesn't
g2 (1st call) gets to handle the exception, and does


How is this a problem?

Re-entering a generator is a bug.  Python caught it and raised an
appropriate exception.

It would be nice if python caught the generator cycle as soon as it was
created, just as it would be nice if reference cycles were collected as
soon as they became garbage.  But python doesn't promise to catch cycles
immediately, and the checks required to do so would slow down all code,
so in practice the checks are delayed.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Adding a builtins parameter to eval(), exec() and import().

2012-03-07 Thread Jim J. Jewett



http://mail.python.org/pipermail/python-dev/2012-March/117395.html
Brett Cannon posted:

[in reply to Mark Shannon's suggestion of adding a builtins parameter
to match locals and globals]

 It's a mess right now to try to grab the __import__()
 implementation and this would actually help clarify import semantics by
 saying that __import__() for any chained imports comes from __import__()s
 locals, globals, or builtins arguments (in that order) or from the builtins
 module itself (i.e. tstate-builtins).

How does that differ from today?

If you're saying that the locals and (module-level) globals aren't
always checked in order, then that is a semantic change.  Probably
a good change, but still a change -- and it can be made indepenently
of Mark's suggestion.

Also note that I would assume this was for sandboxing, and that
missing names should *not* fall back to the real globals, although
I would understand if bootstrapping required the import statement to
get special treatment.


(Note that I like Mark's proposed change; I just don't see how it
cleans up import.)


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python install layout and the PATH on win32

2012-03-14 Thread Jim J. Jewett



In 
view-source:http://mail.python.org/pipermail/python-dev/2012-March/117586.html
van.lindberg at gmail.com posted:

 1) The layout for the python root directory for all platforms should be
 as follows:
 
 stdlib = {base/userbase}/lib/python{py_version_short}
 platstdlib = {base/userbase}/lib/python{py_version_short}
 purelib = {base/userbase}/lib/python{py_version_short}/site-packages
 platlib = {base/userbase}/lib/python{py_version_short}/site-packages
 include = {base/userbase}/include/python{py_version_short}
 scripts = {base/userbase}/bin
 data = {base/userbase}

Why?

Pure python vs compiled C doesn't need to be separated at the directory
level, except for cleanliness.

Some (generally unix) systems prefer to split the libraries into several
additional pieces depending on CPU architecture.

The structure listed above doesn't have a location for docs.

Some packages (such as tcl) may be better off in their own area.

What is data?  Is this an extra split compared to today, or does it
refer to things like LICENSE.txt, README.txt, and NEWS.txt?

And even once I figure out where files have moved, and assume that
the split is perfect -- what does this buy me over the current
situation?  I was under the impression that programs like distutils
already handled finding the appropriate directories for a program;
if you're rewriting that logic, you're just asking for bugs on a
strange platform that you don't use.

If you're looking for things interactively, then platform conventions
are probably more important than consistency across platforms.  If you
disagree, you are welcome to reorganize your personal linux installation
so that it matches windows, and see whether it causes you any problems.

 ... We *already* have this. The only difference in this proposal is
 that we go from py_version_nodot to py_version_short, i.e. from

 c:\python33\lib\python33 

 to

 c:\python33\lib\python3.3

I have not seen that redundancy before on windows.

I'm pretty sure that it is a relic of your Linux provider wanting
to support multiple python versions using shared filesystems.  The
Windows standard is to use a local disk, and to bundle it all up
into its own directory, similar to the way that java apps sometimes
ship with their own JVM.

Also note that using the dot in a directory name is incautious.
I haven't personally had trouble in several years, but doing so is
odd enough that some should be expected.  Python already causes
some grief by not installing in Program Files, but that is at
least justified by the spaces in filenames problem; what is the
advantange of 3.3?


I'm using windows, and I just followed the defaults at installation.
It is possible that the installer continued to do something based
on an earlier installation, but I don't think this machine has ever
had a customized installation of any python version.

C:\python32\*
Everything is under here; I assume {base/userbase} would be
set to C:\python32

As is customary for windows, the base directory contains the
license/readme/news and all executables that the user is
expected to use directly.  (python.exe, pythonw.exe.  It also
contains w9xpopen.exe that users do not use, but that too is
fairly common.)

There is no data directory.

Subdirectories are:

C:\python32\DLLs
In additional to regular DLL files, it contains .pyd files
and icons.  It looks like modules from the stdlib that happen
to be written in C.  Most users will never bother to look here.

C:\python32\Doc
A .chm file; full html would be fine too, but removing it
would be a bad idea.

C:\python32\include
These are the header files, though most users will never have
any use for them, as there isn't generally a compiler.

C:\python32\Lib
The standard library -- or at least the portion implemented
in python.

Note that site-packages is a subdirectory here.  It doesn't
happen to have an __init__.py, but to an ordinary user it
looks just like any other stdlib package, such as xml or
multiprocessing.

I personally happen to keep things in subdirectories of
site-packages, but I can't say what is standard.

Moving site-packages out of the Lib directory might make
sense, but probably isn't worth the backward compatibility hit.

C:\python32\libs
.lib files.  I'm not entirely sure what these (as opposed to
the DLLs) are for; lib files aren't that common on windows.
My machine does not appear to have any that aren't associated
with cross-platform tools or unix emulation.

C:\python32\tcl
Note that this is in addition to associated files under DLLs
and libs.  I would prefer to see them in one place, but
moving it in with non-tcl stuff would not be an improvement.
Most users will never look (or care); those that do usually
appreciate knowing that, for example, the dde subdirectory
is for tcl.

C:\python32\Tools
This has three subdirectories (i18n,

[Python-Dev] Python install layout and the PATH on win32

2012-03-14 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-March/117617.html
van.lindberg at gmail.com posted:

 As noted earlier in the thread, I also change my proposal to maintain 
 the existing differences between system installs and user installs.

[Wanted lower case, which should be irrelevant; sysconfig.get_python_inc
already assumes lower case despite the configuration file.]

[Wanted bin instead of Scripts, even though they aren't binaries.]

If there are to be any changes, I *am* tempted to at least harmonize
the two install types, but to use the less redundant system form.  If
the user is deliberately trying to hide that it is version 33 (or even
that it is python), then so be it; defaulting to redundant information
is not an improvement.

Set the base/userbase at install time, with defaults of

base = %SystemDrive%\{py_version_nodot}
userbase = %USERPROFILE%\Application Data\{py_version_nodot}

usedbase = base for system installs; userbase for per-user installs.

Then let the rest default to subdirectories; sysconfig.get_config_vars
on windows explicitly doesn't provide as many variables as unix, just
INCLUDEPY (which should default to {usedbase}/include) and
LIBDEST and BINLIBDEST (both of which should default to {usedbase}/lib).

And no, I'm not forgetting data or scripts.  As best I can tell,
sysconfig doesn't actually expose them, and there is no Scripts
directory on my machine (except inside Tools).  Perhaps some
installers create it when they install their own extensions?

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Docs of weak stdlib modules should encourage exploration of 3rd-party alternatives

2012-03-19 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-March/117570.html
Steven D'Aprano posted:

 Need is awfully strong. I don't believe it is the responsibility
 of the standard library to be judge and reviewer of third party
 packages that it doesn't control.

It is, however, user-friendly to indicate when the stdlib selections
are particularly likely to be for reasons other than A bunch of
experts believe this is the best way to do this.  Cpython's
documentation is (de facto) the documentation for python in
general, and pointing people towards other resources (particularly
pypi itself) is quite reasonable.

Many modules are in the stdlib in part because they are an *acceptable*
way of doing something, and the best ways are either changing too
quickly or are so complicated that it doesn't make sense to burden
the *standard* libary for specialist needs.  In those cases, I do
think the documentation should say so.  

Specific examples:

http://docs.python.org/library/numeric.html quite reasonably has
subsections only for what ships with Python.  But I think the
introductory paragraph could stand to have an extra sentence
explaining why and when people should look beyond the stanard
library, such as:

Applications centered around mathematics may benefit from
specialist 3rd party libraries, such as
numpy  http://pypi.python.org/pypi/numpy/ ,
gmpy  http://pypi.python.org/pypi/gmpy , and
scipy http://pypi.python.org/pypi/scipy .


I would add a similar sentence to the web section, or the
internet protocols section if web is still not broken out
separately.  http://docs.python.org/dev/library/internet.html

Note that some web conventions are still evolving too quickly
for covenient encapsulation in a stable library.  Many
applications will therefore prefer functional replacements
from third parties, such as requests or httplib2, or
frameworks such as Django and Zope.  www-related products
can be found by browsing PyPI for top internet subtopic www/http.
 http://pypi.python.org/pypi?:action=browsec=319c=326 

[I think that searching by classifier -- which first requires browse,
and can't be reached from the list of classifiers -- could be improved.]

  
 Should we recommend wxPython over Pyjamas or PyGUI or PyGtk?

Actually, I think the existing http://docs.python.org/library/othergui.html
does a pretty good job; I would not object to adding mentions of
other tools as well, but wiki reference is probably sufficient.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Issue #10278 -- why not just an attribute?

2012-03-19 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-March/117762.html
Georg Brandl posted:

 +   If available, a monotonic clock is used. By default, if *strict* is 
 False,
 +   the function falls back to another clock if the monotonic clock failed 
 or is
 +   not available. If *strict* is True, raise an :exc:`OSError` on error or
 +   :exc:`NotImplementedError` if no monotonic clock is available.

 This is not clear to me.  Why wouldn't it raise OSError on error even with
 strict=False?  Please clarify which exception is raised in which case.

Passing strict as an argument seems like overkill since it will always
be meaningless on some (most?) platforms.  Why not just use a function
attribute?  Those few users who do care can check the value of
time.steady.monotonic before calling time.steady(); exceptions raised
will always be whatever the clock actually raises.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Rename time.steady(strict=True) to time.monotonic()?

2012-03-23 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-March/118024.html
Steven D'Aprano wrote:

 What makes this steady, given that it can be adjusted
 and it can go backwards?

It is best-effort for steady, but putting best in the name would
be an attractive nuisance.

 Is steady() merely a convenience function to avoid the user
 having to write something like this?

  try:
 mytimer = time.monotonic
  except AttributeError:
 mytimer = time.time

That would still be worth doing.  But I think the main point is
that the clock *should* be monotonic, and *should* be as precise
as possible.

Given that it returns seconds elapsed (since an undefined start),
perhaps it should be

time.seconds()

or even

time.counter()

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] time.clock_info() field names

2012-04-29 Thread Jim J. Jewett



In http://mail.python.org/pipermail/python-dev/2012-April/119134.html
Benjamin Peterson wrote:

 I see PEP 418 gives time.clock_info() two boolean fields named
 is_monotonic and is_adjusted. I think the is_ is unnecessary and
 a bit ugly, and they could just be renamed monotonic and adjusted.

I agree with monotonic, but I think it should be adjustable.

To me, adjusted and is_adjusted both imply that an adjustment
has already been made; adjustable only implies that it is possible.

I do remember concerns (including Stephen J. Turnbull's
CAL_0O19nmi0+zB+tV8poZDAffNdTnohxo9y5dbw+E2q=9rx...@mail.gmail.com )
that adjustable should imply at least a list of past adjustments,
and preferably a way to make an adjustment.

I just think that stating it is adjustable (without saying how, or
whether and when it already happened) is less wrong than claiming it
is already adjusted just in case it might have been.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 362: 4th edition

2012-06-15 Thread Jim J. Jewett


Summary:

*Every* Parameter attribute is optional, even name.  (Think of
builtins, even if they aren't automatically supported yet.)
So go ahead and define some others that are sometimes useful.

Instead of defining a BoundArguments class, just return a copy
of the Signature, with value attributes added to the Parameters.

Use subclasses to distinguish the parameter kind.  (Replacing
most of the is_ methods from the 3rd version.)

[is_]implemented is important information, but the API isn't
quite right; even with tweaks, maybe we should wait a version
before freezing it on the base class.  But I would be happy
to have Larry create a Signature for the os.* functions,
whether that means a subclass or just an extra instance
attribute.

I favor passing a class to Signature.format, because so many of
the formatting arguments would normally change in parallel.
But my tolerance for nested structures may be unusually high.

I make some more specific suggestions below.


In http://mail.python.org/pipermail/python-dev/2012-June/120305.html
Yury Selivanov wrote:

 A Signature object has the following public attributes and methods:

 * return_annotation : object
The annotation for the return type of the function if specified.
If the function has no annotation for its return type, this
attribute is not set.

This means users must already be prepared to use hasattr with the
Signature as well as the Parameters -- in which case, I don't see any
harm in a few extra optional properties.

I would personally prefer to see the name (and qualname) and docstring,
but it would make perfect sense to implement these by keeping a weakref
to the original callable, and just delegating there unless/until the
properties are explicitly changed.  I suspect others will have a use
for additional delegated attributes, such as the self of boundmethods.

I do agree that __eq__ and __hash__ should depend at most on the
parameters (including their order) and the annotation.

 * parameters : OrderedDict
 An ordered mapping of parameters' names to the corresponding
 Parameter objects (keyword-only arguments are in the same order
 as listed in ``code.co_varnames``).

For a specification, that feels a little too tied to the specific
implementation.  How about:

 Parameters will be ordered as they are in the function declaration.

or even just:

 Positional parameters will be ordered as they are in the function
 declaration.

because:
def f(*, a=4, b=5): pass

and:
def f(*, b=5, a=4): pass

should probably have equal signatures.


Wild thought:  Instead of just *having* an OrderedDict of Parameters,
should a Signature *be* that OrderedDict (with other attributes)?
That is, should signature(testfn)[foo] get the foo parameter?


 * bind(\*args, \*\*kwargs) - BoundArguments
 Creates a mapping from positional and keyword arguments to
 parameters.  Raises a ``BindError`` (subclass of ``TypeError``)
 if the passed arguments do not match the signature.
 * bind_partial(\*args, \*\*kwargs) - BoundArguments
 Works the same way as ``bind()``, but allows the omission
 of some required arguments (mimics ``functools.partial``
 behavior.)

Are those descriptions actually correct?

I would expect the mapping to be from parameters (or parameter names)
to values extracted from *args and **kwargs.

And I'm not sure the current patch does even that, since it seems to
instead return a non-Mapping object (but with a mapping attribute)
that could be used to re-create *args, **kwargs in canonical form.
(Though that canonicalization is valuable for calls; it might even
be worth an as_call method.)


I think it should be explicit that this mapping does not include
parameters which would be filled by default arguments.  In fact, if
you stick with this interface, I would like a 3rd method that does
fill out everything.


But I think it would be simpler to just add an optional attribute
to each Parameter instance, and let bind fill that in on the copies,
so that the return value is also a Signature.  (No need for the
BoundArguments class.)  Then the user can decide whether or not to
plug in the defaults for missing values.


 * format(...) - str
 Formats the Signature object to a string.  Optional arguments allow
 for custom render functions for parameter names,
 annotations and default values, along with custom separators.

I think it should state explicitly that by default, the return value
will be a string that could be used to declare an equivalent function,
if Signature were replaced with def funcname.

There are enough customization parameters that would often be changed
together (e.g., to produce HTML output) that it might make sense to use
overridable class defaults -- or even to make format a class itself.

I also think it would make sense to delegate formatting the individual
parameters to the parameter objects.

[Python-Dev] backported Enum

2013-06-28 Thread Jim J. Jewett


 
(On June 19, 2013) Barry Warsaw wrote about porting mailman from
flufl.enum to the stdlib.enum:


 Switching from call syntax to getitem syntax for looking up an
 enum member by name, e.g.

-delivery_mode = DeliveryMode(data['delivery_mode'])
+delivery_mode = DeliveryMode[data['delivery_mode']]

 Switching from getitem syntax to call syntax for looking up an
 enum member by value, e.g.

-return self._enum[value]
+return self._enum(value)

 Interesting that these two were exactly opposite from flufl.enum.

Is there a reason why these were reversed?

I can sort of convince myself that it makes sense because dicts
work better with strings than with ints, but ... it seems like
such a minor win that I'm not sure it is worth backwards
incompatibility.  (Of course, I also don't know how much use
stdlib.enum has already gotten with the current syntax.)



 Switching from int() to .value to get the integer value of an
 enum member, e.g.

-return (member.list_id, member.address.email, int(member.role))
+return (member.list_id, member.address.email, member.role.value)

Is just this a style preference?

Using a .value attribute certainly makes sense, but I don't see it
mentioned in the PEP as even optional, let alone recommended.  If
you care that the value be specifically an int (as opposed to any
object), then a int constructor may be better.

 [Some additional changes that mean there will be *some* changes,
 which does reduce the pressure for backwards compatibility.] ...


 An unexpected difference is that failing name lookups raise a
 KeyError instead of a ValueError.

I could understand either, as well as AttributeError, since the
instance that would represent that value isn't a class attribute.

Looking at Enum creation, I think ValueError would be better than
TypeError for complaints about duplicate names.  Was TypeError
chosen because it should only happen during setup?

I would also not be shocked if some people expect failed value
lookups to raise an IndexError, though I expect they would
adapt if they get something else that makes sense.

Would it be wrong to create an EnumError that subclasses
(ValueError, KeyError, AttributeError) and to raise that
subclass from everything but _StealthProperty and _get_mixins?


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 454 (tracemalloc) disable == clear?

2013-10-29 Thread Jim J. Jewett


 
(Tue Oct 29 12:37:52 CET 2013) Victor Stinner wrote:

 For consistency, you cannot keep traces when tracing is disabled.
 The free() must be enabled to remove allocated memory blocks, or
 next malloc() may get the same address which would raise an assertion
 error (you cannot have two memory blocks at the same address).

That seems like an a quirk of the implementation, particularly since
the actual address is not returned to the user.  Nor do I see any way
of knowing when that allocation is freed.

Well, unless I missed it... I don't see how to get anything beyond
the return value of get_traces, which is a (time-ordered?) list 
of allocation size with then-current call stack.  It doesn't mention
any attribute for indicating that some entries are de-allocations,
let alone the actual address of each allocation.

 For the reason explained above, it's not possible to disable the whole
 module temporarly.

 Internally, tracemalloc uses a thread-local variable (called the
 reentrant flag) to disable temporarly tracing allocations in the
 current thread. It only disables tracing new allocations,
 deallocations are still proceed.

Even assuming the restriction is needed, this just seems to mean that
disabling (or filtering) should not affect de-allocation events, for
fear of corrupting tracemalloc's internal structures.

In that case, I would expect disabling (and filtering) to stop
capturing new allocation events for me, but I would still expect
tracemalloc to do proper internal maintenance.

It would at least explain why you need both disable *and* reset;
reset would empty those internal structures, so that tracemalloc
could shortcut that maintenance.  I would NOT assume that I needed
to call reset when changing the filters, nor would I assume that
changing them threw out existing traces.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Which direction is UnTransform? / Unicode is different

2013-11-19 Thread Jim J. Jewett


 
(Fri Nov 15 16:57:00 CET 2013) Stephen J. Turnbull wrote:

  Serhiy Storchaka wrote:

   If the transform() method will be added, I prefer to have only
   one transformation method and specify a direction by the
   transformation name (bzip2/unbzip2).

Me too.  Until I consider special cases like compress, or lower,
and realize that there are enough special cases to become a major wart
if generic transforms ever became popular.  

 People think about these transformations as en- or de-coding, not
 transforming, most of the time.  Even for a transformation that is
 an involution (eg, rot13), people have an very clear idea of what's
 encoded and what's not, and they are going to prefer the names
 encode and decode for these (generic) operations in many cases.

I think this is one of the major stumbling blocks with unicode.

I originally disagreed strongly with what Stephen wrote -- but then
I realized that all my counterexamples involved unicode text.

I can tell whether something is tarred or untarred, zipped or unzipped.

But an 8-bit (even Latin-1, let alone ASCII) bytestring really doesn't
seem encoded, and it doesn't make sense to decode a perfectly
readable (ASCII) string into a sequence of code units.

Nor does it help that http://www.unicode.org/glossary/#code_unit
defines code unit as The minimal bit combination that can represent
a unit of encoded text for processing or interchange. The Unicode
Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code
units in the UTF-16 encoding form, and 32-bit code units in the UTF-32
encoding form. (See definition D77 in Section 3.9, Unicode Encoding
Forms.)

I have to read that very carefully to avoid mentally translating it
into Code Units are *en*coded, and there are lots of different
complicated encodings that I wouldn't use unless I were doing special
processing or interchange.  If I'm not using the network, or if my
interchange format already looks like readable ASCII, then unicode
sure sounds like a complication.  I *will* get confused over which
direction is encoding and which is decoding. (Removing .decode()
from the (unicode) str type in 3 does help a lot, if I have a Python 3
interpreter running to check against.)


I'm not sure exactly what implications the above has, but it certainly
supports separating the Text Processing from the generic codecs, both
in the documentation and in any potential new methods.

Instead of relying on introspection of .decodes_to and .encodes_to, it
would be useful to have charsetcodecs and tranformcodecs as entirely
different modules, with their own separate registries.  I will even
note that the existing help(codecs) seems more appropriate for
charsetcodecs than it does for the current conjoined module.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python3 complexity - 2 use cases

2014-01-10 Thread Jim J. Jewett

Steven D'Aprano wrote:
I think that heuristics to guess the encoding have their role to play,
if the caller understands the risks.

Ben Finney wrote:
In my opinion, content-type guessing heuristics certainly don't belong
in the standard library.

It would be great if there were never any need to guess. But in the
real world, there is -- and often the user won't know any more than
python does. So when it is time to guess, a source of good guesses
is an important battery to include.

The HTML5 specifications go through some fairly extreme contortions
to document what browsers actually do, as opposed to what previous
standards have mandated. They don't currently specify how to guess
(though I think a draft once tried, since the major browsers all do
it, and at the time did it similarly), but the specs do explicitly
support such a step, and do provide an implementation note
encouraging user-agents to do at least minimal auto-detection.

http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding

My own opinion is therefore that Python SHOULD provide better support
for both of the following use cases:

(1) Treat this file like it came from the web -- including
autodetection and even overriding explicit charset
declarations for certain charsets.

We should explicitly treat autodetection like time zone data --
there is no promise that the right answer (or at least the
best guess) won't change, even within a release.

I offer no opinion on whether chardet in particular is still
too volatile, but the docs should warn that the API is driven
by possibly changing external data.

(2) Treat this file as ASCII+, where anything non-ASCII
will (at most) be written back out unchanged; it doesn't
even need to be converted to text.

At this time, I don't know whether the right answer is making it
easy to default to surrogate-escape for all error-handling,
adding more bytes methods, encouraging use of python's latin-1
variant, offering a dedicated (new?) codec, or some new suggestion.

I do know that this use case is important, and that python 3
currently looks clumsy compared to python 2.

-jJ

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them. -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 460 -- adding explicit assumptions

2014-01-13 Thread Jim J. Jewett



As best I can tell, some people (apparently including Guido
and PEP author Antoine) are taking some assumptions almost
for granted, while other people (including me, before Nick's
messages) were not assuming them at all.

Since these assumptions (or, possibly, rejections of them?)
are likely to decide the outcome, the assumptions should be
explicit in the PEP.

(1)  The bytes-related classes do include methods that
 are only useful when the already-contained data
 is encoded ASCII.

 They do not (and will not) include any operations
 that *require* an encoding assumption.  This
 implies that no non-bytes data can be added without
 an explicit encoding.

(1a) Not even by assuming ASCII with strict error handling.

(1b) Not even for numbers, where ASCII/strict really is
 sufficient.

Note that this doesn't rule out a solution where objects
(or maybe just numbers and ASCII-kind text) provide their own
encoding to bytes -- but that has to be done by the objects
themselves, not by the bytes container or  by the interpreter.

(2)  Most python programmers are still in the future.

 So an API that confuses people who are still learning
 about Unicode and the text model is bad -- even if it
 would work fine for those who do already understand it.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Automatic encoding detection [was: Re: Python3 complexity - 2 use cases]

2014-01-13 Thread Jim J. Jewett



 So when it is time to guess [at the character encoding of a file],
 a source of good guesses is an important battery to include.

 The barrier for entry to the standard library is higher than mere
 usefulness.

Agreed.  But most programs will need it, and people will either
include (the same) 3rd-party library themselves, or write their
own workaround, or have buggy code *is* sufficient.

The points of contention are

(1)  How many programs have to deal with documents written
 outside their control -- and probably originating on
 another system.

I'm not ready to say most programs in general, but I think that
barrier is met for both web clients (for which we already supply
several batteries) and quick-and-dirty utilities.

(2)  How serious are the bugs / How annoying are the workarounds?

As someone who mostly sticks to English, and who tends to manually
ignore stray bytes when dealing with a semi-binary file format,
the bugs aren't that serious for me personally.  So I may well
choose to write buggy programs, and the bug may well never get
triggered on my own machine.

But having a batch process crash one run in ten (where it didn't
crash at all under Python 2) is a bad thing.  There are environments
where (once I knew about it) I would add chardet (if I could get
approval for the 3rd-party component).

(3)  How clearcut is the *right* answer?

As I said, at one point (several years ago), the w3c and whatwg
started to standardize the right answer.  They backed that out,
because vendors wanted the option to improve their detection in
the future without violating standards.

There are certainly situations where local knowledge can do
better than a global solution like chardet,  but ... the
right answer is clear most of the time.

Just ignoring the problem is still a 99% answer, because most text
in ASCII-mostly environments really is close enough.  But that
is harder (and the One Obvious Way is less reliable) under Python 3
than it was under Python 2.

An alias for open that defaulted to surrogate-escape (or returned
the new ASCIIstr bytes hybrid) would probably be sufficient to get
back (almost) to Python 2 levels of ease and reliability.  But it
would tend to encourage ASCII/English-only assumptions.

You could fix most of the remaining problems by scripting a web
browser, except that scripting the browser in a cross-platform
manner is slow and problematic, even with webbrowser.py.

Whatever a recent Firefox does is (almost by definition) good
enough, and is available ... but maybe not in a convenient form,
which is one reason that chardet was created as a port thereof.
Also note that firefox assumes you will update more often than
Python does.

Whatever chardet said at the time the Python release was cut
is almost certainly good enough too.

The browser makers go to great lengths to match each other even 
in bizarre corner cases.  (Which is one reason there aren't more
competing solutions.)  But that doesn't mean it is *impossible*
to construct a test case where they disagree -- or even one where
a recent improvement in the algorithms led to regressions for one
particular document.

That said, such regressions should be limited to documents that
were not properly labeled in the first place, and should be rare
even there.  Think of the changes as obscure bugfixes, akin to
a program starting to handle NaN properly, in a place where it
should not ever see one.

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Jim J. Jewett



Nick Coghlan wrote:
 Arbitrary binary data and ASCII  compatible binary data are *different 
 things* and the only argument in favour of modelling them with a single 
 type is because Python 2 did it that way.

Greg Ewing replied:

 I would say that ASCII compatible binary data is a
 *subset* of arbitrary binary data. As such, a type
 designed for arbitrary binary data is a perfectly good
 way of representing ASCII compatible binary data.

But not when you care about the ASCII-compatible part;
then you should use a subclass.

Obviously, it is too late for separating bytes from
AsciiStructuredBytes.  PBP *may* even mean that just
using the subclass for everything (and just the
ignoring the ASCII specific methods when they aren't
appropriate) was always the right implementation choice.

But in terms of explaining the text model, that
separation is important enough that

(1)  We should be reluctant to strengthen the
 its really just ASCII messages.
(2)  It *may* be worth creating a virtual
 split in the documentation.

I'm willing ot work on (2) if there is general consensus
that it would be a good idea.  As a rough sketch, I
would change places like

http://docs.python.org/3/library/stdtypes.html#typebytes

from:

Bytes objects are immutable sequences of single bytes.
Since many major binary protocols are based on the ASCII
text encoding, bytes objects offer several methods that
are only valid when working with ASCII compatible data
and are closely related to string objects in a variety
of other ways.

to something more like:

Bytes objects are immutable sequences of single bytes.

A Bytes object could represent anything, and is
appropriate as the underlying storage for a sound sample
or image file.

Virtual subclass ASCIIStructuredBytes


One particularly common use of bytes is to represent
the contents of a file, or of a network message.  In
these cases, the bytes will often represent Text
*in a specific encoding* and that encoding will usually
be a superset of ASCII.  Rather than create and support
an ASCIIStructuredBytes subclass, Python simply added
support for these use cases straight to Bytes objects,
and assumes that this support simply won't be used when
when it does not make sense. For example, bytes literals
*could* be used to construct a sound sample, but the
literals will be far easier to read when they are used
to represent (encoded) ASCII text, such as OPEN. 

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460 reboot

2014-01-14 Thread Jim J. Jewett




Greg Ewing replied:

 ... ASCII compatible binary data is a
 *subset* of arbitrary binary data.

I wrote:

 But in terms of explaining the text model, that
 separation is important enough that

(2)  It *may* be worth creating a virtual
 split in the documentation.

(rough sketch below)

Ethan likes the idea, but points out that the term
Virtual is confusing here.

Alas, I'm not sure what the correct term is.  In
addition to Go for it! / Don't waste your time,
I'm looking for advice on:

(A)  What word should I use instead of Virtual?
Imaginary?  Pretend?

(B)  Would it be good/bad/at least make the docs
easier to create an actual class (or alias)?

(C)  Same question for a pair of classes provided
only in the documentation, like example code.

(D)  What about an abstract class, or several?

e.g., replacing the XXX TODO of collections.abc.ByteString
with separate abstract classes for ByteSequence, String,
ByteString, and ASCIIByteString?

(ByteString already includes any bytes or bytearray instance,
so backward compatibility means the String suffix isn't
sufficient for an opt-in-by-instances class.)


 I'm willing ot work on (2) if there is general consensus
 that it would be a good idea.  As a rough sketch, I
 would change places like

  http://docs.python.org/3/library/stdtypes.html#typebytes

 from:

  Bytes objects are immutable sequences of single bytes.
  Since many major binary protocols are based on the ASCII
  text encoding, bytes objects offer several methods that
  are only valid when working with ASCII compatible data
  and are closely related to string objects in a variety
  of other ways.

 to something more like:

  Bytes objects are immutable sequences of single bytes.

  A Bytes object could represent anything, and is
  appropriate as the underlying storage for a sound sample
  or image file.

  Virtual subclass ASCIIStructuredBytes
  

  One particularly common use of bytes is to represent
  the contents of a file, or of a network message.  In
  these cases, the bytes will often represent Text
  *in a specific encoding* and that encoding will usually
  be a superset of ASCII.  Rather than create and support
  an ASCIIStructuredBytes subclass, Python simply added
  support for these use cases straight to Bytes objects,
  and assumes that this support simply won't be used when
  when it does not make sense. For example, bytes literals
  *could* be used to construct a sound sample, but the
  literals will be far easier to read when they are used
  to represent (encoded) ASCII text, such as OPEN.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

2014-02-24 Thread Jim J. Jewett



Victor Stinner wrote:

 Will ascii() ever emit an antislash representation?

 Try ascii(chr(0x1f)).

In which version?  I get:

ValueError: chr() arg not in range(0x11)

 How do you plan to use this output? Write it into a socket or a file?

 When I debug, I use print  logging which both expect text string. So I
 think that b'%a' is useless.

Sad Use Case 1:
There is not yet a working implementation of the file
or wire format.  Either I am still writing it, or the
file I need to parse is coming from a partner who
configured rather than wrote the original program.

I write (or request that they write) something
recognizable to the actual stream, as a landmark.

Case 1a:  I want a repr of the same object that is
supposedly being represented in the official format,
so I can see whether the problem is bad data or
bad serialization.  

Use Case 2:
Fallback for some sort of serialization format;
I expect not to ever use the fallback in production,
but better something ugly than a failure, let alone
a crash.

Use Case 3:
Shortcut for serialization of objects whose repr is
good enough.  (My first instinct would probably be
to implement the __bytes__ special method, but if I
thought that was supposed to expose the real data,
as opposed to a serialized copy, then I would go
for %a.)


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 463: Exception-catching expressions

2014-02-24 Thread Jim J. Jewett




Yury Selivanov wrote:

 I think the Motivation section is pretty weak.

I have normally wished for this when I was (semi-
interactively) exploring a weakly structured dataset.

Often, I start with a string, split it into something
hopefully like records, and then start applying filters
and transforms.  I would prefer to write a comprehension
instead of a for loop.  Alas, without pre-editing, I
can be fairly confident that the data is dirty.


Sometimes I can solve it with a filter (assuming
that I remember and don't mind the out-of-order
evaluation):

# The if value happens first,
# so the 1/value turns out to be safe.
[1/value for value in working_list if value]

Note that this means dropping the bad data, so that
items in this list will have different indices than
those in the parent working_list.

I would rather have written:

[1/value except (TypeError, ZeroDivisionError): None] 

which would keep the matching indices, and clearly
indicate where I now had missing/invalid data.



Sometimes I solve it with a clumsy workaround:

sum((e.weight if hasattr(e, 'weight') else 1.0)
for e in working_list)

But the hasattr implies that I am doing some sort of
classification based on whether or not the element has
a weight.

The true intent was to recognize that while every element
does have a weight, the representation that I'm starting
from didn't always bother to store it -- so I am repairing
that before processing.

sum(e.weight except AttributeError: 1)


Often I give up, and create a junky helper function, or several.
But to avoid polluting the namespace, I may leave it outside
the class, or give it a truly bad name:

def __only_n2(worklist):
results = []
for line in worklist:
line=line.strip()
if not line:  # or maybe just edit the input file...
continue
split1=line.split(, )
if 7 != len(split1):
continue
if n2 == split1[3]:
results.append(split1)
return results

worklist_n2 = __only_n2(worklist7)


In real life code, even after hand-editing the input data
to fix a few cases, I recently ended up with:

class VoteMark:
...
@classmethod
def from_property(cls, voteline):
# print (voteline)
count, _junk, prefs = voteline.partition(: )
return cls(count, prefs)

... # module level scope

def make_votes(vs=votestring):
return [VoteMark.from_property(e) for e in vs.splitlines()]

vs=make_votes()

You can correctly point out that I was being sloppy, and that I
*should* have gone back to clean it up.  But I wouldn't have had
to clean up either the code or the data (well, not as much), if
I had been able to just keep the step-at-a-time transformations
I was building up during development:

vs=[(VoteMark(*e.strip().split(: ))
   except (TypeError, ValueError): None)
for e in votestring.splitlines()]


Yes, the first line is still doing too much, and might be
worth a helper function during cleanup.

But it is already better than an alternate constructor that
exists only to simplify a single (outside the class) function
that is only called once.

Which in turn is better than the first draft that was so
ugly that I actually did fix it during that same work session.



 Inconvenience of dict[] raising KeyError was solved by
 introducing the dict.get() method. And I think that

 dct.get('a', 'b')

 is 1000 times better than

 dct['a'] except KeyError: 'b'

I don't.

dct.get('a', default='b')

would be considerably better, but it would still imply
that missing values are normal.  So even after argclinic
is fully integrated, there will still be times when I
prefer to make it explicit that I consider this an
abnormal case.  (And, as others have pointed out, .get
isn't a good solution when the default is expensive to
compute.)


 Consider this example of a two-level cache::
  for key in sequence:
  x = (lvl1[key] except KeyError: (lvl2[key] except KeyError: f(key)))

 I'm sorry, it took me a minute to understand what your
 example is doing.  I would rather see two try..except blocks
 than this.

Agreed -- like my semi-interactive code above, it does too much
on one line.  I don't object as much to:

for key in sequence:
x = (lvl1[key]
   except KeyError: (lvl2[key]
 except KeyError: f(key)))


 Retrieve an argument, defaulting to None::
  cond = args[1] except IndexError: None

  # Lib/pdb.py:803:
  try:
  cond = args[1]
  except IndexError:
  cond = None

 cond = None if (len(args)  2) else args[1]

This is an area where tastes will differ.

I view the first as saying that not having a cond
would be unusual, or at least a different kind of
call.

I view your version as a warning that argument
parsing will be complex, and that

Re: [Python-Dev] PEP 463: Exception-catching expressions

2014-02-24 Thread Jim J. Jewett




Greg Ewing suggested:

 This version might be more readable:

 value = lst[2] except No value if IndexError


Ethan Furman asked:

 It does read nicely, and is fine for the single, non-nested, case
 (which is probably the vast majority), but how would
 it handle nested exceptions?

With parentheses.

Sometimes, the parentheses will make a complex expression ugly.
Sometimes, a complex expression should really be factored into pieces anyway.

Hopefully, these times are highly correlated.



The above syntax does lend itself somewhat naturally
to multiple *short* except clauses:

value = (lst[2]
   except No value if IndexError
   except Bad Input if TypeError)

and nested exception expressions are at least possible, but deservedly ugly:

value = (lvl1[name]
  except (lvl2[name]
   except (compute_new_answer(name)
 except None if AppValueError)
   if KeyError)
  if KeyError)
  

This also makes me wonder whether the cost of a subscope 
(for exception capture) could be limited to when an
exception actually occurs, and whether that might lower
the cost enough to make the it a good tradeoff.

def myfunc1(a, b, e):
assert main scope e value == e

e = main scope e value
value = (myfunc1(val1, val2, e)
  except e.reason if AppError as e)
assert main scope e value == e


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Alternative forms [was: PEP 463: Exception-catching expressions]

2014-03-06 Thread Jim J. Jewett




The PEP currently says:

 Alternative Proposals
 =

 Discussion on python-ideas brought up the following syntax suggestions::

value = expr except default if Exception [as e]

This one was rejected because of the out-of-order evaluation.

Note, however, that the (farthest left) expr is always evaluated first;
the only out-of-order evaluation is default if Exception.

default if Exception is precisely the same evaluation order
(clause after the if skips ahead of the clause before the if)
as in the existing if-expression, and the existing if-filters in
comprehensions.

The same justifications for that order violation generally apply here too.
You can argue that they weren't sufficient justification in the first
place, but that is water under the bridge; *re*-using out-of-order-if
shouldn't add any additional costs.

[Err... unless people take the if too literally, and treat the
Exception clause as a boolean value, instead of as an argument to the
except keyword.]

The advantages of this form get much stronger with [as e] or multiple
different except clauses, but some of them do apply to even the simplest
form.  Notably, the say it like you would in English that convinced
Perl still applies: if *without* a then is normally an extra condition
added after the main point:

Normally ham, but fish if it's a Friday.

(Admittedly, the word then *can* be elided (and represented by a slight
pause), and python programmers are already used to seeing it represented
only by :\n)


I also give a fair amount of weight to the fact that this form starts to
look awkward at pretty much the same time the logic gets too complicated
for an expression -- that should discourage abuse.

[The analogies to if-expressions and if-filters and to spoken English,
along with discouragement for abuse, make this my preferred form.]



...
value = expr except (Exception [as e]: default)

(and the similar but unmentioned)

value = expr except (Exception [as e] - default)

The mapping analogy for : is good -- and is the reason to place
parentheses there, as opposed to around the whole expression.  Your
preferred form -- without the internal parentheses -- looks very
much like a suite-introduction, and not at all like the uses
where an inline colon is acceptable.

I do understand your concern that the parentheses make except (...)
look too much like a function call -- but I'm not sure that is all bad,
let alone as bad as looking like a suite introduction.

Both : and - are defined for signatures; the signature meaning
of : is tolerable, and the signature meaning of - is good.



...
value = expr except Exception [as e] continue with default

This one works for me, but only because I read continue with as a
compound keyword.  I assume the parser would too.  :D  But I do
recognize that it is a poor choice for those who see the space as a
more solid barrier.


...
value = expr except(Exception) default # Catches only the named type(s)

This looks too much like the pre-as way of capturing an exception.



value = default if expr raise Exception

(Without new keyword raises,) I would have to work really hard not to
read that as:

__temp = default
if expr:
raise Exception
value = __temp



value = expr or else default if Exception

To me, this just seems like a wordier and more awkward version of

expr except (default if Exception [as e])

including the implicit parentheses around default if Exception.


value = expr except Exception [as e] - default

Without parens to group Exception and default, this looks too much like
an annotation describing what the expr should return.


value = expr except Exception [as e] pass default

I would assume that this skipped the statement, like an if-filter
in a comprehension.



 All forms involving the 'as' capturing clause have been deferred from
 this proposal in the interests of simplicity, but are preserved in the
 table above as an accurate record of suggestions.

Nick is right that you should specify whether it is deferred or
rejected, because the simplest implementation may lock you into
too broad a scope if it is added later.



 The four forms most supported by this proposal are, in order::

value = (expr except Exception: default)
value = (expr except Exception - default)

...

If there are not parentheses after except, it will be very tempting
(and arguably correct) to (at least mentally) insert them around the
first two clauses -- which are evaluated first.  But that leaks into

value = (expr except Exception): default

which strongly resembles the suite-starter :, but has very little
in common with the mapping : or the signature :.

value = (expr except Exception) - default

which looks like an annotation, rather than a part of the value-determination.



-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

Re: [Python-Dev] Alternative forms [was: PEP 463: Exception-catching expressions]

2014-03-07 Thread Jim J. Jewett





(Thu Mar 6 23:26:47 CET 2014) Chris Angelico responded:

 On Fri, Mar 7, 2014 at 7:29 AM, Jim J. Jewett jimjjewett at gmail.com wrote:

 [ note that x if y already occurs in multiple contexts, and
   always evaluates y before x. ]

 Yes, but that's still out of order.

Yeah, but local consistency is more important than global guidelines.  :D

 ... *re*-using out-of-order-if shouldn't add any additional costs.


 The other thing to note is that it's somewhat ambiguous. Until you
 find that there isn't an else clause, it could be the equally valid
 expr except (default if cond else other_default), with the actual
 if Exception part still to come. 

True -- and I'm not a parser expert.  But my belief is that the
current parser allows lookahead for exactly one token, and that
the else would fit within that limit.

 ... humans reading the code have to assume style guides mightn't
 be followed.

True ... but I hope any non-trivial use of this (including use
with a non-trivial ternary if) will look bad enough to serve as
its own warning.


 The advantages of this form get much stronger with [as e] or
 multiple different except clauses, but some of them do apply
 to even the simplest form.

 Multiple different except clauses would make for an even
 messier evaluation order:

 expr1 except expr3 if expr2 except expr5 if expr4

 If you consider the exception type to be the condition, then
 this makes sense (that is, if you read it as
 if isinstance(thrown_exception, Exception));
 [but the most obvious reading is boolean; as always True]

I phrased that badly.  I agree that without parentheses for
good spacing, the above is at least ambiguous -- that is what
you get for stringing multiple clauses together without
internal grouping.

I do think parentheses help, (but are less important when there
is only a single if) and I strongly prefer that they be internal
(which you fear looks too much like calling a function named except).
In that case, it is: 

expr1 except (expr3 if expr2)

and the extension to multiple except clauses would be:

   expr1 except (expr3 if expr2,
 expr5 if expr4)

though as I discuss later, placing parentheses there also makes a
colon or arrow more tolerable.  It does this because the nearby
parens make it look more like the existing (non-lambda) uses of
inline-colon to associate the two things on either side.  (Without
nearby brackets, the scope of the colon or arrow is more easily
taken to be the whole line.)

   expr1 except (expr2: expr3,
 expr4: expr5)

   expr1 except (expr2 - expr3,
 expr4 - expr5)

 Notably, the say it like you would in English that convinced
 Perl still applies: if *without* a then is normally an extra
 condition added after the main point:

 Normally ham, but fish if it's a Friday.

 That's not how Python words ternary if, though.

Agreed ... the say it like you would in English applies only
to the expr if expr form (proposed here and) used by comprehensions:

[1/x for x in data
 if x]


value = expr except (Exception [as e]: default)

 (and the similar but unmentioned)

 value = expr except (Exception [as e] - default)

The parenthesizing question and the choice of tokens are considered
independent, so not all the cross-multiplications are listed.

 The mapping analogy for : is good -- and is the reason to place
 parentheses there, as opposed to around the whole expression.  Your
 preferred form -- without the internal parentheses -- looks very
 much like a suite-introduction, and not at all like the uses
 where an inline colon is acceptable.

 I have some notes on that down the bottom:

 http://www.python.org/dev/peps/pep-0463/#colons-always-introduce-suites

I know that they don't always introduce suites.

I can't speak to the lambda precedent, but I do know that I personally
often stumble when trying to parse it, so I don't consider it a good model.

The other three inline uses (dict display, slide notation, and
function parameter annotation) are effectively conjunction operators,
saying that expr1 and expr2 are bound more tightly than you would
assume if they were separated by commas.  They only occur inside
a fairly close bracket (of some sort), and if the bracket isn't
*very* close, then there are usually multiple associates-colons
inside the same bracket.

data[3:5]
data[-1:-3:-1]
def myfunc(a:int=5,
   b:str=Jim,
   c:float=3.14)
{'b': 2, 'c': 3, 'a': 1}

With parentheses after the except, the except expression will match
this pattern too -- particularly if there are multiple types of
exception treated differently.

expr1 except (expr2: expr3)

Without (preferably internal) parentheses, it will instead look like
a long line with a colon near the end, and a short continuation suite
that got moved up a line because it was only one statement long.

def nullfunc(self, a): pass

expr1 except expr3: expr2


value = expr

[Python-Dev] What is the precise problem? [was: Reference cycles in Exception.traceback]

2014-03-07 Thread Jim J. Jewett





On Wed Mar 5 17:37:12 CET 2014, Victor Stinner wrote:

 Python 3 now stores the traceback object in Exception.__traceback__
 and exceptions can be chained through Exception.__context__. It's
 convenient but it introduced tricky reference cycles if the exception
 object is used out of the except block.

 ... see Future.set_exception() of the ayncio module.

 ... frame.clear() raises an RuntimeError if the frame is still
 running. And it doesn't break all reference cycles.

 An obvious workaround is to store the traceback as text, but this
 operation is expensive especially if the traceback is only needed in
 rare cases.

 I tried to write views of the traceback (and frames), but
 Exception.__traceback__ rejects types other than traceback and
 traceback instances cannot be created. It's possible to store the
 traceback somewhere else and set Exception.__traceback__ to None, but
 there is still the problem with chained exceptions.

 Any idea for a generic fix to such problem?

Could you clarify what the problem actually is?  I can imagine
any of the following:


(A)  Exceptions take a lot of memory, because of all the related
details.

+ But sometimes the details are needed, so there is no good solution.


(B)  Exceptions take a lot of memory, because of all the related
details.  There is a common use case that knows it will never
need certain types of details, and releasing just those details
would save a lot of memory.  But frame.clear() picks the wrong
details to release, at least for this case.

+ So write another function (or even a method) that does work, and
  have your framework call it.  (Also see (F))

+ Instead of saving the original exception, could you instead
  create and store a new (copied?) one, which obviously won't
  (yet) be referenced by the traceback you assign to it?


(C)  Exceptions take a lot of memory, because of all the related
details.  There is a common use case that knows it can make do
with a summary of certain types of details, and releasing just
those details would save a lot of memory.  But generating the
summary is expensive.

+ It would help to have the summarize method available.
+ It would help to have feedback from gc saying when there is enough
  memory pressure to make this call worthwhile.


(D)  Exceptions are not released until cyclic gc, and so they
eat a lot of memory for a long time prior to that.

+ This may be like case B
+ Are there references that can be replaced by weak references?
+ Are there references that you can replace with weak references
  when your framework stores the exception?  (Also see (F))

(E)  Exceptions are not released even during cyclic gc, because
of ambiguity over which __del__ to run first.

+ This may be like case B or case E
+ This may be a concrete use case for the __close__ protocol.

__close__ is similar to __del__, except that it promises not
to care about order of finalization, and it is run eagerly. 
As soon as an instance is known to be in a garbage cycle,
__close__ should be run without worrying about whether other
objects also have __close__ or __del__ methods.  Hopefully,
this will break the cycle, or at least reduce the number of
objects with __del__ methods.  (Whether to require that
__close__ be idempotent, or to guarantee that it is run
only once/instance -- that would be part of the bikeshedding.)


(F) You know what to delete (or turn into weakrefs), but can't
actually do it without changing a type.

  (F1)  Why does Exception.__traceback__ reject other objects
which are neither tracebacks nor None?

  + Can that restriction be relaxed?
  + Can you create a mixin subtype of Exception, which relaxes
the constraint, and gets used by your framework?
  + Can the restriction on creating tracebacks be relaxed?
  + Can traceback's restriction on frames' types be relaxed?
 
  (F2)  Do you need the original Exception?  (see (B))

  (F3)  Do you care about frame.clear() raising a runtime
  exception?  Could you suppress it (or, better, get clear()
  to raise something more specific, and suppress that)?
  It would still have released what memory it reasonably
  could. 

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Why not make frames? [was: Reference cycles in Exception.traceback]

2014-03-07 Thread Jim J. Jewett




On Thu Mar 6 16:52:56 CET 2014, Antoine Pitrou wrote:

 IMO it is absolutely out of question to allow creation of arbitrary 
 frames from Python code, because the structure and initialization of 
 frames embody too many low-level implementation details.

So?

Does any of that matter until the frame is used to actually
evaluate something?  

So what is the harm in creating a (likely partially invalid) frame
for inspection purposes?

For that matter, what is the point in tracebacks requiring frames,
as opposed to any object, with the caveat that not having the
expected attributes may cause grief -- as happens with any duck
typing?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Why not make frames? [was: Alternative forms [was: PEP 463: Exception-catching expressions]]

2014-03-09 Thread Jim J. Jewett



TL;DR:

expr except (default if exc_expr)
expr (except default if exc_expr)
expr except (exc_expr: default)
expr (except exc_expr: default)

(1)  Group the exceptions with the default they imply.
(2)  inline-: still needs () or [] or {}.
(3)  Consider the expression inside a longer line.
(3a)  Does the except expression need to be general, or would it work
  if it were limited to a subclause of variable assignments?
(3b)  What about comprehensions?




On Fri Mar 7 20:54:31 CET 2014, Chris Angelico wrote:
On Sat, Mar 8, 2014 at 5:58 AM, Jim J. Jewett jimjjewett at gmail.com wrote:
 (Thu Mar 6 23:26:47 CET 2014) Chris Angelico responded:
 On Fri, Mar 7, 2014 at 7:29 AM, Jim J. Jewett jimjjewett at gmail.com 
 wrote:

 [ note that x if y already occurs in multiple contexts, and
   always evaluates y before x. ]

...

 I don't see except expressions as fundamentally more associated with
 if/else than with, say, an or chain, which works left to right.

I do, because of the skipping portion.

Short-circuiting operators, such as an or chain, never skip a clause
unless they are skipping *every* subsequent clause.

An if statement sometimes skips the (unlabeled in python) then
clause, but still processes the even-later else clause.

A try statement sometimes skips the remainder of the try suite but
still executes the later subordinate except and finally clauses.

Note that this only explains why I see except as more closely related
to if than to or; it isn't sufficient to justify going back to
execute the skipped clause later.  That said, going back to a previous
location is a lot easier to excuse after an error handler than in
regular code.



 Analysis of the Python standard library suggests that the single-if
 situation is *by far* the most common, to the extent that it'd hardly
 impact the stdlib at all to add multiple except clauses to the
 proposal. Do you have a strong use-case for the more full syntax?

I do not.

I dislike the arbitrary restriction, and I worry that lifting it later
(while maintaining backwards compatibility) will result in a syntax
wart, but I do not have a compelling use case for that later relaxation.



 and I strongly prefer that they [the parentheses] be internal
 (which you fear looks too much like calling a function named except).
 In that case, it is:

 expr1 except (expr3 if expr2)

 I'm still not really seeing how this is better.

For one thing, it makes it clear that the if keyword may be messing
with the order of evaluation.

I don't claim that syntax is perfect.  I do think it is less flawed
than the no-parentheses (or external parentheses) versions:

(expr1 except expr3 if expr2)
expr1 except expr3 if expr2

because the tigher parentheses correctly indicate that expr2 and expr3
should be considered as a (what-to-do-in-case-of-error) group, which
interacts (as a single unit) with the main expression.

I also think it is (very slighly) better than the
colon+internal-parentheses version:

expr1 except (expr2: expr3)

which in turn is far, far better than the colon versions with external
or missing parentheses:

(expr1 except expr2: expr3)
expr1 except expr2: expr3

because I cannot imagine reading an embedded version of either of those
without having to mentally re-parse at the colon.  An example assuming
a precedence level that may not be what the PEP proposes:

if myfunc(5, expr1 except expr2: expr3, label):
for i in range(3, 3*max(data) except TypeError: 9, 3):
   ...

if myfunc(5, (expr1 except expr2: expr3), label):
for i in range(3, (3*max(data) except TypeError: 9), 3):
   ...

if myfunc(5, expr1 except (expr2: expr3), label):
for i in range(3, 3*max(data) except (TypeError: 9), 3):
   ...

if myfunc(5, expr1 except (expr2: expr3), label):
for i in range(3, 3*max(data) (except TypeError: 9), 3):
   ...

if myfunc(5, expr1 except (expr3 if expr3), label):
for i in range(3, 3*max(data) (except 9 if TypeError), 3):
   ...

if myfunc(5, expr1 except (expr3 if expr3), label):
for i in range(3, 3*max(data) except (9 if TypeError), 3):

myarg = expr1 except (expr3 if expr2)
if myfunc(5, myarg, label):
limit = 3*max(data) except (9 if TypeError)
for i in range(3, limit, 3):

Yes, I would prefer to create a variable naming those expressions,
but these are all still simple enough that I would expect to have
to read them.  (I like constructions that get ugly just a bit faster
than they get hard to understand.)  If I have to parse any of them,
the ones at the bottom are less difficult than the ones at the top.



 With the colon version, it looks very much like dict display,

which is good, since that is one of the acceptable uses of inline-colon.

 only with different brackets around it; in some fonts, that'll be
 very easily confused.

I've had more trouble with comma vs period than

[Python-Dev] Scope issues [was: Alternative forms [was: PEP 463: Exception-catching expressions]]

2014-03-09 Thread Jim J. Jewett




On Fri Mar 7 20:54:31 CET 2014, Chris Angelico wrote:
 On Sat, Mar 8, 2014 at 5:58 AM, Jim J. Jewett jimjjewett at gmail.com wrote:
 (Thu Mar 6 23:26:47 CET 2014) Chris Angelico responded:

 ...[as-capturing is] deferred until there's a non-closure means of
 creating a sub-scope.

 The problem is that once it is deployed as leaking into the parent
 scope, backwards compatibility may force it to always leak into
 the parent scope.  (You could document the leakage as a bug or as
 implementation-defined, but ... those choices are also sub-optimal.)

 It'll never be deployed as leaking, for the same reason that the
 current 'except' statement doesn't leak:

I don't think that is the full extent of the problem.  From Nick's
description, this is a nasty enough corner case that there may
be glitches no one notices in time.

The PEP should therefore explicitly state that implementation details
may force the deferral to be permanent, and that this is considered an
acceptable trade-off.


-jJ

--

Sorry for the botched subject line on the last previous message.
If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] What is the precise problem? [was: Reference cycles in Exception.traceback]

2014-03-10 Thread Jim J. Jewett



On Mon Mar 10 18:56:17 CET 2014 (and earlier quotes), Maciej Fijalkowski wrote:

Maciej: You should not rely on __del__s being called timely one way or
Maciej: another. Why would you require this for the program to work
Maciej: correctly in the particular example of __traceback__?

To the extent that I understand, he isn't requiring it for correctness;
he is instead saying that without timely __del__, the Quality of
Implementation suffers.

I suspect there are aspects of tulip (or event processing in general)
that make it more common for the frame graph to be painfully cyclic,
so that live frames keep dead ones from being collected.

It may also be more common to have multiple __del__ methods in the
same cycle, if cycles are created by a framework.  So the problems
aren't new, but they may have become considerably more painful.

Victor: For asyncio, it's very useful to see unhandled exceptions as early as
Victor: possible. Otherwise, your program is blocked and you don't know why.

...

Maciej: twisted goes around it by attaching errback by hand. Would that work
Maciej: for tulip?

Maciej: deferred.addErrback(callback_that_writes_to_log)

What do you mean by hand?  Does the framework automatically add a
log the exception errback to every task, or every task that doesn't
have its own errback of some sort?  Or do you mean that users should
do so by hand, but that it is a well-known recipe?


Maciej: I'm very skeptical about changing details of __traceback__ and
Maciej: frames, just in order to make refcounting work (since it would
Maciej: create something that would not work on pypy for example).

How about just loosening some constraints on exceptions, in order to
permit more efficient operation, but in a way that may be particularly
useful to a refcounting scheme?  Can I assume that you don't object to
frame.clear()?  How about a hypothethetical traceback.pack() that
made it easier to reclaim memory held by frame/traceback cycles?
If standard traceback printing were the only likely future use, each
frame/traceback pair could be replaced by 4 pointers, and allocating
space/copying those 4 would be the only work that wasn't already needed
for the eventual deallocation.

Today, the setters for __cause__, __context__, and __traceback do
typechecks to ensure that those properties are (None or) the
expected type; __traceback__ doesn't even allow subclasses.  The
constructors for frame and traceback are similarly strict.

What would be the harm in allowing arbitrary objects, let alone
a few specific alternative implementations?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Keyword meanings [was: Accept just PEP-0426]

2012-11-20 Thread Jim J. Jewett


 
Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say:

 The use of multiple names in this field *must not* be used for
 bundling distributions together. It is intended for use when
 projects are forked and merged over time ...

(1)  Then how *should* the bundle-of-several-components case be
represented?

(2)  How is 'Provides-Dist' different from 'Obsoletes-Dist'?
The only difference I can see is that it may be a bit more polite
to people who do want to install multiple versions of a (possibly
abstract) package.


-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] dict and required hashing

2014-04-18 Thread Jim J. Jewett

(1)  I believe the recent consensus was that the number of comparisons
made in a dict lookup is an implementation detail.  (Please correct me
if I am wrong.)

(2)  Is the item will be hashed at least once a language guarantee?

For small mappings, it might well be more efficient to just store the
2-3 key/value pairs and skip the bucket calculation.

On the other hand, if a key is not hashable, discovering that long
after it has already been added to the dict is suboptimal.

Of course, that sort of delayed exception can already happen if it is
the __eq__ method that is messed up ...

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] API and process questions (sparked by Claudiu Popa on 16104

2014-04-28 Thread Jim J. Jewett

(1)  Should fixes to a docstring go in with a patch, even if they
aren't related to the changing functionality?

Bytestring compilation has several orthogonal parameters.  Most -- but
not all -- are documented in the docstring.  (Specifically, there is
no explanation of the rx parameter which acts as a filter, and no
mention that symbolic links to directories are ignored.)

It is best if a commit changes one small thing at a time.
On the other hand, Nick recently posted that the minimal overhead of a
patch commit is about half an hour.

Is that overhead enough to override the one-issue-per-patch guideline?

(2)  The patch adds new functionality to use multiple processes in
parallel.  The normal parameter values are integers indicating how
many processes to use.  The parameter also needs two special values --
one to indicate use os.cpu_count, and the other to indicate don't
use multiprocessing at all.

(A)  Is there a Best Practices for this situation, with two odd cases?

(B)  Claudiu originally copied the API from a similar APU for
regrtest.  What is the barrier for do it sensibly vs stick with
precedent elsewhere?  (Apparently regrtest treats any negative number
as a request for the cpu_count calculation;  I suspect that -5 is
more likely to be an escaping error for 5 than it is to be a real
request for auto-calculation that just happened to choose -5 instead
of -1.)

(C)  How important is it to keep the API consistent between a
top-level CLI command and the internal implementation?  At the the
moment, the request for cpu_count is handled in the CLI wrapper, and
not available to interactive callers.  On the other hand, interactive
callers could just call cpu_count themselves...

(D)  How important is it to maintain consistency with other uses of
the same tool -- multiprocessing has its own was of requesting
auto-calculation.  (So someone used to multiprocessing might assume
that None meant to auto-calculate, as opposed to don't use
multiprocessing at all.)

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] API and process questions (sparked by Claudiu Popa on 16104

2014-04-28 Thread Jim J. Jewett

On Mon, Apr 28, 2014 at 12:56 PM, Charles-François Natali
cf.nat...@gmail.com wrote:
 Why would the user care if multiprocessing is used behind the scene?

Err ... that was another set of questions that I forgot to ask.

(A)  Why bother raising an error if multiprocessing is unavailable?
After all, there is a perfectly fine fallback...

On other hand, errors should not pass silently.  If a user has
explicitly asked for multiprocessing, there should be some notice that
it didn't happen.  And builds are presumably something that a
developer will monitor to respond to the Exception.

(A1)  What sort of Error?  I'm inclined to raise the original
ImportError, but the patch prefers a ValueError.

(B)  Assuming the exception, I suppose your question adds a 3rd
special case of Whatever the system suggests, and I don't care
whether or not it involves multiprocessing.

 It would be strange for processes=1 to fail if multiprocessing is not
 available.

As Claudiu pointed out, processes=1 should really mean 1 worker
process, which is still different from do everything in the main
process.  I'm not sure that level of control is really worth the
complexity, but I'm not certain it isn't.

 processes = 0: use os.cpu_count()

I could understand doing that for 0 or -1; what is the purpose of
doing it for both, let alone for -4?

Are we at the point where the parameter should just take positive
integers or one of a set of specified string values?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Internal representation of strings and Micropython (Steven D'Aprano's summary)

2014-06-05 Thread Jim J. Jewett



Steven D'Aprano wrote:

 (1) I asked if it would be okay for MicroPython to *optionally* use 
 nominally Unicode strings limited to ASCII. Pretty much the only 
 response to this as been Guido saying That would be a pretty lousy 
 option, and since nobody has really defended the suggestion, I think we 
 can assume that it's off the table.

Lousy is not quite the same as forbidden.

Doing it in good faith would require making the limit prominent
in the documentation, and raising some sort of CharacterNotSupported
exception (or at least a warning) whenever there is an attempt to
create a non-ASCII string, even via the C API.

 (2) I asked if it would be okay ... to use an UTF-8 implementation 
 even though it would lead to O(N) indexing operations instead of O(1). 
 There's been some opposition to this, including Guido's:

[Non-ASCII character removed.]

It is bad when quirks -- even good quirks -- of one implementation lead
people to write code that will perform badly on a different Python
implementation.  Cpython has at least delayed obvious optimizations for
this reason.  Changing idiomatic operations from O(1) to O(N) is big
enough to cause a concern.

That said, the target environment itself apparently limits N to small
enough that the problem should be mostly theoretical.  If you want to
be good citizens, then do put a note in the documentation warning that
particularly long strings are likely to cause performance issues unique
to the MicroPython implementation.

(Frankly, my personal opinion is that if you're really optimizing for
space, then long strings will start getting awkward long before N is
big enough for algorithmic complexity to overcome constant factors.)

 ... those strings will need to be transcoded to UTF-8 before they
 can be written or printed, so keeping them as UTF-8 ...

That all assumes that the external world is using UTF-8 anyhow.

Which is more likely to be true if you document it as a limitation
of MicroPython.

 ... but many strings may never be written out:

print(prefix + s[1:].strip().lower().center(80) + suffix)

 creates five strings that are never written out and one that is.

But looking at the actual strings -- UTF-8 doesn't really hurt
much.  Only the slice and center() are more complex, and for a
string less than 80 characters long, O(N) is irrelevant.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7

2014-06-24 Thread Jim J. Jewett




On 6/24/2014 4:22 AM, Serhiy Storchaka wrote:
 I submitted a number of patches which fixes currently broken
 Unicode-disabled build of Python 2.7 (built with --disable-unicode
 configure option). I suppose this was broken in 2.7 when C
 implementation of the io module was introduced.

It has frequently been broken.  Without a buildbot, it will continue
to break.  I have given at least a quick look at all your proposed
changes; most are fixes to test code, such as skip decorators.

People checked in tests without the right guards because it did work
on their own builds, and on all stable buildbots.  That will probably
continue to happen unless/until a --disable-unicode buildbot is added.

It would be good to fix the tests (and actual library issues).
Unfortunately, some of the specifically proposed changes (such as
defining and using _unicode instead of unicode within python code)
look to me as though they would trigger problems in the normal build
(where the unicode object *does* exist, but would no longer be used).
Other changes, such as the use of \x escapes, appear correct, but make
the tests harder to read -- and might end up removing a test for
correct unicode funtionality across different spellings.

Even if we assume that the tests are fine, and I'm just an idiot who
misread them, the fact that there is any confusion means that these
particular changes may be tricky enough to be for a bad tradeoff for 2.7.

It *might* work if you could make a more focused change.  For example,
instead of leaving the 'unicode' name unbound, provide an object that
simply returns false for isinstance and raises a UnicodeError for any
other method call.  Even *this* might be too aggressive to 2.7, but the
fact that it would only appear in the --disable-unicode builds, and
would make them more similar to the regular build are points in its
favor.

Before doing that, though, please document what the --disable-unicode
mode is actually *supposed* to do when interacting with byte-streams
that a standard defines as UTF-8.  (For example, are the changes to
_xml_dumps and _xml_loads at
http://bugs.python.org/file35758/multiprocessing.patch
correct, or do those functions assume they get bytes as input, or
should the functions raise an exception any time they are called?)


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] sum(...) limitation

2014-08-04 Thread Jim J. Jewett




Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in
https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote:


 Andrea Griffini agriff at tin.it wrote:

However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.

 hm could this be a pure python case that would profit from temporary
 elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html 
 ]?

 lists could declare the tp_can_elide slot and call list.extend on the
 temporary during its tp_add slot instead of creating a new temporary.
 extend/realloc can avoid the copy if there is free memory available
 after the block.

Yes, with all the same problems.

When dealing with a complex object, how can you be sure that __add__
won't need access to the original values during the entire computation?
It works with matrix addition, but not with matric multiplication.
Depending on the details of the implementation, it could even fail for
a sort of sliding-neighbor addition similar to the original justification.

Of course, then those tricky implementations should not define an
_eliding_add_, but maybe the builtin objects still should?  After all,
a plain old list is OK to re-use.  Unless the first evaluation to create
it ends up evaluating an item that has side effects...

In the end, it looks like a lot of machinery (and extra checks that may
slow down the normal small-object case) for something that won't be used
all that often.

Though it is really tempting to consider a compilation mode that assumes
objects and builtins will be normal, and lets you replace the entire
above expression with compile-time [1, 2, 3, 4, 5, 6].  Would writing
objects to that stricter standard and encouraging its use (and maybe
offering a few AST transforms to auto-generate the out-parameters?) work
as well for those who do need the speed?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Backwards compatibility after certificate autovalidation

2014-09-08 Thread Jim J. Jewett




Summary:  There needs to be a simple way to opt out at install time.
It would be far better to offer more fine-grained control, but leaving
that better solution to downstream is acceptable.


On 3 September 2014 01:19, Antoine Pitrou solipsis at pitrou.net wrote:

 RFC 2818 (HTTP over TLS) has the following language in section 3.1:

 If the hostname is available, the client MUST check it against the
 server's identity as presented in the server's Certificate message,
 in order to prevent man-in-the-middle attacks.

 If the client has external information as to the expected identity of
 the server, the hostname check MAY be omitted.

This second case is pretty common, in my experience.  I still see it on
the public internet, but mismatches are almost the expected case on the
intranet, and many installation guides begin by saying to ignore the
security warnings.

I think it best not to name my employer in this context, but I work for
an IT firm large enough that you've heard of it.  As bad as our internal
situation is, it is still better than a typical client's infrastructure,
except that clients often have fewer surfaces to expose in the first place.

Internal websites and applications tend to have information that needs
protection only because saying otherwise requires a long bureaucratic
process with little payoff.  (Also true at many clients.)  Nick has
already posted a subset of the reasons why a site may be signed with
a certificate that is self-signed, expired, and/or limited to the wrong
hostname/subdomain.  

In the long run, I agree that it is better to default to secure.  But
in the short and medium term, there has to be a workaround, and I would
prefer that the simplest workaround not be retire the application, and
don't use python again.

I believe that the minimal acceptable workaround is that the Release
Notes have an URL pointing to an install-time recipe telling the admin
how to change the default back globally.

Examples of good enough:
Add this file to site-packages 
Add this environment variable with this setting 
Add this command line switch to your launch script 

Examples of not good enough:
Edit your application to change ...
Edit your system store ... (affecting more than python)

Obviously, it would be great to offer finer control, so that the
stricter default can be used when it is OK.  (Per installation?  Per
application?  Per run?  Per domain?  Per protocol?  Per certificate?
Per rejection reason?  Treat anything in subdomain1.example.com as
valid for hostname.example.com?  Self-signing is OK for this IP range?)
I would be pleasantly surprised if this level of API can even be
standardized in time, and I agree that it is reasonable to leave it
to 3rd party modules and downstream distributions.

But I think Python itself should provide at least the single big
hammer -- and that hammer should be something that can be used once
at installation time (perhaps by changing the launch script), instead
of requiring user interaction.


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backwards compatibility after certificate autovalidation

2014-09-08 Thread Jim J. Jewett

On Mon, Sep 8, 2014 at 3:44 PM, Cory Benfield c...@lukasa.co.uk wrote:
 On 8 September 2014 18:23, Jim J. Jewett jimjjew...@gmail.com wrote:
 Summary:  There needs to be a simple way to opt out at install time.
 It would be far better to offer more fine-grained control, but leaving
 that better solution to downstream is acceptable.

 Does this argument apply to a hypothetical 2.7 backport of this
 change, or does it apply to making the change in 3.5? (Or of course
 both.)

I believe the argument applies even to 3.5, given that there was no
deprecation period.  The concern is obviously stronger for maintenance
releases.

I am not saying that secure-by-default should wait until until 3.6; I
am saying that the rush requires even more attention than usual to
backwards compatibility.

This actually argues *for* backporting the fix as at least opt-in, so
that 2.7/3.4 can serve as the make your changes now, test them
without all the other new features releases.


Nick's suggestion of a monkey-patching .pth file would be sufficient
backwards compatibility support, if the recipe were referenced from
the release notes (not just the python lib documentation).



Support for partial opt-in -- whether per-process, per call, per
address, etc -- would be nice, but it isn't required for backwards
compatibility.

I think that means an -X option for noverifyhttps should NOT be
added.  It doesn't get users closer to the final solution; it just
adds the noise of a different workaround.


I assume that adding _unverified_urlopen or urlopen(context=...) do
provide incremental improvements compatible with the eventual full
opt-in.  If so, adding them is probably reasonable, but I think the
PEP should explicitly list all such approved half-measures as a guard
against API feature creep.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Backwards compatibility after certificate autovalidation

2014-09-09 Thread Jim J. Jewett

On Tue, Sep 9, 2014 at 12:11 PM, Christian Heimes christ...@python.org wrote:
 On 09.09.2014 05:03, Nick Coghlan wrote:

 On 9 Sep 2014 10:48, Jim J. Jewett jimjjew...@gmail.com
 mailto:jimjjew...@gmail.com wrote:
 From Guido's and your feedback, I think we may need two things to
 approve this for 3.4.2 (putting 2.7 aside for now):

 1. context parameter support in urllib.request (to opt out on a
 per-call basis)
 2. a documented way to restore the old behaviour via sitecustomize
 (which may involve monkeypatching)

 What's with our plan to introduce sslcustomize? Is the idea for a
 configuration module and named contexts off the table?

In a perfect world, half-measures would not be needed, and so neither
would sslcustomize.

In the real world, half-measures are needed, but offering too many of
them adds so much confusion that things can actually get worse in
practice.

In other words, sslcustomize could be great, but getting it wrong
would be a step backwards -- so start it as a 3rd party module.  Since
the biggest users are likely supported customers of downstream
distributions, it makes sense to let them take the lead, though I'm
sure they would appreciate a proof of concept.

 I still prefer the general idea over the monkey patching idea because it
 provides a clean but simple interface for structured configuration.
 Monkey patching of stdlib modules is ugly and error-prone.

The primary use case for monkey patching is to support Separation of
Roles.  (Exact titles will of course differ by business.)

If you need structured configuration, then you are already treating
some calls differently from others, which means that you are already
doing partial remediation.  I agree that monkey patching is the wrong
choice if you are doing partial remediation.

But this partial remediation also implies that a Developer and
Business Owner are involved to decide which calls need to be
changed, and whether to change the call vs dropping the functionality
vs convincing the owner of the other end of the connection to do
things right in the first place.

A Developer in charge of her own environment doesn't need to monkey
patch -- but she could just do the right thing today, or switch to a
framework that does. sslcustomize may be a really good way for her to
document these are the strange exceptions in our existing
environment, if it is done right.

A Deployment Engineer may not even know python, and is certainly not
authorized to make changes beyond configuration.  Convincing someone
that a .py file is a configuration knob probably requires an exception
that is painful to get.  (And saying oh, this is just where we list
security stuff that we're ignoring won't make it easier.)  Changing
the the urlopen calls would therefore be unacceptable even if source
code were available -- and sometimes it isn't.

The Deployment Engineer is often responsible for upgrading the
infrastructure components (possibly including python) for security
patches, so he has to be able to deploy 3.4.x or 2.7.y (though
*probably* not 3.5.0) without any changes to the application itself --
and usually without access to whatever regression tests the
application itself uses.  (Ideally, someone else who does have that
access is involved, but ... not always.)

What the Deployment Engineer *can* do is modify the environment around
the application.  He can write a shell script that sets environment
variables and or command line options.  He can probably even add a
required component -- which might in practice just be a pre-written
module like sslcustomize, or a .pth file that does the monkey patch on
each launch.  But *adding* such a component is orders of magnitude
simpler (from a bureaucratic perspective) than *modifying* one that
already exists.

The end user often can't do anything outside the application's own UI,
which is why the change has to be handled once at deployment, instead
of as break-fix per call site or per bad certificate.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-12 Thread Jim J. Jewett




On September 11, 2014, Jeff Allen wrote:

 ... the area of code point
 space used for the smuggling of bytes under PEP-383 is not a 
 Unicode Private Use Area, but a portion of the trailing surrogate 
 range. This is a code violation, which I imagine is why 
 surrogateescape is an error handler, not a codec.

True, but I believe that is a CPython implementation detail.

Other implementations (including jython) should implement the
surrogatescape API, but I don't think it is important to use the
same internal representation for the invalid bytes.

(Well, unless you want to communicate with external tools (GUIs?)
that are trying to directly use (effectively bytes rather than
strings) in that particular internal encoding when communicating
with python.)

 lone surrogates preclude a naive use of the platform string library

Invalid input often causes problems.  Are you saying that there are
situations where the platform string library could easily handle
invalid characters in general, but has a problem with the specific
case of lone surrogates?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog

2014-09-15 Thread Jim J. Jewett




On Sat Sep 13 00:16:30 CEST 2014, Jeff Allen wrote:

 1. Java does not really have a Unicode type, therefore not one that 
 validates. It has a String type that is a sequence of UTF-16 code units. 
 There are some String methods and Character methods that deal with code 
 points represented as int. I can put any 16-bit values I like in a String.

Including lone surrogates, and invalid characters in general?

 2. With proper accounting for indices, and as long as surrogates appear 
 in pairs, I believe operations like find or endswith give correct 
 answers about the unicode, when applied to the UTF-16. This is an 
 attractive implementation option, and mostly what we do.

So use it.  The fact that you're having to smuggle bytes already
guarantees that your data is either invalid or misinterpreted, and
bug-free isn't possible.

In terms of best-effort, it is reasonable to treat the smuggled bytes
as representing a character outside of your unicode repertoire -- so
it won't ever match entirely valid strings, except perhaps via a
wildcard.  And it should still work for
   .endswith(the same invalid characters).

 3. I'm fixing some bugs where we get it wrong beyond the BMP, and the 
 fix involves banning lone surrogates (completely).  At present you can't 
 type them in literals but you can sneak them in from Java.

So how will you ban them, and what will you do when some java class
sends you an invalid sequence anyhow?  That is exactly the use case
for these smuggled bytes... 

If you distinguish between a fully constructed PyString and a 
code-unit-sequence-that-could-be-made-into-a-PyString-later,
then you could always have your constructor return an InvalidPyString
subclass on the rare occasions when one is needed.

If you want to avoid invalid surrogates even then, just use the
replacement character and keep a separate list of original
characters that got replaced in this string -- a hassle, but no
worse than tracking indices for surrogates.

 4. I think (with Antoine) if Jython supported PEP-383 byte smuggling, it 
 would have to do it the same way as CPython, as it is visible. It's not 
 impossible (I think), but is messy. Some are strongly against.

If you allow direct write access to the underlying charsequence
(as CPython does to C extensions), then you can't really ban
invalid sequences.  If callers have to go through an API -- even
something as minimal as  getBytes or getChars -- then you can use
whatever internal representation you prefer.  Hopefully, the vast
majority of strings won't actually have smuggled bytes.


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 479

2014-11-29 Thread Jim J. Jewett

I have a strong suspicion that I'm missing something; I have been
persuaded both directions too often to believe I have a grip on the
real issue.

So I'm putting out some assumptions; please tell me if I'm wrong, and
maybe make them more explicit in the PEP.

(1)  The change will only affect situations where StopIteration is
currently raised as an Exception -- i.e., it leaks past the bounds of
a loop.

(2)  This can happen because of an explicit raise StopIteration.  This
is currently a supported idiom, and that is changing with PEP 479.

(2a)  Generators in the unwind path will now need to catch and reraise.

(3)  It can also happen because of an explicit next statement (as
opposed the the implicit next of a loop).
This is currently supported; after PEP 479, the next statement should
be wrapped in a try statement, so that the intent will be explicit.

(4)  It can happen because of yield from yielding from an iterator,
rather than a generator?

(5)  There is no other case where this can happen?  (So the generator
comprehension case won't matter unless it also includes one of the
earlier cases.)

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] hg vs Github [was: PEP 481 - Migrate Some Supporting Repositories to Git and Github]

2014-12-01 Thread Jim J. Jewett




M. Cepl asked:

 What I really don't understand is why this discussion is hg v.  
 GitHub, when it should be hg v. git. Particular hosting is 
 a secondary issue

I think even the proponents concede that git isn't better enough
to justify a switch in repositories.

They do claim that GitHub (the whole environment; not just the
hosting) is so much better that a switch to GitHub is justified.

Github + hg offers far fewer benefits than Github + git, so also
switching to git is part of the price.  Whether that is an
intolerable markup or a discount is disputed, as are the value
of several other costs and benefits.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] My thinking about the development process

2014-12-08 Thread Jim J. Jewett



Brett Cannon wrote:
 4. Contributor creates account on bugs.python.org and signs the
   [contributor agreement](https://www.python.org/psf/contrib/contrib-form/)

Is there an expiration on such forms?  If there doesn't need to be
(and one form is good for multiple tickets), is there an objection
(besides not done yet) to making signed the form part of the bug
reporter account, and required to submit to the CI process?  (An I
can't sign yet, bug me later option would allow the current workflow
without the this isn't technically a patch workaround for small enough
patches from those with slow-moving employers.)


 There's the simple spelling mistake patches and then there's the
 code change patches.

There are a fair number of one-liner code patches; ideally, they
could also be handled quickly.

 For the code change patches, contributors need an easy way to get a hold of
 the code and get their changes to the core developers.

For a fair number of patches, the same workflow as spelling errors is
appropriate, except that it would be useful to have an automated state
saying yes, this currently merges fine, so that committers can focus
only on patches that are (still) at least that ready.

 At best core developers tell a contributor please send your PR
 against 3.4, push-button merge it, update a local clone, merge from
 3.4 to default, do the usual stuff, commit, and then push;

Is it common for a patch that should apply to multiple branches to fail
on some but not all of them?

In other words, is there any reason beyond not done yet that submitting
a patch (or pull request) shouldn't automatically create a patch per
branch, with pushbuttons to test/reject/commit?

 Our code review tool is a fork that probably should be
 replaced as only Martin von Loewis can maintain it.

Only he knows the innards, or only he is authorized, or only he knows
where the code currently is/how to deploy an update?

I know that there were times in the (not-so-recent) past when I had
time and willingness to help with some part of the infrastructure, but
didn't know where the code was, and didn't feel right making a blind
offer.


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] libffi embedded in CPython

2014-12-18 Thread Jim J. Jewett



On Thu, Dec 18, 2014, at 14:13, Maciej Fijalkowski wrote:
 ... http://bugs.python.org/issue23085 ...
 is there any reason any more for libffi being included in CPython?

[And why a fork, instead of just treating it as an external dependency]

Benjamin Peterson responded:

 It has some sort of Windows related patches. No one seems to know
 whether they're still needed for newer libffi. Unfortunately, ctypes
 doesn't currently have a maintainer.

Are any of the following false?

(1)  Ideally, we would treat it as an external dependency.

(2)  At one point, it was intentionally forked to get in needed
patches, including at least some for 64 bit windows with MSVC.

(3)  Upstream libffi maintenance has picked back up.

(4)  Alas, that means the switch merge would not be trivial.

(5)  In theory, we could now switch to the external version.
[In particular, does libffi have a release policy such that we
could assume the newest released version is safe, so long as
our integration doesn't break?]

(6)  By its very nature, libffi changes are risky and undertested.
At the moment, that is also true of its primary user, ctypes.

(7)  So a switch is OK in theory, but someone has to do the
non-trivial testing and merging, and agree to support both libffi
and and ctypes in the future.  Otherwise, stable wins.

(8)  The need for future support makes this a bad candidate for
patches wanted/bug bounty/GSoC.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] libffi embedded in CPython

2014-12-22 Thread Jim J. Jewett



On Thu, Dec 18, 2014, at 14:13, Maciej Fijalkowski wrote:
 ... http://bugs.python.org/issue23085 ...
 is there any reason any more for libffi being included in CPython?


Paul Moore wrote:
 Probably the easiest way of moving this forward would be for someone
 to identify the CPython-specific patches in the current version ...

Christian Heimes wrote:
 That's easy. All patches are tracked in the diff file
 https://hg.python.org/cpython/file/3de678cd184d/Modules/_ctypes/libffi.diff

That (200+ lines) doesn't seem to have all the C changes, such as the
win64 sizeof changes from issue 11835.

Besides http://bugs.python.org/issue23085, there is at least
http://bugs.python.org/issue22733
http://bugs.python.org/issue20160
http://bugs.python.org/issue11835

which sort of drives home the point that making sure we have a
good merge isn't trivial, and this isn't an area where we should
just assume that tests will catch everything.  I don't think it
is just a quicky waiting on permission.

I've no doubt that upstream libffi is better in many ways, but
those are ways people have already learned to live with.  

That said, I haven't seen any objections in principle, except
perhaps from Steve Dower in the issues.  (I *think* he was
just saying not worth the time to me, but it was ambiguous.)

I do believe that Christian or Maciej *could* sort things out well
enough; I have no insight into whether they have (or someone else
has) the time to actually do so.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 441 - Improving Python ZIP Application Support

2015-02-18 Thread Jim J. Jewett



Barry Warsaw wrote:
 I don't know exactly what the procedure would be to claim .pyz for *nix,
 e.g. updating /etc/mime.types, but I think the PEP should at least mention
 this.  I think we want to get as official support for .pyz files on *nix as
 possible.

Paul Moore wrote:
 I'll add a note to the PEP, but I have no idea how we would even go
 about that, so that's all I can do, unfortunately.

Are you just looking for 

http://www.iana.org/assignments/media-types/media-types.xhtml

and its references, including the registration procedures

http://tools.ietf.org/html/rfc6838#section-4.2.5

and the application form at

http://www.iana.org/form/media-types

? 

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 441 - Improving Python ZIP Application Support

2015-02-19 Thread Jim J. Jewett

On Wed, Feb 18, 2015 at 4:16 PM, Paul Moore p.f.mo...@gmail.com wrote:
 On 18 February 2015 at 20:48, Jim J. Jewett jimjjew...@gmail.com wrote:
 Barry Warsaw wrote:
 I don't know exactly what the procedure would be to claim .pyz for *nix,
 e.g. updating /etc/mime.types,

...
 Are you just looking for

 http://www.iana.org/assignments/media-types/media-types.xhtml and ...

 That covers mime types, but not file extensions, so it's not really
 what *I* thought Barry was talking about.

Question 13 at http://www.iana.org/form/media-types asks for
additional information, and specifically calls out Magic Number and
File Extension, among others.  I doubt there is any more official
repository for file extension meaning within MIME or unix.

 Also, I don't think reserving anything is something I, as an
 individual (and specifically a non-Unix user) should do. It probably
 should be handled by the PSF, as the process seems to need a contact
 email address...

Ideally, it would be a long-lasting organizational address, such as
pep-edi...@python.org.  But often, it isn't.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 489: Redesigning extension module loading

2015-03-16 Thread Jim J. Jewett


On 16 March 2015 Petr Viktorin wrote:

 If PyModuleCreate is not defined, PyModuleExec is expected to operate
 on any Python object for which attributes can be added by PyObject_GetAttr*
 and retrieved by PyObject_SetAttr*.

I assume it is the other way around (add with Set and retrieve with Get),
rather than a description of the required form of magic.


 PyObject *PyModule_AddCapsule(
 PyObject *module,
 const char *module_name,
 const char *attribute_name,
 void *pointer,
 PyCapsule_Destructor destructor)

What happens if module_name doesn't match the module's __name__?
Does it become a hidden attribute?  A dotted attribute?  Is the
result undefined?

Later, there is

 void *PyModule_GetCapsulePointer(
 PyObject *module,
 const char *module_name,
 const char *attribute_name)

with the same apparently redundant arguments, but not a
PyModule_SetCapsulePointer.  Are capsule pointers read-only, or can
they be replaced with another call to PyModule_AddCapsule, or by a
simple PyObject_SetAttr?

 Subinterpreters and Interpreter Reloading
...
 No user-defined functions, methods, or instances may leak to different
 interpreters.

By user-defined do you mean defined in python, as opposed to in
the extension itself?

If so, what is the recommendation for modules that do want to support,
say, callbacks?  A dual-layer mapping that uses the interpreter as the
first key?  Naming it _module and only using it indirectly through
module.py, which is not shared across interpreters?  Not using this
API at all?

 To achieve this, all module-level state should be kept in either the module
 dict, or in the module object.

I don't see how that is related to leakage.

 A simple rule of thumb is: Do not define any static data, except 
 built-in types
 with no mutable or user-settable class attributes.

What about singleton instances?  Should they be per-interpreter?
What about constants, such as PI?
Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be
kept?


What happens if this no-leakage rule is violated?  Does the module
not load, or does it just maybe lead to a crash down the road?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-10 Thread Jim J. Jewett



On 10 March 2015 at slightly after midnight. Paul Moore wrote:


 Personally I doubt it would make much difference. If the docs say
 pygmentize I'm unlikely to dig around to find that the incantation
 python -m pygments.somemodule:main does the same thing using 3 times
 as many characters. I'd just add Python to my PATH and say stuff it.

There is value in getting the incantation down to a single (preferably
short) line, because then it can be used as a shortcut.  That means it
can be created as a shortcut at installation time, and that someone
writing their own batch file can just cut and paste from the shortcut
properties' target.  Not as simple as just adding to the path, but
simpler than adding several directories to the path, or modifying other
environment variables, or fighting an existing but conflicting python
installation already on the path.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support

2015-02-25 Thread Jim J. Jewett



On 24 February 2015 at 18:58, Guido van Rossum guido at python.org wrote:
 The naming of the functions feels inconsistent -- maybe pack(directory,
 target) - create_archive(directory, archive), and set_interpreter() -
 copy_archive(archive, new_archive)?


Paul Moore wrote:
 One possible source of confusion with copy_archive (and its command
 line equivalent python -m zipapp old.pyz -o new.pyz) is that it
 isn't technically a copy, as it changes the shebang line (if you omit
 the interpreter argument it removes the existing shebang).

Is the difference between create and copy important?  e.g., is there
anything wrong with

create_archive(old_archive, output=new_archive) working as well as
create_archive(directory, archive)?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support

2015-02-25 Thread Jim J. Jewett

On Wed, Feb 25, 2015 at 2:33 PM, Paul Moore p.f.mo...@gmail.com wrote:
 On 25 February 2015 at 17:06, Paul Moore p.f.mo...@gmail.com wrote:

 I've included the resulting API
 documentation below. It looks pretty good to me.

Me too.  I have a few nits anyhow.

 .. function:: create_archive(directory, target=None, interpreter=None,
 main=None)

Create an application archive from *source*.  The source can be any
of the following:

(1)  *source* makes me think of source code, as opposed to binary.
This is only a small objection, in part because I can't think of
anything better.

(2)  If you do keep *source*, I think that the the directory
parameter should be renamed to source.

(3)
* The name of an existing application archive file, in which case the
  file is copied to the target.

==

* The name of an existing application archive file, in which case the
  file is copied (possibly with changes) to the target.

My concern is that someone who does want just another copy will use
this, see copied, not read the other options, and be surprised when
the shebang is dropped.


* A file object open for reading in bytes mode.  The content of the
  file should be an application archive, and the file object is
  assumed to be positioned at the start of the archive.

I like this way of ducking the does it need to be seekable question.

The *target* argument determines where the resulting archive will be
written:

* If it is the name of a file, the archive will be written to that
  file.

(4)  Note that the filename is not required to end with pyz, although
that is good practice.  Or maybe just be explicit that the function
itself does not add a .pyz, and assumes that the caller will do so
when appropriate.

The *interpreter* argument specifies the name of the Python
interpreter with which the archive will be executed.  ...
   ...  Omitting the *interpreter* results in no shebang line being
written.

(5)  even if there was an explicit shebang line in the source archive.

  If an interpreter is specified, and the target is a
filename, the executable bit of the target file will be set.

(6) (target is a filename, or None)  Or does that clarification just
confuse the issue, and only benefit people so careful they'll verify
it themselves anyway?

(7)  That is a good idea, but not quite as clear cut as it sounds.  On
unix, there are generally 3 different executable bits specifying *who*
can run it.  Setting the executable bit only for the owner is probably
a conservative but sensible default.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 492: What is the real goal?

2015-04-29 Thread Jim J. Jewett


On Tue Apr 28 23:49:56 CEST 2015, Guido van Rossum quoted PEP 492:

 Rationale and Goals
 ===

 Current Python supports implementing coroutines via generators (PEP
 342), further enhanced by the ``yield from`` syntax introduced in PEP
 380. This approach has a number of shortcomings:

 * it is easy to confuse coroutines with regular generators, since they
   share the same syntax; async libraries often attempt to alleviate
   this by using decorators (e.g. ``@asyncio.coroutine`` [1]_);

So?  PEP 492 never says what coroutines *are* in a way that explains
why it matters that they are different from generators.

Do you really mean coroutines that can be suspended while they wait
for something slow?

As best I can guess, the difference seems to be that a normal
generator is using yield primarily to say:

I'm not done; I have more values when you want them,

but an asynchronous (PEP492) coroutine is primarily saying:

This might take a while, go ahead and do something else meanwhile.

 As shown later in this proposal, the new ``async
 with`` statement lets Python programs perform asynchronous calls when
 entering and exiting a runtime context, and the new ``async for``
 statement makes it possible to perform asynchronous calls in iterators.

Does it really permit *making* them, or does it just signal that you
will be waiting for them to finish processing anyhow, and it doesn't
need to be a busy-wait?

As nearly as I can tell, async with doesn't start processing the
managed block until the asynchronous call finishes its work -- the
only point of the async is to signal a scheduler that the task is
blocked.

Similarly, async for is still linearized, with each step waiting
until the previous asynchronous step was not merely launched, but
fully processed.  If anything, it *prevents* within-task parallelism.

 It uses the ``yield from`` implementation with an extra step of
 validating its argument.  ``await`` only accepts an *awaitable*, which
 can be one of:

What justifies this limitation?

Is there anything wrong awaiting something that eventually uses
return instead of yield, if the this might take a while signal
is still true?  Is the problem just that the current implementation
might not take proper advantage of task-switching?

   Objects with ``__await__`` method are called *Future-like* objects in
   the rest of this PEP.

   Also, please note that ``__aiter__`` method (see its definition
   below) cannot be used for this purpose.  It is a different protocol,
   and would be like using ``__iter__`` instead of ``__call__`` for
   regular callables.

   It is a ``TypeError`` if ``__await__`` returns anything but an
   iterator.

What would be wrong if a class just did __await__ = __anext__  ?
If the problem is that the result of __await__ should be iterable,
then why isn't __await__ = __aiter__ OK?

 ``await`` keyword is defined differently from ``yield`` and ``yield
 from``.  The main difference is that *await expressions* do not require
 parentheses around them most of the times.

Does that mean

The ``await`` keyword has slightly higher precedence than ``yield``,
so that fewer expressions require parentheses?

 class AsyncContextManager:
 async def __aenter__(self):
 await log('entering context')

Other than the arbitrary keyword must be there limitations imposed
by this PEP, how is that different from:

 class AsyncContextManager:
 async def __aenter__(self):
 log('entering context')

or even:

 class AsyncContextManager:
 def __aenter__(self):
 log('entering context')

Will anything different happen when calling __aenter__ or log?
Is it that log itself now has more freedom to let other tasks run
in the middle?


 It is an error to pass a regular context manager without ``__aenter__``
 and ``__aexit__`` methods to ``async with``.  It is a ``SyntaxError``
 to use ``async with`` outside of a coroutine.

Why?  Does that just mean they won't take advantage of the freedom
you offered them?  Or are you concerned that they are more likely to
cooperate badly with the scheduler in practice?

 It is a ``TypeError`` to pass a regular iterable without ``__aiter__``
 method to ``async for``.  It is a ``SyntaxError`` to use ``async for``
 outside of a coroutine.

The same questions about why -- what is the harm?

 The following code illustrates new asynchronous iteration protocol::

 class Cursor:
 def __init__(self):
 self.buffer = collections.deque()

 def _prefetch(self):
 ...

 async def __aiter__(self):
 return self

 async def __anext__(self):
 if not self.buffer:
 self.buffer = await self._prefetch()
 if not self.buffer:
 raise StopAsyncIteration
 return self.buffer.popleft()

 then the ``Cursor`` class can be used as follows::

 async for row in Cursor():

Re: [Python-Dev] PEP 492: What is the real goal?

2015-04-30 Thread Jim J. Jewett

On Wed, Apr 29, 2015 at 2:26 PM, Paul Moore p.f.mo...@gmail.com wrote:
 On 29 April 2015 at 18:43, Jim J. Jewett jimjjew...@gmail.com wrote:

 So?  PEP 492 never says what coroutines *are* in a way that explains
 why it matters that they are different from generators.

...

 Looking at the Wikipedia article on coroutines, I see an example of
 how a producer/consumer process might be written with coroutines:

 var q := new queue

 coroutine produce
 loop
 while q is not full
 create some new items
 add the items to q
 yield to consume

 coroutine consume
 loop
 while q is not empty
 remove some items from q
 use the items
 yield to produce

 (To start everything off, you'd just run produce).

 I can't even see how to relate that to PEP 429 syntax. I'm not allowed
 to use yield, so should I use await consume in produce (and vice
 versa)?

I think so ... but the fact that nothing is actually coming via the
await channel makes it awkward.

I also worry that it would end up with an infinite stack depth, unless
the await were actually replaced with some sort of framework-specific
scheduling primitive, or one of them were rewritten differently to
ensure it returned to the other instead of calling it anew.

I suspect the real problem is that the PEP is really only concerned
with a very specific subtype of coroutine, and these don't quite fit.
(Though it could be done by somehow making them both await on the
queue status, instead of on each other.)

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-04-30 Thread Jim J. Jewett


On Wed Apr 29 20:06:23 CEST 2015,Yury Selivanov replied:

 As best I can guess, the difference seems to be that a normal
 generator is using yield primarily to say:

  I'm not done; I have more values when you want them,

 but an asynchronous (PEP492) coroutine is primarily saying:

  This might take a while, go ahead and do something else meanwhile.

 Correct.

Then I strongly request a more specific name than coroutine.


I would prefer something that refers to cooperative pre-emption,
but I haven't thought of anything that is short without leading to
other types of confusion.

My least bad idea at the moment would be self-suspending coroutine
to emphasize that suspending themselves is a crucial feature.

Even PEP492-coroutine would be an improvement.


 Does it really permit *making* them [asynchronous calls], or does
 it just signal that you will be waiting for them to finish processing
 anyhow, and it doesn't need to be a busy-wait?

 I does.

Bad phrasing on my part.  Is there anything that prevents an
asynchronous call (or waiting for one) without the async with?

If so, I'm missing something important.  Either way, I would
prefer different wording in the PEP.



 It uses the ``yield from`` implementation with an extra step of
 validating its argument.  ``await`` only accepts an *awaitable*, 
 which can be one of:

 What justifies this limitation?

 We want to avoid people passing regular generators and random
 objects to 'await', because it is a bug.

Why?

Is it a bug just because you defined it that way?

Is it a bug because the await makes timing claims that an
object not making such a promise probably won't meet?  (In
other words, a marker interface.)

Is it likely to be a symptom of something that wasn't converted
correctly, *and* there are likely to be other bugs caused by
that same lack of conversion?

 For coroutines in PEP 492:

 __await__ = __anext__ is the same as __call__ = __next__
 __await__ = __aiter__ is the same as __call__ = __iter__

That tells me that it will be OK sometimes, but will usually
be either a mistake or an API problem -- and it explains why.

Please put those 3 lines in the PEP.


 This is OK. The point is that you can use 'await log' in
 __aenter__.  If you don't need awaits in __aenter__ you can
 use them in __aexit__. If you don't need them there too,
 then just define a regular context manager.

Is it an error to use async with on a regular context manager?
If so, why?  If it is just that doing so could be misleading,
then what about async with mgr1, mgr2, mgr3 -- is it enough
that one of the three might suspend itself?

 class AsyncContextManager:
 def __aenter__(self):
 log('entering context')


 __aenter__ must return an awaitable

Why?  Is there a fundamental reason, or it is just to avoid the
hassle of figuring out whether or not the returned object is a
future that might still need awaiting?

Is there an assumption that the scheduler will let the thing-being
awaited run immediately, but look for other tasks when it returns,
and a further assumption that something which finishes the whole
task would be too slow to run right away?

 It doesn't make any sense in using 'async with' outside of a
 coroutine.  The interpeter won't know what to do with them:
 you need an event loop for that.

So does the PEP also provide some way of ensuring that there is
an event loop?  Does it assume that self-suspending coroutines
will only ever be called by an already-running event loop
compatible with asyncio.get_event_loop()?  If so, please make
these contextual assumptions explicit near the beginning of the PEP.


 It is a ``TypeError`` to pass a regular iterable without ``__aiter__``
 method to ``async for``.  It is a ``SyntaxError`` to use ``async for``
 outside of a coroutine.

 The same questions about why -- what is the harm?

I can imagine that as an implementation detail, the async for wouldn't
be taken advtange of unless it was running under an event loop that
knew to look for aync for as suspension points.

I'm not seeing what the actual harm is in either not happening to
suspend (less efficient, but still correct), or in suspending between
every step of a regular iterator (because, why not?)


 For debugging this kind of mistakes there is a special debug mode in
 asyncio, in which ``@coroutine`` 
...
 decorator makes the decision of whether to wrap or not to wrap based on
 an OS environment variable ``PYTHONASYNCIODEBUG``.

(1)  How does this differ from the existing asynchio.coroutine?
(2)  Why does it need to have an environment variable?  (Sadly,
 the answer may be backwards compatibility, if you're really
 just specifying the existing asynchio interface better.)
(3)  Why does it need [set]get_coroutine_wrapper, instead of just
 setting the asynchio.coroutines.coroutine attribute?
(4)  Why do the get/set need to be in sys?

Is the intent to do anything more than preface execution with:

import asynchio.coroutines

Re: [Python-Dev] ABCs - Re: PEP 492: async/await in Python; version 4

2015-05-04 Thread Jim J. Jewett


On Sun May 3 08:32:02 CEST 2015, Stefan Behnel wrote:

 Ok, fair enough. So, how would you use this new protocol manually then?
 Say, I already know that I won't need to await the next item that the
 iterator will return. For normal iterators, I could just call next() on it
 and continue the for-loop. How would I do it for AIterators?

Call next, then stick it somewhere it be waited on.

Or is that syntactically illegal, because of the separation between
sync and async?

The asych for seems to assume that you want to do the waiting right
now, at each step.  (At least as far as this thread of the logic goes;
something else might be happening in parallel via other threads of
control.)

 BTW, I guess that this AIterator, or rather AsyncIterator, needs to be
 a separate protocol (and ABC) then. Implementing __aiter__() and
 __anext__() seems perfectly reasonable without implementing (or using) a
 Coroutine.

 That means we also need an AsyncIterable as a base class for it.

Agreed.

 That might even help us to decide if we need new builtins (or helpers)
 aiter() and anext() in order to deal with these protocols.

I hope not; they seem more like specialized versions of functions,
such as are found in math or cmath.  Ideally, as much as possible of
this PEP should live in asycio, rather than appearing globally.

Which reminds me ... *should* the await keyword work with any future,
or is it really intentionally restricted to use with a single library
module and 3rd party replacements?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-05-01 Thread Jim J. Jewett

On Fri, May 1, 2015 at 2:59 PM, Guido van Rossum gu...@python.org wrote:
 On Fri, May 1, 2015 at 11:26 AM, Jim J. Jewett jimjjew...@gmail.com wrote:

 On Thu, Apr 30, 2015 at 3:32 PM, Guido van Rossum gu...@python.org
 wrote:


 (Guido:) Actually that's not even wrong. When using generators as
 coroutines, PEP 342
  style, yield means I am blocked waiting for a result that the I/O
  multiplexer is eventually going to produce.

 So does this mean that yield should NOT be used just to yield control
 if a task isn't blocked?  (e.g., if its next step is likely to be
 long, or low priority.)  Or even that it wouldn't be considered a
 co-routine in the python sense?

 I'm not sure what you're talking about. Does next step refer to something
 in the current stack frame or something that you're calling?

The next piece of your algorithm.

 None of the
 current uses of yield (the keyword) in Python are good for lowering
 priority of something.

If there are more tasks than executors, yield is a way to release your
current executor and go to the back of the line.  I'm pretty sure I
saw several examples of that style back when coroutines were first
discussed.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-05-01 Thread Jim J. Jewett

On Thu, Apr 30, 2015 at 3:32 PM, Guido van Rossum gu...@python.org wrote:

(me:)
 A badly worded attempt to say
 Normal generator:  yield (as opposed to return) means
 that the function isn't done, and there may be more
 things to return later.

 but an asynchronous (PEP492) coroutine is primarily saying:

  This might take a while, go ahead and do something else
 meanwhile.

(Yuri:) Correct.

(Guido:) Actually that's not even wrong. When using generators as
coroutines, PEP 342
 style, yield means I am blocked waiting for a result that the I/O
 multiplexer is eventually going to produce.

So does this mean that yield should NOT be used just to yield control
if a task isn't blocked?  (e.g., if its next step is likely to be
long, or low priority.)  Or even that it wouldn't be considered a
co-routine in the python sense?

If this is really just about avoiding busy-wait on network IO, then
coroutine is way too broad a term, and I'm uncomfortable restricting a
new keyword (async or await) to what is essentially a Domain Specific
Language.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-05-01 Thread Jim J. Jewett


On Thu Apr 30 21:27:09 CEST 2015, Yury Selivanov replied:


On 2015-04-30 2:41 PM, Jim J. Jewett wrote:

 Bad phrasing on my part.  Is there anything that prevents an
 asynchronous call (or waiting for one) without the async with?

 If so, I'm missing something important.  Either way, I would
 prefer different wording in the PEP.

 Yes, you can't use 'yield from' in __exit__/__enter__
 in current Python.

I tried it in 3.4, and it worked.

I'm not sure it would ever be sensible, but it didn't raise any
errors, and it did run.

What do you mean by can't use?


 For coroutines in PEP 492:
 __await__ = __anext__ is the same as __call__ = __next__
 __await__ = __aiter__ is the same as __call__ = __iter__

 That tells me that it will be OK sometimes, but will usually
 be either a mistake or an API problem -- and it explains why.

 Please put those 3 lines in the PEP.

 There is a line like that:
 https://www.python.org/dev/peps/pep-0492/#await-expression
 Look for Also, please note... line.

It was from reading the PEP that the question came up, and I
just reread that section.

Having those 3 explicit lines goes a long way towards explaining
how an asychio coroutine differs from a regular callable, in a
way that the existing PEP doesn't, at least for me.


 This is OK. The point is that you can use 'await log' in
 __aenter__.  If you don't need awaits in __aenter__ you can
 use them in __aexit__. If you don't need them there too,
 then just define a regular context manager.

 Is it an error to use async with on a regular context manager?
 If so, why?  If it is just that doing so could be misleading,
 then what about async with mgr1, mgr2, mgr3 -- is it enough
 that one of the three might suspend itself?

 'with' requires an object with __enter__ and __exit__

 'async with' requires an object with __aenter__ and __aexit__

 You can have an object that implements both interfaces.

I'm not still not seeing why with (let alone await with) can't
just run whichever one it finds.  await with won't actually let
the BLOCK run until the future is resolved.  So if a context
manager only supplies __enter__ instead of __aenter__, then at most
you've lost a chance to switch tasks while waiting -- and that is no
worse than if the context manager just happened to be really slow.


 For debugging this kind of mistakes there is a special debug mode in

 Is the intent to do anything more than preface execution with:

 import asynchio.coroutines
 asynchio.coroutines._DEBUG = True

 This won't work, unfortunately.  You need to set the
 debug flag *before* you import asyncio package (otherwise
 we would have an unavoidable performance cost for debug
 features).  If you enable it after you import asyncio,
 then asyncio itself won't be instrumented.  Please
 see the implementation of asyncio.coroutine for details.

Why does asynchio itself have to wrapped?  Is that really something
normal developers need to debug, or is it only for developing the
stdlib itself?  If it if only for developing the stdlib, than I
would rather see workarounds like shoving _DEBUG into builtins
when needed, as opposed to adding multiple attributes to sys.


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-05-01 Thread Jim J. Jewett

On Fri, May 1, 2015 at 4:10 PM, Guido van Rossum gu...@python.org wrote:
 On Fri, May 1, 2015 at 12:48 PM, Jim J. Jewett jimjjew...@gmail.com wrote:

 If there are more tasks than executors, yield is a way to release your
 current executor and go to the back of the line.  I'm pretty sure I
 saw several examples of that style back when coroutines were first
 discussed.

 Could you dig up the actual references? It seems rather odd to me to mix
 coroutines and threads this way.

I can try in a few days, but the primary case (and perhaps the only
one with running code) was for n_executors=1.  They assumed there
would only be a single thread, or at least only one that was really
important to the event loop -- the pattern was often described as an
alternative to relying on threads.

FWIW, Ron Adam's yielding  in
https://mail.python.org/pipermail/python-dev/2015-May/139762.html is
in the same spirit.

You replied it would be better if that were done by calling some
method on the scheduling loop, but that isn't any more standard, and
the yielding function is simple enough that it will be reinvented.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: What is the real goal?

2015-05-05 Thread Jim J. Jewett


On Fri May 1 23:58:26 CEST 2015, Yury Selivanov wrote:


 Yes, you can't use 'yield from' in __exit__/__enter__
 in current Python.

 What do you mean by can't use?

 It probably executed without errors, but it didn't run the
 generators.

True.  But it did return the one created by __enter__, so it
could be bound to a variable and iterated within the block.

There isn't an easy way to run the generator created by __exit__,
and I'm not coming up with any obvious scenarios where it would
be a sensible thing to do (other than using with on a context
manager that *does* return a future instead of finishing).

That said, I'm still not seeing why the distinction is so important
that we have to enforce it at a language level, as opposed to letting
the framework do its own enforcement.  (And if the reason is
performance, then make the checks something that can be turned off,
or offer a fully instrumented loop as an alternative for debugging.)

 Is the intent to do anything more than preface execution with:
 import asynchio.coroutines
 asynchio.coroutines._DEBUG = True

 If you enable it after you import asyncio,
 then asyncio itself won't be instrumented.

 Why does asynchio itself have to wrapped?  Is that really something
 normal developers need to debug, or is it only for developing the
 stdlib itself?  

 Yes, normal developers need asyncio to be instrumented,
 otherwise you won't know what you did wrong when you
 called some asyncio code without 'await' for example.

I'll trust you that it *does* work that way, but this sure sounds to
me as though the framework isn't ready to be frozen with syntax, and
maybe not even ready for non-provisional stdlib inclusion.

I understand that the disconnected nature of asynchronous tasks makes
them harder to debug.  I heartily agree that the event loop should
offer some sort of debug facility to track this.

But the event loop is supposed to be pluggable.  Saying that this
requires not merely a replacement, or even a replacement before events
are added, but a replacement made before python ever even loads the
default version ...

That seems to be much stronger than sys.settrace -- more like
instrumenting the ceval loop itself.  And that is something that
ordinary developers shouldn't have to do.


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: async/await in Python; version 5

2015-05-05 Thread Jim J. Jewett


On Tue May 5 18:29:44 CEST 2015, Yury Selivanov posted an updated PEP492.

Where are the following over-simplifications wrong?

(1)  The PEP is intended for use (almost exclusively) with
asychronous IO and a scheduler such as the asynchio event loop.

(2)  The new syntax is intended to make it easier to recognize when
a task's execution may be interrupted by arbitrary other tasks, and
the interrupted task therefore has to revalidate assumptions about
shared data.

With threads, CPython can always suspend a task between op-codes,
but with a sufficiently comprehensive loop (and sufficiently
coooperative tasks), tasks *should* only be suspended when they
make an explicit request to *wait* for an answer, and these points
*should* be marked syntactically.

(3)  The new constructs explicitly do NOT support any sort of
concurrent execution within a task; they are for use precisely
when otherwise parallel subtasks are being linearized by pausing
and waiting for the results.


Over-simplifications 4-6 assume a world with standardized futures
based on concurrent.futures, where .result either returns the
result or raises the exception (or raises another exception about
timeout or cancellation).

[Note that the actual PEP uses iteration over the results of a new
__await__ magic method, rather than .result on the object itself.
I couldn't tell whether this was for explicit marking, or just for
efficiency in avoiding future creation.]

(4)  await EXPR is just syntactic sugar for EXPR.result

except that, by being syntax, it better marks locations where
unrelated tasks might have a chance to change shared data.

[And that, as currently planned, the result of an await isn't
actually the result; it is an iterator of results.]

(5)  async def is just syntactic sugar for def, 

except that, by being syntax, it better marks the signatures of
functions and methods where unrelated tasks might have a chance
to change shared data after execution has already begun.

(5A) As the PEP currently stands, it is also a promise that the
function will NOT produce a generator used as an iterator; if a
generator-iterator needs to wait for something else at some point,
that will need to be done differently.

I derive this limitation from
   It is a ``SyntaxError`` to have ``yield`` or ``yield from``
expressions in an ``async`` function.

but I don't understand how this limitation works with things like a
per-line file iterator that might need to wait for the file to
be initially opened.

(6)  async with EXPR as VAR:

would be equivalent to:

with EXPR as VAR:

except that
  __enter__() would be replaced by next(await __enter__()) # __enter__().result
  __exit__() would be replaced by  next(await __exit__())  # __exit__().result


(7)  async for elem in iter:

would be shorthand for:

for elem in iter:
elem = next(await elem) # elem.result



-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 492: Please mention the Event Loop

2015-05-05 Thread Jim J. Jewett


On Tue May 5 21:44:26 CEST 2015,Brett Cannon wrote:

 It's not as
 complicated as it seems when you realize there is an event loop driving
 everything (which people have been leaving out of the conversation since it
 doesn't tie into the syntax directly).

Another reason people don't realize it is that the PEP goes out
of its way to avoid saying so.

I understand that you (and Yuri) don't want to tie the PEP too
tightly to the specific event loop implementation in
asyncio.events.AbstractEventLoop, but ... that particular
conflation isn't really what people are confused about.

coroutines often brings up thoughts of independent tasks.  Yuri may
well know that (Python has asymmetric coroutines, that's it), but
others have posted that this was a surprise -- and the people posting
here have far more python experience than most readers will.

Anyone deeply involved enough to recognize that this PEP is only about

  (1) a particular type of co-routine -- a subset even of prior python usage

  (2) used for a particular purpose

  (3) coordinated via an external scheduler

will already know that they can substitute other event loops.

Proposed second paragraph of the abstract:

This PEP assumes that the asynchronous tasks are scheduled and
coordinated by an Event Loop similar to that of stdlib module
asyncio.events.AbstractEventLoop.  While the PEP is not tied to
any specific Event Loop implementation, it is relevant only to
the kind of coroutine that uses yield as a signal to the scheduler,
indicating that the coroutine will be waiting until an event (such
as IO) is completed.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 492: async/await in Python; version 4

2015-05-05 Thread Jim J. Jewett


Tue May 5 21:48:36 CEST 2015, Yury Selivanov wrote:

 As for terminology, I view this discussion differently.  It's
 not about the technical details (Python has asymmetric
 coroutines, that's it), but rather on how to disambiguate
 coroutines implemented with generators and yield-from, from
 new 'async def' coroutines.

Not just How?, but Why?.

Why do they *need* to be disambiguated?

With the benefit of having recently read all that discussion
(as opposed to just the PEP), my answer is ... uh ... that
generators vs async def is NOT an important distinction.
What matters (as best I can tell) is:

something using yield (or yield from) to mark execution context switches

  vs

other kinds of callables, including those using yield to make an iterator


I'm not quite sure that the actual proposal even really separates them
effectively, in part because the terminology keeps suggesting other
distinctions instead.  (The glossary does help; just not enough.)


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Unicode 8.0 and 3.5

2015-06-22 Thread Jim J. Jewett



On Thu Jun 18 20:33:13 CEST 2015, Larry Hastings asked:

 On 06/18/2015 11:27 AM, Terry Reedy wrote:
 Unicode 8.0 was just released.  Can we have unicodedata updated to 
 match in 3.5?

 What does this entail?  Data changes, code changes, both?

Note that the unicode 7 changes also need to be considered, because
python 3.4 used unicode 6.3.

There are some changes to the recommendations on what to use in
identifiers.  Python doesn't follow precisely the previous rules,
but it would be good to ensure that any newly allowed characters
are intentional -- particularly for the newly defined characters.

My gut feel is that it would have been fine during beta, but for 
the 3rd RC I am not so sure.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Importance of async keyword

2015-06-26 Thread Jim J. Jewett



On Fri Jun 26 16:51:13 CEST 2015, Paul Sokolovsky wrote:

 So, currently in Python you know if you do:

socket.write(buf)

 Then you know it will finish without interruptions for entire buffer.

How do you know that?

Are you assuming that socket.write is a builtin, rather than a
python method?  (Not even a python wrapper around a builtin?)

Even if that were true, it would only mean that the call itself
is processed within a single bytecode ... there is no guarantee
that the write method won't release the GIL or call back into
python (and thereby allow a thread switch) as part of its own
logic.

 And if you write:

await socket.write(buf)

 then you know there may be interruption points inside socket.write(),
 in particular something else may mutate it while it's being written.

I would consider that external mutation to be bad form ... at least
as bad as violating the expectation of an atomic socket.write() up
above.

So either way, nothing bad SHOULD happen, but it might anyhow.  I'm
not seeing what the async-coloring actually bought you...

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of PEP 484 and the typing module

2015-05-22 Thread Jim J. Jewett



Mark Shannon wrote:

 PY2, etc. really need to go.
 Assuming that this code type checks OK:

  if typing.PY2:
  type_safe_under_py2_only()
  else:
  type_safe_under_py3_only()

 Is the checker supposed to pass this:

  if sys.hexversion  0x0300:
  type_safe_under_py2_only()
  else:
  type_safe_under_py3_only()

 If it should pass, then why have PY2, etc. at all.

My immediate response was that there really is a difference,
when doing the equivalent of cross-compilation.  It would
help to make this explicit in the PEP.

But ...
 If it should fail, well that is just stupid and annoying.

so I'm not sure regular authors (as opposed to typing tools)
would ever have reason to use it, and making stub files more
different from regular python creates an attractive nuisance
bigger than the clarification.

So in the end, I believe PY2 should merely be part of the calling
convention for type tools, and that may not be worth standardizing
yet.  It *is* worth explaining why they were taken out, though.

And it is worth saying explicitly that typing tools should override
the sys module when checking for non-native environments.


-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of PEP 484 and the typing module

2015-05-22 Thread Jim J. Jewett



At Thu May 21 22:27:50 CEST 2015, Guido wrote:

 I want to encourage users to think about annotations as types,
 and for most users the distinction between type and class is
 too subtle,

So what is the distinction that you are trying to make?

That a type refers to a variable (name), and a class refers to a
piece of data (object) that might be bound to that name?

Whatever the intended distinction is, please be explicit in the
PEP, even if you decide to paper it over in normal code.  For
example, the above distinction would help to explain why the
typing types can't be directly instantiated, since they aren't
meant to refer to specific data. (They can still be used as
superclasses because practicality beats purity, and using them
as a marker base class is practical.)

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Preserving the definition order of class namespaces.

2015-05-26 Thread Jim J. Jewett



On Sun May 24 12:06:40 CEST 2015, Nick Coghlan wrote:
 On 24 May 2015 at 19:44, Mark Shannon mark at hotpy.org wrote:
 On 24/05/15 10:35, Nick Coghlan wrote:
 If we leave __definition_order__ out for the time being then, for the
 vast majority of code, the fact that the ephemeral namespace used to
 evaluate the class body switched from being a basic dictionary to an
 ordered one would be a hidden implementation detail, rather than
 making all type objects a little bigger.

 and a little slower.

 The runtime namespace used to store the class attributes is remaining
 a plain dict object regardless,

Lookup isn't any slower in the ordereddict.

Inserts are slower -- and those would happen in the ordereddict, as
the type object is being defined.

Note that since we're talking about the type objects, rather than the
instances, most* long-running code won't care, but it will hurt startup
time.

*code which creates lots of throwaway classes is obviously an exception.

FWIW, much of the extra per-insert cost is driven by either the need to
keep deletion O(1) or the desire to keep the C layout binary compatible.

A different layout (with its own lookdict) could optimize for the
insert-each-value-once case, or even for small dicts (e.g., keyword
dicts).  I could imagine this producing a speedup, with the ordering
being just a side benefit.  It is too late to use such a new layout by
default in 3.5, but we should be careful not to close it off.  (That
said, I don't think __definition_order__ would actually close it off,
though it might start to look like a wart.)

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEPs and PEP 8 changes

2015-09-05 Thread Jim J. Jewett

PEP 498 is only the latest PEP where part of the concern is fear that
it may encourage certain types of bad code.

Would it be reasonable to ask PEPs to start including a section on any
recommended changes to PEP8?  (e.g., "If an embedded expression
doesn't fit on a single line, factor it out to a named variable.")

I realize that there will be times when best practices (or common
mistakes) aren't obvious in advance, but I'm a bit uncomfortable with
"PEP 8 will probably grow advice"... if we expect to need such advice,
then we should probably include it from the start.  (PEP 8 is, after
all, only advice.)

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 509

2016-01-12 Thread Jim J. Jewett

(1)  Please make it clear within the abstract what counts as a change.

(1a)  E.g., a second paragraph such as "Adding or removing a key, or
replacing a value, counts as a change.  Modifying an object in place,
or replacing it with itself may not be picked up."

(1b)  Is there a way to force a version update?

d[k]=d[k] seems like it should do that (absent the optimization to
prevent it), but I confess that I can't come up with a good use case
that doesn't start seeming internal to a specific optimizer.

(1c)  Section "Guard against changing dict during iteration" says
"Sadly, the dictionary version proposed in this PEP doesn't help to
detect dictionary mutation."  Why not?  Wouldn't that mutation involve
replacing a value, which ought to trigger a version change?


(2)  I would like to see a .get on the guard object, so that it could
be used in place of the dict lookup even from python.  If this doesn't
make sense (e.g., doesn't really save time since the guard has to be
used from python), please mention that in the Guard Example text.


(3)  It would be possible to define the field as reserved in the main
header, and require another header to use it even from C.

(3a)  This level of privacy might be overkill, but I would prefer that
the decision be explicit.

(3b)  The change should almost certainly be hidden from the ABI / Py_LIMITED_API

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Update PEP 7 to require curly braces in C

2016-01-19 Thread Jim J. Jewett

> On Jan 17, 2016, at 11:10, Brett Cannon  wrote:

>> While doing a review of http://bugs.python.org/review/26129/ 
>> ... update PEP 7 to remove the optionality of curly braces

On Mon Jan 18 03:39:42 EST 2016, Andrew Barnert pointed out:

> There are two ways you could do that.

[The one most people are talking about, which often makes an if-clause
visually too heavy ... though Alexander Walters pointed out that
"Any excuse to break code out into more functions... is usually the
right idea."]

if (!obj) {
return -1;
}

> Alternatively, it could say something like "braces must not be omitted;
> when other C styles would use a braceless one-liner, a one-liner with
> braces should be used instead; otherwise, they should be formatted as follows"

That "otherwise" gets a bit awkward, but I like the idea.  Perhaps
"braces must not be omitted, and should normally be formatted as
follows. ... Where other C styles would permit a braceless one-liner,
the expression and braces may be moved to a single line, as follows: "

if (x > 5) { y++ }

I think that is clearly better, but it may be *too* lightweight for
flow control.

if (!obj)
{ return -1; }

does work for me, and I think the \n{} may actually be useful for
warning that flow control takes a jump.

One reason I posted was to point to a specific example already in PEP 7
itself:

if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 &&
type->tp_dictoffset == b_size &&
(size_t)t_size == b_size + sizeof(PyObject *))
return 0; /* "Forgive" adding a __dict__ only */

For me, that return is already visually lost, simply because it shares
an indentation with the much larger test expression.  Would that be
better as either:

/* "Forgive" adding a __dict__ only */
if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 &&
type->tp_dictoffset == b_size &&
(size_t)t_size == b_size + sizeof(PyObject *)) { return 0; }

or:

/* "Forgive" adding a __dict__ only */
if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 &&
type->tp_dictoffset == b_size &&
(size_t)t_size == b_size + sizeof(PyObject *)) {
return 0;
}

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] pathlib (was: Defining a path protocol)

2016-04-07 Thread Jim J. Jewett

(1)  I think the "built-in" should instead be a module-level function
in the pathlib.  If you aren't already expecting pathlib paths, then
you're just expecting strings to work anyhow, and a builtin isn't
likely to be helpful.

(2)  I prefer that the function be explicit about the fact that it is
downcasting the representation to a string.  e.g.,
pathlib.path_as_string(my_path)

But if the final result is ospath or fspath or ... I won't fight too
hard, particularly since the output may be a bytestring rather than a
str.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-ideas] Add citation() to site.py

2016-03-23 Thread Jim J. Jewett

On Sun Mar 20 16:26:03 EDT 2016, Nikolaus Rath wrote:

> Which I believe makes it completely pointless to cite Python at all. As
> far as I can see, nowadays citations are given for two reasons:

> 1. To give the reader a starting point to get more information on a
>topic.

I don't often see references to good "starting points", but I'll grant
the "get more information".

> 2. To formally acknowledge the work done by someone else (who ends up
>with an increased number of citations for the cited publication,
>which is unfortunately a crucial metric in most academic hiring and
>evaluation processes).

There is a third category, of reader service.  

When I as a reader have wanted to follow a citation, it was because I
wanted to know more about the specific claim it supposedly supported.
In a few cases -- and these were probably the cases most valuable to
the authors -- I wanted to build on the work, or test it out under
new conditions.  Ideally, my first step was to replicate the original
result, to ensure that anything new I found was really caused by the
intentional changes.  If I was looking at a computational model, I
really didn't even have the excuse of "too expensive to run that
many subjects."

For papers more than a few years old, even if the code was available,
it generally didn't run -- and often didn't even compile.  Were there
a few missing utility files, or had they been using a language variant
different from what had eventually become the standard?

Obviously, it would have been better to just get a copy of the
original environment, PDP and all.  In real life, it was very
helpful to know which version of which compiler the authors had
been using.  Even the authors who had managed to save their code
didn't generally remember that level of detail about the original
environment.

Python today has much better backwards compatibility, but ...
if some junior grad student (maybe not in CS) today came across code
raising strings instead of Exceptions, how confident would she be that
she had the real code, as opposed to a mangled transcription?  Would it
help if the paper had a citation that specified CPython 2.1 and she
could still download a version of that ... where it worked?

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-15 Thread Jim J. Jewett


On Thu Apr 14 11:19:42 EDT 2016, Victor Stinner posted the latest
draft of PEP 509; dict version_tag

(1)  Meta Question:  If this is really only for CPython, then is
"Standards Track" the right classification?

(2)  Why *promise* not to update the version_tag when replacing a
value with itself?  Isn't that the sort of quality-of-implementation
issue that got pushed to a note for objects that happen to be
represented as singletons, such as small integers or ASCII chars?

I think it is a helpful optimization, and worth documenting ... I
just think it should be at the layer of "this particular patch",
rather than something that sounds like part of the contract.

e.g.,

... The global version is also incremented and copied to the
dictionary version at each dictionary change.  The following
dict methods can trigger changes:

* ``clear()`` 
* ``pop(key)``
* ``popitem()`` 
* ``setdefault(key, value)`` 
* ``__detitem__(key)`` 
* ``__setitem__(key, value)`` 
* ``update(...)``

.. note::  As a quality of implementation issue, the actual patch
does not increment the version_tag when it can prove that there
was no actual change.  For example, clear() on an already-empty
dict will not trigger a version_tag change, nor will updating a
dict with itself, since the values will be unchanged.  For efficiency,
the analysis considers only object identity (not equality) when
deciding whether to increment the version_tag.

[2A] Do you want to promise that replacing a value with a
non-identical object *will* trigger a version_tag update *even*
if the objects are equal?

I would vote no, but I realize backwards-compatibility may create
such a promise implicitly.

(3)  It is worth being explicit on whether empty dicts can share
a version_tag of 0.  If this PEP is about dict content, then that
seems fine, and it may well be worth optimizing dict creation.

There are times when it is important to keep the same empty dict;
I can't think of any use cases where it is important to verify
that some *other* code has done so, *and* I can't get a reference
to the correct dict for an identity check.

(4)  Please be explicit about the locking around version++; it
is enough to say that the relevant methods already need to hold
the GIL (assuming that is true).

(5)  I'm not sure I understand the arguments around a per-entry
version.

On the one hand, you never need a strong reference to the value;
if it has been collected, then it has obviously been removed from
the dict and should trigger a change even with per-dict.

On the other hand, I'm not sure per-entry would really allow
finer-grained guards to avoid lookups; just because an entry hasn't
been modified doesn't prove it hasn't been moved to another location,
perhaps by replacing a dummy in a slot it would have preferred.

(6)  I'm also not sure why version_tag *doesn't* solve the problem
of dicts that fool the iteration guards by mutating without changing
size ( https://bugs.python.org/issue19332 ) ... are you just saying
that the iterator views aren't allowed to rely on the version-tag
remaining stable, because replacing a value (as opposed to a
key-value pair) is allowed?

I had always viewed the failing iterators as a supporting-this-case-
makes-the-code-too-slow-and-ugly limitation, rather than a data
integrity check.  When I do care about the data not changing,
(an exposed variant of) version_tag is as likely to be what I want as
a hypothetical keys_version_tag would be. 

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-15 Thread Jim J. Jewett

On Fri, Apr 15, 2016 at 4:41 PM, Victor Stinner
<victor.stin...@gmail.com> wrote:
> 2016-04-15 19:54 GMT+02:00 Jim J. Jewett <jimjjew...@gmail.com>:

>> (2)  Why *promise* not to update the version_tag when replacing a
>> value with itself?

> It's an useful property. For example, let's say that you have a guard
> on globals()['value']. The guard is created with value=3. An unit test
> replaces the value with 50, but then restore the value to its previous
> value (3). Later, the guard is checked to decide if an optimization
> can be used.

> If the dict version is increased, you need a lookup. If the dict
> version is not increased, the guard is cheap.

I would expect the version to be increased twice, and therefore to
require a lookup.  Are you suggesting that unittest should provide an
example of resetting the version back to the original value when it
cleans up after itself?

> In C, it's very cheap to implement the test "new_value == old_value",
> it just compares two pointers.

Yeah, I understand that it is likely a win in terms of performance,
and a good way to start off (given that you're willing to do the
work).

I just worry that you may end up closing off even better optimizations
later, if you make too many promises about exactly how you will do
which ones.

Today, dict only cares about ==, and you (reasonably) think that full
== isn't always worth running ... but when it comes to which tests
*are* worth running, I'm not confident that the answers won't change
over the years.

>> [2A] Do you want to promise that replacing a value with a
>> non-identical object *will* trigger a version_tag update *even*
>> if the objects are equal?

> It's already written in the PEP:

I read that as a description of what the code does, rather than a spec
for what it should do... so it isn't clear whether I could count on
that remaining true.

For example, if I know that my dict values are all 4-digit integers,
can I write:

d[k]  = d[k] + 0

and be assured that the version_tag will bump?  Or is that something
that a future optimizer might optimize out?

>> (3)  It is worth being explicit on whether empty dicts can share
>> a version_tag of 0.  If this PEP is about dict content, then that
>> seems fine, and it may well be worth optimizing dict creation.

> This is not part of the PEP yet. I'm not sure that I will modify the
> PEP to use the version 0 for empty dictionaries. Antoine doesn't seem
> to be convinced :-)

True.  But do note that "not hitting the global counter an extra time
for every dict creation" is a more compelling reason than "we could
speed up dict.clear(), sometimes".

>> (4)  Please be explicit about the locking around version++; it
>> is enough to say that the relevant methods already need to hold
>> the GIL (assuming that is true).

> I don't think that it's important to mention it in the PEP. It's more
> an implementation detail. The version can be protected by atomic
> operations.

Now I'm the one arguing from a specific implementation.  :D

My thought was that any sort of locking (including atomic operations)
is slow, but if the GIL is already held, then there is no *extra*
locking cost. (Well, a slightly longer hold on the lock, but...)

>> (5)  I'm not sure I understand the arguments around a per-entry
>> version.

>> On the one hand, you never need a strong reference to the value;
>> if it has been collected, then it has obviously been removed from
>> the dict and should trigger a change even with per-dict.
>
> Let's say that you watch the key1 of a dict. The key2 is modified, it
> increases the version. Later, you test the guard: to check if the key1
> was modified, you need to lookup the key and compare the value. You
> need the value to compare it.

And the value for key1 is still there, so you can.

The only reason you would notice that the key2 value had gone away is
if you also care about key2 -- in which case the cached value is out
of date, regardless of what specific value it used to hold.

>> (6)  I'm also not sure why version_tag *doesn't* solve the problem
>> of dicts that fool the iteration guards by mutating without changing
>> size ( https://bugs.python.org/issue19332 ) ... are you just saying
>> that the iterator views aren't allowed to rely on the version-tag
>> remaining stable, because replacing a value (as opposed to a
>> key-value pair) is allowed?

> If the dictionary values are modified during the loop, the dict
> version is increased. But it's allowed to modify values when you
> iterate on *keys*.

Sure.  So?

I see three cases:

(A)  I don't care that the collection changed.  The python
implementation might, but I don't.  (So no bug even today.)

(B)  I want to process exactly the collection that I started wit

Re: [Python-Dev] Updated PEP 509

2016-04-18 Thread Jim J. Jewett

On Sat, Apr 16, 2016 at 5:01 PM, Victor Stinner
 wrote:
> * I mentionned that version++ must be atomic, and that in the case of
> CPython, it's done by the GIL

Better; if those methods *already* hold the GIL, it is worth saying
"already", to indicate that the change is not expensive.

> * I removed the dict[key]=value; dict[key]=value. It's really a
> micro-optimization. I also fear that Raymond will complain because it
> adds an if in the hot code of dict, and the dict type is very
> important for Python performance.

That is an acceptable answer.  Though I really do prefer explicitly
*refusing to promise* either way when the replacement/replaced objects
are ==.

dicts (and other collections) already assume sensible ==, even
explicitly allowing self-matches of objects that are not equal to
themselves.  I don't like the idea of making new promises that violate
(or rely on violations of) that sensible == assumption.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] RFC: PEP 509: Add a private version to dict

2016-04-18 Thread Jim J. Jewett

On Fri, Apr 15, 2016 at 7:31 PM, Victor Stinner
<victor.stin...@gmail.com> wrote:
> .2016-04-15 23:45 GMT+02:00 Jim J. Jewett <jimjjew...@gmail.com>:
...
>> I just worry that you may end up closing off even better optimizations
>> later, if you make too many promises about exactly how you will do
>> which ones.

>> Today, dict only cares about ==, and you (reasonably) think that full
>> == isn't always worth running ... but when it comes to which tests
>> *are* worth running, I'm not confident that the answers won't change
>> over the years.

> I checked, currently there is no unit test for a==b, only for a is b.
> I will add add a test for a==b but a is not b, and ensure that the
> version is increased.

Again, why?  Why not just say "If an object is replaced by something
equal to itself, the version_tag may not be changed.  While the
initial heuristics are simply to check for identity but not full
equality, this may change in future releases."

>> For example, if I know that my dict values are all 4-digit integers,
>> can I write:
>>
>> d[k]  = d[k] + 0
>>
>> and be assured that the version_tag will bump?  Or is that something
>> that a future optimizer might optimize out?

> Hum, I will try to clarify that.

I would prefer that you clarify it to say that while the initial patch
doesn't optimize that out, a future optimizer might.

> The problem with storing an identifier (a pointer in C) with no strong
> reference is when the object is destroyed, a new object can likely get
> the same identifier. So it's likely that "dict[key] is old_value_id"
> can be true even if dict[key] is now a new object.

Yes, but it shouldn't actually be destroyed until it is removed from
the dict, which should change version_tag, so that there will be no
need to compare it.

> Do you want to modify the PEP 509 to fix this issue? Or you don't
> understand why the PEP 509 cannot be used to fix the issue? I'm
> lost...

I believe it *does* fix the issue in some (but not all) cases.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PyWeakref_GetObject() borrows its reference from... whom?

2016-10-13 Thread Jim J. Jewett

On Mon, Oct 10, 2016, at 14:04, MRAB wrote:
> Instead of locking the object, could we keep the GIL, but have it
> normally released?

> A thread could then still call a function such as PyWeakref_GetObject()
> that returns a borrowed reference, but only if it's holding the GIL. It
> would be able to INCREF the reference before releasing the GIL again.

So you need to get/release the GIL just to run a slightly faster function
that doesn't bother with an extra incref/defcref pair?  I think anyone
willing to make those changes would be willing to switch to a
non-borrowing version of that same function, and do an explicit
DECREF if that is really what they wanted.

On Tue, Oct 11, 2016 at 5:24 AM, Random832  wrote:
> So, what stops the other thread which never asks for the GIL from
> blowing away the reference? Or is this a special kind of lock that you
> can "assert isn't locked" without locking it for yourself, and
> INCREF/DECREF does so?

On Mon Oct 10 15:36:59 EDT 2016, Chris Angelico wrote:
> "assert isn't locked" is pretty cheap

Yeah, but so is INCREF/DECREF on memory that is almost certainly
in cache anyhow, because you're using the object right next to it.

The write part hurts, particularly when trying to use multiple cores
with shared memory, but any sort of indirection (even separating the
refcount from the object, to allow per-core counters) ... well, it
doesn't take much at all to be worse than INCREF/DECREF in even
the normal case, let alone amortized across the the "drat, this
object now has to be handled specially" cases.

Imagine two memory pools, one for "immortal" objects (such as
None) that won't be collected, and so don't need their memory
dirtied when you INCREF/DECREF.  Alas, now *every* INCREF and DECREF
has to branch on the address to tell whether or not it should be a
no-op.

-jJ

--

If there are still threading problems with my replies, please
email me with details, so that I can try to resolve them.  -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] startup time repeated? why not daemon

2017-07-20 Thread Jim J. Jewett

I agree that startup time is a problem, but I wonder if some of the pain
could be mitigated by using a persistent process.

For example, in
https://mail.python.org/pipermail/python-dev/2017-July/148664.html Ben Hoyt
mentions that the Google Cloud SDK (CLI) team has found it "especially
problematic for shell tab completion helpers, because every time you press
tab the shell has to load your Python program"

Decades ago, I learned to set my editor to vi instead of emacs for similar
reasons -- but there was also an emacsclient option that simply opened a
new window from an already running emacs process.  tab completion seems
like the exactly the sort of thing that should be sent to an existing
process instead of creating a new one.

Is it too hard to create a daemon server?
Is the communication and context switch slower than a new startup?
Is the pattern just not well-enough advertised?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 550 v3 naming

2017-08-20 Thread Jim J. Jewett

Building on Brett's suggestion:

FrameContext: used in/writable by one frame
ContextStack: a FrameContext and its various fallbacks

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap

2017-08-24 Thread Jim J. Jewett

On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017
at 12:32 AM, Jim J. Jewett <jimjjew...@gmail.com> wrote:

> The key requirement for using immutable datastructures is to make
> "get_execution_context" operation fast.

Do you really need the whole execution context, or do you just need
the current value of a specific key?  (Or, sometimes, the writable
portion of the context.)

>  Currently, the PEP doesn't do
> a good job at explaining why we need that operation and why it will be
> used by asyncio.Task and call_soon, so I understand the confusion.

OK, the schedulers need the whole context, but if implemented as a
ChainMap (instead of per-key), isn't that just a single constant?  As
in, don't they always schedule to the same thread?  And when they need
another map, isn't that because the required context is already
available from whichever code requested the scheduling?

>> (A)  How many values do you expect a typical generator to use?  The
>> django survey suggested mostly 0, sometimes 1, occasionally 2.  So
>> caching the values of all possible keys probably won't pay off.

> Not many, but caching is still as important, because some API users
> want the "get()" operation to be as fast as possible under all
> conditions.

Sure, but only because they view it as a hot path; if the cost of that
speedup is slowing down another hot path, like scheduling the
generator in the first place, it may not be worth it.

According to the PEP timings, HAMT doesn't beat a copy-on-write dict
until over 100 items, and never beats a regular dict.That suggests
to me that it won't actually help the overall speed for a typical (as
opposed to worst-case) process.

>> And, of course, using a ChainMap means that the keys do NOT have to be
>> predefined ... so the Key class really can be skipped.

> The first version of the PEP had no ContextKey object and the most
> popular complaint about it was that the key names will clash.

That is true of any global registry.  Thus the use of keys with
prefixes like com.sun.

The only thing pre-declaring a ContextKey buys in terms of clashes is
that a sophisticated scheduler would have less difficulty figuring out
which clashes will cause thrashing in the cache.

Or are you suggesting that the key can only be declared once (as
opposed to once per piece of code), so that the second framework to
use the same name will see a RuntimeError?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap

2017-08-25 Thread Jim J. Jewett

On Aug 24, 2017 11:02 AM, "Yury Selivanov" <yselivanov...@gmail.com> wrote:

On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett <jimjjew...@gmail.com>
wrote:
> On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017
> at 12:32 AM, Jim J. Jewett <jimjjew...@gmail.com> wrote:

If you look at this small example:

foo = new_context_key()

async def nested():
await asyncio.sleep(1)
print(foo.get())

async def outer():
foo.set(1)
await nested()
foo.set(1000)

l = asyncio.get_event_loop()
l.create_task(outer())
l.run_forever()

If will print "1", as "nested()" coroutine will see the "foo" key when
it's awaited.

Now let's say we want to refactor this snipped and run the "nested()"
coroutine with a timeout:

foo = new_context_key()

async def nested():
await asyncio.sleep(1)
print(foo.get())

async def outer():
foo.set(1)
await asyncio.wait_for(nested(), 10)  # !!!
foo.set(1000)

l = asyncio.get_event_loop()
l.create_task(outer())
l.run_forever()

So we wrap our `nested()` in a `wait_for()`, which creates a new
asynchronous tasks to run `nested()`.  That task will now execute on
its own, separately from the task that runs `outer()`.  So we need to
somehow capture the full EC at the moment `wait_for()` was called, and
use that EC to run `nested()` within it.  If we don't do this, the
refactored code would print "1000", instead of "1".

I would expect 1000 to be the right answer!  By the time it runs, 1000 (or
mask_errors=false, to use a less toy example) is what its own controlling
scope requested.

If you are sure that you want the value frozen earlier, please make this
desire very explicit  ... this example is the first I noticed it.  And
please explain what this means for things like signal or warning masking.

ContextKey is declared once for the code that uses it. Nobody else
will use that key. Keys have names only for introspection purposes,
the implementation doesn't use it, iow:

var = new_context_key('aa')
var.set(1)
# EC = [..., {var: 1}]

# Note the that EC has a "var" object itself as the key in the
mapping, not "a".

This I had also not realized.  So effectively, they keys are based on
object identity, with some safeguards to ensure that even starting with the
same (interned) name will *not* produce the same object unless you passed
it around explicitly, or are in the same same code unit (file, typically).

This strikes me as reasonable, but still surprising.  (I think of variables
as typically named, rather than identified by address.)  So please make
this more explicit as well.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 550 and other python implementations

2017-08-25 Thread Jim J. Jewett

Should PEP 550 discuss other implementations?  E.g., the object space used
in pypy?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Pep 550 and None/masking

2017-08-27 Thread Jim J. Jewett

Does setting an ImplicitScopeVar to None set the value to None, or just
remove it?

If it removes it, does that effectively unmask a previously masked value?

If it really sets to None, then is there a way to explicitly unmask
previously masked values?

Perhaps the initial constructor should require an initial value
 (defaulting to None) and the docs should give examples both for using a
sensible default value and for using a special "unset" marker.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Pep 550 module

2017-08-27 Thread Jim J. Jewett

I think there is general consensus that this should go in a module other
than sys. (At least a submodule.)

The specific names are still To Be Determined, but I suspect seeing the
functions and objects as part of a named module will affect what works.

So I am requesting that the next iteration just pick a module name, and let
us see how that looks.  E.g

import dynscopevars

user=dynscopevars.Var ("username")

myscope=dynscopevars.get_current_scope()

childscope=dynscopevars.Scope (parent=myscope,user="bob")


-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap

2017-08-23 Thread Jim J. Jewett

In https://mail.python.org/pipermail/python-dev/2017-August/148869.html
Nick Coghlan wrote:

> * what we want to capture at generator creation time is
>   the context where writes will happen, and we also
>   want that to be the innermost context used for lookups

So when creating a generator, we want a new (empty) ImplicitContext
map to be the head of the ChainMap.  Each generator should have one of
its own, just as each generator has its own frame.  And the ChainMap
delegation goes up the call stack, just as an exception would.
Eventually, it hits the event loop (or other Executor) which is
responsible for ensuring that the ChainMap eventually defers to the
proper (Chain)Map for this thread or Task.

> While the context is defined conceptually as a nested chain of
> key:value mappings, we avoid using the mapping syntax because of the
> way the values can shift dynamically out from under you based on who
> called you
...
> instead of having the problem of changes inside the
> generator leaking out, we instead had the problem of
> changes outside the generator *not* making their way in

I still don't see how this is different from a ChainMap.

If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C,
d_mine]  maps as your implicit context, then a change to d_thread map
(that some other code could make) will be visible unless it is masked.

Similarly, if the fallback for d_C changes from d_B to d_B1 (which
points directly to d_thread), that will be visible for any keys that
were previously resolved in d_A or d_B, or are now resolved in dB1.

Those seem like exactly the cases that would (and should) cause
"shifting values".

This does mean that you can't cache everything in the localmost map,
but that is a problem with the optimization regardless of how the
implementation is done.

In https://mail.python.org/pipermail/python-dev/2017-August/148873.html
Yury Selivanov wrote:

> Any code that uses EC will not see any difference
[between mutable vs immutable but replaced LC maps],
> because it can only work with the top LC.

> Back to generators. Generators have their own empty LCs when created
> to store their *local* EC modifications.

OK, so just as they have their own frame, they have their own
ChainMap, and the event loop is responsible for resetting the fallback
when it schedules them.

> When a generator is *being* iterated, it pushes its LC to the EC. When
> the iteration step is finished, it pops its LC from the EC.

I'm not sure it helps to think of a single stack.  When the generator
is active, it starts with its own map.  When it is in the call chain
of the active generator, its map will be in the chain of delegations.
When neither it nor a descendant are active, no code will end up
delegating to it.

If the delegation graph has 543 generators delegating directly to the
thread-wide map, there is no reason to pop/push an execution stack
every time a different generator is scheduled, since only that
generator itself (and code it calls) will even care.

> HAMT is a way to efficiently implement immutable mappings ...
> using regular dicts and copy, set() would be O(log N)

Using a ChainMap, set affects only the localmost map and is therefore
O(1).  get could require stacksize lookups, but ...

(A)  How many values do you expect a typical generator to use?  The
django survey suggested mostly 0, sometimes 1, occasionally 2.  So
caching the values of all possible keys probably won't pay off.

(B)  Other than the truly global context and thread-level context, how
many of these maps do you expect to be non-empty?

(C)  How deep do you expect the stack to get?  Are we talking about
100 layers of mappings to check between the current generator and the
thread-wide defaults?  Even if we are, verifying that there hasn't
been a change in some mid-level layer requires tracking the versions
of each mid-level layer.  (If version is globally unique, that would
also ensure that the stack hasn't changed.)  Is that really faster
than checking that the map is empty?

And, of course, using a ChainMap means that the keys do NOT have to be
predefined ... so the Key class really can be skipped.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 560: bases classes / confusion

2017-11-15 Thread Jim J. Jewett

(1)  I found the following (particularly "bases classes") very confusing:

"""
If an object that is not a class object appears in the bases of a class

definition, then ``__mro_entries__`` is searched on it. If found,
it is called with the original tuple of bases as an argument. The result
of the call must be a tuple, that is unpacked in the bases classes in place
of this object. (If the tuple is empty, this means that the original bases
is
simply discarded.)
"""

Based on the following GenericAlias/NewList/Tokens example, I think I
now I understand what you mean, and would have had somewhat less
difficulty if it were expressed as:

"""
When an object that is not a class object appears in the (tuple of)
bases of a class
definition, then attribute ``__mro_entries__`` is searched on that
non-class object.  If ``__mro_entries__`` found,
it is called with the entire original tuple of bases as an argument. The result
of the call must be a tuple, which is unpacked and replaces only the
non-class object in the tuple of bases.  (If the tuple is empty, this
means that the original bases
is
simply discarded.)
"""

Note that this makes some assumptions about the __mro_entries__
signature that I wasn't quite sure about from the example.  So
building on that:

class ABList(A, NewList[int], B):

I *think* the following will happen:

"NewList[int]" will be evaluated, and __class_getitem__ called, so
that the bases tuple will be (A, GenericAlias(NewList, int), B)

# (A)  I *think* __mro_entries__ gets called with the full tuple,
# instead of just the object it is found on.
# (B) I *think* it is called on the results of evaluating
# the terms within the tuple, instead of the original
# string representation.
_tmp = __mro_entries__(A, GenericAlias(NewList, int), B)

# (C)  I *think* __mro_entries__ returns a replacement for
# just the single object, even though it was called on
# the whole tuple, without knowing which object it
# represents.
bases = (A, _tmp, B)

# (D) If there are two non-class objects, I *think* the
# second one gets the same arguments as the first,
# rather than an intermediate tuple with the first such
# object already substituted out.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] unfrozen dataclasses and hash (subsets are OK)

2018-02-04 Thread Jim J. Jewett

I understand auto-generating the __hash__ (and __eq__) for a frozen
container; that is just convenient.

But why is there any desire to autogenerate a __hash__ for something
that isn't frozen?  Like a list or dict, the normal case would be for
it not to have a hash at all, and the author *should* write out any
explicit exceptions.

The objection to that seems to be that someone might forget to add
another field to the hash during later maintenance -- but so what?

__hash__ should reference a subset of the fields used for equality,
and strict subsets are OK.  It *should* ignore some fields if that
will provide the right balance between quick calculation and
sufficient dispersion.  If the record is complicated enough that
forgetting a field is a likely problem, then the hash is probably
already sufficiently complex without those new fields.

 -jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Dataclasses, frozen and __post_init__

2018-02-22 Thread Jim J. Jewett

On Mon, Feb 19, 2018 at 5:06 PM, Chris Barker - NOAA Federal <
chris.barker at noaa.gov> wrote:

> If I have this right, on the discussion about frozen and hash, a use
> case was brought up for taking a few steps to create an instance (and
> thus wanting it not frozen) and then wanting it hashable.

> Which pointed to the idea of a “ freeze this from now on” method.

> This seems another use case — maybe it would be helpful to be able to
> freeze an instance after creation for multiple use-cases?

Yes, it would be helpful.  But in practice, I've just limited the hash
function to only the attributes that are available before I need to
stick the object in a dict.  In practice, that has always been more
than sufficient.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Nickname Binding (PEP 572)

2018-04-26 Thread Jim J. Jewett

I think part of the disconnect is that this enhancement could very
easily be abused, and it seems likely that it will be, because the
problems aren't visible while writing the code -- only when reading it
later.

I therefore suggest making it very clear in the PEP -- and probably in
PEP 8 --  how these expressions should be limited.  Simply renaming
them to "nickname binding" would be start, but here is a rough draft
for wording.



When scanning code by eye, it is helpful that assignments are (almost)
always at the start of a line.  Even def and class statements can
cause confusion if the reader didn't realize that the name referred to
a class, rather than an instance.  Moving assignments to the middle of
a line will make it harder for someone else to read your code -- so
don't do that.

A nickname is just a regular name, except that it also suggests an
intimate environment.  If the name is purely for documentation, or
will be used only later in the same expression (or, possibly, the same
block or just after), then a nickname may be appropriate.  But

*  If you are wondering what to do about type hints, then the
expression is probably too complex to leave inline.  Separate it out
into a regular assignment statement; nicknames do not support type
hints.

*  If you will be saving the value -- even as an attribute on self --
there is a chance it will be retrieved in a very different context.
Use a regular assignment statement; nicknames are just simple names,
not attributes or keys.

*  If you will be using the value somewhere else in your code, use a
regular assignment statement.  This makes it easier to find, and warns
people that the value may be used again later.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 3 >

1 - 100 of 226 matches

Mail list logo