from:"Erik"

On Sun, 17 Oct 2021, Steven D'Aprano wrote:

On Sat, Oct 16, 2021 at 11:42:49AM -0400, Erik Demaine wrote:

I guess the question is whether to define `(*it for it in its)` to mean
tuple or generator comprehension or nothing at all.

I don't see why that is even a question. We don't have tuple
comprehensions and `(expr for x in items)` is always a generator, never
a tuple. There's no ambiguity there. Why would allowing unpacking turn
it into a tuple?

Agreed. I got confused by the symmetry.

The only tricky corner case is that generator comprehensions can forgo
the surrounding brackets in the case of a function call:

func( (expr for x in items) )
func( expr for x in items ) # we can leave out the brackets

But with the unpacking operator, it is unclear whether the unpacking
star applies to the entire generator or the inner expression:

func(*expr for x in items)

That could be read as either:

it = (expr for x in items)
func(*it)

or this:

it = (*expr for x in items)
func(it)

Of course we can disambiguate it with precedence rules, [...]

I'd be inclined to go that way, as the latter seems like the only reasonable
(to me) parse for that syntax. Indeed, that's how the current parser
interprets this:

```
func(*expr for x in items)
^
SyntaxError: iterable unpacking cannot be used in comprehension
```

To get the former meaning, which is possible today, you already need
parentheses, as in

func(*(expr for x in items))

But it would be quite surprising for this minor issue to lead to the
major inconsistency of prohibiting unpacking inside generator comps when
it is allowed in list, dict and set comps.

Good point. Now I'm much more inclined to define the generator expression
`(*expr for x in items)`. Thanks for your input!

On Sat, 16 Oct 2021, Serhiy Storchaka wrote:

It was considered and rejected in PEP 448. What was changed since? What
new facts or arguments have emerged?

I need to read the original discussion more (e.g.
https://mail.python.org/pipermail/python-dev/2015-February/138564.html), but
you can see the summary of why it was removed here:
https://www.python.org/dev/peps/pep-0448/#variations

In particular, there was "limited support" before (and the generator ambiguity
issue discussed above). I expect now that we've gotten to enjoy PEP 448 for 5
years, it's more "obvious" that this functionality is missing and useful. So
far that seems true (all responses have been at least +0), but if anyone
disagree, please say so.

Erik
--
Erik Demaine | edema...@mit.edu | http://erikdemaine.org/
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/DGPZMQXAZG55J4HLACIXMBZFCTEM6FPG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Unpacking in tuple/list/set/dict comprehensions


On Sat, 16 Oct 2021, David Mertz, Ph.D. wrote:


On Sat, Oct 16, 2021, 10:10 AM Erik Demaine
  (*it1, *it2, *it3)  # tuple with the concatenation of three
  iterables
  [*it1, *it2, *it3]  # list with the concatenation of three
  iterables
  {*it1, *it2, *it3}  # set with the union of three iterables
  {**dict1, **dict2, **dict3}  # dict with the combination of
  three dicts 

I'm +0 on the last three of these. 


But the first one is much more suggestive of a generator comprehension. I
would want/expect it to be equivalent to itertools.chain(), not create a
tuple.


I guess you were referring to `(*it for it in its)` (proposed notation) rather 
than `(*it1, *it2, *it3)` (which already exists and builds a tuple).


Very good point!  This is confusing.  I could also read `(*it for it in its)` 
as wanting to build the following generator (or something like it):


```
def generate():
for it in its:
yield from it
```

I guess the question is whether to define `(*it for it in its)` to mean tuple 
or generator comprehension or nothing at all.  Tuples are nice because they 
mirror `(*it1, *it2, *it3)` but bad for the reasons you raise:



Moreover, it is an anti-pattern to create large and indefinite sized tuples,
whereas such large collections as lists, sets, and dicts are common and
useful.


I'd be inclined to not define `(*it for it in its)`, given the ambiguity.

Assuming the support remains relatively unanimous for [*...], {*...}, and 
{**...} (thanks for all the quick replies!), I'll put together a PEP.


On Sat, 16 Oct 2021, Guido van Rossum wrote:


Seems sensible to me. I’d write the equivalency as

for x in y: answer.extend([…x…])


Oh, nice!  That indeed works in all cases.

Erik
--
Erik Demaine  |  edema...@mit.edu  |  http://erikdemaine.org/___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2AZBMZGKL56PERIJRCPTIJ6BRITTWHGM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Unpacking in tuple/list/set/dict comprehensions

Extended unpacking notation (* and **) from PEP 448 gives us great ways to 
concatenate a few iterables or dicts:


```
(*it1, *it2, *it3)  # tuple with the concatenation of three iterables
[*it1, *it2, *it3]  # list with the concatenation of three iterables
{*it1, *it2, *it3}  # set with the union of three iterables
{**dict1, **dict2, **dict3}  # dict with the combination of three dicts
# roughly equivalent to dict1 | dict2 | dict3 thanks to PEP 584
```

I propose (not for the first time) that similarly concatenating an unknown 
number of iterables or dicts should be possible via comprehensions:


```
(*it for it in its)  # tuple with the concatenation of iterables in 'its'
[*it for it in its]  # list with the concatenation of iterables in 'its'
{*it for it in its}  # set with the union of iterables in 'its'
{**d for d in dicts} # dict with the combination of dicts in 'dicts'
```

The above is my attempt to argue that the proposed notation is natural:
`[*it for it in its]` is exactly analogous to `[*its[0], *its[1], ..., 
*its[len(its)-1]]`.


There are other ways to do this, of course:

```
[x for it in its for x in it]
itertools.chain(*its)
sum(it for it in its, [])
functools.reduce(operator.concat, its, [])
```

But none are as concise and (to me, and hopefully others who understand * 
notation) as intuitive.  For example, I recently wanted to write a recursion 
like so, which accumulated a set of results from within a tree structure:


```
def recurse(node):
  # base case omitted
  return {*recurse(child) for child in node.children}
```

In fact, I am teaching a class and just asked a question on a written exam for 
which several students wrote this exact code in their solution (which inspired 
writing this message).  So I do think it's quite intuitive, even to those 
relatively new to Python.


Now, on to previous proposals.  I found this thread from 2016 (!); please let 
me know if there are others.


https://mail.python.org/archives/list/python-ideas@python.org/thread/SBM3LYESPJMI3FMTMP3VQ6JKKRDHYP7A/#DE4PCVNXBQJIGFBYRB2X7JUFZT75KYFR

There are several arguments for and against this feature in that thread.  I'll 
try to summarize:


Arguments for:

* Natural extension to PEP 448 (it's mentioned as a variant within PEP 448)

* Easy to implement: all that's needed in CPython is to *remove* some code 
blocking this.


Arguments against:

* Counterintuitive (to some)

* Hard to teach

* `[...x... for x in y]` is no longer morally equivalent to
`answer = []; for x in y: answer.append(...x...)`
(unless `list1.append(a, b)` were equivalent to `list1.extend([a, b])`)

Above I've tried to counter the first two "against" arguments.
Some counters to the third "against" argument:

1. `[*...x... for x in y]` is equivalent to
`answer = []; for x in y: answer.extend(...x...)`
(about as easy to teach, I'd say)

2. Maybe `list1.append(a, b)` should be equivalent to `list1.extend([a, b])`?
It is in JavaScript (`Array.push`).  And I don't see why one would expect
it to append a tuple `(a, b)`; that's what `list1.append((a, b))` is for.
I think the main argument against this is to avoid programming errors,
which is fine, but I don't see why it should hold back an advanced feature
involving both unpacking and comprehensions.

Erik
--
Erik Demaine  |  edema...@mit.edu  |  http://erikdemaine.org/
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7G732VMDWCRMWM4PKRG6ZMUKH7SUC7SH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Accessing target name at runtime


On Sat, 16 Oct 2021, Steven D'Aprano wrote:


The token should preferably be:

* self-explanatory, not line-noise;

* shorter rather than longer, otherwise it is easier to just
 type the target name as a string: 'x' is easier to type than
 NAME_OF_ASSIGNMENT_TARGET;

* backwards compatible, which means it can't be anything that
 is already a legal name or expression;

* doesn't look like an error or typo.


A possible soft keyword: __lhs__ (short for 'left-hand side'):


REGION = os.getenv(__lhs__)
db_url = config[REGION][__lhs__]


It's not especially short, and it's not backward-compatible,
but at least there's a history of adding double-underscore things.
Perhaps, for backward compatibility, the feature could be disabled in any 
scope (or file?) where __lhs__ is assigned, in which case it's treated like a 
variable as usual.  The magic version only applies when it's used in a 
read-only fashion.  It's kind of like a builtin variable, but its value 
changes on every line (and it's valid only in an assignment line).


One thing I wonder: what happens if you write the following?


foo[1] = __lhs__  # or <<< or whatever


Maybe you get 'foo[1]', or maybe this is invalid syntax, in the same way that 
the following is.



def foo[1]: pass



Classes, functions, decorators and imports already satisfy the "low
hanging fruit" for this functionality. My estimate is that well over 99%
of the use-cases for this fall into just four examples, which are
already satisfied by the interpreter:
[...]
# like func = decorator(func)
# similarly for classes
@decorator
def func(): ...


This did get me wondering about how you could simulate this feature with 
decorators.  Probably obvious, but here's the best version I came up with:


```
def env_var(x):
return os.getenv(x.__name__)

@env_var
def REGION(): pass
```

It's definitely ugly to avoid repetition...  Using a class, I guess we could 
at least get several such variables at once.



If we didn't already have interpreter support for these four cases, it
would definitely be worth coming up with a solution. But the use-cases
that remain are, I think, quite niche and uncommon.


To me (a mathematician), the existence of this magic in def, class, import, 
etc. is a sign that this is indeed useful functionality.  As a fan of 
first-class language features, it definitely makes me wonder whether it could 
be generalized.


But I'm not sure what the best mechanism is.  (From the description in the 
original post, I gather that variable assignment decorators didn't work out 
well.)  I wonder about some generalized mechanism for automatically setting 
the __name__ of an assigned object (like def and class), but I'm not sure what 
it would look like...


Erik
--
Erik Demaine  |  edema...@mit.edu  |  http://erikdemaine.org/
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BHGDRTX3BBYB66NINSTOPROTCIRKZNRU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: dict_items.getitem?

2021-10-11 Thread Erik Demaine

There seems to be a growing list of issues with adding `itertools.first(x)` as 
shorthand for `next(iter(x))`:


* If `x` is an iterator, it modifies the iterator, which is counterintuitive 
from the name `first`.


* It'll still be difficult for new users to find/figure out.

In the end, I feel like the main case I want to use a `first` and `last` 
functions on are `dict`s; other objects like `range`, `str`, `list`, `tuple` 
all support `[0]` and `[-1]`.


So I wonder whether we should go back to this idea:

On Tue, 5 Oct 2021, Alex Waygood wrote:


[...] Another possibility I've been wondering about was
whether several methods should be added to the dict interface:
 *  dict.first_key = lambda self: next(iter(self))
 *  dict.first_val = lambda self: next(iter(self.values()))
 *  dict.first_item = lambda self: next(iter(self.items()))
 *  dict.last_key = lambda self: next(reversed(self))
 *  dict.last_val = lambda self: next(reversed(self.values()))
 *  dict.last_item = lambda self: next(reversed(self.items()))
But I think I like a lot more the idea of adding general ways of doing these
things to itertools.


At the least, I wonder whether a `dict.lastitem` method that's the 
nondestructive analog of `dict.popitem` would be good to add.  This would 
solve the case of "I want an arbitrary item from this dict, I don't care which 
one, but I don't want to modify the dict so I'd rather not use popitem" which 
I've seen repeated a few times in this thread.


By contrast, I don't think `next(iter(my_dict))` is an intuitive way to solve 
this problem, even for many experts; and I don't think it's as efficient as 
`my_dict.lastitem()` would be, because the current `dict` code maintains a 
pointer to the last item but not to the first item.


[I also admit that I've mostly forgotten the original situation where I wanted 
this functionality.  I believe it was an exhaustive search, where I wanted to 
branch on an arbitrary item of a dict, and nondestructively build new versions 
of that dict for recursive calls (instead of modifying before recursion and 
unmodifying afterward).]



One more idea to throw around: Consider the following "anonymous unpacking" 
syntax.


```
first, * = [1, 2, 3]
*, last = [1, 2, 3]
```

For someone used to unpacking syntax, this seems like a natural extension to 
what we have now, and is far more flexible than just extracting the first 
element.  The distinction from the existing methods (with e.g. `*_`) is that 
it wouldn't waste time extracting elements you don't want.  And it could work 
well with things like `dict` (and `dict_items` etc.).


Erik
--
Erik Demaine  |  edema...@mit.edu  |  http://erikdemaine.org/___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/IQ2EJM5BTDEO4URUHN3XGR6XSXX22HFR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] dict_items.getitem?

2021-10-04 Thread Erik Demaine

Have folks thought about allowing indexing dictionary views as in the 
following code, where d is a dict object?


d.keys()[0]
d.keys()[-1]
d.values()[0]
d.values()[-1]
d.items()[0]
d.items()[-1]  # item that would be returned by d.popitem()

I could see value to the last form in particular: you might want to inspect 
the last item of a dictionary before possibly popping it.


I've also often wanted to get an arbitrary item/key from a dictionary, and 
d.items()[0] seems natural for this.  Of course, the universal way to get the 
first item from an iterable x is


item = next(iter(x))

I can't say this is particularly readable, but it is functional and fast.  Or 
sometimes I use this pattern:


for item in x: break

If you wanted the last item of a dictionary d (the one to be returned from 
d.popitem()), you could write this beautiful code:


last = next(iter(reversed(d.items(


Given the dictionary order guarantee from Python 3.7, adding indexing 
(__getitem__) to dict views seems natural.  The potential catch is that (I 
think) it would require linear time to access an item in the middle, because 
you need to count the dummy elements.  But accessing [i] and [-i] should be 
doable in O(|i|) time.  (I've wondered about the possibility of doing binary 
or interpolation search, but without some stored index signposts, I don't 
think it's possible.)


Python is also full of operations that take linear time to do: list.insert(0, 
x), list.pop(0), list.index(), etc.  But it may be that __getitem__ takes 
constant time on all built-in data structures, and the apparent symmetry but 
very different performance between dict()[i] and list()[i] might be confusing. 
That said, I really just want d[0] and d[-1], which is when these are fast.


I found some related discussion in 
https://mail.python.org/archives/list/python-ideas@python.org/thread/QVTGZD6USSC34D4IJG76UPKZRXBBB4MM/

but not this exact idea.

Erik
--
Erik Demaine  |  edema...@mit.edu  |  http://erikdemaine.org/
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/PPI747IBFYYRAVPUJDY4DKFNTJGASH3K/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Support parsing stream with `re`

2018-10-08 Thread Erik Bray

On Mon, Oct 8, 2018 at 12:20 PM Cameron Simpson  wrote:
>
> On 08Oct2018 10:56, Ram Rachum  wrote:
> >That's incredibly interesting. I've never used mmap before.
> >However, there's a problem.
> >I did a few experiments with mmap now, this is the latest:
> >
> >path = pathlib.Path(r'P:\huge_file')
> >
> >with path.open('r') as file:
> >mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
>
> Just a remark: don't tromp on the "mmap" name. Maybe "mapped"?
>
> >for match in re.finditer(b'.', mmap):
> >pass
> >
> >The file is 338GB in size, and it seems that Python is trying to load it
> >into memory. The process is now taking 4GB RAM and it's growing. I saw the
> >same behavior when searching for a non-existing match.
> >
> >Should I open a Python bug for this?
>
> Probably not. First figure out what is going on. BTW, how much RAM have you
> got?
>
> As you access the mapped file the OS will try to keep it in memory in case you
> need that again. In the absense of competition, most stuff will get paged out
> to accomodate it. That's normal. All the data are "clean" (unmodified) so the
> OS can simply release the older pages instantly if something else needs the
> RAM.
>
> However, another possibility is the the regexp is consuming lots of memory.
>
> The regexp seems simple enough (b'.'), so I doubt it is leaking memory like
> mad; I'm guessing you're just seeing the OS page in as much of the file as it
> can.

Yup. Windows will aggressively fill up your RAM in cases like this
because after all why not?  There's no use to having memory just
sitting around unused.  For read-only, non-anonymous mappings it's not
much problem for the OS to drop pages that haven't been recently
accessed and use them for something else.  So I wouldn't be too
worried about the process chewing up RAM.

I feel like this is veering more into python-list territory for
further discussion though.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] support toml for pyproject support

2018-10-08 Thread Erik Bray

On Mon, Oct 8, 2018 at 12:23 PM Nathaniel Smith  wrote:
>
> On Mon, Oct 8, 2018 at 2:55 AM, Steven D'Aprano  wrote:
> >
> > On Mon, Oct 08, 2018 at 09:10:40AM +0200, Jimmy Girardet wrote:
> >> Each tool which wants to use pyproject.toml has to add a toml lib  as a
> >> conditional or hard dependency.
> >>
> >> Since toml is now the standard configuration file format,
> >
> > It is? Did I miss the memo? Because I've never even heard of TOML before
> > this very moment.
>
> He's referring to PEPs 518 and 517 [1], which indeed standardize on
> TOML as a file format for Python package build metadata.
>
> I think moving anything into the stdlib would be premature though –
> TOML libraries are under active development, and the general trend in
> the packaging space has been to move things *out* of the stdlib (e.g.
> there's repeated rumblings about moving distutils out), because the
> stdlib release cycle doesn't work well for packaging infrastructure.

If I had the energy to argue it I would also argue against using TOML
in those PEPs.  I personally don't especially care for TOML and what's
"obvious" to Tom is not at all obvious to me.  I'd rather just stick
with YAML or perhaps something even simpler than either one.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Asynchronous exception handling around with/try statement borders

2018-09-24 Thread Erik Bray

On Fri, Sep 21, 2018 at 12:58 AM Chris Angelico  wrote:
>
> On Fri, Sep 21, 2018 at 8:52 AM Kyle Lahnakoski  
> wrote:
> > Since the java.lang.Thread.stop() "debacle", it has been obvious that
> > stopping code to run other code has been dangerous.  KeyboardInterrupt
> > (any interrupt really) is dangerous. Now, we can probably code a
> > solution, but how about we remove the danger:
> >
> > I suggest we remove interrupts from Python, and make them act more like
> > java.lang.Thread.interrupt(); setting a thread local bit to indicate an
> > interrupt has occurred.  Then we can write explicit code to check for
> > that bit, and raise an exception in a safe place if we wish.  This can
> > be done with Python code, or convenient places in Python's C source
> > itself.  I imagine it would be easier to whitelist where interrupts can
> > raise exceptions, rather than blacklisting where they should not.
>
> The time machine strikes again!
>
> https://docs.python.org/3/c-api/exceptions.html#signal-handling

Although my original post did not explicitly mention
PyErr_CheckSignals() and friends, it had already taken that into
account and it is not a silver bullet, at least w.r.t. the exact issue
I raised, which had to do with the behavior of context managers versus
the

setup()
try:
do_thing()
finally:
cleanup()

pattern, and the question of how signals are handled between Python
interpreter opcodes.  There is a still-open bug on the issue tracker
discussing the exact issue in greater details:
https://bugs.python.org/issue29988
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Move optional data out of pyc files

2018-04-11 Thread Erik Bray

On Tue, Apr 10, 2018 at 9:50 PM, Eric V. Smith  wrote:
>
>>> 3. Annotations. They are used mainly by third party tools that
>>> statically analyze sources. They are rarely used at runtime.
>>
>> Even less used than docstrings probably.
>
> typing.NamedTuple and dataclasses use annotations at runtime.

Astropy uses annotations at runtime for optional unit checking on
arguments that take dimensionful quantities:
http://docs.astropy.org/en/stable/api/astropy.units.quantity_input.html#astropy.units.quantity_input
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] PEP proposal: unifying function/method classes

2018-03-28 Thread Erik Bray

On Fri, Mar 23, 2018 at 11:25 AM, Antoine Pitrou  wrote:
> On Fri, 23 Mar 2018 07:25:33 +0100
> Jeroen Demeyer  wrote:
>
>> On 2018-03-23 00:36, Antoine Pitrou wrote:
>> > It does make sense, since the proposal sounds ambitious (and perhaps
>> > impossible without breaking compatibility).
>>
>> Well, *some* breakage of backwards compatibility will be unavoidable.
>>
>>
>> My plan (just a plan for now!) is to preserve backwards compatibility in
>> the following ways:
>>
>> * Existing Python attributes of functions/methods should continue to
>> exist and behave the same
>>
>> * The inspect module should give the same results as now (by changing
>> the implementation of some of the functions in inspect to match the new
>> classes)
>>
>> * Everything from the documented Python/C API.
>>
>>
>> This means that I might break compatibility in the following ways:
>>
>> * Changing the classes of functions/methods (this is the whole point of
>> this PEP). So anything involving isinstance() checks might break.
>>
>> * The undocumented parts of the Python/C API, in particular the C structure.
>
> One breaking change would be to add __get__ to C functions.  This means
> e.g. the following:
>
> class MyClass:
> my_open = open
>
> would make my_open a MyClass method, therefore you would need to spell
> it:
>
> class MyClass:
> my_open = staticmethod(open)
>
> ... if you wanted MyClass().my_open('some file') to continue to work.
>
> Of course that might be considered a minor annoyance.

I don't really see your point in this example.  For one: why would
anyone do this?  Is this based on a real example?  2) That's how any
function works.  If you put some arbitrary function in a class body,
and it's not able to accept an instance of that class as its first
argument, then it will always be broken unless you make it a
staticmethod.  I don't see how there should be any difference there if
the function were implemented in Python or in C.

Thanks,
E
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] importlib: making FileFinder easier to extend

2018-02-07 Thread Erik Bray

Hello,

Brief problem statement: Let's say I have a custom file type (say,
with extension .foo) and these .foo files are included in a package
(along with other Python modules with standard extensions like .py and
.so), and I want to make these .foo files importable like any other
module.

On its face, importlib.machinery.FileFinder makes this easy.  I make a
loader for my custom file type (say, FooSourceLoader), and I can use
the FileFinder.path_hook helper like:

sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo'])))
sys.path_importer_cache.clear()

Great--now I can import my .foo modules like any other Python module.
However, any standard Python modules now cannot be imported.  The way
PathFinder sys.meta_path hook works, sys.path_hooks entries are
first-come-first-serve, and furthermore FileFinder.path_hook is very
promiscuous--it will take over module loading for *any* directory on
sys.path, regardless what the file extensions are in that directory.
So although this mechanism is provided by the stdlib, it can't really
be used for this purpose without breaking imports of normal modules
(and maybe it's not intended for that purpose, but the documentation
is unclear).

There are a number of different ways one could get around this.  One
might be to pass FileFinder.path_hook loaders/extension pairs for all
the basic file types known by the Python interpreter.  Unfortunately
there's no great way to get that information.  *I* know that I want to
support .py, .pyc, .so etc. files, and I know which loaders to use for
them.  But that's really information that should belong to the Python
interpreter, and not something that should be reverse-engineered.  In
fact, there is such a mapping provided by
importlib.machinery._get_supported_file_loaders(), but this is not a
publicly documented function.

One could probably think of other workarounds.  For example you could
implement a custom sys.meta_path hook.  But I think it shouldn't be
necessary to go to higher levels of abstraction in order to do
this--the default sys.path handler should be able to handle this use
case.

In order to support adding support for new file types to
sys.path_hooks, I ended up implementing the following hack:

#
import os
import sys

from importlib.abc import PathEntryFinder


@PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.

The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory.  So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.

This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else).  Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec.  So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""

def __init__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)

self.path = path
self._finder_cache = {}

def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)

def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None

for hook in sys.path_hooks:
if hook is self.__class__:
continue

finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass

if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass

try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass

if finder is not None:
spec = finder.find_spec(fullname, target)
if spec is not None:
return spec

# Module spec not found through any of the finders
return None

def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()

Re: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's int?

2017-12-29 Thread Erik Bray

On Thu, Dec 28, 2017 at 8:42 PM, Serhiy Storchaka <storch...@gmail.com> wrote:
> 28.12.17 12:10, Erik Bray пише:
>>
>> There's no index() alternative to int().
>
>
> operator.index()

Okay, and it's broken.  That doesn't change my other point that some
functions that could previously take non-int arguments can no
longer--if we agree on that at least then I can set about making a bug
report and fixing it.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's int?

2017-12-28 Thread Erik Bray

On Fri, Dec 8, 2017 at 7:20 PM, Ethan Furman <et...@stoneleaf.us> wrote:
> On 12/08/2017 04:33 AM, Erik Bray wrote:
>
>> More importantly not as many objects that coerce to int actually
>> implement __index__.  They probably *should* but there seems to be
>> some confusion about how that's to be used.
>
>
> __int__ is for coercion (float, fraction, etc)
>
> __index__ is for true integers
>
> Note that if __index__ is defined, __int__ should also be defined, and
> return the same value.
>
> https://docs.python.org/3/reference/datamodel.html#object.__index__

This doesn't appear to be enforced, though I think maybe it should be.

I'll also note that because of the changes I pointed out in my
original post, it's now necessary for me to explicitly cast as int()
objects that previously "just worked" when passed as arguments in some
functions in itertools, collections, and other modules with C
implementations.  However, this is bad because if some broken code is
passing floats to these arguments, they will be quietly cast to int
and succeed, when really I should only be accepting objects that have
__index__.  There's no index() alternative to int().

I think changing all these functions to do the appropriate
PyIndex_Check is a correct and valid fix, but I think it also
stretches beyond the original purpose of __index__.  I think that
__index__ is relatively unknown, and perhaps there should be better
documentation as to when and how it should be used over the
better-known __int__.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's int?

2017-12-08 Thread Erik Bray

On Fri, Dec 8, 2017 at 1:52 PM, Antoine Pitrou  wrote:
> On Fri, 8 Dec 2017 14:30:00 +0200
> Serhiy Storchaka 
> wrote:
>>
>> NumPy integers implement __index__.
>
> That doesn't help if a function calls e.g. PyLong_AsLongAndOverflow().

Right--pointing to __index__ basically implies that PyIndex_Check and
subsequent PyNumber_AsSsize_t than there currently are.  That I could
agree with but then it becomes a question of where are those cases?
And what do with, e.g. interfaces like PyLong_AsLongAndOverflow().
Add more PyNumber_ conversion functions?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's int?

2017-12-08 Thread Erik Bray

On Fri, Dec 8, 2017 at 12:26 PM, Serhiy Storchaka <storch...@gmail.com> wrote:
> 08.12.17 12:41, Erik Bray пише:
>>
>> IIUC, it seems to be carry-over from Python 2's PyLong API, but I
>> don't see an obvious reason for it.  In every case there's an explicit
>> PyLong_Check first anyways, so not calling __int__ doesn't help for
>> the common case of exact int objects; adding the fallback costs
>> nothing in that case.
>
>
> There is also a case of int subclasses. It is expected that PyLong_AsLong is
> atomic, and calling __int__ can lead to crashes or similar consequences.
>
>> I ran into this because I was passing an object that implements
>> __int__ to the maxlen argument to deque().  On Python 2 this used
>> PyInt_AsSsize_t which does fall back to calling __int__, whereas
>> PyLong_AsSsize_t does not.
>
>
> PyLong_* functions provide an interface to PyLong objects. If they don't
> return the content of a PyLong object, how can it be retrieved? If you want
> to work with general numbers you should use PyNumber_* functions.

By "you " I assume you meant the generic "you".  I'm not the one who
broke things in this case :)

> In your particular case it is more reasonable to fallback to __index__
> rather than __int__. Unlikely maxlen=4.2 makes sense.

That's true, but in Python 2 that was possible:

>>> deque([], maxlen=4.2)
deque([], maxlen=4)

More importantly not as many objects that coerce to int actually
implement __index__.  They probably *should* but there seems to be
some confusion about how that's to be used.  It was mainly motivated
by slices, but it *could* be used in general cases where it definitely
wouldn't make sense to accept a float (I wonder if maybe the real
problem here is that floats can be coerced automatically to ints)

In other words, there are probably countless other cases in the stdlib
at all where it "doesn't make sense" to accept a float, but that
otherwise should accept objects that can be coerced to int without
having to manually wrap those objects with an int(o) call.

>> Currently the following functions fall back on __int__ where available:
>>
>> PyLong_AsLong
>> PyLong_AsLongAndOverflow
>> PyLong_AsLongLong
>> PyLong_AsLongLongAndOverflow
>> PyLong_AsUnsignedLongMask
>> PyLong_AsUnsignedLongLongMask
>
>
> I think this should be deprecated (and there should be an open issue for
> this). Calling __int__ is just a Python 2 legacy.

Okay, but then there are probably many cases where they should be
replaced with PyNumber_ equivalents or else who knows how much code
would break.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Is there a reason some of the PyLong_As* functions don't call an object's int?

2017-12-08 Thread Erik Bray

IIUC, it seems to be carry-over from Python 2's PyLong API, but I
don't see an obvious reason for it.  In every case there's an explicit
PyLong_Check first anyways, so not calling __int__ doesn't help for
the common case of exact int objects; adding the fallback costs
nothing in that case.

I ran into this because I was passing an object that implements
__int__ to the maxlen argument to deque().  On Python 2 this used
PyInt_AsSsize_t which does fall back to calling __int__, whereas
PyLong_AsSsize_t does not.

Currently the following functions fall back on __int__ where available:

PyLong_AsLong
PyLong_AsLongAndOverflow
PyLong_AsLongLong
PyLong_AsLongLongAndOverflow
PyLong_AsUnsignedLongMask
PyLong_AsUnsignedLongLongMask

whereas the following (at least according to the docs--haven't checked
the code in all cases) do not:

PyLong_AsSsize_t
PyLong_AsUnsignedLong
PyLong_AsSize_t
PyLong_AsUnsignedLongLong
PyLong_AsDouble
PyLong_AsVoidPtr

I think this inconsistency should be fixed, unless there's some reason
for it I'm not seeing.

Thanks,
Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] install pip packages from Python prompt

2017-11-04 Thread Erik Bray

On Nov 4, 2017 08:31, "Stephen J. Turnbull" <
turnbull.stephen...@u.tsukuba.ac.jp> wrote:

Erik Bray writes:

 > Nope.  I totally get that they don’t know what a shell or command prompt
 > is.  THEY. NEED. TO. LEARN.

Just to be clear I did not write this. Someone replying to me did.

I'm going to go over all the different proposals in this thread and see if
I can synthesize a list of options. I think, even if it's not a solution
that winds up in the stdlib, it would be good to have some user stories
about how package installation from within an interactive prompt might work
(even if not from the standard REPL, which it should be noted has had small
improvements made to it over the years).

I also have my doubts about whether this *shouldn't* be possible. I mean,
to a lot of beginners starting out the basic REPL *is* Python. They're so
new to the scene they don't even know what IPython or Jupyter is or why
they might want that. They aren't experienced enough to even know what
they're missing out on. In classrooms we can resolve that easily by
pointing our students to whatever tools we think will work best for them,
but not everyone has that privilege.

Best,
Erik

I don't want to take a position on the proposal, and I agree that we
should *strongly* encourage everyone to learn.  But "THEY. NEED. TO.
LEARN." is not obvious to me.

Anecdotally, my students are doing remarkably (to me, as a teacher)
complex modeling with graphical interfaces to statistical and
simulation packages (SPSS/AMOS, Artisoc, respectively), and collection
of large textual databases from SNS with cargo-culted Python programs.
For the past twenty years teaching social scientists, these accidental
barriers (as Fred Brooks would have called them) have dropped
dramatically, to the point where it's possible to do superficially
good-looking (= complex) but entirely meaningless :-/ empirical
research.  (In some ways I think this lowered cost has been horribly
detrimental to my work as an educator in applied social science. ;-)

The point being that "user-friendly" UI in many fields where (fairly)
advanced computing is used is more than keeping up with the perceived
needs of most computer users, while the essential (in the sense of
Brooks) non-computing modeling difficulties of their jobs remain.

By "perceived" I mean I want my students using TeX, but it's hard to
force them when all their professors (except me and a couple
mathematicians) use Word (speaking of irreproducible results).  It's
good enough for government work, and that's in fact where many of them
end up (and the great majority are either in government or in
equivalent corporate bureaucrat positions).  Yes, I meant the
deprecatory connotations of "perceived", but realistically, I admit
that maybe they *don't* *need* the more polished tech that I could
teach them.

I remember when I first started out teaching Software Carpentry I made the
embarrassing mistake (coming from Physics) of assuming that LaTex is
de-facto in most other academic fields :)

 > Hiding it is not a good idea for anyone.

Agreed.  Command lines and REPLs teach humility, to me as well as my
students. :-)

Steve

--
Associate Professor  Division of Policy and Planning Science
http://turnbull/sk.tsukuba.ac.jp/ Faculty of Systems and Information
Email: turnb...@sk.tsukuba.ac.jp   University of Tsukuba
Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] install pip packages from Python prompt

2017-11-02 Thread Erik Bray

On Oct 30, 2017 8:57 PM, "Alex Walters" <tritium-l...@sdamon.com> wrote:



> -Original Message-
> From: Python-ideas [mailto:python-ideas-bounces+tritium-
> list=sdamon@python.org] On Behalf Of Erik Bray
> Sent: Monday, October 30, 2017 6:28 AM
> To: Python-Ideas <python-ideas@python.org>
> Subject: Re: [Python-ideas] install pip packages from Python prompt
>
> On Sun, Oct 29, 2017 at 8:45 PM, Alex Walters <tritium-l...@sdamon.com>
> wrote:
> > Then those users have more fundamental problems.  There is a minimum
> level
> > of computer knowledge needed to be successful in programming.
> Insulating
> > users from the reality of the situation is not preparing them to be
> > successful.  Pretending that there is no system command prompt, or
shell,
> or
> > whatever platform specific term applies, only hurts new programmers.
> Give
> > users an error message they can google, and they will be better off in
the
> > long run than they would be if we just ran pip for them.
>
> While I completely agree with this in principle, I think you
> overestimate the average beginner.

Nope.  I totally get that they don’t know what a shell or command prompt
is.  THEY. NEED. TO. LEARN.  Hiding it is not a good idea for anyone.  If
this is an insurmountable problem for the newbie, maybe they really
shouldn’t be attempting to program.  This field is not for everyone.


Reading this I get the impression, and correct me if I'm wrong, that you've
never taught beginners programming. Of course long term (heck in fact
fairly early on) they need to learn these nitty-gritty and sometimes
frustrating lessons, but not in a 2 hour intro to programming for total
beginners.

And I beg to differ--this field is for everyone, and increasingly moreso
every day. Doesn't mean it's easy, but it is and can be for everyone.

Whether this specific proposal is technically feasible in a cross-platform
manner with the state of the Python interpreter and import system is
another question. But that's a discussion worth having. "Some people aren't
cut out for programming" isn't.


>  Many beginners I've taught or
> helped, even if they can manage to get to the correct command prompt,
> often don't even know how to run the correct Python.  They might often
> have multiple Pythons installed on their system--maybe they have
> Anaconda, maybe Python installed by homebrew, or a Python that came
> with an IDE like Spyder.  If they're on OSX often running "python"
> from the command prompt gives the system's crippled Python 2.6 and
> they don't know the difference.
>
> One thing that has been a step in the right direction is moving more
> documentation toward preferring running `python -m pip` over just
> `pip`, since this often has a better guarantee of running `pip` in the
> Python interpreter you intended.  But that still requires one to know
> how to run the correct Python interpreter from the command-line (which
> the newbie double-clicking on IDLE may not even have a concept of...).
>
> While I agree this is something that is important for beginners to
> learn (e.g. print(sys.executable) if in doubt), it *is* a high bar for
> many newbies just to install one or two packages from pip, which they
> often might need/want to do for whatever educational pursuit they're
> following (heck, it's pretty common even just to want to install the
> `requests` module, as I would never throw `urllib` at a beginner).
>
> So while I don't think anything proposed here will work technically, I
> am in favor of an in-interpreter pip install functionality.  Perhaps
> it could work something like this:
>
> a) Allow it *only* in interactive mode:  running `pip(...)` (or
> whatever this looks like) outside of interactive mode raises a
> `RuntimeError` with the appropriate documentation
> b) When running `pip(...)` the user is supplied with an interactive
> prompt explaining that since installing packages with `pip()` can
> result in changes to the interpreter, it is necessary to restart the
> interpreter after installation--give them an opportunity to cancel the
> action in case they have any work they need to save.  If they proceed,
> install the new package then restart the interpreter for them.  This
> avoids any ambiguity as to states of loaded modules before/after pip
> install.
> > From: Stephan Houben [mailto:stephan...@gmail.com]
> > Sent: Sunday, October 29, 2017 3:43 PM
> > To: Alex Walters <tritium-l...@sdamon.com>
> > Cc: Python-Ideas <python-ideas@python.org>
> > Subject: Re: [Python-ideas] install pip packages from Python prompt
> >
> >
> >
> > Hi Alex,
> >
> >
> >
> > 2017-10-29 20:26 GMT+01:00 Alex Walters <tri

Re: [Python-ideas] install pip packages from Python prompt

2017-10-30 Thread Erik Bray

On Mon, Oct 30, 2017 at 11:27 AM, Erik Bray <erik.m.b...@gmail.com> wrote:
> On Sun, Oct 29, 2017 at 8:45 PM, Alex Walters <tritium-l...@sdamon.com> wrote:
>> Then those users have more fundamental problems.  There is a minimum level
>> of computer knowledge needed to be successful in programming.  Insulating
>> users from the reality of the situation is not preparing them to be
>> successful.  Pretending that there is no system command prompt, or shell, or
>> whatever platform specific term applies, only hurts new programmers.  Give
>> users an error message they can google, and they will be better off in the
>> long run than they would be if we just ran pip for them.
>
> While I completely agree with this in principle, I think you
> overestimate the average beginner.  Many beginners I've taught or
> helped, even if they can manage to get to the correct command prompt,
> often don't even know how to run the correct Python.  They might often
> have multiple Pythons installed on their system--maybe they have
> Anaconda, maybe Python installed by homebrew, or a Python that came
> with an IDE like Spyder.  If they're on OSX often running "python"
> from the command prompt gives the system's crippled Python 2.6 and
> they don't know the difference.


I should add--another case that is becoming extremely common is
beginners learning Python for the first time inside the
Jupyter/IPython Notebook.  And in my experience it can be very
difficult for beginners to understand the connection between what's
happening in the notebook ("it's in the web-browser--what does that
have to do with anything on my computer??") and the underlying Python
interpreter, file system, etc.  Being able to pip install from within
the Notebook would be a big win.  This is already possible since
IPython allows running system commands and it is possible to run the
pip executable from the notebook, then manually restart the Jupyter
kernel.

It's not 100% clear to me how my proposal below would work within a
Jupyter Notebook, so that would also be an angle worth looking into.

Best,
Erik


> One thing that has been a step in the right direction is moving more
> documentation toward preferring running `python -m pip` over just
> `pip`, since this often has a better guarantee of running `pip` in the
> Python interpreter you intended.  But that still requires one to know
> how to run the correct Python interpreter from the command-line (which
> the newbie double-clicking on IDLE may not even have a concept of...).
>
> While I agree this is something that is important for beginners to
> learn (e.g. print(sys.executable) if in doubt), it *is* a high bar for
> many newbies just to install one or two packages from pip, which they
> often might need/want to do for whatever educational pursuit they're
> following (heck, it's pretty common even just to want to install the
> `requests` module, as I would never throw `urllib` at a beginner).
>
> So while I don't think anything proposed here will work technically, I
> am in favor of an in-interpreter pip install functionality.  Perhaps
> it could work something like this:
>
> a) Allow it *only* in interactive mode:  running `pip(...)` (or
> whatever this looks like) outside of interactive mode raises a
> `RuntimeError` with the appropriate documentation
> b) When running `pip(...)` the user is supplied with an interactive
> prompt explaining that since installing packages with `pip()` can
> result in changes to the interpreter, it is necessary to restart the
> interpreter after installation--give them an opportunity to cancel the
> action in case they have any work they need to save.  If they proceed,
> install the new package then restart the interpreter for them.  This
> avoids any ambiguity as to states of loaded modules before/after pip
> install.
>
>
>
>> From: Stephan Houben [mailto:stephan...@gmail.com]
>> Sent: Sunday, October 29, 2017 3:43 PM
>> To: Alex Walters <tritium-l...@sdamon.com>
>> Cc: Python-Ideas <python-ideas@python.org>
>> Subject: Re: [Python-ideas] install pip packages from Python prompt
>>
>>
>>
>> Hi Alex,
>>
>>
>>
>> 2017-10-29 20:26 GMT+01:00 Alex Walters <tritium-l...@sdamon.com>:
>>
>> return “Please run pip from your system command prompt”
>>
>>
>>
>>
>>
>> The target audience for my proposal are people who do not know
>>
>> which part of the sheep the "system command prompt" is.
>>
>> Stephan
>>
>>
>>
>>
>>
>> From: Python-ideas
>> [mailto:python-ideas-bounces+tritium-list=sdamon@python.org] On Behalf
>> Of Stephan Houben
>> Sent: Sunda

Re: [Python-ideas] Asynchronous exception handling around with/try statement borders

On Wed, Jun 28, 2017 at 3:19 PM, Greg Ewing <greg.ew...@canterbury.ac.nz> wrote:
> Erik Bray wrote:
>>
>> At this point a potentially
>> waiting SIGINT is handled, resulting in KeyboardInterrupt being raised
>> while inside the with statement's suite, and finally block, and hence
>> Lock.__exit__ are entered.
>
>
> Seems to me this is the behaviour you *want* in this case,
> otherwise the lock can be acquired and never released.
> It's disconcerting that it seems to be very difficult to
> get that behaviour with a pure Python implementation.

I think normally you're right--this is the behavior you would *want*,
but not the behavior that's consistent with how Python implements the
`with` statement, all else being equal.  Though it's still not
entirely fair either because if Lock.__enter__ were pure Python
somehow, it's possible the exception would be raised either before or
after the lock is actually marked as "acquired", whereas in the C
implementation acquisition of the lock will always succeed (assuming
the lock was free, and no other exceptional conditions) before the
signal handler is executed.

>> I think it might be possible to
>> gain more consistency between these cases if pending signals are
>> checked/handled after any direct call to PyCFunction from within the
>> ceval loop.
>
>
> IMO that would be going in the wrong direction by making
> the C case just as broken as the Python case.
>
> Instead, I would ask what needs to be done to make this
> work correctly in the Python case as well as the C case.

You have a point there, but at the same time the Python case, while
"broken" insofar as it can lead to broken code, seems correct from the
Pythonic perspective.  The other possibility would be to actually
change the semantics of the `with` statement. Or as you mention below,
a way to temporarily mask signals...

> I don't think it's even possible to write Python code that
> does this correctly at the moment. What's needed is a
> way to temporarily mask delivery of asynchronous exceptions
> for a region of code, but unless I've missed something,
> no such facility is currently provided.
>
> What would such a facility look like? One possibility
> would be to model it on the sigsetmask() system call, so
> there would be a function such as
>
>mask_async_signals(bool)
>
> that turns delivery of async signals on or off.
>
> However, I don't think that would work. To fix the locking
> case, what we need to do is mask async signals during the
> locking operation, and only unmask them once the lock has
> been acquired. We might write a context manager with an
> __enter__ method like this:
>
>def __enter__(self):
>   mask_async_signals(True)
>   try:
>  self.acquire()
>   finally:
>  mask_async_signals(False)
>
> But then we have the same problem again -- if a Keyboard
> Interrupt occurs after mask_async_signals(False) but
> before __enter__ returns, the lock won't get released.

Exactly.

> Another approach would be to provide a context manager
> such as
>
>async_signals_masked(bool)
>
> Then the whole locking operation could be written as
>
>with async_signals_masked(True):
>   lock.acquire()
>   try:
>  with async_signals_masked(False):
> # do stuff here
>   finally:
>  lock.release()
>
> Now there's no possibility for a KeyboardInterrupt to
> be delivered until we're safely inside the body, but we've
> lost the ability to capture the pattern in the form of
> a context manager.
>
> The only way out of this I can think of at the moment is
> to make the above pattern part of the context manager
> protocol itself. In other words, async exceptions are
> always masked while the __enter__ and __exit__ methods
> are executing, and unmasked while the body is executing.

I think so too.  That's more or less in line with Nick's idea on njs's
issue (https://bugs.python.org/issue29988) of an ATOMIC_UNTIL opcode.
That's just one implementation possibility.  My question would be to
make that a language-level requirement of the context manager
protocol, or just something CPython does...

Thanks,
Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Asynchronous exception handling around with/try statement borders

On Wed, Jun 28, 2017 at 3:09 PM, Erik Bray <erik.m.b...@gmail.com> wrote:
> On Wed, Jun 28, 2017 at 2:26 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
>> On 28 June 2017 at 21:40, Erik Bray <erik.m.b...@gmail.com> wrote:
>>> My colleague's contention is that given
>>>
>>> lock = threading.Lock()
>>>
>>> this is simply *wrong*:
>>>
>>> lock.acquire()
>>> try:
>>> do_something()
>>> finally:
>>> lock.release()
>>>
>>> whereas this is okay:
>>>
>>> with lock:
>>> do_something()
>>
>> Technically both are slightly racy with respect to async signals (e.g.
>> KeyboardInterrupt), but the with statement form is less exposed to the
>> problem (since it does more of its work in single opcodes).
>>
>> Nathaniel Smith posted a good write-up of the technical details to the
>> issue tracker based on his work with trio:
>> https://bugs.python.org/issue29988
>
> Interesting; thanks for pointing this out.  Part of me felt like this
> has to have come up before but my searching didn't bring this up
> somehow (and even then it's only a couple months old itself).
>
> I didn't think about the possible race condition before
> WITH_CLEANUP_START, but obviously that's a possibility as well.
> Anyways since this is already acknowledged as a real bug I guess any
> further followup can happen on the issue tracker.

On second thought, maybe there is a case to made w.r.t. making a
documentation change about the semantics of the `with` statement:

The old-style syntax cannot make any guarantees about atomicity w.r.t.
async events.  That is, there's no way syntactically in Python to
declare that no exception will be raised between "lock.acquire()" and
the setup of the "try/finally" blocks.

However, if issue-29988 were *fixed* somehow (and I'm not convinced it
can't be fixed in the limited case of `with` statements) then there
really would be a major semantic difference of the `with` statement in
that it does support this invariant.  Then the question is whether
that difference be made a requirement of the language (probably too
onerous a requirement?), or just a feature of CPython (which should
still be documented one way or the other IMO).

Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Asynchronous exception handling around with/try statement borders

On Wed, Jun 28, 2017 at 2:26 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 28 June 2017 at 21:40, Erik Bray <erik.m.b...@gmail.com> wrote:
>> My colleague's contention is that given
>>
>> lock = threading.Lock()
>>
>> this is simply *wrong*:
>>
>> lock.acquire()
>> try:
>> do_something()
>> finally:
>> lock.release()
>>
>> whereas this is okay:
>>
>> with lock:
>> do_something()
>
> Technically both are slightly racy with respect to async signals (e.g.
> KeyboardInterrupt), but the with statement form is less exposed to the
> problem (since it does more of its work in single opcodes).
>
> Nathaniel Smith posted a good write-up of the technical details to the
> issue tracker based on his work with trio:
> https://bugs.python.org/issue29988

Interesting; thanks for pointing this out.  Part of me felt like this
has to have come up before but my searching didn't bring this up
somehow (and even then it's only a couple months old itself).

I didn't think about the possible race condition before
WITH_CLEANUP_START, but obviously that's a possibility as well.
Anyways since this is already acknowledged as a real bug I guess any
further followup can happen on the issue tracker.

Thanks,
Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Asynchronous exception handling around with/try statement borders

Hi folks,

I normally wouldn't bring something like this up here, except I think
that there is possibility of something to be done--a language
documentation clarification if nothing else, though possibly an actual
code change as well.

I've been having an argument with a colleague over the last couple
days over the proper way order of statements when setting up a
try/finally to perform cleanup of some action.  On some level we're
both being stubborn I think, and I'm not looking for resolution as to
who's right/wrong or I wouldn't bring it to this list in the first
place.  The original argument was over setting and later restoring
os.environ, but we ended up arguing over
threading.Lock.acquire/release which I think is a more interesting
example of the problem, and he did raise a good point that I do want
to bring up.



My colleague's contention is that given

lock = threading.Lock()

this is simply *wrong*:

lock.acquire()
try:
do_something()
finally:
lock.release()

whereas this is okay:

with lock:
do_something()


Ignoring other details of how threading.Lock is actually implemented,
assuming that Lock.__enter__ calls acquire() and Lock.__exit__ calls
release() then as far as I've known ever since Python 2.5 first came
out these two examples are semantically *equivalent*, and I can't find
any way of reading PEP 343 or the Python language reference that would
suggest otherwise.

However, there *is* a difference, and has to do with how signals are
handled, particularly w.r.t. context managers implemented in C (hence
we are talking CPython specifically):

If Lock.__enter__ is a pure Python method (even if it maybe calls some
C methods), and a SIGINT is handled during execution of that method,
then in almost all cases a KeyboardInterrupt exception will be raised
from within Lock.__enter__--this means the suite under the with:
statement is never evaluated, and Lock.__exit__ is never called.  You
can be fairly sure the KeyboardInterrupt will be raised from somewhere
within a pure Python Lock.__enter__ because there will usually be at
least one remaining opcode to be evaluated, such as RETURN_VALUE.
Because of how delayed execution of signal handlers is implemented in
the pyeval main loop, this means the signal handler for SIGINT will be
called *before* RETURN_VALUE, resulting in the KeyboardInterrupt
exception being raised.  Standard stuff.

However, if Lock.__enter__ is a PyCFunction things are quite
different.  If you look at how the SETUP_WITH opcode is implemented,
it first calls the __enter__ method with _PyObjet_CallNoArg.  If this
returns NULL (i.e. an exception occurred in __enter__) then "goto
error" is executed and the exception is raised.  However if it returns
non-NULL the finally block is set up with PyFrame_BlockSetup and
execution proceeds to the next opcode.  At this point a potentially
waiting SIGINT is handled, resulting in KeyboardInterrupt being raised
while inside the with statement's suite, and finally block, and hence
Lock.__exit__ are entered.

Long story short, because Lock.__enter__ is a C function, assuming
that it succeeds normally then

with lock:
do_something()

always guarantees that Lock.__exit__ will be called if a SIGINT was
handled inside Lock.__enter__, whereas with

lock.acquire()
try:
...
finally:
lock.release()

there is at last a small possibility that the SIGINT handler is called
after the CALL_FUNCTION op but before the try/finally block is entered
(e.g. before executing POP_TOP or SETUP_FINALLY).  So the end result
is that the lock is held and never released after the
KeyboardInterrupt (whether or not it's handled somehow).

Whereas, again, if Lock.__enter__ is a pure Python function there's
less likely to be any difference (though I don't think the possibility
can be ruled out entirely).

At the very least I think this quirk of CPython should be mentioned
somewhere (since in all other cases the semantic meaning of the
"with:" statement is clear).  However, I think it might be possible to
gain more consistency between these cases if pending signals are
checked/handled after any direct call to PyCFunction from within the
ceval loop.

Sorry for the tl;dr; any thoughts?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Run length encoding

2017-06-19 Thread Erik


On 19/06/17 02:47, David Mertz wrote:
As an only semi-joke, I have created a module on GH that meets the needs 
of this discussion (using the spelling I think are most elegant):


https://github.com/DavidMertz/RLE


It's a shame you have to build that list when encoding. I tried to work 
out a way to get the number of items in an iterable without having to 
capture all the values (on the understanding that if the iterable is 
already an iterator, it would be consumed).


The best I came up with so far (not general purpose, but it works in 
this scenario) is:


from iterator import groupby
from operator import countOf

def rle_encode(it):
return ((k, countOf(g, k)) for k, g in groupby(it))

In your test code, this speeds things up quite a bit over building the 
list, but that's presumably only because both groupby() and countOf() 
will use the standard class comparison operator methods which in the 
case of ints will short-circuit with a C-level pointer comparison first.


For user-defined classes with complicated comparison methods, getting 
the length of the group by comparing the items will probably be worse.


Is there a better way of implementing a general-purpose "ilen()"? I 
tried a couple of other things, but they all required at least one 
lambda function and slowed things down by about 50% compared to the 
list-building version.


(I agree this is sort of a joke, but it's still an interesting puzzle ...).

Regards, E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] [Python-Dev] Language proposal: variable assignment in functional context

2017-06-16 Thread Erik


[cross-posted to python-ideas]

Hi Robert,

On 16/06/17 12:32, Robert Vanden Eynde wrote:
Hello, I would like to propose an idea for the language but I don't know 
where I can talk about it.


Can you please explain what the problem is that you are trying to solve?


In a nutshell, I would like to be able to write:
y = (b+2 for b = a + 1)


The above is (almost) equivalent to:

y = (a+1)+2

I realize the parentheses are not required, but I've included them 
because if your example mixed operators with different precedence then 
they might be necessary.


Other than binding 'b' (you haven't defined what you expect the scope of 
that to be, but I'll assume it's the outer scope for now), what is it 
about the form you're proposing that's different?



Or in list comprehension:
Y = [b+2 for a in L for b = a+1]

Which can already be done like this:
Y = [b+2 for a in L for b in [a+1]]


Y = [(a+1)+2 for a in L]

Which is less obvious, has a small overhead (iterating over a list) and 
get messy with multiple assignment:

Y =  [b+c+2 for a in L for b,c in [(a+1,a+2)]]

New syntax would allow to write:
Y =  [b+c+2 for a in L for b,c = (a+1,a+2)]


Y = [(a+1)+(a+2)+2 for a in L]

My first example (b+2 for b = a+1) can already be done using ugly syntax 
using lambda


y = (lambda b: b+2)(b=a+1)
y = (lambda b: b+2)(a+1)
y = (lambda b=a+1: b+2)()

Choice of syntax: for is good because it uses current keyword, and the 
analogy for x = 5 vs for x in [5] is natural.


But the "for" loses the meaning of iteration.
The use of "with" would maybe sound more logical.

Python already have the "functional if", lambdas, list comprehension, 
but not simple assignment functional style.


Can you present an example that can't be re-written simply by reducing 
the expression as I have done above?


Regards, E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Dictionary destructing and unpacking.

2017-06-07 Thread Erik


On 07/06/17 23:42, C Anthony Risinger wrote:

Neither of these are really comparable to destructuring.


No, but they are comparable to the OP's suggested new built-in method 
(without requiring each mapping type - not just dicts - to implement 
it). That was what _I_ was responding to.


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Dictionary destructing and unpacking.

2017-06-07 Thread Erik


On 07/06/17 19:14, Nick Humrich wrote:

a, b, c = mydict.unpack('a', 'b', 'c')


def retrieve(mapping, *keys):
   return (mapping[key] for key in keys)



$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def retrieve(mapping, *keys):
... return (mapping[key] for key in keys)
...
>>> d = {'a': 1, 'b': None, 100: 'Foo' }
>>> a, b, c = retrieve(d, 'a', 'b', 100)
>>> a, b, c
(1, None, 'Foo')


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] π = math.pi

2017-06-02 Thread Erik Bray

On Fri, Jun 2, 2017 at 7:52 AM, Greg Ewing  wrote:
> Victor Stinner wrote:
>>
>> How do you write π (pi) with a keyboard on Windows, Linux or macOS?
>
>
> On a Mac, π is Option-p and ∑ is Option-w.

I don't have a strong opinion about it being in the stdlib, but I'd
also point out that a strong advantage to having these defined in a
module at all is that third-party interpreters (e.g. IPython, bpython,
some IDEs) that support tab-completion make these easy to type as
well, and I find them to be very readable for math-heavy code.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Suggestion: push() method for lists

2017-05-22 Thread Erik

On 21/05/17 15:43, Paul Laos wrote:
 push(obj) would be 
equivalent to insert(index = -1, object), having -1 as the default index 
parameter. In fact, push() could replace both append() and insert() by 
unifying them.

I don't think list.insert() with an index of -1 does what you think it does:

$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> l = [0, 1, 2]
>>> l
[0, 1, 2]
>>> l.insert(-1, 99)
>>> l
[0, 1, 99, 2]
>>>

Because the indices can be thought of as referencing the spaces 
_between_ the objects, having a push() in which -1 is referencing a 
different 'space' than a -1 given to insert() or a slice operation 
refers to would, I suspect, be a source of confusion (and off-by-one bugs).

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Erik


On 04/05/17 01:24, Steven D'Aprano wrote:

On Thu, May 04, 2017 at 12:13:25AM +0100, Erik wrote:

I had a use-case where splitting an iterable into a sequence of
same-sized chunks efficiently improved the performance of my code

[...]

So I didn't propose it. I have no idea now what I spent my saved hours
doing, but I imagine that it was fun



Summary: I didn't present the argument because I'm not a masochist


I'm not sure what the point of that anecdote was, unless it was "I wrote
some useful code, and you missed out".


Then you have misunderstood me. Paul suggested that my use-case 
(chunking could be faster) was perhaps enough to propose that my patch 
may be considered. I responded with historical/empirical evidence that 
perhaps that would actually not be the case.


I was responding, honestly, to the questions raised by Paul's email.


Your comments come across as a passive-aggressive chastisment of the
core devs and the Python-Ideas community for being too quick to reject
useful code: we missed out on something good, because you don't have the
time or energy to deal with our negativity and knee-jerk rejection of
everything good. That's the way your series of posts come across to me.


I apologise if my words or my turn of phrase do not appeal to you. I am 
trying to be constructive with everything I post.


If you choose to interpret my messages in a different way then I'm not 
sure what I can do about that.


Back to the important stuff though:


- you could have offered it to the moreitertools project;


A more efficient version of moreitertools.chunked() is what we're 
talking about.



- you could have published it on PyPy;


Does PyPy support C extension modules? If so, that's a possibility.


- you could have proposed it on Python-Ideas with an explicit statement


I may well do that - my current patch (because of when I did it) is 
against a Py2 codebase, but I could port it to Py3. I still have a 
nagging doubt that I'd be wasting my time though ;)




If
you care so little that you can't be bothered even to propose it, why do
you care if it is rejected?


You are mistaking not caring enough about the functionality with not 
caring enough to enter into an argument about including that 
functionality ...


I didn't propose it at the time because of the reasons I mentioned. But 
when I saw something being discussed yet again that I had a general 
solution for already written I thought I mention it in case it was 
useful. As I said, I'm _trying_ to be constructive.


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-03 Thread Erik

Hi Paul,

On 03/05/17 08:57, Paul Moore wrote:
> On 3 May 2017 at 02:48, Erik <pyt...@lucidity.plus.com> wrote:
>> Anyway, I know you can't stop anyone from *proposing* something like 
this,

>> but as soon as they do you may decide to quote the recipe from
>> "https://docs.python.org/3/library/functions.html#zip; and try to block
>> their proposition. There are already threads on fora that do that.
>>
>> That was my sticking point at the time when I implemented a general
>> solution. Why bother to propose something that (although it made my code
>> significantly faster) had already been blocked as being something that
>> should be a python-level operation and not something to be included in a
>> built-in?
>
> It sounds like you have a reasonable response to the suggestion of
> using zip- that you have a use case where performance matters, and
> your proposed solution is of value in that case.

I don't think so, though.

I had a use-case where splitting an iterable into a sequence of 
same-sized chunks efficiently improved the performance of my code 
significantly (processing a LOT of 24-bit, multi-channel - 16 to 32 - 
PCM streams from a WAV file).

Having thought "I need to split this stream by a fixed number of bytes" 
and then found more_itertools.chunked() (and the 
zip_longest(*([iter(foo)] * num)) trick) it turned out they were not 
quick enough so I implemented itertools.chunked() in C.

That worked well for me, so when I was done I did a search in case it 
was worth proposing as an enhancement to feed it back to the community. 
Then I came across things such as the following:

http://bugs.python.org/issue6021

I am specifically referring to the "It has been rejected before" 
comment, also mentioned here:

https://mail.python.org/pipermail/python-dev/2012-July/120885.html

See this entire thread, too:

https://mail.python.org/pipermail/python-ideas/2012-July/015671.html

This is the reason why I really just didn't care enough to go through 
the process of proposing it in the end (even though the 
more_itertools.chunked function was one of the first 3 implemented in 
V1.0 and seems to _still_ be cropping up all the time in different 
guises - so is perhaps more fundamental than people recognise).

The strong implication of the discussions linked to above is that if it 
had been mentioned before it would be immediately rejected, and that was 
supported by several members of the community in good standing.

So I didn't propose it. I have no idea now what I spent my saved hours 
doing, but I imagine that it was fun

> Whether it's a
> *sufficient* response remains to be seen, but unless you present the
> argument we won't know.

Summary: I didn't present the argument because I'm not a masochist

Regards, E.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-05-02 Thread Erik


On 26/04/17 21:50, Chris Angelico wrote:

On Thu, Apr 27, 2017 at 6:24 AM, Erik <pyt...@lucidity.plus.com> wrote:

The background is that what I find myself doing a lot of for private
projects is importing data from databases into a structured collection of
objects and then grouping and analyzing the data in different ways before
graphing the results.

So yes, I tend to have classes that accept their entire object state as
parameters to the __init__ method (from the database values) and then any
other methods in the class are generally to do with the subsequent analysis
(including dunder methods for iteration, rendering and comparison etc).


You may want to try designing your objects as namedtuples. That gives
you a lot of what you're looking for.


I did look at this. It looked promising.

What I found was that I spent a lot of time working out how to subclass 
namedtuples properly (I do need to do that to add the extra logic - and 
sometimes some state - for my analysis) and once I got that working, I 
was left with a whole different set of boilerplate and special cases and 
therefore another set of things to remember if I return to this code at 
some point.


So I've reverted to regular classes and multiple assignments in __init__.

E.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Add an option for delimiters in bytes.hex()

2017-05-02 Thread Erik


On 02/05/17 12:31, Steven D'Aprano wrote:

I disagree with this approach. There's nothing special about bytes.hex()
here, perhaps we want to format the output of hex() or bin() or oct(),
or for that matter "%x" and any of the other string templates?

In fact, this is a string operation that could apply to any character
string, including decimal digits.

Rather than duplicate the API and logic everywhere, I suggest we add a
new string method. My suggestion is str.chunk(size, delimiter=' ') and
str.rchunk() with the same arguments:

"1234ABCDEF".chunk(4)
=> returns "1234 ABCD EF"


FWIW, I implemented a version of something similar as a fixed-length 
"chunk" method in itertoolsmodule.c (it was similar to izip_longest - it 
had a "fill" keyword to pad the final chunk). It was ~100 LOC including 
the structure definitions. The chunk method was an iterator (so it 
returned a sequence of "chunks" as defined by the API).


Then I read that "itertools" should consist of primitives only and that 
we should defer to "moreitertools" for anything that is of a higher 
level (which this is - it can be done in terms of itertools functions). 
So I didn't propose it, although the processing of my WAV files (in 
which the sample data are groups of bytes - frames - of a fixed length) 
was significantly faster with it :(


I also looked at implementing itertools.chunk as a function that would 
make use of a "__chunk__" method on the source object if it existed 
(which allowed a class to support an even more efficient version of 
chunking - things like range() etc).



I don't see any advantage to adding this to bytes.hex(), hex(), oct(),
bin(), and I really don't think it is helpful to be grouping the
characters by the number of bits. Its a string formatting operation, not
a bit operation.


Why do you want to limit it to strings? Isn't something like this 
potentially useful for all sequences (where the result is a tuple of 
objects that are the same as the source sequence - be that strings or 
lists or lazy ranges or whatever?). Why aren't the chunks returned via 
an iterator?


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-28 Thread Erik

On 28/04/17 10:47, Paul Moore wrote:

On 28 April 2017 at 00:18, Erik <pyt...@lucidity.plus.com> wrote:

The semantics are very different and there's little or no connection
between importing a module and setting an attribute on self.

At the technical level of what goes on under the covers, yes. At the higher
level of what the words mean in spoken English, it's really not so different
a concept.

I disagree. If you were importing into the *class* (instance?) I might
begin to see a connection, but importing into self?

I know you already understand the following, but I'll spell it out 
anyway. Here's a module:

-
$ cat foo.py
def foo():
  global sys
  import sys

  current_namespace = set(globals().keys())
  print(initial_namespace ^ current_namespace)

def bar():
  before_import = set(locals().keys())
  import os
  after_import = set(locals().keys())
  print(before_import ^ after_import)

initial_namespace = set(globals().keys())
-

Now, what happens when I run those functions:

$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import foo
>>> foo.foo()
{'sys', 'initial_namespace'}
>>> foo.bar()
{'before_import', 'os'}
>>>

... so the net effect of "import" is to bind an object into a namespace 
(a dict somewhere). In the case of 'foo()' it's binding the module 
object for "sys" into the dict of the module object that represents 
'foo.py'. In the case of 'bar()' it's binding the module object for "os" 
into the dict representing the local namespace of the current instance 
of the bar() call.

Isn't binding an object to a namespace the same operation that 
assignment performs?

So it's a type of assignment, and one that doesn't require the name to 
be spelled twice in the current syntax (and that's partly why I took 
offense at a suggestion - not by you - that I was picking "random or 
arbitrary" keywords. I picked it for that specific reason).

I realize that there are other semantic changes (importing a module 
twice doesn't do anything - and specifically repeated "import * from 
mod" will not do anything if the module mutates) - and perhaps this is 
your point.

Also, if you try to make the obvious generalisations (which you'd *have* to be 
able to
make due to the way Python works) things quickly get out of hand:

def __init__(self, a):
self import a

self.a = a

OK, but self is just a variable name, so we can reasonably use a different name:

def __init__(foo, a):
foo import a

foo.a = a

So the syntax is  import 

Presumably the following also works, because there's nothing special
about parameters?

def __init__(x, a):
calc = a**2
x import calc

x.calc = calc

And of course there's nothing special about __init__

def my_method(self, a):
self import a

self.a = a

Or indeed about methods

def standalone(a, b):
a import b

a.b = b

or statements inside functions:

if __name __ == '__main__:
a = 12
b = 13
a import b

a.b = b

Hmm, I'd hope for a type error here. But what types would be allowed
for a?

I think you're assuming I'm suggesting some sort of magic around "self" 
or some such thing. I'm not. I've written above exactly what I would 
expect the examples to be equivalent to. It's just an assignment which 
doesn't repeat the name (and in the comma-separated version allows 
several names to be assigned using compact syntax without spelling them 
twice, which is where this whole thing spawned from).

See what I mean? Things get out of hand *very* fast.

I don't see how that's getting "out of hand". The proposal is nothing 
more complicated than a slightly-different spelling of assignment. It 
could be done today with a text-based preprocessor which converts the 
proposed form to an existing valid syntax. Therefore, if it's "out of 
hand" then so is the existing assignment syntax ;)

FWIW, I should probably state for the record that I'm not actually 
pushing for _anything_ right now. I'm replying to questions asked and 
also to statements made which I think have missed the point of what I 
was trying to say earlier. So I'm just engaging in the conversation at 
this point - if it appears confrontational then it's not meant to.

To summarise:

1. There's some serious technical issues with your proposal, which as
far as I can see can only be solved by arbitrary restrictions on how
it can be used

To be honest, I still don't understand what the serious technical issues 
are (other than the parser probably doesn't handle this sort of 
keyword/operator hybrid!). Is it just that I'm seeing the word "import" 
in this context as a type of assignment and you're seeing any reference 
to the word "import" as being a

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-27 Thread Erik


On 27/04/17 23:43, Steven D'Aprano wrote:

On Wed, Apr 26, 2017 at 11:29:19PM +0100, Erik wrote:

def __init__(self, a, b, c):
   self import a, b
   self.foo = c * 100


[snarky]
If we're going to randomly choose arbitrary keywords with no connection
to the operation being performed,


The keyword I chose was not random or arbitrary and it _does_ have a 
connection to the operation being performed (bind a value in the source 
namespace to the target namespace using the same name it had in the 
source namespace - or rename it using the 'as' keyword).



can we use `del` instead of `import`
because it's three characters less typing?


Comments like this just serve to dismiss or trivialize the discussion. 
We acknowledged that we're bikeshedding so it was not a serious 
suggestion, just a "synapse prodder" ...



But seriously, I hate this idea.


Good. It's not a proposal, but something that was supposed to generate 
constructive discussion.



The semantics are very different and there's little or no connection
between importing a module and setting an attribute on self.


At the technical level of what goes on under the covers, yes. At the 
higher level of what the words mean in spoken English, it's really not 
so different a concept.



If we're going to discuss pie-in-the-sky suggestions,


That is just dismissing/trivializing the conversation again.


(If you don't like "inject", I'm okay with "load" or even "push".)


No you're not, because that's a new keyword which might break existing 
code and that is even harder to justify than re-using an existing 
keyword in a different context.



the problem this solves isn't big or
important enough for the disruption of adding a new keyword.


So far, you are the only one to have suggested adding a new keyword, I 
think ;)


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 23:28, Paul Moore wrote:

Or to put it another way, if the only
reason for the syntax proposal is performance then show me a case
where performance is so critical that it warrants a language change.


It's the other way around.

The proposal (arguably) makes the code clearer but does not impact 
performance (and is a syntax error today, so does not break existing code).


The suggestions (decorators etc) make the code (arguably) clearer today 
without a syntax change, but impact performance.


So, those who think the decorators make for clearer code have to choose 
between source code clarity and performance.


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 19:15, Mike Miller wrote:

As the new syntax ideas piggyback on existing syntax, it doesn't feel
like that its a complete impossibility to have this solved.  Could be
another "fixed papercut" to drive Py3 adoption.  Taken individually not
a big deal but they add up.


*sigh* OK, this has occurred to me over the last couple of days but I 
didn't want to suggest it as I didn't want the discussion to fragment 
even more.


But, if we're going to bikeshed and there is some weight behind the idea 
that this "papercut" should be addressed, then given my previous 
comparisons with importing, what about having 'import' as an operator:


def __init__(self, a, b, c):
   self import a, b
   self.foo = c * 100

Also allows renaming:

def __init__(self, a, b, c):
   self import a, b, c as _c

Because people are conditioned to think the comma-separated values after 
"import" are not tuples, perhaps the use of import as an operator rides 
on that wave ...


(I do realise that blurring the lines between statements and operators 
like this is probably not going to work for technical reasons (and it 
just doesn't quite read correctly anyway), but now we're bikeshedding 
and who knows what someone else might come up with in response ...).


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 22:28, Paul Moore wrote:

On 26 April 2017 at 21:51, Erik <pyt...@lucidity.plus.com> wrote:

It doesn't make anything more efficient, however all of the suggestions of
how to do it with current syntax (mostly decorators) _do_ make things less
efficient.


Is instance creation the performance bottleneck in your application?


No, not at all. This discussion has split into two:

1) How can I personally achieve what I want for my own personal 
use-cases. This should really be on -list, and some variation of the 
decorator thing will probably suffice for me.


2) The original proposal, which does belong on -ideas and has to take 
into account the general case, not just my specific use-case.


The post you are responding to is part of (2), and hence reduced 
performance is a consideration.


Regards, E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 01:39, Nathaniel Smith wrote:
[snip discussion of why current augmented assignment operators are 
better for other reasons]



Are there any similar arguments for .=?


It doesn't make anything more efficient, however all of the suggestions 
of how to do it with current syntax (mostly decorators) _do_ make things 
less efficient.


So rather than a win/win as with current augmented assignment 
(compact/clearer code *and* potentially a performance improvement), it's 
now a tradeoff (wordy code *or* a performance reduction).


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 16:10, Nick Timkovich wrote:

I was wondering that if there are so many arguments to a function that
it *looks* ugly, that it might just *be* ugly.

For one, too many required arguments to a function (constructor,
whatever) is already strange. Binding them as attributes of the object,
unmodified in a constructor also seems to be rare.


Yes, and perhaps it's more of a problem for me because of my 
possibly-atypical use of Python.


The background is that what I find myself doing a lot of for private 
projects is importing data from databases into a structured collection 
of objects and then grouping and analyzing the data in different ways 
before graphing the results.


So yes, I tend to have classes that accept their entire object state as 
parameters to the __init__ method (from the database values) and then 
any other methods in the class are generally to do with the subsequent 
analysis (including dunder methods for iteration, rendering and 
comparison etc).


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 18:42, Mike Miller wrote:

I want to be able to say:


def __init__(self, foo, bar, baz, spam):
  self .= foo, bar, spam
  self.baz = baz * 100



I don't see ALL being set a big problem, and less work than typing
several of them out again.


Because, some of the parameters might be things that are just passed to 
another constructor to create an object that is then referenced by the 
object being created.


If one doesn't want the object's namespace to be polluted by that stuff 
(which may be large and also now can't be garbage collected while the 
object is alive) then a set of "del self.xxx" statements is required 
instead, so you've just replaced one problem with another ;)


I'd rather just explicitly say what I want to happen rather than have 
*everything* happen and then have to tidy that up instead ...


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik

On 26/04/17 08:59, Paul Moore wrote:

It should be possible to modify the decorator to take a list
of the variable names you want to assign, but I suspect you won't like
that

Now you're second-guessing me.

> class MyClass:
> @auto_args('a', 'b')
> def __init__(self, a, b, c=None):
> pass

I had forgotten that decorators could take parameters. Something like 
that pretty much ticks the boxes for me.

I'd _prefer_ something that sits inside the method body rather than just 
outside it, and I'd probably _prefer_ something that wasn't quite so 
heavyweight at runtime (which may be an irrational concern on my part 
;)), but those aren't deal breakers, depending on the project - and the 
vast majority of what I do in Python is short-lived one-off projects and 
rapid prototyping for later implementation in another language, so I do 
seem to be fleshing out a set of classes from scratch and writing a 
bunch of __init__ methods far more of the time than people with 
long-lived projects would do. Perhaps that's why it irritates me more 
than it does some others ;)

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-26 Thread Erik


On 26/04/17 13:19, Joao S. O. Bueno wrote:

On 25 April 2017 at 19:30, Erik <pyt...@lucidity.plus.com> wrote:

decorators don't cut it anyway (at least not those
proposed) because they blindly assign ALL of the arguments. I'm more than
happy to hear of something that solves both of those problems without
needing syntax changes though, as that means I can have it today ;)


Sorry -  a decorator won't "blindly assign all argments" - it will do
that just if it is written to do so.


Right, and the three or four variants suggested (and the 
vars(self).update() suggestion) all do exactly that. I was talking about 
the specific responses (though I can see my language is vague).


[FWIW I've been using Python the whole time that decorators have existed 
and I've yet to need to write one - I've _used_ some non-parameterized 
ones though - so I guess I'd forgotten that they can take parameters]


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-25 Thread Erik


On 25/04/17 22:15, Brice PARENT wrote:

it may be easier to get something like this
(I think, as there is no new operator involved) :


No new operator, but still a syntax change, so that doesn't help from 
that POV.




def __init__(self, *args, **kwargs):
  self.* = *args
  self.** = **kwargs


What is "self.* = *args" supposed to do? For each positional argument, 
what name in the object is it bound to?


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Augmented assignment syntax for objects.

2017-04-25 Thread Erik


On 25/04/17 02:15, Chris Angelico wrote:

Bikeshedding: Your example looks a lot more like tuple assignment than
multiple assignment.


Well, originally, I thought it was just the spelling-the-same-name-twice 
thing that irritated me and I was just going to suggest a single 
assignment version like:


  self .= foo
  self .= bar

Then I thought that this is similar to importing (referencing an object 
from one namespace in another under the same name). In that scenario, 
instead of:


  from other import foo
  from other import bar

we have:

  from other import foo, bar

That's where the comma-separated idea came from, and I understand it 
looks like a tuple (which is why I explicitly mentioned that) but it 
does in the import syntax too ;)


The single argument version (though it doesn't help with vertical space) 
still reads better to me for the same reason that augmented assignment 
is clearer - there is no need to mentally parse that the same name is 
being used on both sides of the assignment because it's only spelled once.



self .= foo .= bar .= baz .= spam .= ham


Thanks for being the only person so far to understand that I don't 
necessarily want to bind ALL of the __init__ parameters to the object, 
just the ones I explicitly reference, but I'm not convinced by this 
suggestion. In chained assignment the thing on the RHS is bound to each 
name to the left of it and that is really not happening here.



The trouble is that this syntax is really only going to be used inside
__init__.


Even if that was true, who ever writes one of those? :D

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Augmented assignment syntax for objects.

2017-04-24 Thread Erik

Hi. I suspect that this may have been discussed to death at some point 
in the past, but I've done some searching and I didn't come up with 
much. Apologies if I'm rehashing an old argument ;)


I often find myself writing __init__ methods of the form:

def __init__(self, foo, bar, baz, spam, ham):
  self.foo = foo
  self.bar = bar
  self.baz = baz
  self.spam = spam
  self.ham = ham

This seems a little wordy and uses a lot of vertical space on the 
screen. Occasionally, I have considered something like:


def __init__(self, foo, bar, baz, spam, ham):
  self.foo, self.bar, self.baz, self.spam, self.ham = \
 foo, bar, baz, spam, ham

... just to make it a bit more compact - though in practice, I'd 
probably not do that with a list quite that long ... two or three items 
at most:


def __init__(self, foo, bar, baz):
   self.foo, self.bar, self.baz = foo, bar, baz

When I do that I'm torn because I know it has a runtime impact to create 
and unpack the implicit tuples and I'm also introducing a style 
asymmetry in my code just because of the number of parameters a method 
happens to have.


So why not have an augmented assignment operator for object attributes? 
It addresses one of the same broad issues that the other augmented 
assignment operators were introduced for (that of repeatedly spelling 
names).


The suggestion therefore is:

def __init__(self, foo, bar, baz, spam, ham):
  self .= foo, bar, baz, spam, ham

This is purely syntactic sugar for the original example:

def __init__(self, foo, bar, baz, spam, ham):
  self.foo = foo
  self.bar = bar
  self.baz = baz
  self.spam = spam
  self.ham = ham

... so if any of the attributes have setters, then they are called as 
usual. It's purely a syntactic shorthand. Any token which is not 
suitable on the RHS of the dot in a standard "obj.attr =" assignment is 
a syntax error (no "self .= 1").


The comma-separators in the example are not creating a tuple object, 
they would work at the same level in the parser as the import 
statement's comma-separated lists - in the same way that "from pkg 
import a, b, c" is the same as saying:


import pkg
a = pkg.a
b = pkg.b
c = pkg.c

... "self .= a, b, c" is the same as writing:

self.a = a
self.b = b
self.c = c

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-08 Thread Erik


On 08/03/17 11:07, Steven D'Aprano wrote:

I mentioned earlier that I have code which has to track the type of list
items, and swaps to a different algorithm when the types are not all the
same.


Hmmm. Yes, I guess if the expensive version requires a lot of 
isinstance() messing or similar for each element then it could be better 
to have optimized versions for homogeneous lists of ints or strings etc.



A list.is_heterogeneous() method
could be implemented if it was necessary,


I would prefer to get the list item's type:

if mylist.__type_hint__ is float:


If you know the list is homogeneous then the item's type is 
"type(mylist[0])".


Also, having it be a function call gives an obvious place to put the 
transition from "unknown" to known state if the tri-state hint approach 
was taken. Otherwise, that would have to be hooked into the attribute 
access somehow.


That's for someone who wants to try implementing it to decide and 
propose though :)


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!)

2017-03-07 Thread Erik


On 07/03/17 20:46, Erik wrote:

(unless it
was acceptable that once heterogeneous, a list is always considered
heterogeneous - i.e., delete always sets the hint to NULL).


Rubbish. I meant that delete would not touch the hint at all.

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] For/in/as syntax

2017-03-04 Thread Erik


Hi Brice,

On 04/03/17 08:45, Brice PARENT wrote:

* Creating a real object at runtime for each loop which needs to be
the target of a non-inner break or continue



However, I'm not sure the object should be constructed and fed for every
loop usage. It should probably only be instanciated if explicitly asked
by the coder (by the use of "as loop_name").


That's what I meant by "needs to be the target of a non-inner break or 
continue" (OK, you are proposing something more than just a referenced 
break/continue target, but we are talking about the same thing). Only 
loops which use the syntax get a loop manager object.



* For anything "funky" (my words, not yours ;)), there needs to be a
way of creating a custom loop object - what would the syntax for that
be? A callable needs to be invoked as well as the name bound (the
current suggestion just binds a name to some magical object that
appears from somewhere).



I don't really understand what this means, as I'm not aware of how those
things work in the background.


What I mean is, in the syntax "for spam in ham as eggs:" the name "eggs" 
is bound to your loop manager object. Where is the constructor call for 
this object? what class is it? That's what I meant by "magical".


If you are proposing the ability to create user-defined loop managers 
then there must be somewhere where your custom class's constructor is 
called. Otherwise how does Python know what type of object to create?


Something like (this is not a proposal, just something plucked out of 
the air to hopefully illustrate what I mean):


  for spam in ham with MyLoop() as eggs:
  eggs.continue()


I guess it would be magical in the sense it's not
the habitual way of constructing an object. But it's what we're already
used to with "as". When we use a context manager, like "with
MyPersonalStream() as my_stream:", my_stream is not an object of type
"MyPersonalStream" that has been built using the constructor, but the
return of __enter__()


By you have to spell the constructor (MyPersonalStream()) to see what 
type of object is being created (whether or not the eventual name bound 
in your context is to the result of a method call on that object, the 
constructor of your custom context manager is explicitly called.



If you are saying that the syntax always implicitly creates an instance 
of a builtin class which can not be subclassed by a custom class then 
that's a bit different.




This solution, besides having been explicitly rejected by Guido himself,


I didn't realise that. Dead in the water then probably, which is fine, I 
wasn't pushing it.



brings two functionalities that are part of the proposal, but are not
its main purpose, which is having the object itself. Allowing to break
and continue from it are just things that it could bring to us, but
there are countless things it could also bring (not all of them being
good ideas, of course), like the .skip() and the properties I mentioned,


I understand that, but I concentrated on those because they were easily 
converted into syntax (and would probably be the only things I'd find 
useful - all the other stuff is mostly doable using a custom iterator, I 
think).


I would agree that considering syntax for all of the extra things you 
mention would be a bad idea - which your loop manager object idea gets 
around.



but we could discuss about some methods like forloop.reset(),
forloop.is_first_iteration() which is just of shortcut to (forloop.count
== 0), forloop.is_last_iteration()


Also, FWIW, if I knew that in addition to the overhead of creating a 
loop manager object I was also incurring the overhead of a loop counter 
being maintained (usually, one is not required - if it is, use 
enumerate()) I would probably not use this construct and instead find 
ways of restructuring my code to avoid it using regular for loops.



I'm not beating up on you - like I said, I think the idea is interesting.

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] More classical for-loop

2017-02-18 Thread Erik


On 18/02/17 19:35, Mikhail V wrote:

You mean what my proposal would bring
technically better than e.g.:

for i,e in enumerate(Seq)

Well, nothing, and I will simply use it,
with only difference it could be:

for i,e over enumerate(Seq)

In this case only space holes will be
smoothed out, so pure optical fix.


But you also make the language's structure not make sense. For good or 
bad, English is the language that the keywords are written in so it 
makes sense for the Python language constructs to follow English constructs.


An iterable in Python (something that can be the target of a 'for' loop) 
is a collection of objects (whether they represent a sequence of 
integers, a set of unique values, a list of random things, whatever).


It is valid English to say "for each object in my collection, I will do 
the following:".


It is not valid English to say "for each object over my collection, I 
will do the following:".


In that respect, "in" is the correct keyword for Python to use. In the 
physical world, if the "collection" is some coins in your pocket, would 
you say "for each coin over my pocket, I will take it out and look at it"?


Other than that, I also echo Stephen's comments that not all iterables' 
lengths can be known in advance, and not all iterables can be indexed, 
so looping using length and indexing is a subset of what the 'for' loop 
can do today.


Why introduce new syntax for a restricted subset of what can already be 
done? Soon, someone else will propose another syntax for a different 
subset. This is why people are talking about the "burden" of learning 
these extra syntaxes. Rather than 10 different syntaxes for 10 different 
subsets, why not just learn the one syntax for the general case?


E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Python reviewed

2017-01-09 Thread Erik

On 10/01/17 01:44, Simon Lovell wrote:

Regarding the logical inconsistency of my argument, well I am saying
that I would prefer my redundancy at the end of the loop rather than the
beginning. To say that the status quo is better is to say that you
prefer your redundancy at the beginning.

It's not really that one prefers redundancy anywhere. It's more a 
question of:

a) Does the redundancy have any (however small) benefit?
b) How "expensive" is the redundancy (in this case, that equates to 
mandatory characters typed and subsequent screen noise when reading the 
code).

I don't understand how a "redundancy" of a trailing colon in any 
statement that will introduce a new level of indentation is worse than 
having to remember to type "end" when a dedent (which is zero 
characters) does that.

Trailing colon "cost": 1 * (0.n)
Block end "cost": (len("end") + len(statement_text)) * 1.0

I still struggle to see why it should be
mandatory though?

That looks like a statement, but you've ended it with a question mark. 
Are you asking if you still struggle? I can't tell. Perhaps it's just 
the correct use of punctuation that you're objecting to ;)

> One more comment I wanted to make about end blocks, is that a
> respectable editor will add them for you,

You are now asking me to write code with what you describe as a 
"respectable" editor. I use vim, which is very respectable, thank you. 
You'd like me to use "EditPlus 2" or equivalent. I struggle to see why 
that should be mandatory.

Thanks for starting an entertaining thread, though ;)

E.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

2016-12-30 Thread Erik Bray

On Fri, Dec 30, 2016 at 5:05 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 29 December 2016 at 22:12, Erik Bray <erik.m.b...@gmail.com> wrote:
>>
>> 1) CPython's TLS: Defines -1 as an uninitialized key (by fact of the
>> implementation--that the keys are integers starting from zero)
>> 2) pthreads: Does not definite an uninitialized default value for
>> keys, for reasons described at [1] under "Non-Idempotent Data Key
>> Creation".  I understand their reasoning, though I can't claim to know
>> specifically what they mean when they say that some implementations
>> would require the mutual-exclusion to be performed on
>> pthread_getspecific() as well.  I don't know that it applies here.
>
>
> That section is a little weird, as they describe two requests (one for a
> known-NULL default value, the other for implicit synchronisation of key
> creation to prevent race conditions), and only provide the justification for
> rejecting one of them (the second one).

Right, that is confusing to me as well. I'm guessing the reason for
rejecting the first is in part a way to force us to recognize the
second issue.

> If I've understood correctly, the situation they're worried about there is
> that pthread_key_create() has to be called at least once-per-process, but
> must be called before *any* call to pthread_getspecific or
> pthread_setspecific for a given key. If you do "implicit init" rather than
> requiring the use of an explicit mechanism like pthread_once (or our own
> Py_Initialize and module import locks), then you may take a small
> performance hit as either *every* thread then has to call
> pthread_key_create() to ensure the key exists before using it, or else
> pthread_getspecific() and pthread_setspecific() have to become potentially
> blocking calls. Neither of those is desirable, so it makes sense to leave
> that part of the problem to the API client.
>
> In our case, we don't want the implicit synchronisation, we just want the
> known-NULL default value so the "Is it already set?" check can be moved
> inside the library function.

Okay, we're on the same page here then.  I just wanted to make sure
there wasn't anything else I was missing in Python's case.

>> 3) windows: The return value of TlsAlloc() is a DWORD (unsigned int)
>> and [2] states that its value should be opaque.
>>
>> So in principle we can cover all cases with an opaque struct that
>> contains, as its first member, an is_initialized flag.  The tricky
>> part is how to initialize the rest of the struct (containing the
>> underlying implementation-specific key).  For 1) and 3) it doesn't
>> matter--it can just be zero.  For 2) it's trickier because there's no
>> defined constant value to initialize a pthread_key_t to.
>>
>> Per Nick's suggestion this can be worked around by relying on C99's
>> initialization semantics. Per [3] section 6.7.8, clause 21:
>>
>> """
>> If there are fewer initializers in a brace-enclosed list than there
>> are elements or members of an aggregate, or fewer characters in a
>> string literal used to initialize an array of known size than there
>> are elements in the array, the remainder of the aggregate shall be
>> initialized implicitly the same as objects that have static storage
>> duration.
>> """
>>
>> How objects with static storage are initialized is described in the
>> previous page under clause 10, but in practice it boils down to what
>> you would expect: Everything is initialized to zero, including nested
>> structs and arrays.
>>
>> So as long as we can use this feature of C99 then I think that's the
>> best approach.
>
>
>
> I checked PEP 7 to see exactly which features we've added to the approved C
> dialect, and designated initialisers are already on the list:
> https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
>
> So I believe that would allow the initializer to be declared as something
> like:
>
> #define Py_tss_NEEDS_INIT {.is_initialized = false}

Great!  One could argue about whether or not the designated
initializer syntax also incorporates omitted fields, but it would seem
strange to insist that it doesn't.

Have a happy new year,

Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

2016-12-29 Thread Erik Bray

On Wed, Dec 21, 2016 at 5:07 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 21 December 2016 at 20:01, Erik Bray <erik.m.b...@gmail.com> wrote:
>>
>> On Wed, Dec 21, 2016 at 2:10 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
>> > Option 2: Similar to option 1, but using a custom type alias, rather
>> > than
>> > using a C99 bool directly
>> >
>> > The closest API we have to these semantics at the moment would be
>> > PyGILState_Ensure, so the following API naming might work for option 2:
>> >
>> > Py_ensure_t
>> > Py_ENSURE_NEEDS_INIT
>> > Py_ENSURE_INITIALIZED
>> >
>> > Respectively, these would just be aliases for bool, false, and true.
>> >
>> > And then modify the proposed PyThread_tss_create and PyThread_tss_delete
>> > APIs to accept a "Py_ensure_t *init_flag" in addition to their current
>> > arguments.
>>
>> That all sounds good--between the two option 2 looks a bit more explicit.
>>
>> Though what about this?  Rather than adding another type, the original
>> proposal could be changed slightly so that Py_tss_t *is* partially
>> defined as a struct consisting of a bool, with whatever the native TLS
>> key is.   E.g.
>>
>> typedef struct {
>> bool init_flag;
>> #if defined(_POSIX_THREADS)
>> pthreat_key_t key;
>> #elif defined (NT_THREADS)
>> DWORD key;
>> /* etc... */
>> } Py_tss_t;
>>
>> Then it's just taking Masayuki's original patch, with the global bool
>> variables, and formalizing that by combining the initialized flag with
>> the key, and requiring the semantics you described above for
>> PyThread_tss_create/delete.
>>
>> For Python's purposes it seems like this might be good enough, with
>> the more general purpose pthread_once-like functionality not required.
>
>
> Aye, I also thought of that approach, but talked myself out of it since
> there's no definable default value for pthread_key_t. However, C99 partial
> initialisation may deal with that for us (by zeroing the memory without
> actually assigning a typed value to it), and if it does, I agree it would be
> better to handle the initialisation flag automatically rather than requiring
> callers to do it.

I think I understand what you're saying here...  To be clear, let me
enumerate the three currently supported cases and how they're
affected:

1) CPython's TLS: Defines -1 as an uninitialized key (by fact of the
implementation--that the keys are integers starting from zero)
2) pthreads: Does not definite an uninitialized default value for
keys, for reasons described at [1] under "Non-Idempotent Data Key
Creation".  I understand their reasoning, though I can't claim to know
specifically what they mean when they say that some implementations
would require the mutual-exclusion to be performed on
pthread_getspecific() as well.  I don't know that it applies here.
3) windows: The return value of TlsAlloc() is a DWORD (unsigned int)
and [2] states that its value should be opaque.

So in principle we can cover all cases with an opaque struct that
contains, as its first member, an is_initialized flag.  The tricky
part is how to initialize the rest of the struct (containing the
underlying implementation-specific key).  For 1) and 3) it doesn't
matter--it can just be zero.  For 2) it's trickier because there's no
defined constant value to initialize a pthread_key_t to.

Per Nick's suggestion this can be worked around by relying on C99's
initialization semantics. Per [3] section 6.7.8, clause 21:

"""
If there are fewer initializers in a brace-enclosed list than there
are elements or members of an aggregate, or fewer characters in a
string literal used to initialize an array of known size than there
are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage
duration.
"""

How objects with static storage are initialized is described in the
previous page under clause 10, but in practice it boils down to what
you would expect: Everything is initialized to zero, including nested
structs and arrays.

So as long as we can use this feature of C99 then I think that's the
best approach.

[1] 
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
[2] 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).aspx
[3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

2016-12-21 Thread Erik Bray

On Wed, Dec 21, 2016 at 11:01 AM, Erik Bray <erik.m.b...@gmail.com> wrote:
> That all sounds good--between the two option 2 looks a bit more explicit.
>
> Though what about this?  Rather than adding another type, the original
> proposal could be changed slightly so that Py_tss_t *is* partially
> defined as a struct consisting of a bool, with whatever the native TLS
> key is.   E.g.
>
> typedef struct {
> bool init_flag;
> #if defined(_POSIX_THREADS)
> pthreat_key_t key;

*pthread_key_t* of course, though I wonder if that was a Freudian slip :)

> #elif defined (NT_THREADS)
> DWORD key;
> /* etc... */
> } Py_tss_t;
>
> Then it's just taking Masayuki's original patch, with the global bool
> variables, and formalizing that by combining the initialized flag with
> the key, and requiring the semantics you described above for
> PyThread_tss_create/delete.
>
> For Python's purposes it seems like this might be good enough, with
> the more general purpose pthread_once-like functionality not required.

Of course, that's not to say it might not be useful for some other
purpose, but then it's outside the scope of this discussion as long as
it isn't needed for TLS key initialization.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

2016-12-21 Thread Erik Bray

On Wed, Dec 21, 2016 at 2:10 AM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 21 December 2016 at 01:35, Masayuki YAMAMOTO <ma3yuki.8mam...@gmail.com>
> wrote:
>>
>> 2016-12-20 22:30 GMT+09:00 Erik Bray <erik.m.b...@gmail.com>:
>>>
>>> This is probably an implementation detail, but ISTM that even with
>>> PyThread_call_once, it will be necessary to reset any used once_flags
>>> manually in PyOS_AfterFork, essentially for the same reason the
>>> autoTLSkey is reset there currently...
>>
>>
>> Deleting threads key is executed on *_Fini functions, but Py_FinalizeEx
>> function that calls *_Fini functions doesn't terminate CPython interpreter.
>> Furthermore, source comment and document have said description about
>> reinitialization after calling Py_FinalizeEx. [1] [2] That is to say there
>> is an implicit possible that is reinitialization contrary to name
>> "call_once" on a process level. Therefore, if CPython interpreter continues
>> to allow reinitialization, I'd suggest to rename the call_once API to avoid
>> misreading semantics. (for example, safe_init, check_init)
>
>
> Ouch, I'd missed that, and I agree it's not a negligible implementation
> detail - there are definitely applications embedding CPython out there that
> rely on being able to run multiple Initialize/Finalize cycles in the same
> process and have everything "just work". It also means using the
> "PyThread_*" prefix for the initialisation tracking aspect would be
> misleading, since the life cycle details are:
>
> 1. Create the key for the first time if it has never been previously set in
> the process
> 2. Destroy and reinit if Py_Finalize gets called
> 3. Destroy and reinit if a new subprocess is forked
>
> It also means we can't use pthread_once even in the pthread TLS
> implementation, since it doesn't provide those semantics.
>
> So I see two main alternatives here.
>
> Option 1: Modify the proposed PyThread_tss_create and PyThread_tss_delete
> APIs to accept a "bool *init_flag" pointer in addition to their current
> arguments.
>
> If *init_flag is true, then PyThread_tss_create is a no-op, otherwise it
> sets the flag to true after creating the key.
> If *init_flag is false, then PyThread_tss_delete is a no-op, otherwise it
> sets the flag to false after deleting the key.
>
> Option 2: Similar to option 1, but using a custom type alias, rather than
> using a C99 bool directly
>
> The closest API we have to these semantics at the moment would be
> PyGILState_Ensure, so the following API naming might work for option 2:
>
> Py_ensure_t
> Py_ENSURE_NEEDS_INIT
> Py_ENSURE_INITIALIZED
>
> Respectively, these would just be aliases for bool, false, and true.
>
> And then modify the proposed PyThread_tss_create and PyThread_tss_delete
> APIs to accept a "Py_ensure_t *init_flag" in addition to their current
> arguments.

That all sounds good--between the two option 2 looks a bit more explicit.

Though what about this?  Rather than adding another type, the original
proposal could be changed slightly so that Py_tss_t *is* partially
defined as a struct consisting of a bool, with whatever the native TLS
key is.   E.g.

typedef struct {
bool init_flag;
#if defined(_POSIX_THREADS)
pthreat_key_t key;
#elif defined (NT_THREADS)
DWORD key;
/* etc... */
} Py_tss_t;

Then it's just taking Masayuki's original patch, with the global bool
variables, and formalizing that by combining the initialized flag with
the key, and requiring the semantics you described above for
PyThread_tss_create/delete.

For Python's purposes it seems like this might be good enough, with
the more general purpose pthread_once-like functionality not required.

Best,
Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

On Mon, Dec 19, 2016 at 3:45 PM, Erik Bray <erik.m.b...@gmail.com> wrote:
> On Mon, Dec 19, 2016 at 1:11 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
>> On 17 December 2016 at 03:51, Antoine Pitrou <solip...@pitrou.net> wrote:
>>>
>>> On Fri, 16 Dec 2016 13:07:46 +0100
>>> Erik Bray <erik.m.b...@gmail.com> wrote:
>>> > Greetings all,
>>> >
>>> > I wanted to bring attention to an issue that's been languishing on the
>>> > bug tracker since last year, which I think would best be addressed by
>>> > changes to CPython's C-API.  The original issue is at
>>> > http://bugs.python.org/issue25658, but I have made an effort below in
>>> > a sort of proto-PEP to summarize the problem and the proposed
>>> > solution.
>>> >
>>> > I haven't written this up in the proper PEP format because I want to
>>> > see if the idea has some broader support first, and it's also not
>>> > clear to me whether C-API changes (especially to undocumented APIs)
>>> > even require their own PEP.
>>>
>>> This is a nice detailed write-up and I'm in favour of the proposal.
>>
>>
>> Likewise - we know the status quo isn't right, and the proposed change
>> addresses that. In reviewing the patch on the tracker, the one downside I've
>> found is that due to "pthread_key_t" being an opaque type with no defined
>> sentinel, the consuming code in _tracemalloc.c and pystate.c needed to add
>> separate boolean flag variables to track whether or not the key had been
>> created. (The pthread examples at
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
>> use pthread_once for a similar effect)
>>
>> I don't see any obvious way around that either, as even using a small struct
>> for native pthread TLS keys would still face the problem of how to
>> initialise the pthread_key_t field.
>
> Hmm...fair point that it's not pretty.  One way around it, albeit
> requiring more work/complexity, would be to extend this proposal to
> add a new function analogous to pthread_once--say--PyThread_call_once,
> and an associated Py_once_flag_t

Oops--fat-fingered a 'send' command before I finished.

So  workaround would be to add a PyThread_call_once function,
analogous to pthread_once.  Yet another interface one needs to
implement for a native thread implementation, but not too hard either.
For pthreads there's already an obvious analogue that can be wrapped
directly.  For other platforms that don't have a direct analogue a
(naive) implementation is still fairly simple: All you need in
Py_once_flag_t is a boolean flag with an associated mutex, and a
sentinel value analogous to PTHREAD_ONCE_INIT.

Best,
Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

On Mon, Dec 19, 2016 at 1:11 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 17 December 2016 at 03:51, Antoine Pitrou <solip...@pitrou.net> wrote:
>>
>> On Fri, 16 Dec 2016 13:07:46 +0100
>> Erik Bray <erik.m.b...@gmail.com> wrote:
>> > Greetings all,
>> >
>> > I wanted to bring attention to an issue that's been languishing on the
>> > bug tracker since last year, which I think would best be addressed by
>> > changes to CPython's C-API.  The original issue is at
>> > http://bugs.python.org/issue25658, but I have made an effort below in
>> > a sort of proto-PEP to summarize the problem and the proposed
>> > solution.
>> >
>> > I haven't written this up in the proper PEP format because I want to
>> > see if the idea has some broader support first, and it's also not
>> > clear to me whether C-API changes (especially to undocumented APIs)
>> > even require their own PEP.
>>
>> This is a nice detailed write-up and I'm in favour of the proposal.
>
>
> Likewise - we know the status quo isn't right, and the proposed change
> addresses that. In reviewing the patch on the tracker, the one downside I've
> found is that due to "pthread_key_t" being an opaque type with no defined
> sentinel, the consuming code in _tracemalloc.c and pystate.c needed to add
> separate boolean flag variables to track whether or not the key had been
> created. (The pthread examples at
> http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
> use pthread_once for a similar effect)
>
> I don't see any obvious way around that either, as even using a small struct
> for native pthread TLS keys would still face the problem of how to
> initialise the pthread_key_t field.

Hmm...fair point that it's not pretty.  One way around it, albeit
requiring more work/complexity, would be to extend this proposal to
add a new function analogous to pthread_once--say--PyThread_call_once,
and an associated Py_once_flag_t
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython

On Sat, Dec 17, 2016 at 8:21 AM, Stephen J. Turnbull
<turnbull.stephen...@u.tsukuba.ac.jp> wrote:
> Erik Bray writes:
>
>  > Abstract
>  > 
>  >
>  > The proposal is to add a new Thread Local Storage (TLS) API to CPython
>  > which would supersede use of the existing TLS API within the CPython
>  > interpreter, while deprecating the existing API.
>
> Thank you for the analysis!

And thank *you* for the feedback!

> Question:
>
>  > Further, the old PyThread_*_key* functions will be marked as
>  > deprecated.
>
> Of course, but:
>
>  > Additionally, the pthread implementations of the old
>  > PyThread_*_key* functions will either fail or be no-ops on
>  > platforms where sizeof(pythead_t) != sizeof(int).
>
> Typo "pythead_t" in last line.

Thanks, yes, that was suppose to be pthread_key_t of course.  I think
I had a few other typos too.

> I don't understand this.  I assume that there are no such platforms
> supported at present.  I would think that when such a platform becomes
> supported, code supporting "key" functions becomes unsupportable
> without #ifdefs on that platform, at least directly.  So you should
> either (1) raise UnimplementedError, or (2) provide the API as a
> wrapper over the new API by making the integer keys indexes into a
> table of TSS'es, or some such device.  I don't understand how (3)
> "make it a no-op" can be implemented for PyThread_create_key -- return
> 0 or -1?  That would only work if there's a failure return status like
> 0 or -1, and it seems really dangerous to me since in general a lot of
> code doesn't check status even though it should.  Even for code
> checking the status, the error message will be suboptimal ("creation
> failed" vs. "unimplemented").

Masayuki already explained this downthread I think, but I could have
probably made that section more precise.  The point was that
PyThread_create_key should immediately return -1 in this case.  This
is just a subtle difference over the current situation, which is that
PyThread_create_key succeeds, but the key is corrupted by being cast
to an int, so that later calls to PyThread_set_key_value and the like
fail unexpectedly.  The point is that PyThread_create_key (and we're
only talking about the pthread implementation thereof, to be clear)
must fail immediately if it can't work correctly.

#ifdefs on the platform would not be necessary--instead, Masayuki's
patch adds a feature check in configure.ac for sizeof(int) ==
sizeof(pthread_key_t).  It should be noted that even this check is not
100% perfect, as on Linux pthread_key_t is an unsigned int, and so
technically can cause Python's signed int key to overflow, but there's
already an explicit check for that (which would be kept), and it's
also a very unlikely scenario.

> I gather from references to casting pthread_key_t to unsigned int and
> back that there's probably code that does this in ways making (2) too
> dangerous to support.  If true, perhaps that should be mentioned here.

It's not necessarily too dangerous, so much as not worth the trouble,
IMO.  Simpler to just provide, and immediately use the new API and
make the old one deprecated and explicitly not supported on those
platforms where it can't work.

Thanks,
Erik
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] New PyThread_tss_ C-API for CPython