Re: [Python-ideas] Syntax for allowing extra keys when unpacking a dict as keyword arguments

2019-04-12 Thread Nathaniel Smith
I don't think it's possible to make this work reliably. In particular, it's
an important feature of python that you can make wrappers that pass through
arguments and are equivalent to the original function:

def original(a=0):
...

def wrapper(*args, **kwargs):
return original(*args, **kwargs)

Right now these can be called in exactly the same ways. But with the
proposal they would become different:

# ok
original(***{"a": 1, "b": 2})
# raises TypeError
wrapper(***{"a": 1, "b": 2})

The problem is that the extra star gets lost when passing through the
wrapper.

In this case you might be able to fix this by using functools.wraps to fix
up the signature introspection metadata, but that doesn't work in more
complex cases (e.g. when the wrapper adds/removes some args while passing
through the rest). In Python, signature introspection is a best-effort
thing, and IME not super reliable.

-n

On Fri, Apr 12, 2019, 08:11 Viktor Roytman  wrote:

> Currently, unpacking a dict in order to pass its items as keyword
> arguments to a function will fail if there are keys present in the dict
> that are invalid keyword arguments:
>
> >>> def func(*, a):
> ... pass
> ...
> >>> func(**{'a': 1, 'b': 2})
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: func() got an unexpected keyword argument 'b'
>
> The standard approach I have encountered in this scenario is to pass in
> the keyword arguments explicitly like so
>
> func(
> a=kwargs_dict["a"],
> b=kwargs_dict["b"],
> c=kwargs_dict["c"],
> )
>
> But this grows more cumbersome as the number of keyword arguments grows.
>
> There are a number of other workarounds, such as using a dict
> comprehension to select only the required keys, but I think it would be
> more convenient to have this be a feature of the language. I don't know
> what a nice syntax for this would be, or even how feasible it is.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Sorted lists

2019-04-08 Thread Nathaniel Smith
On Mon, Apr 8, 2019, 02:09 Steven D'Aprano  wrote:

> On Sun, Apr 07, 2019 at 08:26:24PM -0700, Nathaniel Smith wrote:
> > On Sun, Apr 7, 2019 at 7:37 PM Steven D'Aprano 
> wrote:
> > > There are quite a few important algorithms which require lists to be
> > > sorted. For example, the bisect module, and for statistics median and
> > > other quantiles.
> >
> > But this flag doesn't affect those modules, right? 'bisect' already
> > requires the user to ensure that the list is sorted appropriately
>
> Naturally the bisect and statistics modules (or any other that requires
> sorting) won't change to inspect this flag by magic, the code will
> require modification.


Right, by "doesn't affect" I meant "cannot get any benefit, even if their
code is modified".

Possibly the maintainer of bisect may decide that its not worth the
> change. But for the statistics module, I would certainly change the
> implementation of median() to look something vaguely like this:
>
> # was
> data = sorted(data)  # may be expensive if data is large
>
> # may become
> if not (isinstance(data, list) and data.__issorted__):
> data = sorted(data)


>
> statistics is soon to grow a quantiles() function, but the thing with
> quantiles is often you want to get a bunch of them:
>
> # This only makes sense if data is a sequence (list)
> # not an iterator.
> quartiles = quantiles(data, n=4)
> quintiles = quantiles(data, n=5)
> deciles = quantiles(data, n=10)
> percentiles = quantiles(data, n=100)
>

If only we had some kind of API that could compute multiple quantiles at
the same time...


>
> That's four calls to sorted(). The caller could drop that down to one:
>
> data.sort()
> quartiles = ... etc
>
>
> Now before anyone else mentions it, we could give the function a
> "dont_sort" argument, or "already_sorted" if you prefer, but I dislike
> that kind of constant-bool parameter and would prefer to avoid it.
>
>
> > and this bit:
> >
> > > The flag doesn't guarantee that the list is sorted the way you want
> > > (e.g. biggest to smallest, by some key, etc) only that it has been
> > > sorted. Its up to the user to ensure they sort it the right way:
> >
> > ...seems to mean that the 'statistics' module can't use this flag either.
>
> "Consenting adults" applies. If you want to pass an unsorted list to the
> functions, but pretend that its sorted, then on your own head be it.
> There's no real difference between these two hypothetical scenarios:
>
> data = [1, 4, 2, 0, 5, 3]
> garbage = median(data, already_sorted=True)
>
> versus:
>
> data = [1, 4, 2, 0, 5, 3]
> data.__issorted__ = True
> garbage = median(data)
>
>
> I'm perfectly comfortable with allowing the caller to lie if they want.
> Its their own foot they're shooting.
>

An already_sorted=True argument would be an explicit opt in, and consenting
adults would apply. But your message was very explicit that __issorted__
can be set implicitly, though. For example, this would give garbage results:

# implicitly sets the sorted flag
data.sort()
# preserves the flag, because hey it's sorted by *some* key
data.reverse()
statistics.median(data)

You can't use this in statistics.median because it would break
compatibility. Also, isn't the whole point of 'statistics' to be the
simple, reliable module for folks who aren't that worried about speed? This
seems like a massive footgun.


> (I wouldn't be so blasé about this if it were a function written in C
> that could segfault if the list wasn't sorted.)
>

Silently giving the wrong answer is way worse than a segfault.

> It doesn't seem very likely to me that the savings from this flag
> > could outweigh the extra overhead it introduces, just because list
> > operations are *so* common in Python. If you want to push this
> > forward, the thing I'd most like to see is some kind of measurements
> > to demonstrate that average programs will benefit.
>
> I'm not sure that the average program uses sort *at all*, so a better
> set of questions are:
>
> - how much will this hurt the average program?
>   (my gut feeling is "very little", but we'd need benchmarks to know)
>
> - are there any other use-cases for sorted data that could benefit?
>
> - how much will they benefit?
>
> Let's say, for the sake of the argument that this proposal makes the
> average program 0.01% slower, but helps sorting-heavy programs be 2%
> faster when dealing with large lists, then I think that might be a win.
>

Obviously these are made up numbers, but if they were real then for it to
be a net win you would still need at least 1 in 200 programs to be "sorting
heavy" in a way that could benefit from this flag, and I don't believe
that's true.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Sorted lists

2019-04-07 Thread Nathaniel Smith
On Sun, Apr 7, 2019 at 7:37 PM Steven D'Aprano  wrote:
> There are quite a few important algorithms which require lists to be
> sorted. For example, the bisect module, and for statistics median and
> other quantiles.

But this flag doesn't affect those modules, right? 'bisect' already
requires the user to ensure that the list is sorted appropriately, and
this bit:

> The flag doesn't guarantee that the list is sorted the way you want
> (e.g. biggest to smallest, by some key, etc) only that it has been
> sorted. Its up to the user to ensure they sort it the right way:

...seems to mean that the 'statistics' module can't use this flag either.

It doesn't seem very likely to me that the savings from this flag
could outweigh the extra overhead it introduces, just because list
operations are *so* common in Python. If you want to push this
forward, the thing I'd most like to see is some kind of measurements
to demonstrate that average programs will benefit.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add output() helper function to subprocess module

2019-04-04 Thread Nathaniel Smith
On Thu, Apr 4, 2019 at 1:59 AM Greg Ewing  wrote:
>
> Nathaniel Smith wrote:
> > On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing  
> > wrote:
> >>output(args) --> (status, output)
> >
> > Isn't this already available as: run(args, stdout=PIPE)?
>
> Yes, but you need to do more than that to get the output
> as a string. This is the relevant part of the implementation
> of check_output():
>
>  process = Popen(stdout=PIPE, *popenargs, **kwargs)
>  output, unused_err = process.communicate()
>  retcode = process.poll()

>>> from subprocess import run, pipe
>>> p = run(["grep", "njs", "/etc/passwd"], stdout=PIPE)
>>> p.returncode
0
>>> p.stdout
b'njs:x:1000:1000:Nathaniel J. Smith,,,:/home/njs:/usr/bin/zsh\n'

I do think it's a bit weird that you write 'stdout=PIPE' to mean
'please capture stdout' – it's leaking an internal implementation
detail across an abstraction boundary. But it's documented, and run()
allows any combination of check=True/False, capturing stdout or not,
and capturing stderr or not, without having to invent 8 different
functions.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add output() helper function to subprocess module

2019-04-04 Thread Nathaniel Smith
On Thu, Apr 4, 2019 at 12:48 AM Greg Ewing  wrote:
>
> The check_output() function of the subprocess module raises an
> exception if the process returns a non-zero exit status. This is
> inconvenient for commands such as grep that use the return
> status to indicate something other than success or failure.
>
> The check_call() function has a companion call(), but here is
> currently no non-checking companion for check_call(). How
> about adding one with a signature such as
>
> output(args) --> (status, output)

Isn't this already available as: run(args, stdout=PIPE)? Is the object
to the extra typing, or...?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Built-in parsing library

2019-04-01 Thread Nathaniel Smith
On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen  wrote:
> Installing a package out of stdlib does not solve the problem that motivated 
> this thread. The libraries included in the stdlib can't use those parsers.

Can you be more specific about exactly which code in the stdlib you
think should be rewritten to use a parsing library?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] New Project to Capture summaries from this

2019-03-28 Thread Nathaniel Smith
On Thu, Mar 28, 2019 at 4:52 PM Steven D'Aprano  wrote:
> On Thu, Mar 28, 2019 at 03:25:34PM -, Richard Whitehead wrote:
> > Chris,
> >
> > As a new member to this list, I can tell you that searching for relevant old
> > content was effectively impossible, so I'm all for some way of doing that.
>
> "Effectively impossible" is a gross exaggeration.
>
> The old mailman built-in search functionality is not fantastic, but it's
> not useless either, and more importantly, Google does a great job of
> indexing the archives.

It really doesn't. I often need to look up specific emails in the
mail.python.org archives, that I remember seeing or writing, in order
to link to them. IME, Google never works for this. For whatever
reason, most pages on mail.python.org are not included in Google's
index.

For example, here's a post of yours from a few weeks ago:
https://mail.python.org/pipermail/python-ideas/2019-March/055911.html

AFAICT, it is not possible to find that post with Google. For example,
doing a site-restricted search with an exact quote from your email
says that there are no pages that match:
https://www.google.com/search?q="Is+that+common+enough+that+it+needs+to+be+built-in+to+dict+itself%3F"+site%3Amail.python.org

(I also just tried a few variants of that search on Bing and
DuckDuckGo, and they both failed as well.)

The only reliable way that I know of to find emails on mail.python.org
is to (a) find the email in my MUA's archives, (b) note author and the
date it was sent, (c) navigate through the mailman archives 'date'
index to narrow things down, and then click around manually until I
find the post I'm looking for.

I don't think this proves we should switch to using Github issues or
something instead. But I do think we should listen when people say
that they're struggling with something, instead of dismissing their
concerns.

-n

--
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Simpler thread synchronization using "Sticky Condition"

2019-03-26 Thread Nathaniel Smith
On Tue, Mar 26, 2019, 09:50 Richard Whitehead 
wrote:

> Nathaniel,
>
> Thanks very much for taking the time to comment.
>
> Clearing the event after waiting for it will introduce a race condition:
> if
> the sender has gone around its loop again and set the semaphore after we
> have woken but before we've cleared it.


Sounds fine to me. Why is that a problem? Can you write down an example of
how two threads could be interleaved to produce incorrect results?

As you said, this stuff is tricky!
> The only safe way is to make the wait-and-clear atomic, which can be done
> with a lock; and this comes essentially back to what I'm proposing.
>
> I realise this is not a fundamental new primitive - if it was, I wouldn't
> be
> able to build it in pure Python - but I've found it extremely useful in
> our
> generic threading and processing library.
>
> You're right about what you say regarding queues; I didn't want to go into
> the full details of the multi-threading and multi-processing situation at
> hand, but I will say that we have a pipeline of tasks that can run as
> either
> threads or processes, and we want to make it easy to construct this
> pipeline, "wiring" it as necessary; combining command queues with data
> queues just gets a real mess.
>

But you're effectively implementing a multi-producer single-consumer Queue
anyway, so without any details it's hard to guess why using a Queue would
be messier. I know you don't want to get into too many details, but if the
whole motivation for your proposal is based on some details then it's
usually a good idea to explain them :-).

-n

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Simpler thread synchronization using "Sticky Condition"

2019-03-26 Thread Nathaniel Smith
These kinds of low-level synchronization primitives are notoriously
tricky, yeah, and I'm all in favor of having better higher-level
tools. But I'm not sure that AutoResetEvent adds enough to be worth
it.

AFAICT, you can get this behavior with an Event just fine – using your
pseudocode:

def sender():
 while alive():
   wait_for_my_data_from_hardware()
   send_data_to_receiver()
   auto_event.set()

def receiver():
 while alive():
   auto_event.wait()
   auto_event.clear()   # <-- this line added
   receive_all_data_from_sender()
   process_data()

It's true that if we use a regular Event then the .clear() doesn't
happen atomically with the wakeup, but that doesn't matter. If we call
auto_event.set() and then have new data arrive, then there are two
cases:

1) the new data early enough to be seen by the current call to
receive_all_data_from_sender(): this is fine, the new data will be
processed in this call
2) the new data arrives too late to be seen by the current call to
receive_all_data_from_sender(): that means the new data arrived after
the call to receive_all_data_from_sender() started, which means it
arrived after auto_event.clear(), which means that the call to
auto_event.set() will successfully re-arm the event and another call
to receive_all_data_from_sender() will happen immediately

That said, this is still very tricky. It requires careful analysis,
and it's not very general (for example, if we want to support multiple
receivers than we need to throw out the whole approach and do
something entirely different). In Trio we've actually discussed
removing Event.clear(), since it's so difficult to use correctly:
https://github.com/python-trio/trio/issues/637

You said your original problem is that you have multiple event
sources, and the receiver needs to listen to all of them. And based on
your approach, I guess you only have one receiver, and that it's OK to
couple all the event sources directly to this receiver (i.e., you're
OK with passing them all a Condition object to use).

Under these circumstances, wouldn't it make more sense to use a single
Queue, pass it to all the sources, and have them each do
queue.put((source_id, event))? That's simple to implement, hard to
mess up, and can easily be extended to multiple receivers.

If you want to further decouple the sources from the receiver, then
one approach would be to have each source expose its own Queue
independently, and then define some kind of 'select' operation (like
in Golang/CSP/concurrent ML) to let the receiver read from multiple
Queues simultaneously. This is non-trivial to do, but in return you
get a very general and powerful construct. There's some more links and
discussion here: https://github.com/python-trio/trio/issues/242

> Regarding the link you sent, I don't entirely agree with the opinion 
> expressed: if you try to use a Semaphore for this purpose you will soon find 
> that it is "the wrong way round", it is intended to protect resources from 
> multiple accesses, not to synchronize those multiple accesses

Semaphores are extremely generic primitives – there are a lot of
different ways to use them. I think the blog post is correct that an
AutoResetEvent is equivalent to a semaphore whose value is clamped so
that it can't exceed 1. Your 'auto_event.set()' would be implemented
as 'sem.release()', and 'auto_event.wait()' would be 'sem.acquire()'.

I guess technically the semantics might be slightly different when
there are multiple waiters: the semaphore wakes up exactly one waiter,
while I'm not sure what your AutoResetEvent would do. But I can't see
any way to use AutoResetEvent reliably with multiple waiters anyway.


-n

--
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] unittest: 0 tests pass means failure of the testsuite

2019-03-06 Thread Nathaniel Smith
On Wed, Mar 6, 2019 at 12:13 PM Matěj Cepl  wrote:
>
> Hi,
>
> I am a lead maintainer of Python packages in OpenSUSE and I can
> see the pattern of many packagers adding blindly
>
> python setup.py test
>
> to %check section of our SPEC file. The problem is that if the
> package doesn't use unittest (it actually uses nose, pytest or
> something), it could lead to zero found tests, which pass and
> Python returns exit code 0 (success) even though nothing has been
> tested. It seems from the outside that everything is all right,
> package is being tested on every build, but actually it is lie.
>
> Would it be possible to change unittest runner, so that when 0
> tests pass, whole test suite would end up failing?

You probably want to file a bug on the setuptools tracker:
https://github.com/pypa/setuptools

It's maintained by different people than Python itself, and is
responsible for defining 'setup.py test'.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Allow creation of polymorph function (async function executable syncronously)

2019-03-06 Thread Nathaniel Smith
On Wed, Mar 6, 2019 at 4:37 PM pylang  wrote:
>> def maybe_async(fn):
>> @functools.wraps(fn)
>> def wrapper(*args, **kwargs):
>> coro = fn(*args, **kwargs)
>> if asyncio.get_running_loop() is not None:
>> return coro
>> else:
>> return await coro
>
> I was unable to run his example as-is (in Python 3.6 at least) since the 
> `await` keyword is only permitted inside an `async def` function.

Oh yeah, that was a brain fart. I meant to write:

def maybe_async(fn):
@functools.wraps(fn)
def wrapper(*args, **kwargs):
coro = fn(*args, **kwargs)
if asyncio.get_running_loop() is not None:
return coro
else:
return asyncio.run(coro)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] dict.merge(d1, d2, ...) (Counter proposal for PEP 584)

2019-03-05 Thread Nathaniel Smith
On Mon, Mar 4, 2019 at 11:41 PM INADA Naoki  wrote:
> Then, I propose `dict.merge` method.  It is outer-place version
> of `dict.update`, but accepts multiple dicts.  (dict.update()
> can be updated to accept multiple dicts, but it's not out of scope).
>
> * d = d1.merge(d2)  # d = d1.copy(); d.update(d2)
> * d = d1.merge(d2, d3)  # d = d1.copy(); d.update(d2); d2.update(d3)
> * d = d1.merge(iter_of_pairs)
> * d = d1.merge(key=value)

Another similar option would be to extend the dict constructor to
allow: d = dict(d1, d2, d3, ...)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Potential PEP: with/except

2019-01-22 Thread Nathaniel Smith
The first concern that comes to my mind is... When I see:

with:
...
except:
...

Is that a shorthand for

try:
with:
...
except:
...

or for

with:
try:
...
except:
...

? Both are plausible, and it makes a big difference, because 'with' already
has an implicit 'except' block built in.

-n

On Tue, Jan 22, 2019, 12:12 Paul Ferrell  I've found that almost any time I'm writing a 'with' block, it's doing
> something that could throw an exception. As a result, each of those
> 'with' blocks needs to be nested within a 'try' block. Due to the
> nature of 'with', it is rarely (if ever) the case that the try block
> contains anything other than the with block itself.
>
> As a result, I would like to propose that the syntax for 'with' blocks
> be changed such that they can be accompanied by 'except', 'finally',
> and/or 'else' blocks as per a standard 'try' block. These would handle
> exceptions that occur in the 'with' block, including the execution of
> the applicable __enter__ and __exit__ methods.
>
> Example:
>
> try:
> with open(path) as myfile:
>   ...   # Do stuff with file
> except (OSError, IOError) as err:
> logger.error("Failed to read/open file {}: {}".format(path, err)
>
> The above would turn into simply:
>
> with open(path) as myfile:
> ... # Do stuff with file
> except (OSError, IOError) as err:
> logger.error(...)
>
>
> I think this is rather straightforward in meaning and easy to read,
> and simplifies some unnecessary nesting. I see this as the natural
> evolution of what 'with'
> is all about - replacing necessary try-finally blocks with something
> more elegant. We just didn't include the 'except' portion.
>
> I'm a bit hesitant to put this out there. I'm not worried about it
> getting shot down - that's kind of the point here. I'm just pretty
> strongly against to unnecessary syntactical additions to the language.
> This though, I think I can except. It introduces no new concepts and
> requires no special knowledge to use. There's no question about what
> is going on when you read it.
>
> --
> Paul Ferrell
> pfl...@gmail.com
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] NAN handling in the statistics module

2019-01-06 Thread Nathaniel Smith
On Sun, Jan 6, 2019 at 11:06 PM Steven D'Aprano  wrote:
> I'm not wedded to the idea that the default ought to be the current
> behaviour. If there is a strong argument for one of the others, I'm
> listening.

"Errors should never pass silently"? Silently returning nonsensical
results is hard to defend as a default behavior IMO :-)

> How would you answer those who say that the right behaviour is not to
> propogate unwanted NANs, but to fail fast and raise an exception?

Both seem defensible a priori, but every other mathematical operation
in Python propagates NaNs instead of raising an exception. Is there
something unusual about median that would justify giving it unusual
behavior?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] 回复:Python-ideas Digest, Vol 146, Issue 13

2019-01-05 Thread Nathaniel Smith
On Sat, Jan 5, 2019 at 9:13 PM Moon丶sun  wrote:
>
> Thanks for your reply.
> But the answer is not I except, I will show you some examples to explain what 
> result I except:
>
> @contextmanager
> def cm():
> print('open file')
> yield
> print('close file')
> with cm():
> 1/0
>
> If I use a contextmanager ,I except it can help me to close the file 
> anytime,even raise an error,
> but if I define a function with @contextmanager like the example which I have 
> showed for you,
> it will never print('close file')
>
> I can only modify it like this:
> @contextmanager
> def cm():
> try:
> print('open file')
> yield
> except Exception as e:
> print('Error',e)
> finally:
> print('close file')
>
> It is not friendly for us to use it, so I modify the contextlib to fix it,you 
> can catch it from the e-mail attachment.
> It's in the line 79 and line 97

This is intentional, and can't be changed without breaking lots of code.

With your version, there's no way for the context manager to catch or
modify the exception, which is a common use case. For example, here's
a context manager I wrote recently:

@contextmanager
def catch_and_log(exctype):
try:
yield
except exctype:
log.exception(...)

This can't be done using your version.

Of course you can have your own version of @contextmanager that works
however you prefer.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] struct.unpack should support open files

2018-12-26 Thread Nathaniel Smith
On Wed, Dec 26, 2018, 02:19 Andrew Svetlov 
> Also I'm thinking about type annotations in typeshed.
> Now the type is Union[array[int], bytes, bytearray, memoryview]
> Should it be Union[io.BinaryIO, array[int], bytes, bytearray, memoryview] ?
>

Yeah, trying to support both buffers and file-like objects in the same
function seems like a clearly bad idea. If we do this at all it should be
by adding new convenience functions/methods that take file-like objects
exclusively, like the ones several people posted on the thread.

I don't really have an opinion on whether this is worth doing at all. I
guess I can think of some arguments against: Packing/unpacking multiple
structs to the same file-like object may be less efficient than using a
single buffer + a single call to read/write. And it's unfortunate that the
obvious pack_into/unpack_from names are already taken. And it's only 2
lines of code to write your own helpers. But none of these are particularly
strong arguments either, and clearly some people would find them handy.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] [asyncio] Suggestion for a major PEP

2018-12-16 Thread Nathaniel Smith
If you want this style of concurrency, you don't need to write a PEP,
just 'pip install gevent' :-)

But unfortunately you're years too late to argue for making asyncio
work this way. This was discussed extensively at the time, and the
decision to use special syntax was made intentionally, and after
studying existing systems like gevent that made the other choice.

This section of the trio docs explain why explicit async/await syntax
makes life easier for developers:
https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints

It's also awkward but very doable to support both sync and async mode
with a single code base: https://github.com/python-trio/unasync/

In fact, when doing this, the async/await syntax isn't really the hard
part – the hard part is that different libraries have very different
networking APIs. E.g., the stdlib socket API and the stdlib asyncio
API are totally different.

-n
On Sun, Dec 16, 2018 at 12:21 AM Christophe Bailly  wrote:
>
> Hello,
>
> I copy paste the main idea from an article I have written:
> contextual async
>
> "
>
> Imagine you have some code written for monothread. And you want to include 
> your code in a multithread environment.  Do you need to adapt all your code 
> which is what you do when you want to migrate to async code ? The answer is 
> no.
>
> Functionnally these constraints are not justified neither technically
>
> Do we have the tools to do this ? Yes because thanks to boost::context we can 
> switch context between tasks. When a task suspends, it just calls a function 
> (the event loop or reactor) to potentially switch to another task. Just like 
> threads switch contexts…
>
> Async/Await logic has introduced a symetric relation wich introduces 
> unnecessary contraints. We should just the same logic as thread logic.
>
> "
>
> Read the examples in the article I have developped a prototype in C++ and 
> everything works perfectly.
>
> My opinion is that sooner or later, it will have to switch to this logic 
> because chaining async/aswait is a huge contraints and does not make sense in 
> my opinion.
>
> Maybe I am missing something,
>
> Feel free to give me your feedback.
>
> Regards,
>
>
> Chris
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Using sha512 instead of md5 on python.org/downloads

2018-12-07 Thread Nathaniel Smith
On Fri, Dec 7, 2018 at 3:38 PM Steven D'Aprano  wrote:

> On Fri, Dec 07, 2018 at 01:25:19PM -0800, Nathaniel Smith wrote:
>
> > For this specific purpose, md5 is just as good as a proper hash. But all
> > else being equal, it would still be better to use a proper hash, just so
> > people don't have to go through the whole security analysis to check
> that.
>
> I don't understand what you are trying to say here about "the whole
> security analysis" to check "that". What security analysis, and
> what is "that"?
>

The analysis that people posted in this thread, demonstrating that for the
particular purpose at hand, md5 and sha-whatever are equally useful.


> It seems to me that moving to a cryptographically-secure hash would give
> many people a false sense of security, that just because the hash
> matched, the download was not only not corrupted, but not compromised as
> well. For those two purposes:
>
> - testing for accidental corruption;
> - testing for deliberate compromise;
>
> md5 and sha512 are precisely equivalent: both are sufficient for the
> first, and useless for the second. But a crypto-hash can give a false
> sense of security. The original post in this thread is evidence of that.
>

If you're worried about giving people a false sense of security, I think it
would be more effective to post a prominent notice or link describing how
people should interpret the hashes. Maybe some people see md5 and think
"ah-hah, this is their way of warning me that the hash is suitable for
defending against accidental corruption but not malicious actors", but it
must be a small minority :-). (That's certainly not what the OP thought.)
Most people will just think we're fools who don't realize or care md5 is
broken. Statistically, that's a pretty reasonable guess when you see
someone using md5.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Using sha512 instead of md5 on python.org/downloads

2018-12-07 Thread Nathaniel Smith
For this specific purpose, md5 is just as good as a proper hash. But all
else being equal, it would still be better to use a proper hash, just so
people don't have to go through the whole security analysis to check that.

Of course all else isn't equal: switching from md5 to sha-whatever would
require someone do the work. Is anyone volunteering?

On Fri, Dec 7, 2018, 11:56 Devin Jeanpierre  On Fri, Dec 7, 2018 at 10:48 AM Antoine Pitrou 
> wrote:
>
>> If the site is vulnerable to modifications, then TLS doesn't help.
>> Again: you must verify the GPG signatures (since they are produced by
>> the release manager's private key, which is *not* stored on the
>> python.org Web site).
>>
>
> This is missing the point. They were asking why not to use SHA512. The
> answer is that the hash does not provide any extra security. GPG is
> separate: even if there was no GPG signature, SHA512 would still not
> provide any extra security. That's why I said "more to the point". :P
>
> Nobody "must" verify the GPG signatures. TLS doesn't protect against
> everything, but neither does GPG. A naive user might just download a public
> GPG key from a compromised python.org and use it to verify the
> compromised release, see everything is "OK", and still be hosed.
>
> -- Devin
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Enhancing range object string displays

2018-11-19 Thread Nathaniel Smith
On Mon, Nov 19, 2018 at 6:09 PM Steven D'Aprano  wrote:
>
> On Mon, Nov 19, 2018 at 05:09:25PM -0800, danish bluecheese wrote:
> > I think it is kind of useless effort. If somebody using range() then
> > probably knows about it.
>
> For experienced users, sure, but this is an enhancement to help
> beginners who may be confused by the half-open end points.
>
> Even non-beginners may find it nice to be able to easily see the end
> points when the step size is not 1.
>
> If range objects had this, I'd use it in the REPL to check the end
> points. Sure, I could convert to a list and take a slice, but giving the
> object a nicer print output makes less work for the user.

I feel like the kind of users who would benefit the most from this are
exactly the same users who are baffled by the distinction between
str() and repr() and which one is used when, and thus would struggle
to benefit from it?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Relative Imports

2018-11-13 Thread Nathaniel Smith
On Fri, Nov 9, 2018 at 4:32 PM, danish bluecheese
 wrote:
> you are right on the lines you mentioned. Those are all working if i run it
> as a module which i do every time.
> This is somewhat unpleasant to me, especially while developing something and
> trying to test it quickly.
> I just want to be able to use same relative imports and run single file with
> `python3 test_main.py` for example.
> Running files as modules every time is tiring. This is my problem.

Have you tried 'python3 -m test_main'? IIRC it should be effectively
the same as 'python3 test_main.py' but with working relative imports.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Standardising ASGI as a PEP

2018-10-27 Thread Nathaniel Smith
The WSGI PEP is a bit of a funny thing, since it's a PEP that doesn't
really involve the language or stdlib. (I guess there's wsgiref, but I
don't think it being in the stdlib actually affects much these days.)

What are you hoping to accomplish by making ASGI a PEP?

-n

On Sat, Oct 27, 2018 at 4:42 PM, Andrew Godwin  wrote:
> Hi everyone,
>
> I'd like to breach the topic of standardising an asynchronous successor to
> WSGI - specifically, the ASGI specification I and a group of other Python
> web developers have been refining over the past couple of years (you can
> read more at https://asgi.readthedocs.io/en/latest/).
>
> I'm unsure of the best approach to take for this, given a couple of factors:
>
>  - Web SIG has been basically dead for at least two years and several
> maintainers I know unsubscribed from it last time as things got toxic. It
> appears to not be a good place to start this discussion, but maybe it can be
> revived?
>
>  - The specification as I would propose it is two or three parts - an
> overall interface for asynchronous servers to talk to applications and then
> a separate specification(s) of how to transport HTTP and WebSockets over
> that. Would this be multiple PEPs?
>
> I'd appreciate advice from you all on these questions as well as what you
> think the best way to even approach something like "let's add a WSGI
> successor" is.
>
> My initial approach was to go away and prove something in real-world use and
> across a variety of frameworks, and we seem to have got to that point, and
> so now I would like to start making it more official.
>
> I'm more than ready to take the specification we have and start prepping it
> to land into the PEP repo for further discussion, but I wanted to check in
> here first before jumping the gun (and besides, there's already plenty of
> specs, write ups, and reference code to discuss the merits of this).
>
> Andrew
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Contracts in python -- a report & next steps

2018-10-25 Thread Nathaniel Smith
On Thu, Oct 25, 2018, 14:44 Marko Ristin-Kaufmann 
wrote:

>
> Nathaniel Smith wrote:
>
>> In your position, I wouldn't be talking to the core devs; I'd be
>> writing blog posts to proselytize the advantages of contracts, working
>> with popular projects that are interested in adopting them, writing
>> case studies, going to meetups, submitting talks to pycons, that sort
>> of thing. If you want contracts to become a widely-used thing, you
>> have to convince people to use them. The core team doesn't have a
>> magic wand that can short-circuit that process.
>>
>
> I thought python-ideas is an open list to generally discuss ideas, not the
> core dev list (isn't that python-dev)? That's why I wrote here (after being
> forwarded from python-dev list).
>
> I agree that these efforts you mention would be worthwhile. Implementation
> of a (prototype) library and participating in the discussions on this mail
> list are the maximum effort I can put up at the moment. Maybe in 2019 I
> could manage to go to the local Python summit.
>

Python-ideas is for ideas for changing the python language or standard
library. And the subthread I was replying to was about adding contracts
support to the stdlib, so you're in the right place for that discussion. I
was trying to explain why trying to add contract support to the stdlib
doesn't make much sense right now. And if you switch your focus to trying
to recruit users and collaborators, like I recommend, then python-ideas
isn't the best place to do that. Most of your potential users aren't here!

-n

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-22 Thread Nathaniel Smith
On Sun, Oct 21, 2018 at 8:31 PM, Guido van Rossum  wrote:
> On Sun, Oct 21, 2018 at 6:08 PM Nathaniel Smith  wrote:
>> I'm not sure if this is an issue the way Queue is used in practice, but in
>> general you have to be careful with this kind of circular flow because if
>> your queue communicates backpressure (which it should) then circular flows
>> can deadlock.
>
> Nathaniel, would you be able to elaborate more on the issue of backpressure?
> I think a lot of people here are not really familiar with the concepts and
> its importance, and it changes how you have to think about queues and the
> like.

Sure.

Suppose you have some kind of producer connected to some kind of
consumer. If the producer consistently runs faster than the consumer,
what should happen? By default with queue.Queue, there's no limit on
its internal buffer, so if the producer puts, say, 10 items per
second, and the consumer only gets, say, 1 item per second, then the
internal buffer grows by 9 items per second. Basically you have a
memory leak, which will eventually crash your program. And well before
that, your latency will become terrible. How can we avoid this?

I guess we could avoid this by carefully engineering our systems to
make sure that producers always run slower than consumers, but that's
difficult and fragile. Instead, what we usually want to do is to
dynamically detect when a producer is outrunning a consumer, and apply
*backpressure*. (It's called that b/c it involves the consumer
"pushing back" against the producer.) The simplest way is to put a
limit on how large our Queue's buffer can grow, and make put() block
if it would exceed this limit. That way producers are automatically
slowed down, because they have to wait for the consumer to drain the
buffer before they can continue executing.

This simple approach also works well when you have several tasks
arranged in a pipeline like A -> B -> C, where B gets objects from A,
does some processing, and then puts new items on to C. If C is running
slow, this will eventually apply backpressure to B, which will block
in put(), and then since B is blocked and not calling get(), then A
will eventually get backpressure too. In fact, this works fine for any
acyclic network topology.

If you have a cycle though, like A -> B -> C -> A, then you at least
potentially have the risk of deadlock, where every task is blocked in
put(), and can't continue until the downstream task calls get(), but
it never will because it's blocked in put() too. Sometimes it's OK and
won't deadlock, but you need to think carefully about the details to
figure that out.

If a task gets and puts to the same queue, like someone suggested
doing for the sentinel value upthread, then that's a cycle and you
need to do some more analysis. (I guess if you have a single sentinel
value, then queue.Queue is probably OK, since the minimal buffer size
it supports is 1? So when the last thread get()s the sentinel, it
knows that there's at least 1 free space in the buffer, and can put()
it back without blocking. But if there's a risk of somehow getting
multiple sentinel values, or if Queues ever gain support for
zero-sized buffers, then this pattern could deadlock.)

There's a runnable example here:
https://trio.readthedocs.io/en/latest/reference-core.html#buffering-in-channels
And I also wrote about backpressure and asyncio here:
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#bug-1-backpressure

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Nathaniel Smith
On Sun, Oct 21, 2018, 16:48 MRAB  wrote:

> On 2018-10-21 22:30, Antoine Pitrou wrote:
> > On Sun, 21 Oct 2018 19:58:05 +0200
> > Vladimir Filipović 
> > wrote:
> >>
> >> To anticipate a couple more possible questions:
> >>
> >> - What would this proposal do about multiple producers/consumers
> >> needing to jointly decide _when_ to close the queue?
> >>
> >> Explicitly nothing.
> >>
> >> The queue's state is either closed or not, and it doesn't care who
> >> closed it. It needs to interact correctly with multiple consumers and
> >> multiple producers, but once any one piece of code closes it, the
> >> correct interaction is acting like a closed queue for everybody.
> >
> > Ah.  This is the one statement that makes me favorable to this idea.
> > When there is a single consumer, it's easy enough to send a sentinel.
> > But when there are multiple consumers, suddenly you must send exactly
> > the right number of sentinels (which means you also have to careful
> > keep track of their number, which isn't always easy).  There's some
> > delicate code doing exactly that in concurrent.futures.
> >
> You don't need more than one sentinel. When a consumer sees the
> sentinel, it just needs to put it back for the other consumers.
>

I'm not sure if this is an issue the way Queue is used in practice, but in
general you have to be careful with this kind of circular flow because if
your queue communicates backpressure (which it should) then circular flows
can deadlock.

-n

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Nathaniel Smith
Hi Vladimir,

It's great to see people revisiting these old stdlib tools. Closure
tracking is definitely a big point of awkwardness for Queues. In Trio we
started with a straight copy of threading.Queue, and this turned out to be
a major friction point for users. We just deprecated our version of Queue
and replaced it with a new design. Our new thing is probably more radical
than you want to get in the stdlib (we ended up splitting the object into
two pieces, a sender object and a receiver object), but you might find the
discussions interesting:

Manual:
https://trio.readthedocs.io/en/latest/reference-core.html#using-channels-to-pass-values-between-tasks

A more minimal proposal to add closure tracking to trio.Queue:
https://github.com/python-trio/trio/pull/573

Follow-up issue with design questions we're still thinking about (also
links to earlier design discussions):
https://github.com/python-trio/trio/issues/719

We only started shipping this last week, so we're still getting experience
with it.

-n

On Sun, Oct 21, 2018, 10:59 Vladimir Filipović  wrote:

> Hi!
>
> I originally submitted this as a pull request. Raymond Hettinger
> suggested it should be given a shakeout in python-ideas first.
>
> https://github.com/python/cpython/pull/10018
> https://bugs.python.org/issue35034
>
> --
>
> Briefly:
>
> Add a close() method to Queue, which should simplify many common uses
> of the class and reduce the space for some easy-to-make errors.
>
> Also add an __iter__() method which in conjunction with close() would
> further simplify some common use patterns.
>
> --
>
> At eye-watering length:
>
> Apologies in advance for the length of this message. This isn't a PEP
> in disguise, it's a proposal for a very small, simple and I dare
> imagine uncontroversial feature. I'm new to contributing to Python and
> after the BPO/github submission I didn't manage to come up with a
> better way to present it than this.
>
> The issue
>
> Code using threading.Queue often needs to coordinate a "work is
> finished as far as I care" state between the producing and consuming
> side. Not "work" in the task_done() sense of completion of processing
> of queue items, "work" in the simpler sense of just passing data
> through the queue.
>
> For example, a producer can be driving the communication by enqueuing
> e.g. names of files that need to be processed, and once it's enqueued
> the last filename, it can be useful to inform the consumers that no
> further names will be coming, so after they've retrieved what's
> in-flight currently, they don't need to bother waiting for any more.
> Alternatively, a consumer can be driving the communication, and may
> need to let the producers know "I'm not interested in any more, so you
> can stop wasting resources on producing and enqueuing them".
> Also, a third, coordinating component may need to let both sides know
> that "Our collective work here is done. Start wrapping it up y'all,
> but don't drop any items that are still in-flight."
>
> In practice it's probably the exception, not the rule, when any piece
> of code interacting with a Queue _doesn't_ have to either inform
> another component that its interest in transferring the data has
> ended, or watch for such information.
>
> In the most common case of producer letting consumers know that it's
> done, this is usually implemented (over and over again) with sentinel
> objects, which is at best needlessly verbose and at worst error-prone.
> A recipe for multiple consumers making sure nobody misses the sentinel
> is not complicated, but neither is it obvious the first time one needs
> to do it.
> When a generic sentinel (None or similar) isn't adequate, some
> component needs to create the sentinel object and communicate it to
> the others, which complicates code, and especially complicates
> interfaces between components that are not being developed together
> (e.g. if one of them is part of a library and expects the library-user
> code to talk to it through a Queue).
>
> In the less common cases where the producers are the ones being
> notified, there isn't even a typical solution - everything needs to be
> cooked up from scratch using synchronization primitives.
>
> --
>
> A solution
>
> Adding a close() method to the Queue that simply prohibits all further
> put()'s (with other methods acting appropriately when the queue is
> closed) would simplify a lot of this in a clean and safe way - for the
> most obvious example, multi-consumer code would not have to juggle
> sentinel objects.
>
> Adding a further __iter__() method (that would block as necessary, and
> stop its iteration once the queue is closed and exhausted) would
> especially simplify many unsophisticated consumers.
>
> This is a current fairly ordinary pattern:
>
> # Producer:
> while some_condition:
> q.put(generate_item())
> q.put(sentinel)
>
> # Consumer:
> while True:
> item = q.get()
> if item == sentinel:
> q.put(sentinel)
> break

Re: [Python-ideas] support toml for pyproject support

2018-10-08 Thread Nathaniel Smith
On Mon, Oct 8, 2018 at 2:55 AM, Steven D'Aprano  wrote:
>
> On Mon, Oct 08, 2018 at 09:10:40AM +0200, Jimmy Girardet wrote:
>> Each tool which wants to use pyproject.toml has to add a toml lib  as a
>> conditional or hard dependency.
>>
>> Since toml is now the standard configuration file format,
>
> It is? Did I miss the memo? Because I've never even heard of TOML before
> this very moment.

He's referring to PEPs 518 and 517 [1], which indeed standardize on
TOML as a file format for Python package build metadata.

I think moving anything into the stdlib would be premature though –
TOML libraries are under active development, and the general trend in
the packaging space has been to move things *out* of the stdlib (e.g.
there's repeated rumblings about moving distutils out), because the
stdlib release cycle doesn't work well for packaging infrastructure.

-n

[1] https://www.python.org/dev/peps/pep-0518/
https://www.python.org/dev/peps/pep-0517

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support parsing stream with `re`

2018-10-07 Thread Nathaniel Smith
On Sun, Oct 7, 2018 at 5:54 PM, Nathaniel Smith  wrote:
> Are you imagining something roughly like this? (Ignoring chunk
> boundary handling for the moment.)
>
> def find_double_line_end(buf):
> start = 0
> while True:
> next_idx = buf.index(b"\n", start)
> if buf[next_idx - 1:next_idx + 1] == b"\n" or buf[next_idx -
> 3:next_idx] == b"\r\n\r":
> return next_idx
> start = next_idx + 1
>
> That's much more complicated than using re.search, and on some random
> HTTP headers I have lying around it benchmarks ~70% slower too. Which
> makes sense, since we're basically trying to replicate re engine's
> work by hand in a slower language.
>
> BTW, if we only want to find a fixed string like b"\r\n\r\n", then
> re.search and bytearray.index are almost identical in speed. If you
> have a problem that can be expressed as a regular expression, then
> regular expression engines are actually pretty good at solving those
> :-)

Though... here's something strange.

Here's another way to search for the first appearance of either
\r\n\r\n or \n\n in a bytearray:

def find_double_line_end_2(buf):
idx1 = buf.find(b"\r\n\r\n")
idx2 = buf.find(b"\n\n", 0, idx1)
if idx1 == -1:
return idx2
elif idx2 == -1:
return idx1
else:
return min(idx1, idx2)

So this is essentially equivalent to our regex (notice they both pick
out position 505 as the end of the headers):

In [52]: find_double_line_end_2(sample_headers)
Out[52]: 505

In [53]: double_line_end_re = re.compile(b"\r\n\r\n|\n\n")

In [54]: double_line_end_re.search(sample_headers)
Out[54]: <_sre.SRE_Match object; span=(505, 509), match=b'\r\n\r\n'>

But, the Python function that calls bytearray.find twice is about ~3x
faster than the re module:

In [55]: %timeit find_double_line_end_2(sample_headers)
1.18 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [56]: %timeit double_line_end_re.search(sample_headers)
3.3 µs ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

The regex module is even slower:

In [57]: double_line_end_regex = regex.compile(b"\r\n\r\n|\n\n")

In [58]: %timeit double_line_end_regex.search(sample_headers)
4.95 µs ± 76.4 ns per loop (mean ± std. dev. of 7 runs, 10 loops each)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support parsing stream with `re`

2018-10-07 Thread Nathaniel Smith
On Sun, Oct 7, 2018 at 5:09 PM, Terry Reedy  wrote:
> On 10/6/2018 5:00 PM, Nathaniel Smith wrote:
>>
>> On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum  wrote:
>>>
>>> I'd like to use the re module to parse a long text file, 1GB in size. I
>>> wish
>>> that the re module could parse a stream, so I wouldn't have to load the
>>> whole thing into memory. I'd like to iterate over matches from the stream
>>> without keeping the old matches and input in RAM.
>>>
>>> What do you think?
>>
>>
>> This has frustrated me too.
>>
>> The case where I've encountered this is parsing HTTP/1.1. We have data
>> coming in incrementally over the network, and we want to find the end
>> of the headers. To do this, we're looking for the first occurrence of
>> b"\r\n\r\n" OR b"\n\n".
>>
>> So our requirements are:
>>
>> 1. Search a bytearray for the regex b"\r\n\r\n|\n\n"
>
>
> I believe that re is both overkill and slow for this particular problem.
> For O(n), search forward for \n with str.index('\n') (or .find)
> [I assume that this searches forward faster than
> for i, c in enumerate(s):
>if c == '\n': break
> and leave you to test this.]
>
> If not found, continue with next chunk of data.
> If found, look back for \r to determine whether to look forward for \n or
> \r\n *whenever there is enough data to do so.

Are you imagining something roughly like this? (Ignoring chunk
boundary handling for the moment.)

def find_double_line_end(buf):
start = 0
while True:
next_idx = buf.index(b"\n", start)
if buf[next_idx - 1:next_idx + 1] == b"\n" or buf[next_idx -
3:next_idx] == b"\r\n\r":
return next_idx
start = next_idx + 1

That's much more complicated than using re.search, and on some random
HTTP headers I have lying around it benchmarks ~70% slower too. Which
makes sense, since we're basically trying to replicate re engine's
work by hand in a slower language.

BTW, if we only want to find a fixed string like b"\r\n\r\n", then
re.search and bytearray.index are almost identical in speed. If you
have a problem that can be expressed as a regular expression, then
regular expression engines are actually pretty good at solving those
:-)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support parsing stream with `re`

2018-10-07 Thread Nathaniel Smith
On Sat, Oct 6, 2018, 18:40 Steven D'Aprano  wrote:

> The message I take from this is:
>
> - regex engines certainly can be written to support streaming data;
> - but few of them are;
> - and it is exceedingly unlikely to be able to easily (or at all)
>   retro-fit that support to Python's existing re module.
>

I don't know enough about the re module internals to make an informed guess
about the difficulty.

On a quick glance, it does seem to store most intermediate match state in
explicit structs rather than on the call stack...

-n

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support parsing stream with `re`

2018-10-06 Thread Nathaniel Smith
On Sat, Oct 6, 2018 at 2:04 PM, Chris Angelico  wrote:
> On Sun, Oct 7, 2018 at 8:01 AM Nathaniel Smith  wrote:
>>
>> On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum  wrote:
>> > I'd like to use the re module to parse a long text file, 1GB in size. I 
>> > wish
>> > that the re module could parse a stream, so I wouldn't have to load the
>> > whole thing into memory. I'd like to iterate over matches from the stream
>> > without keeping the old matches and input in RAM.
>> >
>> > What do you think?
>>
>> This has frustrated me too.
>>
>> The case where I've encountered this is parsing HTTP/1.1. We have data
>> coming in incrementally over the network, and we want to find the end
>> of the headers. To do this, we're looking for the first occurrence of
>> b"\r\n\r\n" OR b"\n\n".
>>
>> So our requirements are:
>>
>> 1. Search a bytearray for the regex b"\r\n\r\n|\n\n"
>> 2. If there's no match yet, wait for more data to arrive and try again
>> 3. When more data arrives, start searching again *where the last
>> search left off*
>>
>> The last requirement is subtle, but important. The naive approach
>> would be to rescan your whole receive buffer after each new packet
>> arrives:
>>
>> end_of_headers = re.compile(b"\r\n\r\n|\n\n")
>> while True:
>> m = end_of_headers.search(receive_buffer)
>> if m is None:
>> receive_buffer += await get_more_data_from_network()
>> # loop around and try again
>> else:
>> break
>>
>> But the code above is quadratic! If the headers are N bytes long, then
>> on each pass through the loop we perform an O(N) regex search, and we
>> do O(N) passes through the loop, so the whole thing is O(N**2). That
>> means your HTTP client-or-server can be trivially DoSed by a peer who
>> sends their headers broken into lots of small fragments.
>
> Quadratic in the size of the headers only, so you could just cap it -
> if the receive buffer is too large, just reject it. Sometimes, the
> simplest approach is the best.

But OTOH, every problem has a solution that's simple, obvious, and wrong :-).

Of course you set a cap on the header size, to prevent other kinds of
DoS (e.g. memory exhaustion). But it turns out people stuff a lot of
data into HTTP headers [1], so if the cap is large enough to support
non-malicious usage, then it's also large enough to let people DoS the
naive O(N**2) algorithm. Production-quality HTTP/1.1 parsers really do
have to use an O(N) algorithm here.

And similarly, if you're building a generic helper library for people
implementing arbitrary unknown protocols, then you can't assume their
protocols were designed to use small frames only, to avoid hitting
arbitrary limitations in Python's re module.

-n

[1] E.g., 16 KiB total header size is already enough that on my
laptop, the naive O(N**2) algorithm takes ~750 ms CPU time, versus ~16
ms for the O(N) algorithm. HTTP/2 tried hard to simplify their header
encoding scheme by putting limits on header size, but got so much
push-back that they were eventually forced to add special hacks to
allow for arbitrarily large headers – in particular, it turns out that
people use *individual cookies* that are larger than 16 KiB:
https://http2.github.io/faq/#why-the-rules-around-continuation-on-headers-frames

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support parsing stream with `re`

2018-10-06 Thread Nathaniel Smith
On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum  wrote:
> I'd like to use the re module to parse a long text file, 1GB in size. I wish
> that the re module could parse a stream, so I wouldn't have to load the
> whole thing into memory. I'd like to iterate over matches from the stream
> without keeping the old matches and input in RAM.
>
> What do you think?

This has frustrated me too.

The case where I've encountered this is parsing HTTP/1.1. We have data
coming in incrementally over the network, and we want to find the end
of the headers. To do this, we're looking for the first occurrence of
b"\r\n\r\n" OR b"\n\n".

So our requirements are:

1. Search a bytearray for the regex b"\r\n\r\n|\n\n"
2. If there's no match yet, wait for more data to arrive and try again
3. When more data arrives, start searching again *where the last
search left off*

The last requirement is subtle, but important. The naive approach
would be to rescan your whole receive buffer after each new packet
arrives:

end_of_headers = re.compile(b"\r\n\r\n|\n\n")
while True:
m = end_of_headers.search(receive_buffer)
if m is None:
receive_buffer += await get_more_data_from_network()
# loop around and try again
else:
break

But the code above is quadratic! If the headers are N bytes long, then
on each pass through the loop we perform an O(N) regex search, and we
do O(N) passes through the loop, so the whole thing is O(N**2). That
means your HTTP client-or-server can be trivially DoSed by a peer who
sends their headers broken into lots of small fragments.

Fortunately, there's an elegant and natural solution: Just save the
regex engine's internal state when it hits the end of the string, and
then when more data arrives, use the saved state to pick up the search
where we left off. Theoretically, any regex engine *could* support
this – it's especially obvious for DFA-based matchers, but even
backtrackers like Python's re could support it, basically by making
the matching engine a coroutine that can suspend itself when it hits
the end of the input, then resume it when new input arrives. Like, if
you asked Knuth for the theoretically optimal design for this parser,
I'm pretty sure this is what he'd tell you to use, and it's what
people do when writing high-performance HTTP parsers in C.

But unfortunately, in reality, re *doesn't* support this kind of
pause/resume functionality, and you can't write efficient
character-by-character algorithms in Python, so you have to use really
awkward hacks instead. For the HTTP header case, the best I've been
able to come up with is to manually analyze the regex to figure out
the maximum size string it could match (in this case, 4 bytes), and
then write a loop that tracks how long the string was before the last
time we appended new data, and on each iteration searches the
substring receive_buffer[old_length - 4 + 1:]. This is super finicky,
and especially annoying if you want to offer this as a generic API for
using regexes to deconstruct network streams. (There are a lot of
Python network libraries that have accidentally-quadratic parsers in
them.)

In practice I suspect retrofitting this functionality into 're' would
be a lot of work. But it's definitely frustrating that we have 90% of
the machinery we'd need to do things the natural/efficient way, but
then are thwarted by this arbitrary API limitation.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Debugging: some problems and possible solutions

2018-10-03 Thread Nathaniel Smith
On Wed, Oct 3, 2018 at 10:48 AM, Chris Angelico  wrote:
> On Thu, Oct 4, 2018 at 2:30 AM Anders Hovmöller  wrote:
>>
>> Nothing is a keyword in that example or in my example. My suggestion is that 
>> we could do:
>>
>> my_func(=big_array[5:20])
>>
>> And it would be compile time transformed into
>>
>> my_func(**{'big_array[5:20]': big_array[5:20]})
>>
>> and then my_func is just a normal function:
>>
>> def my_func(**kwargs):
>>  Whatever
>>
>> It's a very simple textual transformation.
>>
>
> That is not guaranteed to work. In another thread it was pointed out
> that this is merely a CPython implementation detail, NOT a language
> feature.

I'm curious where this is written down. Can you point to the relevant
part of the language spec or pronouncement or whatever it was?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] f-string "debug" conversion

2018-10-03 Thread Nathaniel Smith
On Wed, Oct 3, 2018, 03:55 Eric V. Smith  wrote:
>
> On 10/3/2018 1:40 AM, Nathaniel Smith wrote:
> > I think the way I'd do it would be:
> >
> > Step 1: Take the current "lnotab" that lets us map bytecode offsets ->
> > line numbers, and extend it with more detailed information, so that we
> > can map e.g. a CALL operation to the exact start and end positions of
> > that call expression in the source. This is free at runtime, and would
> > allow more detailed tracebacks (see [1] for an example), and more
> > detailed coverage information. It would definitely take some work to
> > thread the necessary information through the compiler infrastructure,
> > but I think this would be a worthwhile feature even without the debug()
> > use case.
> >
> > Step 2: Add a 'debug' helper function that exploits the detailed
> > information to reconstruct its call, by peeking into the calling frame
> > and finding the source for the call. Of course this would be a strange
> > and ugly thing to do for a regular function, but for a debugging helper
> > it's reasonable. So e.g. if you had the code:
> >
> >total = debug(x) + debug(y / 10)
> >
> > The output might be:
> >
> >debug:myfile.py:10: 'x' is 3
> >debug:myfile.py:10: 'y / 10' is 7
>
> I'm not positive, but isn't this what q does?

The difference is that without "step 1", there's no reliable way to
figure out the value's source text. q does it by grabbing the source
line and making some guesses based on heuristics, but e.g. in the
example here it gets confused and prints:

 0.0s : x) + q(y / 10=3
 0.0s : x) + q(y / 10=7

So you can think of this idea as (1) make it possible to implement a
reliable version of q, (2) add an built-in implementation.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] f-string "debug" conversion

2018-10-02 Thread Nathaniel Smith
On Tue, Oct 2, 2018 at 8:44 PM, David Teresi  wrote:
> print(f'{value!d}') is a lot of symbols and boilerplate to type out just
for
> a debugging statement that will be deleted later. Especially now that
> breakpoint() exists, I can't really see myself using this.
>
> I also don't see the use case of it being within an f-string, because I've
> never had to interpolate a debug string within some other string or format
> it in a fancy way. You said it yourself, taking advantage of other
f-string
> features isn't very useful in this case.
>
> If other people can find a use for it, I'd suggest making it ita own
> function -- debug(value) or something similar.

There was some discussion of this back in April:

https://mail.python.org/pipermail/python-ideas/2018-April/050113.html

I think the way I'd do it would be:

Step 1: Take the current "lnotab" that lets us map bytecode offsets -> line
numbers, and extend it with more detailed information, so that we can map
e.g. a CALL operation to the exact start and end positions of that call
expression in the source. This is free at runtime, and would allow more
detailed tracebacks (see [1] for an example), and more detailed coverage
information. It would definitely take some work to thread the necessary
information through the compiler infrastructure, but I think this would be
a worthwhile feature even without the debug() use case.

Step 2: Add a 'debug' helper function that exploits the detailed
information to reconstruct its call, by peeking into the calling frame and
finding the source for the call. Of course this would be a strange and ugly
thing to do for a regular function, but for a debugging helper it's
reasonable. So e.g. if you had the code:

  total = debug(x) + debug(y / 10)

The output might be:

  debug:myfile.py:10: 'x' is 3
  debug:myfile.py:10: 'y / 10' is 7

Or if you have a clever UI, like in an IDE or ipython, maybe it overrides
the debug() operator to print something like:

  total = debug(*x*) + debug(y / 10)
 ^ *3*

  total = debug(x) + debug(*y / 10*)
  ^^^ *7*

(for anyone for whom the rendering is borked: on my screen the "x" on the
first line and the "y / 10" on the second line are highlighted in a
different font, and the carets draw an underline beneath them.)

-n

[1] https://mail.python.org/pipermail/python-ideas/2018-April/050137.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Suggestion: Extend integers to include iNaN

2018-09-29 Thread Nathaniel Smith
On Fri, Sep 28, 2018 at 11:31 PM, Steve Barnes  wrote:
> One specific use case that springs to mind would be for Libraries such
> as Pandas to return iNaN for entries that are not numbers in a column
> that it has been told to treat as integers.

Pandas doesn't use Python objects to store integers, though; it uses
an array of unboxed machine integers.

In places where you can use Python objects to represent numbers, can't
you just use float("nan") instead of iNaN?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Retire or reword the "Beautiful is better than ugly" Zen clause

2018-09-13 Thread Nathaniel Smith
On Thu, Sep 13, 2018 at 9:13 AM, Mark E. Haase  wrote:
> On Thu, Sep 13, 2018 at 10:49 AM Rhodri James  wrote:
>
>> More importantly, this whole idea of banning and/or changing terminology
>> is psychologically and sociologically wrong-headed.  The moment you say "You
>> may not use that word" you create a taboo, and give the word a power that it
>> did not have before.
>
>
> Samantha posted this as a *proposal* to python-*ideas*, the mailing list
> where we purportedly discuss... umm... ideas. Samantha has not banned any
> words from Python, so let's tone down the hyperbole.
>
> These responses that assume Samantha is a troll are based on... what? Other
> posters on this list use Yandex e-mails, and nobody called those people
> trolls. And there are a lot of disagreements about ideas, and most of those
> people don't get called trolls, either. The Python CoC calls for *respect*,
> and I posit that the majority reaction to Samantha's first post has been
> disrespectful.
>
> Engage the post on the ideas—or ignore it altogether—but please don't
> automatically label newcomers with controversial ideas as trolls. Let's
> assume her proposal was made in good faith.

It's not just automatically labeling newcomers with controversial
ideas –  This is a very common tactic that online organized bigotry
groups use: invent fake "socially progressive" personas, and use them
to stir up arguments, undermine trust, split communities, etc. The
larger campaigns are pretty well documented:

http://www.slate.com/blogs/xx_factor/2014/06/16/_endfathersday_is_a_hoax_fox_news_claims_feminists_want_to_get_rid_of_father.html
https://www.buzzfeednews.com/article/ryanhatesthis/your-slip-is-showing-4chan-trolls-operation-lollipop
https://birdeemag.com/free-bleeding-thing/
https://www.dailydot.com/parsec/femcon-4chan-convention-scam/
http://www.newnownext.com/clovergender-hoax-fake-prank-pharma-bro-martin-shkreli-4chan-troll/01/2017/

Smaller-scale versions are also common – these people love to jump
into difficult conversations and try to make them more difficult.

That said, in OP's case we don't actually know either way, and even
trolls can inadvertently suggest good ideas, so we should consider the
proposal on its merits.

Applied to people, lookism is a real and honestly kind of horrifying
thing: humans who happen to be born with less symmetric faces get paid
worse, receive worse health care, all kinds of unfair things. It
wasn't too long ago that being sufficiently ugly in public was
actually illegal in many places:
https://en.wikipedia.org/wiki/Ugly_law

But even if we all agree that beautiful and ugly people should be
treated equally, I don't see how it follows that beautiful and ugly
buildings should be treated equally, or beautiful and ugly music
should be treated equally, or beautiful and ugly code should be
treated equally. The situations are totally different. Maybe there's
some connection I'm missing, and if anyone (Samantha?) has links to
deeper discussion then I'll happily take a look. But until then I'm
totally comfortable with keeping the Zen as-is. (And I'm someone
pretty far on the "SJW" side of the spectrum, and 100% in favor of
Victor's original PR.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Asynchronous friendly iterables

2018-08-20 Thread Nathaniel Smith
On Mon, Aug 20, 2018 at 12:34 AM, Simon De Greve  wrote:
> Do you mean that for loops inside an "async def" statements are always
> executed as 'async for' loops? That's what I wanted to acheive by writing
> the AsyncDict class (c.f. the CodeReview link).

The only difference between an 'async for' and a regular 'for' is that
the former works on async iterables, and the latter works on regular
iterables. So "executed as 'async for'" doesn't really mean anything,
I think? If you have an async iterable, use 'async for', and if you
have a regular iterable, use 'for'.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Asynchronous friendly iterables

2018-08-20 Thread Nathaniel Smith
On Mon, Aug 20, 2018 at 12:19 AM, Simon De Greve  wrote:
> Hello everyone,
>
> I'm quite new working with asyncio and thus maybe missing some things about
> it, but wouldn't it be quite easier to have some iterables to support async
> for loops "natively", since asyncio is now part of the Stdlib?
>
> I've tried to work with asyncio while using discord.py, and has some
> struggle with an "async for" loop on a dictionary, so I had to implement a
> new dict subclass that would just reimplement items(), keys() and values()
> functions.
>
> I think that it would be a cool improvement to implement some of those in
> some standard way. There's some code I wrote on a CodeReview thread but I
> still haven't got any feedback on it.
>
> Here's the link of the thread :
> https://codereview.stackexchange.com/questions/197551/asynchronous-dictionary-in-python

You can do this, but I don't see what it accomplishes...

Are you aware that you can use regular 'for' loops inside 'async def' functions?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Make "yield" inside a with statement a SyntaxError

2018-08-08 Thread Nathaniel Smith
On Tue, Aug 7, 2018 at 11:14 PM, Ken Hilton  wrote:
> This mostly springs off of a comment I saw in some thread.
>
> The point of a with statement is that it ensures that some resource will be
> disposed of, yes? For example, this:
>
> with open(filename) as f:
> contents = f.read()
>
> is better than this:
>
> contents = open(filename).read()
>
> because the former definitely closes the file while the latter relies on
> garbage collection?
>
> The point of a yield expression is to suspend execution. This is nice for
> efficient looping because instead of having to hold all results in memory,
> each result can be consumed immediately, yes? Therefore this:
>
> def five_to_one():
> for i in range(4):
> yield 5 - i
>
> is better than this:
>
> def five_to_one():
> result = []
> for i in range(4):
> result.append(5 - i)
> return result
>
> because the former suspends execution of "five_to_one" while the latter
> holds all five results in memory?
>
> Now, let's take a look at the following scenario:
>
> def read_multiple(*filenames):
> for filename in filenames:
> with open(filename) as f:
> yield f.read()
>
> Can you spot the problem? The "with open(filename)" statement is supposed to
> ensure that the file object is disposed of properly. However, the "yield
> f.read()" statement suspends execution within the with block, so if this
> happened:
>
> for contents in read_multiple('chunk1', 'chunk2', 'chunk3'):
> if contents == 'hello':
> break
>
> and the contents of "chunk2" were "hello" then the loop would exit, and
> "chunk2" would never be closed! Yielding inside a with block, therefore,
> doesn't make sense and can only lead to obscure bugs.

This is a real problem. (Well, technically the 'with' block's __exit__
function *will* eventually close the file, when the generator is
garbage-collected – see PEP 342 for details – but this is not exactly
satisfying, because the whole purpose of the 'with' block is to close
the file *without* relying on the garbage collector.)

Unfortunately, your proposal for solving it is a non-starter -- there
are lots of cases where 'yield' inside a 'with' block is not only
used, but is clearly the right thing to do. A notable one is when
defining a next contextmanager in terms of a pre-existing
contextmanager:

@contextmanager
def tempfile():
# This is an insecure way of making a temp file but good enough
for an example
tempname = pick_random_filename()
with open(tempname, "w") as f:
yield f

Here are some links for previous discussions around these kinds of
issues, none of which have really gone anywhere but might help you get
a sense of the landscape of options:

https://www.python.org/dev/peps/pep-0533/
https://www.python.org/dev/peps/pep-0521/
https://www.python.org/dev/peps/pep-0568/
https://github.com/python-trio/trio/issues/264

One small step that might be doable would be to start issuing
ResourceWarning whenever a generator that was suspended inside a
'with' or 'try' block is GC'ed.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Adding Python interpreter info to "pip install"

2018-07-19 Thread Nathaniel Smith
On Thu, Jul 19, 2018 at 5:45 PM, Al Sweigart  wrote:
> The goal of this idea is to make it easier to find out when someone has
> installed packages for the wrong python installation. I'm coming across
> quite a few StackOverflow posts and emails where beginners are using pip to
> install a package, but then finding they can't import it because they have
> multiple python installations and used the wrong pip.

This sounds like a great idea to me, but pip is developer separately
from python itself, and I don't think the pip maintainers monitor
python-ideas. I'd suggest filing a feature request on the pip tracker:

https://github.com/pypa/pip/issues/new?template=feature-request.md

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-18 Thread Nathaniel Smith
On Wed, Jul 18, 2018 at 11:49 AM, Stephan Houben  wrote:
> Basically, what I am suggesting is a direct translation of Javascript's
> Web Worker API
> (https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API)
> to Python.
>
> The Web Worker API is generally considered a "share-nothing" approach,
> although
> as we will see some state can be shared.
>
> The basic principle is that any object lives in a single Worker (Worker =
> subinterpreter).
> If a message is send from Worker A to Worker B, the message is not shared,
> rather the so-called "structured clone" algorithm is used to create
> recursively a NEW message
> object in Worker B. This is roughly equivalent to pickling in A and then
> unpickling in B,
>
> Of course, this may become a bottleneck if large amounts of data need to be
> communicated.
> Therefore, there is a special object type designed to provide a view upon a
> piece
> of shared memory:  SharedArrayBuffer. Notable, this only provides a view
> upon
> raw "C"-style data (ints or floats or whatever), not on Javascript objects.

Note that this everything you said here also exactly describes the
programming model for the existing 'multiprocessing' module:
"structured clone" is equivalent to how multiprocessing uses pickle to
transfer arbitrary objects, or you can use multiprocessing.Array to
get a shared view on raw "C"-style data.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Nathaniel Smith
On Sun, Jul 15, 2018 at 6:00 PM, Chris Angelico  wrote:
> On Mon, Jul 16, 2018 at 10:31 AM, Nathaniel Smith  wrote:
>> On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:
>>> * The Actor model can be used with some effort via the “multiprocessing”
>>> module, but it doesn’t seem that streamlined and forces there to be a
>>> separate OS process per line of execution, which is relatively expensive.
>>
>> What do you mean by "the Actor model"? Just shared-nothing
>> concurrency? (My understanding is that in academia it means
>> shared-nothing + every thread/process/whatever gets an associated
>> queue + queues are globally addressable + queues have unbounded
>> buffering + every thread/process/whatever is implemented as a loop
>> that reads messages from its queue and responds to them, with no
>> internal concurrency. I don't know why this particular bundle of
>> features is considered special. Lots of people seem to use it in
>> looser sense though.)
>
> Shared-nothing concurrency is, of course, the very easiest way to
> parallelize. But let's suppose you're trying to create an online
> multiplayer game. Since it's a popular genre at the moment, I'll go
> for a battle royale game (think PUBG, H1Z1, Fortnite, etc). A hundred
> people enter; one leaves. The game has to let those hundred people
> interact, which means that all hundred people have to be connected to
> the same server. And you have to process everyone's movements,
> gunshots, projectiles, etc, etc, etc, fast enough to be able to run a
> server "tick" enough times per second - I would say 32 ticks per
> second is an absolute minimum, 64 is definitely better. So what
> happens when the processing required takes more than one CPU core for
> 1/32 seconds? A shared-nothing model is either fundamentally
> impossible, or a meaningless abstraction (if you interpret it to mean
> "explicit queues/pipes for everything"). What would the "Actor" model
> do here?

"Shared-nothing" is a bit of jargon that means there's no *implicit*
sharing; your threads can still communicate, the communication just
has to be explicit. I don't know exactly what algorithms your
hypothetical game needs, but they might be totally fine in a
shared-nothing approach. It's not just for embarrassingly parallel
problems.

> Ideally, I would like to be able to write my code as a set of
> functions, then easily spin them off as separate threads, and have
> them able to magically run across separate CPUs. Unicorns not being a
> thing, I'm okay with warping my code a bit around the need for
> parallelism, but I'm not sure how best to do that. Assume here that we
> can't cheat by getting most of the processing work done with the GIL
> released (eg in Numpy), and it actually does require Python-level
> parallelism of CPU-heavy work.

If you need shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python, then yeah, you
basically need a free-threaded implementation of Python. Jython is
such an implementation. PyPy could be if anyone were interested in
funding it [1], but apparently no-one is. Probably removing the GIL
from CPython is impossible. (I'd be happy to be proven wrong.) Sorry I
don't have anything better to report.

The good news is that there are many, many situations where you don't
actually need "shared-memory threads, on multiple cores, for CPU-bound
logic, where the logic is implemented in Python". If you're in that
specific niche and don't have $100k to throw at PyPy, then I dunno, I
hear Rust is good at that sort of thing? It's frustrating for sure,
but there will always be niches where Python isn't the best choice.

-n

[1] 
https://morepypy.blogspot.com/2017/08/lets-remove-global-interpreter-lock.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] The future of Python parallelism. The GIL. Subinterpreters. Actors.

2018-07-15 Thread Nathaniel Smith
On Sun, Jul 8, 2018 at 11:27 AM, David Foster  wrote:
> * The Actor model can be used with some effort via the “multiprocessing”
> module, but it doesn’t seem that streamlined and forces there to be a
> separate OS process per line of execution, which is relatively expensive.

What do you mean by "the Actor model"? Just shared-nothing
concurrency? (My understanding is that in academia it means
shared-nothing + every thread/process/whatever gets an associated
queue + queues are globally addressable + queues have unbounded
buffering + every thread/process/whatever is implemented as a loop
that reads messages from its queue and responds to them, with no
internal concurrency. I don't know why this particular bundle of
features is considered special. Lots of people seem to use it in
looser sense though.)

> I'd like to solicit some feedback on what might be the most efficient way to
> make forward progress on efficient parallelization in Python inside the same
> OS process. The most promising areas appear to be:
>
> 1. Make the current subinterpreter implementation in Python have more
> complete isolation, sharing almost no state between subinterpreters. In
> particular not sharing the GIL. The "Interpreter Isolation" section of PEP
> 554 enumerates areas that are currently shared, some of which probably
> shouldn't be.
>
> 2. Give up on making things work inside the same OS process and rather focus
> on implementing better abstractions on top of the existing multiprocessing
> API so that the actor model is easier to program against. For example,
> providing some notion of Channels to communicate between lines of execution,
> a way to monitor the number of Messages waiting in each channel for
> throughput profiling and diagnostics, Supervision, etc. In particular I
> could do this by using an existing library like Pykka or Thespian and
> extending it where necessary.

I guess I would distinguish though between "multiple processes" and
"the multiprocessing module". The module might be at the point in its
lifecycle where starting over is at least worth considering, and one
thing I'm hoping to do with Trio is experiment with making worker
process patterns easier to work with.

But the nice thing about these two options is that subinterpreters are
basically a way to emulate multiple Python processes within a single
OS process, which means they're largely interchangeable. There are
trade-offs in terms of compatibility, how much work needs to be done,
probably speed, but if you come up with a great API based around one
model then you should be able to switch out the backend later without
affecting users. So if you want to start experimenting now, I'd use
multiple processes and plan to switch to subinterpreters later if it
turns out to make sense.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add new `Symbol` type

2018-07-05 Thread Nathaniel Smith
On Thu, Jul 5, 2018 at 1:25 PM, Flavio Curella  wrote:
>
>> What functionality does such a thing actually need?
>
> I think the requirements should be:
> * The resulting symbol behave exactly like None. IE: the symbol should not
> be an instance of object, but an instance of its own class
> * A symbol can optionally be globally unique.
> * Two symbols created by the same key must not be equal. IE: they have equal
> key, but different value
>* if we're trying to create global symbols with the same key, an
> exception is thrown
>
> This is mostly based on the Javascript spec.

I think the name "symbol" here is pretty confusing. It comes
originally from Lisp, where it's used to refer to an interned-string
data type. It's a common source of confusion even there. Then it
sounds like JS took that name, and it ended up drifting to mean
something that's almost exactly the opposite of a Lisp symbol. In
Lisp, symbols are always "global"; the whole point is that if two
different pieces of code use the same name for the same symbol then
they end up with the same object. So this is *super* confusing. I
think I see how JS ended up here [1], but the rationale really doesn't
translate to other languages.

The thing you're talking about is what Python devs call a "sentinel"
object. If your proposal is to add a sentinel type to the stdlib, then
your chance of success will be *much* higher if you use the word
"sentinel" instead of "symbol". People don't read mailing list threads
carefully, so if you keep calling it "symbol" then you'll likely spend
infinite time responding to people rushing in to critique your
proposal based on some misconception about what you're trying to do,
which is no fun at all. Honestly I'd probably start a new thread with
a new subject, ideally with an initial straw-man proposal for the
semantics of these objects.

-n

[1] What was JS thinking? Well, I'm not sure I have all the details
right, but AFAICT it's all very logical... JS objects, like Python
objects, have attributes, e.g. 'console.log' is the 'log' attribute of
the 'console' object. There's a table inside the 'console' object
mapping keys like 'log' to their corresponding values, much like a
Python object's __dict__. But a Python dict can use arbitrary objects
as keys. JS attribute tables are different: the keys are required to
be Lisp-style symbol objects: they're arbitrary strings (and only
strings), that are then interned for speed. This kind of table lookup
is exactly why Lisp invented symbols in the first place; a Lisp scope
is also a table mapping symbols to values. BUT THEN, they decided to
enhance JS to add the equivalent of special methods like Python's
__add__. Now how do you tell which attributes are ordinary attributes,
and which ones are supposed to be special? In Python of course we use
a naming convention, which is simple and works well. But in JS, by the
time they decided to do this, it was too late: people might already be
using names like "__add__" for regular attributes, and making them
special would break compatibility. In fact, *all* possible strings
were potentially already in use for ordinary attributes; there were no
names left for special attributes. SO, they decided, they needed to
expand the set of symbol objects (i.e., attribute names) to include
new values that were different from all possible strings. So now the
JS Symbol class is effectively the union of {strings, compared as
values} + {sentinels, compared by identity}. And for string
attributes, you can mostly ignore all this and pretend they're
ordinary strings and the JS interpreter will paper over the details.
So the main kind of symbol that JS devs actually have to *know* about
is the new sentinel values. And that's how the name "symbol" flipped
to mean the opposite of what it used to. See? I told you it was all
very logical.

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-28 Thread Nathaniel Smith
On Thu, Jun 28, 2018 at 2:25 PM, Andrei Kucharavy
 wrote:
>> This is indeed a serious problem. I suspect python-ideas isn't the
>> best venue for addressing it though – there's nothing here that needs
>> changes to the Python interpreter itself (I think), and the people who
>> understand this problem the best and who are most affected by it,
>> mostly aren't here.
>
> There has been localized discussion popping up among the large scientific
> package maintainers and some attempts to solve the problem at the local
> level. Until now they seemed to be winding down due to a lack of a
> large-scale citation mechanism and a discussion about what is concretely
> doable at the scale of the language is likely to finalize

Those are the people with the most motivation and expertise to solve
this, and whose buy-in you'll need on any solution. If they haven't
solved it yet themselves, then there are basically two reasons why
that happens: either because they're busy and no-one's had enough time
to work on it, or else because they're uncertain about the best path
forward. Neither of these is a problem that python-ideas can help
with. If you want to be effective here, you need to talk to them to
figure out how you can help them move forward.

If I were you, I'd try organizing a birds-of-a-feather at the next
SciPy conference, or start getting in touch with others working on
this (duecredit devs, the folks listed on that citationPEP thing,
etc.), and go from there. (Feel free to CC me if you do start up some
effort like this.)

> As for the list, reserving a __citation__/__cite__ for packages at the same
> level as __version__ is now reserved and adding a citation()/cite() function
> to the standard library seemed large enough modifications to warrant
> searching a buy-in from the maintainers and the community at large.

There isn't actually any formal method for registering special names
like __version__, and they aren't treated specially by the language.
They're just variables that happen to have a funny name. You shouldn't
start using them willy-nilly, but you don't actually have to ask
permission or anything. And it's not very likely that someone else
will come along and propose using the name __citation__ for something
that *isn't* a citation :-).

>> You'll want to check out the duecredit project:
>> https://github.com/duecredit/duecredit
>> One of the things they've thought about is the ability to track
>> citation information at a more fine-grained way than per-package – for
>> example, there might be a paper that should be cited by anyone who
>> calls a particular method (or even passes a specific argument to some
>> specific method, when that turns on some fancy algorithm).
>
>
> Due credit looks amazing - I will definitely check it out. The idea was,
> however, to bring the barrier for adoption and usage as low as possible. In
> my experience, the vast majority of Python users in academic environment who
> aren't citing the packages properly are beginners. As such they are unlikely
> to search for third-party libraries beyond those they've found and used to
> solve their specific problem.
>
>  who just assembled a pipeline based on widely-used libraries and would need
> to generate a citation list for it to pass on to their colleagues
> responsible for the paper assembly and submission.

The way to do this is to first get your solution implemented as a
third-party library and adopted by the scientific packages, and then
start thinking about whether it would make sense to move the library
into the standard library. It's relatively easy to move things into
the standard library. The hard part is making sure that you
implemented the right thing in the first place, and that's MUCH more
likely if you start out as a third-party package.

>> I'd actually like to see a more general solution that isn't restricted
>> to any one language, because multi-language analysis pipelines are
>> very common. For example, we could standardize a convention where if a
>> certain environment variable is set, then the software writes out
>> citation information to a certain location, and then implement
>> libraries that do this in multiple languages. Of course, that's a
>> "dynamic" solution that requires running the software -- which is
>> probably necessary if you want to do fine-grained citations, but it
>> might be useful to also have static metadata, e.g. as part of the
>> package metadata that goes into sdists, wheels, and on PyPI. That
>> would be a discussion for the distutils-sig mailing list, which
>> manages that metadata.
>
>
> Thanks for the reference to the distutils-sig list. I will talk to them if
> the idea gets traction here

I think you misunderstand how these lists work :-). (Which is fine --
it's actually pretty opaque and confusing if you don't already know!)
Generally, distutils-sig operates totally independently from
python-{ideas,dev} -- if you have a packaging proposal, it goes there
and not here; if you 

Re: [Python-ideas] Add a __cite__ method for scientific packages

2018-06-27 Thread Nathaniel Smith
On Wed, Jun 27, 2018 at 2:20 PM, Andrei Kucharavy
 wrote:
> To remediate to that situation, I suggest a __citation__ method associated
> to each package installation and import. Called from the __main__,
> __citation__() would scan __citation__ of all imported packages and return
> the list of all relevant top-level citations associated to the packages.
>
> As a scientific package developer working in academia, the problem is quite
> serious, and the solution seems relatively straightforward.
>
> What does Python core team think about addition and long-term maintenance of
> such a feature to the import and setup mechanisms? What do other users and
> scientific package developers think of such a mechanism for citations
> retrieval?

This is indeed a serious problem. I suspect python-ideas isn't the
best venue for addressing it though – there's nothing here that needs
changes to the Python interpreter itself (I think), and the people who
understand this problem the best and who are most affected by it,
mostly aren't here.

You'll want to check out the duecredit project:
https://github.com/duecredit/duecredit
One of the things they've thought about is the ability to track
citation information at a more fine-grained way than per-package – for
example, there might be a paper that should be cited by anyone who
calls a particular method (or even passes a specific argument to some
specific method, when that turns on some fancy algorithm).

The R world also has some prior art -- in particular I know they have
citations as part of the standard metadata in every package.

I'd actually like to see a more general solution that isn't restricted
to any one language, because multi-language analysis pipelines are
very common. For example, we could standardize a convention where if a
certain environment variable is set, then the software writes out
citation information to a certain location, and then implement
libraries that do this in multiple languages. Of course, that's a
"dynamic" solution that requires running the software -- which is
probably necessary if you want to do fine-grained citations, but it
might be useful to also have static metadata, e.g. as part of the
package metadata that goes into sdists, wheels, and on PyPI. That
would be a discussion for the distutils-sig mailing list, which
manages that metadata.

One challenge in standardizing this kind of thing is choosing a
standard way to represent citation information. Maybe CSL-JSON?
There's a lot of complexity as you dig into this, though of course one
shouldn't let the perfect be the enemy of the good...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Replacing Infinite while Loops with an Iterator: async edition

2018-06-23 Thread Nathaniel Smith
On Sat, Jun 23, 2018 at 5:58 PM, Greg Ewing  wrote:
> j...@math.brown.edu wrote:
>>
>> it would be nice if we could write an async version of this, as in ``async
>> for chunk in aiter(...)``.
>
> The time machine seems to have taken care of this:
>
> https://docs.python.org/3.6/reference/compound_stmts.html#the-async-for-statement

He's asking for an async version of the 'iter' builtin, presumably
something like:

async def aiter(async_callable, sentinel):
while True:
value = await async_callable()
if value == sentinel:
break
yield value

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Secure string disposal (maybe other inmutable seq types too?)

2018-06-22 Thread Nathaniel Smith
On Fri, Jun 22, 2018 at 6:45 PM, Steven D'Aprano  wrote:
> On Sat, Jun 23, 2018 at 01:33:59PM +1200, Greg Ewing wrote:
>> Chris Angelico wrote:
>> >Downside:
>> >You can't say "I'm done with this string, destroy it immediately".
>>
>> Also it would be hard to be sure there wasn't another
>> copy of the data somewhere from a time before you
>> got around to marking the string as sensitive, e.g.
>> in a file buffer.
>
> Don't let the perfect be the enemy of the good.

That's true, but for security features it's important to have a proper
analysis of the threat and when the mitigation will and won't work;
otherwise, you don't know whether it's even "good", and you don't know
how to educate people on what they need to do to make effective use of
it (or where it's not worth bothering).

Another issue: I believe it'd be impossible for this proposal to work
correctly on implementations with a compacting GC (e.g., PyPy),
because with a compacting GC strings might get copied around in memory
during their lifetime. And crucially, this might have already happened
before the interpreter was told that a particular string object
contained sensitive data. I'm guessing this is part of why Java and C#
use a separate type.

There's a lot of prior art on this in other languages/environments,
and a lot of experts who've thought hard about it. Python-{ideas,dev}
doesn't have a lot of security experts, so I'd very much want to see
some review of that work before we go running off designing something
ad hoc.

The PyCA cryptography library has some discussion in their docs:
https://cryptography.io/en/latest/limitations/

One possible way to move the discussion forward would be to ask the
pyca devs what kind of API they'd like to see in the interpreter, if
any.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Copy (and/or pickle) generators

2018-06-19 Thread Nathaniel Smith
You might find this useful, either to use directly or as a source of
inspiration:

https://github.com/ll/cloudpickle-generators

-n

On Tue, Jun 19, 2018, 15:55 Micheál Keane  wrote:

>
> Add a function to generator objects to copy the entire state of it:
>
> Proposed example code:
>
> game1 = complicated_game_type_thing()
>
> # Progress the game to the first decision point
> choices = game1.send(None)
>
> # Choose something
> response = get_a_response(choices)
>
> # Copy the game generator
> game2 = game1.copy()
>
> # send the same response to each game
> x = game1.send(response)
> y = game2.send(response)
>
> # verify the new set of choices is the same
> assert x == y
>
>
> History:
>
> I found this stackoverflow Q
>  
> which
> among other things linked to an in-depth explanation of why generators
> could not be pickled
>  and
> this enhancement request for 2.6  on
> the bugtracker. All the reasons given there are perfectly valid but
> they were also given nearly 10 years ago. It may be time to revisit the
> issue.
>
> I couldn't turn up any previous threads here related to this so I'm
> throwing it out for discussion.
>
>
> Use case:
>
> My work involves Monte Carlo Tree Searches of games, eventually in
> combination with tensorflow. MCTS involves repeatedly copying the state of
> a simulation to explore the potential outcomes of various choices in depth.
>
> If you're doing a game like Chess or Go, a game state is dead simple to
> summarize - you have a list of board positions with which pieces they have
> and whose turn it is.
>
> If you're doing complex games that don't have an easily summarized state
> at any given moment, you start running into problems. Think something
> along the lines of Magic the Gathering with complex turn sequences between
> players and effect resolutions being done in certain orders that are
> dependent on choices made by players, etc.
>
> Generators are an ideal way to run these types of simulations but the
> inability to copy the state of a generator makes it impossible to do this
> in MCTS.
>
> As Python is being increasingly used for data science, this use case will
> be increasingly common. Being able to copy generators will save a lot of
> work.
>
> Keep in mind, I don't necessarily propose that generators should be fully
> picklable; there are obviously a number of concerns and problems there.
> Just being able to duplicate the generator's state within the interpreter
> would be enough for my use case.
>
>
> Workarounds:
>
> The obvious choice is to refactor the simulation as an iterator that
> stores each state as something that's easily copied/pickled. It's probably
> possible but it'll require a lot of thought and code for each type of
> simulation.
>
> There's a Python2 package from 2009 called generator_tools
>  that purports to do this. I
> haven't tried it yet to see if it still works in 2.x and it appears beyond
> my skill level to port to 3.x.
>
> PyPy & Stackless Python apparently support this within certain limits?
>
>
> Thoughts?
>
>
> Washington, DC  USA
> ffaristoc...@gmail.com
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Approximately equal operator

2018-06-15 Thread Nathaniel Smith
On Fri, Jun 15, 2018 at 3:56 PM, Andre Roberge  wrote:
> * people doing heavy numerical work and wanting code as readable as possible

IME serious numerical work doesn't use approximate equality tests at
all, except in test assertions.

> * teaching mostly beginners about finite precision for floating point
> arithmetics

Given that approximate equality tests are almost never the right
solution, I would be worried that emphasizing them to beginners would
send them down the wrong path. This is already a common source of
confusion and trap for non-experts.

> * people wishing to have trigonometric functions with arguments in degrees,
> as in a current discussion on this forum.

AFAICT approximate equality checks aren't really useful for that, no.
(I also don't understand why people in that argument are so worried
about exact precision for 90° and 30° when it's impossible for all the
other angles.)

Python is *very* stingy with adding new operators; IIRC only 3 have
been added over the last ~30 years (**, //, @). I don't think ~= is
going to make it.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Fwd: Trigonometry in degrees

2018-06-12 Thread Nathaniel Smith
On Tue, Jun 12, 2018, 00:03 Stephan Houben  wrote:

> Hi all,
>
> I wrote a possible implementation of sindg:
>
> https://gist.github.com/stephanh42/336d54a53b31104b97e46156c7deacdd
>
> This code first reduces the angle to the [0,90] interval.
> After doing so, it can be observed that the simple implementation
>   math.sin(math.radians(angle))
> produces exact results for 0 and 90, and a result already rounded to
> nearest for
> 60.
>

You observed this on your system, but math.sin uses the platform libm,
which might do different things on other people's systems.


> For 30 and 45, this simple implementation is one ulp too low.
> So I special-case those to return the correct/correctly-rounded value
> instead.
> Note that this does not affect monotonicity around those values.
>

Again, monotonicity is preserved on your system, but it might not be on
others. It's not clear that this matters, but then it's not clear that any
of this matters...

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add hooks to asyncio lifecycle

2018-06-05 Thread Nathaniel Smith
Twisted's reactor API has some lifecycle hooks:

https://twistedmatrix.com/documents/18.4.0/api/twisted.internet.interfaces.IReactorCore.html#addSystemEventTrigger

My impression is that this is actually pretty awkward for twisted/asyncio
interoperability, because if you're trying to use a twisted library on top
of an asyncio loop then there's no reliable way to implement these methods.
And twisted uses these internally for things like managing its thread pool.

There is some subtlety though because in twisted, reactors can't transition
from the stopped state back into the running state, which implies an
invariant where the start and shutdown hooks can be called at most once.

Anyway, I'm not a twisted expert, but wanted to flag this so you all know
that if you're talking about adding lifecycle hooks then you know to go
talk to them and get the details.

(Trio does have some sorta-kinda analogous functionality. Specifically it
has a concept of "system tasks" that are automatically cancelled when the
main task exits, so they have a chance to do any cleanup at that point. But
trio's lifecycle model is so different that I'm not sure how helpful this
is.)

-n

On Tue, Jun 5, 2018, 05:48 Michel Desmoulin 
wrote:

> After years of playing with asyncio, I'm still having a harder time
> using it than any other async architecture around. There are a lot of
> different reasons for it, but this mail want to address one particular one:
>
> The event loop and policy can be tweaked at any time, by anyone.
>
> Now, it's hard enough to have to deal, manually, with a low-level event
> loop. But having it exposed that much, and it being that flexible means
> any code can just do whatever it wants with it, and make a mess.
>
> Several things in particular, comes to mind:
>
> - Changing the event loop policy
> - Changing the event loop
> - Spawning a new loop
> - Starting the loop
> - Stopping the loop
> - Closing the loop
>
> Now, if you want to make any serious project with it, you currently have
> to guard against all of those, especially if you want to have proper
> cleanup code, good error message and a decent debugging experience.
>
> I tried to do it for one year, and currently, it's very hard. You have a
> lot of checks to make, redundantly in a lot of places. Some things can
> only be done by providing a custom event policy/loop yourself, and, of
> course, expecting (aka documenting and praying) that it's used.
>
> For a lot of things, when it breaks, the people that haven't read the
> doc in depth will have a hard time to understand the problem after the
> fact.
>
> Sometimes, it's just that your code use somebody else code that is not
> here to read your doc anymore. Now you have to check their code to
> understand what they are doing that breaks your expectations about the
> loop / policy or workflow.
>
> Barring the creating of an entire higher level framework that everybody
> will agree on using and that makes messing up way harder, we can improve
> this situation by adding hooks to those events.
>
> I hence propose to add:
>
> - asyncio.on_change_policy(cb:Callable[[EventLoopPolicy,
> EventLoopPolicy], EventLoopPolicy])
>
> - asyncio.on_set_event_loop(cb:Callable[[EventLoop, EventLoop], EventLoop])
>
> - asyncio.on_create_event_loop(cb:Callable[[EventLoop], EventLoop])
>
> - EventLoop.on_start(cb:Callable[EventLoop])
>
> - EventLoop.on_stop(cb:Awaitable[EventLoop])
>
> - EventLoop.on_close(cb:Callable[EventLoop])
>
> - EventLoop.on_set_debug_mode(cb:Callable[[loop]])
>
> This would allow to implement safer, more robust and easier to debug
> code. E.G:
>
> - you can raise a warning stating that if somebody changes the event
> policy, it must inherit from your custom one or deal with disabled features
>
> - you can raise an exception on loop swap and forbid it, saying that
> your small script doesn't support it yet so that it's easy to understand
> the limit of your code
>
> - you can hook on the event loop life cycle to automatically get on
> board, or run clean up code, starting logging, warn that you were
> supposed to start the loop yourself, etc
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] High Precision datetime

2018-05-17 Thread Nathaniel Smith
On Thu, May 17, 2018 at 9:49 AM, Chris Barker via Python-ideas
 wrote:
> On Tue, May 15, 2018 at 11:21 AM, Rob Speer  wrote:
>>
>>
>> I'm sure that the issue of "what do you call the leap second itself" is
>> not the problem that Chris Barker is referring to. The problem with leap
>> seconds is that they create unpredictable differences between UTC and real
>> elapsed time.
>>
>> You can represent a timedelta of exactly 10^8 seconds, but if you add it
>> to the current time, what should you get? What UTC time will it be in 10^8
>> real-time seconds? You don't know, and neither does anybody else, because
>> you don't know how many leap seconds will occur in that time.
>
>
> indeed -- even if you only care about the past, where you *could* know the
> leap seconds -- they are, by their very nature, of second precision -- which
> means right before leap second occurs, your "time" could be off by up to a
> second (or a half second?)

Not really. There are multiple time standards in use. Atomic clocks
count the duration of time – from their point of view, every second is
the same (modulo relativistic effects). TAI is the international
standard based on using atomic clocks to count seconds since a fixed
starting point, at mean sea level on Earth.

Another approach is to declare that each day (defined as "the time
between the sun passing directly overhead the Greenwich Observatory
twice") is 24 * 60 * 60 seconds long. This is what UT1 does. The
downside is that since the earth's rotation varies over time, this
means that the duration of a UT1 second varies from day to day in ways
that are hard to estimate precisely.

UTC is defined as a hybrid of these two approaches: it uses the same
seconds as TAI, but every once in a while we add or remove a leap
second to keep it roughly aligned with UT1. This is the time standard
that computers use the vast majority of the time. Importantly, since
we only ever add or remove an integer number of seconds, and only at
the boundary in between seconds, UTC is defined just as precisely as
TAI.

So if you're trying to measure time using UT1 then yeah, your computer
clock is wrong all the time by up to 0.9 seconds, and we don't even
know what UT1 is more precisely than ~milliseconds. Generally it gets
slightly more accurate just after a leap second, but it's not very
precise either before or after. Which is why no-one does this.

But if you're trying to measure time using UTC, then computers with
the appropriate setup (e.g. at CERN, or in HFT data centers) routinely
have clocks accurate to <1 microsecond, and leap seconds don't affect
that at all.

The datetime module still isn't appropriate for doing precise
calculations over periods long enough to include a leap second though,
e.g. Python simply doesn't know how many seconds passed between two
arbitrary UTC timestamps, even if they were in the past.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Crazy idea: allow keywords as names in certain positions

2018-05-13 Thread Nathaniel Smith
On Sun, May 13, 2018 at 9:00 PM, Greg Ewing  wrote:
> Guido van Rossum wrote:
>>
>> Of course this would still not help for names of functions that might be
>> imported directly (do people write 'from numpy import where'?).
>
>
> Maybe things could be rigged so that if you use a reserved word
> as a name in an import statement, it's treated as a name everywhere
> else in that module. Then "from numpy import where" would Just Work.

'from numpy import *' is also common.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] High Precision datetime

2018-05-10 Thread Nathaniel Smith
You don't mention the option of allowing time.microseconds to be a
float, and I was curious about that since if it did work, then that
might be a relatively smooth extension of the current API. The highest
value you'd store in the microseconds field is 1e6, and at values
around 1e6, double-precision floating point has precision of about
1e-10:

In [8]: 1e6 - np.nextafter(1e6, 0)
Out[8]: 1.1641532182693481e-10

So that could represent values to precision of ~0.116 femtoseconds, or
116 attoseconds. Too bad. Femtosecond precision would cover a lot of
cases, if you really need attoseconds then it won't work.

-n


On Thu, May 10, 2018 at 1:30 PM, Ed Page  wrote:
> Greetings,
>
> Is there interest in a PEP for extending time, datetime / timedelta for 
> arbitrary or extended precision fractional seconds?
>
> My company designs and manufactures scientific hardware that typically 
> operate with nanoseconds -- sometimes even attoseconds -- levels of 
> precision.  We’re in the process of providing Python APIs for some of these 
> products and need  to expose the full accuracy of the data to our customers.  
> Doing so would allow developers to do things like timestamp analog 
> measurements for correlating with other events in their system, or precisely 
> schedule a future time event for correctly interoperating  with other 
> high-speed devices.
>
> The API we’ve been toying with is adding two new fields to time, datetime and 
> timedelta
> - frac_seconds (int)
> - frac_seconds_exponent (int or new SITimeUnit enum)
>
> time.microseconds would be turned into a property that wraps frac_seconds for 
> compatibility
>
> Challenges
> - Defining the new `max` or `resolution`
> - strftime / strptime.  I propose that we do nothing, just leave formatting / 
> parsing to use `microseconds` at best.  On the other hand, __str__ could just 
> specify the fractional seconds using scientific or engineering notation.
>
> Alternatives
> - My company create our own datetime library
>   - Continued fracturing of time ... ecosystem (datetime, arrow, pendulum, 
> delorean, datetime64, pandas.Timestamp – all of which offer varying degrees 
> of compatibility)
> - Add an `attosecond` field and have `microsecond` wrap this.
>   - Effectively same except hard code `frac_seconds_exponent` to lowest value
>   - The most common cases (milliseconds, microseconds) will always pay the 
> cost of using a bigint as compared to the proposal which is a "pay for what 
> you use" approach
>   - How do we define what is "good enough" precision?
> - Continue to subdivide time by adding `nanosecond` that is "nanoseconds 
> since last micosecond", `picosecond` that is "picoseconds since last 
> micnanosecond", and  `attosecond` field that is "attoseconds since last 
> picosecond"
>   - Possibly surprising API; people might expect `picosecond` to be an offset 
> since last second
>   - Messy base 10 / base 2 conversions
> - Have `frac_seconds` be a float
>   - This has precision issues.
>
> If anyone wants to have an impromptu BoF on the subject, I'm available at 
> PyCon.
>
> Thanks
> Ed Page
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Please consider skipping hidden directories in os.walk, os.fwalk, etc.

2018-05-09 Thread Nathaniel Smith
There are hidden directories, and then there are hidden directories :-). It
makes sense to me to add an option to the stdlib functions to skip
directories (and files) that the system considers hidden, so I guess that
means dotfiles on Unix and files with the hidden attribute on Windows. But
if you want "smart" matching that has special knowledge of CVS directories
and so forth, then that seems like something that would fit better as a
library on PyPI.

The rust "ignore" crate has a pretty good set of semantics, for reference.
It's not trivial, but it sure is handy :-):

https://docs.rs/ignore/0.4.2/ignore/struct.WalkBuilder.html

-n

On Tue, May 8, 2018, 00:43 Steve Barnes  wrote:

> In a lot of uses of os.walk it is desirable to skip version control
> directories, (which are usually hidden directories), to the point that
> almost all of the examples given look like:
>
> import os
> for root, dirs, files in os.walk(some_dir):
>  if 'CVS' in dirs:
>  dirs.remove('CVS')  # or .svn or .hg etc.
>  # do something...
>
> But of course there are many version control systems to the point that
> much of my personal code looks like, (note that I have to use a
> multitude of version control systems due to project requirements):
>
>
> import os
> vcs_dirs = ['.hg', '.svn', 'CSV', '.git', '.bz']  # Version control
> directory names I know
>
>
> for root, dirs, files in os.walk(some_dir):
>  for dirname in vcs_dirs:
>  dirs.remove(dirname)
>
> I am sure that I am missing many other version control systems but the
> one thing that all of the ones that I am familiar with default to
> creating their files in hidden directories. I know that the above
> sometimes hits problems on Windows if someone manually created a
> directory and you end up with abortions such as Csv\ or .SVN 
>
> Since it could be argued that hidden directories are possibly more
> common than simlinks, (especially in the Windows world of course), and
> that hidden directories have normally been hidden by someone for a
> reason it seems to make sense to me to normally ignore them in directory
> traversal.
>
> Obviously there are also occasions when it makes sense to include VCS,
> or other hidden, directories files, (e.g. "Where did all of my disk
> space go?" or "delete recursively"), so I would like to suggest
> including in the os.walk family of functions an additional parameter to
> control skipping all hidden directories - either positively or negatively.
>
> Names that spring to mind include:
>   * nohidden
>   * nohidden_dirs
>   * hidden
>   * hidden_dirs
>
> This change could be made with no impact on current behaviour by
> defaulting to hidden=True (or nohidden=False) which would just about
> ensure that no existing code is broken or quite a few bugs in existing
> code could be quietly fixed, (and some new ones introduced), by
> defaulting to this behaviour.
>
> Since the implementation of os.walk has changed to use os.scandir which
> exposes the returned file statuses in the os.DirEntry.stat() the
> overhead should be minimal.
>
> An alternative would be to add another new function, say os.vwalk(), to
> only walk visible entries.
>
> Note that a decision would have to be made on whether to include such
> filtering when topdown is False, personally I am tempted to include the
> filtering so as to maintain consistency but ignoring the filter when
> topdown is False, (or if topdown is False and the hidden behaviour is
> unspecified), might make sense if the skipping of hidden directories
> becomes the new default (then recursively removing files & directories
> would still include processing hidden items by default).
>
> If this receives a positive response I would be happy to undertake the
> effort involved in producing a PR.
> --
> Steve (Gadget) Barnes
> Any opinions in this message are my personal opinions and do not reflect
> those of my employer.
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] __dir__ in which folder is this py file

2018-05-07 Thread Nathaniel Smith
On Mon, May 7, 2018, 03:45 Steven D'Aprano <st...@pearwood.info> wrote:

> On Sun, May 06, 2018 at 09:33:03PM -0700, Nathaniel Smith wrote:
>
> > How is
> >
> > data_path = __filepath__.parent / "foo.txt"
> >
> > more distracting than
> >
> > data_path = joinpath(dirname(__file__), "foo.txt")
>
>
> Why are you dividing by a string? That's weird.
>
> [looks up the pathlib docs]
>
> Oh, that's why. It's still weird.
>
> So yes, its very distracting.
>

Well, yes, you do have to know the API to use it, and if you happen to have
learned the os.path API but not the pathlib API then of course the os.path
API will look more familiar. I'm not sure what this is supposed to prove.


> First I have to work out what __filepath__ is, then I have to remember
> the differences between all the various flavours of pathlib.Path
> and suffer a moment or two of existential dread as I try to work out
> whether or not *this* specific flavour is the one I need. This might not
> matter for heavy users of pathlib, but for casual users, it's a big,
> intimidating API with:
>
> - an important conceptual difference between pure paths and
>   concrete paths;
> - at least six classes;
>

The docs could perhaps be more beginner friendly. For casual users, the
answer is always "you want pathlib.Path".

- about 50 or so methods and properties
>

Yeah, filesystems have lots of operations. That's why before pathlib users
had to learn about os and os.path and shutil and glob and maybe some more
I'm forgetting.


> As far as performance goes, I don't think it matters that we could
> technically make pathlib imported lazily. Many people put all their
> pathname manipulations at the beginning of their script, so lazy or not,
> the pathlib module is going to be loaded *just after* startup, .
>
> For many scripts, this isn't going to matter, but for those who want to
> avoid the overhead of pathlib, making it lazy doesn't help. That just
> delays the overhead, it doesn't remove it.
>

AFAIK were two situations where laziness has been mentioned in this thread:

- my suggestion that we delay loading pathlib until someone accesses
__filepath__. I don't actually know how to implement this so it was mostly
intended to try to spur new ideas, but if we could do it, the point of the
laziness would be so that scripts that didn't use __filepath__ wouldn't pay
for it.

- Nick's observation that pathlib could load faster if it loaded fnmatch
lazily. Since this is only used for a few methods, this would benefit any
script that didn't use those methods. (And for scripts that do need
fnmatch's functionality, without pathlib they'd just be importing it
directly, so pathlib importing it isn't really an extra cost.)

It's true that laziness isn't a silver bullet, though, yeah. We should also
look for ways to speed things up.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] __dir__ in which folder is this py file

2018-05-06 Thread Nathaniel Smith
On Sun, May 6, 2018 at 8:47 PM, Nick Coghlan <ncogh...@gmail.com> wrote:
> On 7 May 2018 at 13:33, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> Spit-balling: how about __filepath__ as a
>> lazily-created-on-first-access pathlib.Path(__file__)?
>>
>> Promoting os.path stuff to builtins just as pathlib is emerging as
>> TOOWTDI makes me a bit uncomfortable.
>
> pathlib *isn't* TOOWTDI, since it takes almost 10 milliseconds to import it,
> and it introduces a higher level object-oriented abstraction that's
> genuinely distracting when you're using Python as a replacement for shell
> scripting.

Hmm, the feedback I've heard from at least some folks teaching
intro-python-for-scientists is like, "pathlib is so great for
scripting that it justifies upgrading to python 3".

How is

data_path = __filepath__.parent / "foo.txt"

more distracting than

data_path = joinpath(dirname(__file__), "foo.txt")

? And the former gives you far more power: the full Path interface,
not just 2-3 common operations.

Import times are certainly a consideration, but I'm uncomfortable with
jumping straight to adding things to builtins based on current import
times, without at least exploring options for speeding that up...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] __dir__ in which folder is this py file

2018-05-06 Thread Nathaniel Smith
Spit-balling: how about __filepath__ as a
lazily-created-on-first-access pathlib.Path(__file__)?

Promoting os.path stuff to builtins just as pathlib is emerging as
TOOWTDI makes me a bit uncomfortable.

On Sun, May 6, 2018 at 8:29 PM, Nick Coghlan  wrote:
> On 7 May 2018 at 12:35, Chris Angelico  wrote:
>>
>> On Mon, May 7, 2018 at 12:13 PM, Nick Coghlan  wrote:
>> > So I have a different suggestion: perhaps it might make sense to propose
>> > promoting a key handful of path manipulation operations to the status of
>> > being builtins?
>> >
>> > Specifically, the ones I'd have in mind would be:
>> >
>> > - dirname (aka os.path.dirname)
>> > - joinpath (aka os.path.join)
>>
>> These two are the basics of path manipulation. +1 for promoting to
>> builtins, unless pathlib becomes core (which I suspect isn't
>> happening).
>
>
> pathlib has too many dependencies to ever make the type available as a
> builtin:
>
> $ ./python -X importtime -c pass 2>&1 | wc -l
> 25
> $ ./python -X importtime -c "import pathlib" 2>&1 | wc -l
> 53
>
> It's a good way of unifying otherwise scattered standard library APIs, but
> it's overkill if all you want to do is to calculate and resolve some
> relative paths.
>
>>
>> > - abspath (aka os.path.abspath)
>>
>> Only +0.5 on this, as it has to do file system operations. It may be
>> worthwhile, instead, to promote os.path.normpath, which (like the
>> others) is purely processing the string form of the path. It'll return
>> the same value regardless of the file system.
>
>
> My rationale for suggesting abspath() over any of its component parts is
> based on a few key considerations:
>
> - "make the given path absolute" is a far more common path manipulation
> activitity than "normalise the given path" (most users wouldn't even know
> what the latter means - the only reason *I* know what it means is because I
> looked up the docs for abspath while writing my previous comment)
> - __file__ isn't always absolute (especially in __main__), so you need to be
> able to do abspath(__file__) in order to reliably apply dirname() more than
> once
> - it can stand in for both os.getcwd() (when applied to the empty string or
> os.curdir) and os.path.normpath() (when the given path is already absolute),
> so we get 3 new bits of builtin functionality for the price of one new
> builtin name
> - I don't want to read "normpath(joinpath(getcwd(), relpath))" when I could
> be reading "abspath(relpath)" instead
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Inline assignments using "given" clauses

2018-05-04 Thread Nathaniel Smith
On Fri, May 4, 2018 at 6:56 PM, Alexander Belopolsky
 wrote:
> On Fri, May 4, 2018 at 8:06 AM, Nick Coghlan  wrote:
>> ...
>> With that spelling, the three examples above would become:
>>
>> # Exactly one branch is executed here
>> if m given m = pattern.search(data):
>> ...
>> elif m given m = other_pattern.search(data)):
>> ...
>> else:
>> ...
>>
>> # This name is rebound on each trip around the loop
>> while m given m = pattern.search(remaining_data):
>> ...
>>
>> # "f(x)" is only evaluated once on each iteration
>> result = [(x, y, x/y) for x in data if y given y = f(x)]
>
> I think this is a step in the right direction.  I stayed away from the
> PEP 572 discussions because while intuitively it felt wrong, I could
> not formulate what exactly was wrong with the assignment expressions
> proposals.  This proposal has finally made me realize why I did not
> like PEP 572.  The strong expression vs. statement dichotomy is one of
> the key features that set Python apart from many other languages and
> it makes Python programs much easier to understand.  Right from the
> title, "Assignment Expressions", PEP 572 was set to destroy the very
> feature that in my view is responsible for much of Python's success.

This is what makes me uncomfortable too. As Dijkstra once wrote:

"our intellectual powers are rather geared to master static relations
and ... our powers to visualize processes evolving in time are
relatively poorly developed. For that reason we should do (as wise
programmers aware of our limitations) our utmost to shorten the
conceptual gap between the static program and the dynamic process, to
make the correspondence between the program (spread out in text space)
and the process (spread out in time) as trivial as possible." [1]

Normally, Python code strongly maps *time* onto *vertical position*:
one side-effect per line. Of course there is some specific
order-of-operations for everything inside an individual line that the
interpreter has to keep track of, but I basically never have to care
about that myself. But by definition, := involves embedding
side-effects within expressions, so suddenly I do have to care after
all. Except... for the three cases Nick wrote above, where the
side-effect occurs at the very end of the evaluation. And these also
seem to be the three cases that have the most compelling use cases
anyway. So restricting to just those three cases makes it much more
palatable to me.

(I won't comment on Nick's actual proposal, which is a bit more
complicated than those examples, since it allows things like 'if
m.group(1) given m = ...'.)

(And on another note, I also wonder if all this pent-up desire to
enrich the syntax of comprehensions means that we should add some kind
of multi-line version of comprehensions, that doesn't require the
awkwardness of explicitly accumulating a list or creating a nested
function to yield out of. Not sure what that would look like, but
people sure seem to want it.)

-n

[1] This is from "Go to statement considered harmful". Then a few
lines later he uses a sequence of assignment statements as an example,
and says that the wonderful thing about this example is that there's a
1-1 correspondence between lines and distinguishable program states,
which is also uncannily apropos.

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Auto-wrapping coroutines into Tasks

2018-05-04 Thread Nathaniel Smith
Hi all,

This is a bit of a wacky idea, but I think it might be doable and have
significant benefits, so throwing it out there to see what people
think.

In asyncio, there are currently three kinds of calling conventions for
asynchronous functions:

1) Ones which return a Future
2) Ones which return a raw coroutine object
3) Ones which return a Future, but are documented to return a
coroutine object, because we want to possibly switch to doing that in
the future and are hoping people won't depend on them returning a
Future

In practice these have slightly different semantics. For example,
types (1) and (3) start executing immediately, while type (2) doesn't
start executing until passed to 'await' or some function like
asyncio.gather. For type (1), you can immediately call
.add_done_callback:

  func_returning_future().add_done_callback(...)

while for type (2) and (3), you have to explicitly call ensure_future first:

  asyncio.ensure_future(func_returning_coro()).add_done_callback(...)

In practice, these distinctions are mostly irrelevant and annoying to
users; the only thing you can do with a raw coroutine is pass it to
ensure_future() or equivalent, and the existence of type (3) functions
means that you can't even assume that functions documented as
returning raw coroutines actually return raw coroutines, or that these
will stay the same across versions. But it is a source of confusion,
see e.g. this thread on async-sig [1], or this one [2]. It also makes
it harder to evolve asyncio, since any function documented as
returning a Future cannot take advantage of async/await syntax. And
it's forced the creation of awkward APIs like the "coroutine hook"
used in asyncio's debug mode.

Other languages with async/await, like C# and Javascript, don't have
these problems, because they don't have raw coroutine objects at all:
when you mark a function as async, that directly converts it into a
function that returns a Future (or local equivalent). So the
difference between async functions and Future-returning functions is
only relevant to the person writing the function; callers don't have
to care, and can assume that the full Future interface is always
available.

I think Python did a very smart thing in *not* hard-coding Futures
into the language, like C#/JS do. But, I also think it would be nice
if we didn't force regular asyncio users to be aware of all these
details.

So here's an idea: we add a new kind of hook that coroutine runners
can set. In async_function.__call__, it creates a coroutine object,
and then invokes this hook, which then can wrap the coroutine into a
Task (or Deferred or whatever is appropriate for the current coroutine
runner). This way, from the point of view of regular asyncio users,
*all* async functions become functions-returning-Futures (type 1
above):

async def foo():
pass

# This returns a Task running on the current loop
foo()

Of course, async loops need a way to get at the actual coroutine
objects, so we should also provide some method on async functions to
do that:

foo.__corocall__() -> returns a raw coroutine object

And as an optimization, we can make 'await ' invoke this, so
that in regular async function -> async function calls, we don't pay
the cost of setting up an unnecessary Task object:

# This
await foo(*args, **kwargs)
# Becomes sugar for:
try:
_callable = foo.__corocall__
except AttributeError:
# Fallback, so 'await function_returning_promise()' still works:
_callable = foo
_awaitable = _callable(*args, **kwargs)
await _awaitable

(So this hook is actually quite similar to the existing coroutine
hook, except that it's specifically only invoked on bare calls, not on
await-calls.)

Of course, if no coroutine runner hook is registered, then the default
should remain the same as now. This also means that common idioms
like:

loop.run_until_complete(asyncfn())

still work, because at the time asyncfn() is called, no loop is
running, asyncfn() silently returns a regular coroutine object, and
then run_until_complete knows how to handle that.

This would also help libraries like Trio that remove Futures
altogether; in Trio, the convention is that 'await asyncfn()' is
simply the only way to call asyncfn, and writing a bare 'asyncfn()' is
always a mistake – but one that is currently confusing and difficult
to detect because all it does is produce a warning ("coroutine was
never awaited") at some potentially-distant location that depends on
what the GC does. In this proposal, Trio could register a hook that
raises an immediate error on bare 'asyncfn()' calls.

This would also allow libraries built on Trio-or-similar to migrate a
function from sync->async or async->sync with a deprecation period.
Since in Trio sync functions would always use __call__, and async
functions would always use __corocall__, then during a transition
period one could use a custom object that defines both, and has one of
them emit a DeprecationWarning. This is a problem that comes up a 

Re: [Python-ideas] Objectively Quantifying Readability

2018-05-01 Thread Nathaniel Smith
On Tue, May 1, 2018, 02:55 Matt Arcidy  wrote:

>
> I am not inferring causality when creating a measure.


No, but when you assume that you can use that measure to *make* code more
readable, then you're assuming causality.

Measuring the
> temperature of a steak doesn't infer why people like it medium rare.
> It just quantifies it.
>

Imagine aliens who have no idea how cooking works decide to do a study of
steak rareness. They go to lots of restaurants, order steak, ask people to
judge how rare it was, and then look for features that could predict these
judgements.

They publish a paper with an interesting finding: it turns out that
restaurant decor is highly correlated with steak rareness. Places with
expensive leather seats and chandeliers tend to serve steak rare, while
cheap diners with sticky table tops tend to serve it well done.

(I haven't done this study, but I bet if you did then you would find this
correlation is actually true in real life!)

Now, should we conclude based on this that if we want to get rare steak,
the key is to *redecorate the dining room*? Of course not, because we
happen to know that the key thing that changes the rareness of steak is how
it's exposed to heat.

But for code readability, we don't have this background knowledge; we're
like the aliens. Maybe the readability metric in this study is like
quantifying temperature; maybe it's like quantifying how expensive the
decor is. We don't know.

(This stuff is extremely non-obvious; that's why we force
scientists-in-training to take graduate courses on statistics and
experimental design, and it still doesn't always take.)


> > And yeah, it doesn't help that they're only looking at 3 line blocks
> > of code and asking random students to judge readability – hard to say
> > how that generalizes to real code being read by working developers.
>
> Respectfully, this is practical application and not a PhD defense,  so
> it will be generated by practical coding.
>

Well, that's the problem. In a PhD defense, you can get away with this kind
of stuff; but in a practical application it has to actually work :-). And
generalizability is a huge issue.

People without statistical training tend to look at studies and worry about
how big the sample size is, but that's usually not the biggest concern; we
have ways to quantify how big your sample needs to be. There bigger problem
is whether your sample is *representative*. If you're trying to guess who
will become governor of California, then if you had some way to pick voters
totally uniformly at random, you'd only need to ask 50 or 100 of them how
they're voting to get an actually pretty good idea of what all the millions
of real votes will do. But if you only talk to Republicans, it doesn't
matter how many you talk to, you'll get a totally useless answer. Same if
you only talk to people of the same age, or who all live in the same town,
or who all have land-line phones, or... This is what makes political
polling difficult, is getting a representative sample.

Similarly, if we only look at out-of-context Java read by students, that
may or may not "vote the same way" as in-context Python read by the average
user. Science is hard :-(.

-n

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Objectively Quantifying Readability

2018-05-01 Thread Nathaniel Smith
On Mon, Apr 30, 2018 at 8:46 PM, Matt Arcidy  wrote:
> On Mon, Apr 30, 2018 at 5:42 PM, Steven D'Aprano  wrote:
>> (If we know that, let's say, really_long_descriptive_identifier_names
>> hurt readability, how does that help us judge whether adding a new kind
>> of expression will hurt or help readability?)
>
> A new feature can remove symbols or add them.  It can increase density
> on a line, or remove it.  It can be a policy of variable naming, or it
> can specifically note that variable naming has no bearing on a new
> feature.  This is not limited in application.  It's just scoring.
> When anyone complains about readability, break out the scoring
> criteria and assess how good the _comparative_ readability claim is:
> 2 vs 10?  4 vs 5?  The arguments will no longer be singularly about
> "readability," nor will the be about the question of single score for
> a specific statement.  The comparative scores of applying the same
> function over two inputs gives a relative difference.  This is what
> measures do in the mathematical sense.

Unfortunately, they kind of study they did here can't support this
kind of argument at all; it's the wrong kind of design. (I'm totally
in favor of being more evidence-based decisions about language design,
but interpreting evidence is tricky!) Technically speaking, the issue
is that this is an observational/correlational study, so you can't use
it to infer causality. Or put another way: just because they found
that unreadable code tended to have a high max variable length,
doesn't mean that taking those variables and making them shorter would
make the code more readable.

This sounds like a finicky technical complaint, but it's actually a
*huge* issue in this kind of study. Maybe the reason long variable
length was correlated with unreadability was that there was one
project in their sample that had terrible style *and* super long
variable names, so the two were correlated even though they might not
otherwise be related. Maybe if you looked at Perl, then the worst
coders would tend to be the ones who never ever used long variables
names. Maybe long lines on their own are actually fine, but in this
sample, the only people who used long lines were ones who didn't read
the style guide, so their code is also less readable in other ways.
(In fact they note that their features are highly correlated, so they
can't tell which ones are driving the effect.) We just don't know.

And yeah, it doesn't help that they're only looking at 3 line blocks
of code and asking random students to judge readability – hard to say
how that generalizes to real code being read by working developers.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Should __builtins__ have some kind of pass-through print function, for debugging?

2018-04-28 Thread Nathaniel Smith
On Sat, Apr 28, 2018 at 7:29 PM, Greg Ewing  wrote:
> but he sent it in HTML using a proportional font, which spoils the effect!

Uh...? https://vorpus.org/~njs/tmp/monospace.png

It looks like my client used "font-family: monospace", maybe yours
only understands  or something? Anyway, if anyone else is having
trouble viewing it, it seems to have come through correctly in the
archives:

https://mail.python.org/pipermail/python-ideas/2018-April/050137.html

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Should __builtins__ have some kind of pass-through print function, for debugging?

2018-04-28 Thread Nathaniel Smith
On Fri, Apr 27, 2018 at 5:58 AM, Chris Angelico  wrote:
> On Fri, Apr 27, 2018 at 9:27 PM, Steven D'Aprano 
wrote:
> I don't think this needs any specific compiler magic or making 'dp' a
> reserved name, but it might well be a lot easier to write if there
> were some compiler features provided to _all_ functions. For instance,
> column positions are currently available in SyntaxErrors, but not
> other exceptions:
>
 x = 1
 print("spam" + x)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: can only concatenate str (not "int") to str
 print("spam" : x)
>   File "", line 1
> print("spam" : x)
>  ^
> SyntaxError: invalid syntax
>
> Imagine if the TypeError could show a caret, pointing to the plus
> sign. That would require that a function store column positions, not
> just line numbers. I'm not sure how much overhead it would add, nor
> how much benefit you'd really get from those markers, but it would
> then be the same mechanic for exception tracebacks and for
> semi-magical functions like this.

Being able to add carets to tracebacks in general would be quite nice
actually. Imagine:

Traceback (most recent call last):
  File "/tmp/blah.py", line 16, in 
print(foo())
  ^
  File "/tmp/blah.py", line 6, in foo
return bar(1) + bar(2)
^^
 File "/tmp/blah.py", line 10, in bar
return baz(2 * x) / baz(2 * x + 1)
   ^^
  File "/tmp/blah.py", line 13, in baz
return 1 + 1 / (x - 4)
   ^^^
ZeroDivisionError: division by zero

This is how I report error messages in patsy[1], and people seem to
appreciate it... it would also help Python catch back up with other
languages whose error reporting has gotten much friendlier in recent years
(e.g., rust, clang).

Threading column numbers through the compiler might be tedious but AFAICT
should be straightforward in principle. (Peephole optimizations and similar
might be a bit of a puzzle, but you can do pretty crude things like saying
new_span_start = min(*old_span_starts); new_span_end = max(*old_span_ends)
and still get something that's at least useful, even if not necessarily
100% theoretically accurate.) The runtime overhead would be essentially
zero, since this would be a static table that only gets consulted when
printing tracebacks, similar to the lineno table. (Tracebacks already
preserve f_lasti.)

So I think the main issue would be the extra memory in each code object to
hold the bytecode offset -> column numbers table. We'd need some actual
numbers to judge this for real, but my guess is that the gain in
usability+friendliness would be easily worth it for 99% of users, and the
other 1% are already plotting how to add options to strip out unnecessary
things like type annotations so if it's a problem then this could be
another thing for them to add to their list – leave out these tables at
-OOO or whatever.

-n

[1] https://patsy.readthedocs.io/en/latest/overview.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Should __builtins__ have some kind of pass-through print function, for debugging?

2018-04-27 Thread Nathaniel Smith
Hi all,

This came up in passing in one of the PEP 572 threads, and I'm curious
if folks think it's a good idea or not. When debugging, sometimes you
have a somewhat complicated expression that's not working:

# Hmm, is func2() returning the right thing?
while (func1() + 2 * func2()) < func3():
...

It'd be nice to print out what func2() returns, but to do that we have
to refactor this code, which might be rather tricky in a case like
this. I think if you want to use print() directly here, the simplest
way to do that is:

while True:
tmp = func2()
print(tmp)
if not (func1() + 2 * func2()) < func3()):
break
...

Obviously this is annoying and error prone – especially for beginners,
who are the ones most likely to need to print out lots of stuff to
figure out why their code isn't working. (Chris Angelico mentioned
that he finds this to be a common problem when teaching beginners.)

There is a better way: if you define a trivial helper like:

# "debug print": prints and then returns its argument
def dp(obj):
print(repr(obj))
return obj

then the rewritten code becomes:

while (func1() + 2 * dp(func2())) < func3():
...

Of course, this is trivial -- for me or you. But the leap to first
realize that this is a useful thing, and then implement it correctly,
is really asking a lot of beginners, who by assumption are struggling
to do *anything* with Python syntax. And similarly, putting a package
on PyPI is useful (cf. the venerable 'q' package), but still adds a
significant barrier to entry: you need to be able to install packages,
and you need to add an import. In fact, I can imagine that you might
want to teach this trick even before you teach what imports are.

So, would it make sense to include a utility like this in __builtins__?

PEP 553, the breakpoint() builtin, provides some relevant precedent.
Looking at it, I see it also emphasized the value of letting IDEs
override the debugger, and I can see some similar value here: e.g.
fancy REPLs like Spyder or Jupyter could potentially capture the
objects passed to dp() and make them available for interactive viewing
(imagine if they're like a large dataframe or something).

Points to argue over if people like the general idea:

- The name: p(), dp(), debug(), debugprint(), ...?
- __str__ or __repr__? Presumably __repr__ since it's a debugging tool.
- Exact semantics: there should probably be some way to add a bit of
metadata that gets printed out, for cases like:

while (dp(func1(), "func1") + 2 * dp(func2(), "func2")) < dp(func3(), "func3"):
...

Maybe other tweaks would be useful as well.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Providing a public API for creating and parsing HTTP messages

2018-04-17 Thread Nathaniel Smith
On Mon, Apr 16, 2018 at 11:21 PM, Derek Maciel  wrote:
> The modules http.client and http.server both do a wonderful job when
> implementing HTTP clients and servers, respectively. However, occasionally
> there may be a need to create and parse HTTP messages themselves without
> needing to implement a client or server.

The way http.client/http.server are written, the code for creating and
parsing messages is very tangled up with the code for sending and
receiving data, so this wouldn't be easy to do without rewriting them
from scratch. But would you accept a third-party package?
https://h11.readthedocs.io

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] New PEP proposal -- Pathlib Module Should Contain All File Operations

2018-03-20 Thread Nathaniel Smith
On Tue, Mar 20, 2018 at 1:03 AM, Wes Turner  wrote:
> I added trio to the comparison table
> (Things are mostly just async-wrapped,
> though pathlib_not_trio does show a few missing methods?).

trio.Path is an automatically generated, exact mirror of pathlib.Path,
so I don't think it's very useful to have in your table? Also the
missing attributes are actually handled via __getattr__, so they
aren't actually missing, they're just invisible to your detection
mechanism :-)

In [21]: trio.Path("/a/b").anchor
Out[21]: '/'

In [22]: trio.Path("/a/b").name
Out[22]: 'b'

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] New PEP proposal -- Pathlib Module Should Contain All File Operations

2018-03-18 Thread Nathaniel Smith
On Sun, Mar 18, 2018 at 4:58 AM, George Fischhof  wrote:
> Of course several details could be put into it, but I think it would better
> to let the developers decide the details, because they know the environment
> and the possibilities.

That's not how PEPs work :-). Someone has to do the work of collating
contradictory feedback and making opinionated design proposals, and
the person who does that is called the PEP author.

In this case, I'd also suggest framing the PEP as a list of specific
things that should be added to pathlib.Path, with justifications for
each. If your argument is "X should be in pathlib because it's in some
other module", then that's not very compelling -- by definition it
already means we have an X, so why do we need another? I think for a
number of these cases there actually is a good answer to that
question, but your PEP has to actually provide that answer :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] New PEP proposal -- Pathlib Module Should Contain All File Operations

2018-03-17 Thread Nathaniel Smith
On Sat, Mar 17, 2018 at 10:15 AM, Stephen J. Turnbull
 wrote:
> (5) perform operations on several objects denoted by Paths at once
> (copy and its multiple operand variants),

Sure it does: Path.rename and Path.replace. I know why rename and copy
have historically been in separate modules, but the distinction is
pretty arcane and matters a lot more to implementers than it does to
users.

Similarly, it's hard to explain why we have Path.mkdir but not
Path.makedirs -- and these have historically both lived in the 'os'
module, so we can't blame it on Path being a mirror of os.path. It's
also not obvious why we should have Path.rmdir, but not Path.rmtree.

My understanding is that the point of Path is to be a convenient,
pleasant-to-use mechanism for accessing common filesystem operations.
And it does a pretty excellent job of that. But it seems obvious to me
that it's still missing a number of fairly basic operations that
people need all the time. I don't think the PEP is there yet, and we
can quibble over the details -- just copying over all the historical
decisions in shutil isn't obviously the right move (maybe it should be
Path.mkdir(include_parents=True) and Path.unlink(recursive=True)
instead of Path.makedirs and Path.rmtree?), but there's definitely
room for improvement.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Adding quantile to the statistics module

2018-03-17 Thread Nathaniel Smith
On Fri, Mar 16, 2018 at 11:19 PM, Stephen J. Turnbull
 wrote:
> PLIQUE Guillaume writes:
>
>  > That's really interesting. I did not know there were so many way to
>  > consider quantiles. Maybe we should indeed wait for numpy to take a
>  > decision on the matter and go with their default choice so we remain
>  > consistent with the ecosystem?
>
> The example of R with 9 variants baked into one function suggests that
> numpy is unlikely to come up with a single "good" choice.  If R's
> default is to Steven's taste, I would say go with that for cross-
> language consistency, and hope that numpy makes the same decision.  In
> fact, I would argue that numpy might very well make a decision for a
> default that has nice mathematical properties, while the stdlib module
> might very well prefer consistency with R's default since defaults
> will be used in the same kind of "good enough for government work"
> contexts in both languages.

NumPy already has a default and supports a number of variants. I'd
have to go digging to figure out which languages/tools use which
methods and how those match to theoretical properties, but IIRC numpy,
R, and matlab all have different defaults.

The 9 types that R supports come from a well-known review article
(Hyndman & Fan, 1996). Their docs note that Hyndman & Fan's
recommendation is different from the default, because the default was
chosen to match a previous package (S) before they read Hyndman & Fan.
It's all a bit messy.

None of this is to say that Python shouldn't have some way to compute
quantiles, but unfortunately you're not going to find TOOWTDI.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Adding quantile to the statistics module

2018-03-15 Thread Nathaniel Smith
On Thu, Mar 15, 2018 at 12:39 PM, PLIQUE Guillaume
 wrote:
> Hello everyone,
>
> Sorry if this subject has already been covered in the mailing list but I
> could not find it.
>
> My question is very simple: should the `quantile` function be added to
> python `statistics` module.

This seems like a reasonable idea to me -- but be warned that there
are actually quite a few slightly-different definitions of "quantile"
in use. R supports 9 different methods of calculating quantiles
(exposed via an interesting API: their quantile function takes a type=
argument, which is an integer between 1 and 9; the default is 7). And
there's currently an open issue at numpy discussing whether numpy
implements the right approaches:
https://github.com/numpy/numpy/issues/10736
So this would require some research to decide on which definition(s)
you wanted to support.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] New PEP proposal -- Pathlib Module Should Contain All File Operations

2018-03-14 Thread Nathaniel Smith
On Mar 12, 2018 1:57 PM, "George Fischhof"  wrote:

This PEP proposes pathlib module to be a centralized place for all
file-system related operations.


I'd find this useful for another reason that hasn't been mentioned yet:
having a single class collecting all the common/basic file operations makes
it much easier to provide a convenient async version of that interface. For
example: https://trio.readthedocs.io/en/latest/reference-io.html#trio.Path

Obviously it can never be complete, since there are always going to be
standalone functions that take paths and work on them internally (for
example in third party libraries), but the operations we're talking about
here are all pretty basic primitives.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add MutableSet.update?

2018-03-10 Thread Nathaniel Smith
On Sat, Mar 10, 2018 at 2:45 PM, Chris Barker  wrote:
> On Sat, Mar 10, 2018 at 3:24 AM, MRAB  wrote:
>>
>> On 2018-03-10 01:15, Guido van Rossum wrote:
>>>
>>> Yes, you can use the |= operator instead.
>>>
>> |= is not quite the same as .update because it rebinds,
>
>
> isn't that an "in-place operator" i.e. if it's a mutable object it should
> mutate rather than rebinding?

Normally on a mutable object, |= will mutate the object in-place AND
ALSO rebind the name to the same object it started with (like writing
'x = x'). The latter part raises an error if 'x' is not a legal target
for assignment.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Class autoload

2018-03-03 Thread Nathaniel Smith
On Sat, Mar 3, 2018 at 9:12 AM, Jamesie Pic  wrote:
>
> Hello everybody,
>
> I thought perhaps we could allow the usage of a "new" keyword to instanciate
> an object, ie:
>
>obj = new yourmodule.YourClass()
>
> In this case, it would behave the same as from yourmodule import YourClass;
> obj = YourClass(), except that it wouldn't need to be imported. This would
> also eliminate the need to manage an import list at the beginning of a
> script in most case.

The 'py' library has something like this for stdlib libraries. You
could imagine extending it to handle arbitrary auto-imports, e.g.

  import auto_import as ai

  obj = ai.yourmodule.YourClass()

The 'py' version never really caught on, but if you really like the
idea there's nothing stopping you from implementing and using
something similar today.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Consider making Decimal's context use PEP 567

2018-02-07 Thread Nathaniel Smith
On Feb 7, 2018 1:54 PM, "Neil Girdhar"  wrote:

Decimal could just pull its Context object from a context variable rather
than having to pass it in to all functions.  This would be similar to how
numpy works.


Decimal has always used a thread local context the same way numpy does, and
in 3.7 it's switching to use a PEP 567 context:

https://bugs.python.org/issue32630

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Format mini-language for lakh and crore

2018-01-28 Thread Nathaniel Smith
On Sun, Jan 28, 2018 at 5:31 PM, David Mertz  wrote:
> I actually didn't know about `locale.format("%d", 10e9, grouping=True)`.
> But it's still much less general than having the option in the
> f-string/.format() mini-language.  This is really about the formatted
> string, not necessarily about the locale.  So, e.g. I'd like to be able to
> write:
>
 print(f"In European format x is {x:,.2f}, in Indian format it is
 {x:`.2f}")
>
> I don't want the format necessarily to be some pseudo-global setting, even
> if it can get stored in thread-locals.  That said, having a locale-aware
> symbol for delimiting numbers in the format mini-language would also not be
> a bad thing.

I don't understand the format mini-language well enough to know what
would fit in, but maybe some way to (a) request localified formatting,
(b) some way to explicitly say which locale you want to use? Like if
"h" means "human friendly", it might be something like:

f"In the current locale x is {x:h.2f}, in Indian format it is {x:h(en_IN).2f}"

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Format mini-language for lakh and crore

2018-01-28 Thread Nathaniel Smith
On Sun, Jan 28, 2018 at 5:46 AM, Eric V. Smith  wrote:
> If I recall correctly, we discussed this at the time, and the problem with
> locale is that it's not thread safe. I agree that if it were, it would be
> nice to be able to use it, either with 'n', or in some other mode just for
> grouping.
>
> The underlying C setlocale()/localeconv() just isn't very friendly to this
> use case.

POSIX.1-2008 added thread-local locales (say that 3x fast); see
uselocale(3). This appears to be supported on Linux (since glibc 2.3,
which is older than all supported enterprise distros), MacOS, and the
BSDs, but not Windows. OTOH Windows, MacOS, and the BSDs all seem to
provide the non-standard sprintf_l, which takes an explicit locale to
use.

So it looks like all mainstream OSes actually make it possible to use
a specific locale to do arbitrary formatting in a thread-safe way.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support WHATWG versions of legacy encodings

2018-01-18 Thread Nathaniel Smith
On Thu, Jan 18, 2018 at 7:51 PM, Guido van Rossum  wrote:
> Can someone explain to me why this is such a controversial issue?

I guess practicality versus purity is always controversial :-)

> It seems reasonable to me to add new encodings to the stdlib that do the
> roundtripping requested in the first message of the thread. As long as they
> have new names that seems to fall under "practicality beats purity".
> (Modifying existing encodings seems wrong -- did the feature request somehow
> transmogrify into that?)

Someone did discover that Microsoft's current implementations of the
windows-* encodings matches the WHAT-WG spec, rather than the Unicode
spec that Microsoft originally wrote. So there is some argument that
the Python's existing encodings are simply out of date, and changing
them would be a bugfix. (And standards aside, it is surely going to be
somewhat error-prone if Python's windows-1252 doesn't match everyone
else's implementations of windows-1252.) But yeah, AFAICT the original
requesters would be happy either way; they just want it available
under some name.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support WHATWG versions of legacy encodings

2018-01-17 Thread Nathaniel Smith
On Wed, Jan 17, 2018 at 10:13 AM, Rob Speer  wrote:
> I'm going to push back on the idea that this should only be used for
> decoding, not encoding.
>
> The use case I started with -- showing people how to fix mojibake using
> Python -- would *only* use these codecs in the encoding direction. To fix
> the most common case of mojibake, you encode it as web-1252 and decode it as
> UTF-8 (because you got the data from someone who did the opposite).

It's also nice to be able to parse some HTML data, make a few changes
in memory, and then serialize it back to HTML. Having this crash on
random documents is rather irritating, esp. if these documents are
standards-compliant HTML as in this case.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Support WHATWG versions of legacy encodings

2018-01-11 Thread Nathaniel Smith
On Jan 11, 2018 4:05 AM, "Antoine Pitrou"  wrote:

Define "widely used".  If web-XXX is a superset of windows-XXX, then
perhaps web-XXX is "used" in the sense of "used to decode valid
windows-XXX data" (but windows-XXX could be used just as well to
decode the same data).  The question is rather: how often does web-XXX
mojibake happen?  We're well in the 2010s now and you'd hope that
mojibake doesn't happen as often as it used to in, e.g., 1998.


I'm not an expert here or anything, but from what we've been hearing it
sounds like it must be used by all standard-compliant HTML parsers. I don't
*like* the standard much, but I don't think that the stdlib should refuse
to handle standard-compliant HTML, or help users handle standard-compliant
HTML correctly, just because the HTML standard has unfortunate things in
it. We're not going to convince them to change the standard or anything.
And this whole thread started with someone said that their mojibake fixing
library is having trouble because of this, so clearly mojibake does still
exist.

Does it help if we reframe it as not that whatwg is "wrong" about
windows-1252, but rather that there is this encoding web-1252, and thanks
to an interesting quirk of history, in HTML documents the byte sequence
b'' indicates a file using this encoding? In
fact the mapping between byte sequences and character sets here is so
arbitrary that in standards-compliant HTML, the byte sequences b'', b'', and b'' *also* indicate that the file is encoded using web-1252.
(See: https://encoding.spec.whatwg.org/#names-and-labels)

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] make Connections iterable

2018-01-09 Thread Nathaniel Smith
On Jan 9, 2018 04:12, "Random832"  wrote:

On Tue, Jan 9, 2018, at 05:46, Nick Coghlan wrote:
> If you view them as comparable to subprocess pipes, then it can be
> surprising that they're not iterable when using a line-oriented
> protocol.
>
> If you instead view them as comparable to socket connections, then the
> lack of iteration support seems equally reasonable.

Sockets are files - there's no fundamental reason a stream socket using a
line-oriented protocol (which is a common enough case), or a datagram
socket, shouldn't be iterable. Why aren't they?


Supporting line iteration on sockets would require adding a whole buffering
layer, which would be a huge change in semantics. Also, due to the way the
BSD socket API works, stream and datagram sockets are the same Python type,
so which one would socket.__next__ assume? (Plus datagrams are a bit messy
anyway; you need to know the protocol's max size before you can call recv.)

I know this was maybe a rhetorical question, but this particular case does
have an answer beyond "we never did it that way before" :-).

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] make Connections iterable

2018-01-09 Thread Nathaniel Smith
On Tue, Jan 9, 2018 at 2:07 AM, Antoine Pitrou <solip...@pitrou.net> wrote:
> On Mon, 8 Jan 2018 21:22:56 -0800
> Nathaniel Smith <n...@pobox.com> wrote:
>>
>> The only documented error from multiprocessing.Connection.recv is EOFError,
>> which is basically equivalent to a StopIteration.
>
> Actually recv() can raise an OSError corresponding to any system-level
> error.
>
>> I'm surprised that multiprocessing.Connection isn't iterable -- it seems
>> like an obvious oversight.
>
> What is obvious about making a connection iterable?  It's the first
> time I see someone requesting this.

On the receive side, it's a stream of incoming objects that you fetch
one at a time until you get to the end, probably processed with a loop
like:

while True:
try:
next_message = conn.recv()
except EOFError:
break
...

Why wouldn't it be iterable?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] make Connections iterable

2018-01-08 Thread Nathaniel Smith
On Mon, Jan 8, 2018 at 7:27 PM, Amit Green  wrote:

> An argument against this API, is that any caller of recv should be doing
> error handling (i.e.: catching exceptions from the socket).
>

It's still not entirely clear, but I'm pretty sure this thread is talking
about multiprocessing.Connection objects, which don't have anything to do
with sockets. (I think. They might use sockets internally on some
platforms.)

The only documented error from multiprocessing.Connection.recv is EOFError,
which is basically equivalent to a StopIteration.

I'm surprised that multiprocessing.Connection isn't iterable -- it seems
like an obvious oversight.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Deprecate "slice" on built-ins, move it to "types"?

2017-12-28 Thread Nathaniel Smith
On Dec 28, 2017 12:10, "Joao S. O. Bueno"  wrote:

This is probably too little  to justify the compatibility breakage, but is
there a motive for the "slice" type to be on built-ins ?
(besides people forgot it there at PEP-3000 time?)

It is normally used in super-specialized cases, mostly when one is
implementing a Sequence type, and even there just for type-checking,
not to create new slice objects.


It does get called sometimes in numerical code to construct complex
indexing operations.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] a sorting protocol dunder method?

2017-12-04 Thread Nathaniel Smith
On Sun, Dec 3, 2017 at 10:48 PM, Carl Meyer  wrote:
> It'd be nice to be able to eliminate an import and have the lines of
> code and instead write that as:
>
> class BankAccount:
> def __init__(self, balance):
> self.balance = balance
>
> def __sort_key__(self):
> return self.balance

What if we added a @key_ordering decorator, like @total_ordering but
using __key__ to generate the comparisons? I know you'd have to do an
import, but usually adding things to the core language requires more
of a benefit than that :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Using an appropriate tone in emails (was: Adding a thin wrapper class around the functions in stdlib.heapq)

2017-11-27 Thread Nathaniel Smith
On Mon, Nov 27, 2017 at 7:22 PM, bunslow  wrote:
> My first submission to this list was predicated on what I'd read in PEPs --
> and many of those, since they recommend major-enough changes to require a
> PEP, have sections (often lengthy) dedicated to "what's wrong with the
> status quo". My attempt to imitate that obviously crossed some boundaries in
> retrospect, and of course now that it's brought up here I see that spinning
> it as "what can be done to make it better" is psychologically much more
> effective than "why the current way sucks" (because semantically these are
> either approximately or exactly the same). But that's where it came from, at
> least with some of my earlier threads, and I suspect the author of the topic
> message of the OP will have a similar sentiment.

To quote Brett's original email:
> So obviously Nick doesn't like the design of the heapq module. ;) And that's 
> okay! And he's totally within his rights to express the feeling that the 
> heapq module as it stands doesn't meet his needs.
> But calling it "atrocious" and so bad that it needs to be fixed "immediately" 
> as if it's a blight upon the stdlib is unnecessarily insulting to those that 
> have worked on the module.

You can and should talk about problems with the status quo! But it's
totally possible to do this without insulting anyone. Brett's talking
about tone, not content.

> (One major example I can point to is PEP 465 -- because it proposed such a
> major change to the language, literally half its text amounts to "what's
> wrong with the status quo", quantifiably and repeatedly. It was also a
> highly persuasive PEP due in no small part to its "why current things suck"
> section.)

Maybe, but you won't find the word "suck" anywhere in that section
:-). And of course, the nice thing about PEP 465 is that it's
complaining about a missing feature, which sort of by definition means
that it's not complaining about anyone in particular's work.

Nonetheless, an earlier draft of PEP 465 did inadvertently talk about
an old PEP in an overly-flippant manner, and I ended up apologizing to
the author and fixing it. (Which of course also made the PEP
stronger.) It's cool, no-one's perfect. If you think you've made a
mistake, then apologize and try to do better, that's all.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Looking for input to help with the pip situation

2017-11-14 Thread Nathaniel Smith
On Tue, Nov 14, 2017 at 12:56 AM, Paul Moore <p.f.mo...@gmail.com> wrote:
> On 14 November 2017 at 03:08, Nathaniel Smith <n...@pobox.com> wrote:
>> On Nov 13, 2017 6:47 PM, "Nick Coghlan" <ncogh...@gmail.com> wrote:
>
>>> and a pip.bat with the equivalent contents on Windows?
>>> (Bonus: maybe this would fix the problem with upgrading pip on
>>> Windows?)
>>
>> Depending on how the batch file was written, I think the answer to
>> that is "maybe":
>> https://stackoverflow.com/questions/2888976/how-to-make-bat-file-delete-it-self-after-completion/20333152#20333152
>>
>>
>> Sigh.
>
> Batch files are not suitable for this task. The wrappers have to be
> executables. See
> http://paul-moores-notes.readthedocs.io/en/latest/wrappers.html for a
> detailed analysis I did some time ago.

Ah, interesting. My reason for suggesting it in the first place
because I was hoping to avoid paying the process spawn overhead twice,
but it sounds like this specific trick is misguided all around :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Looking for input to help with the pip situation

2017-11-13 Thread Nathaniel Smith
On Nov 13, 2017 6:47 PM, "Nick Coghlan" <ncogh...@gmail.com> wrote:

On 14 November 2017 at 11:51, Nathaniel Smith <n...@pobox.com> wrote:
> What if instead of installing a standard entry point, the pip
> executable was installed as
>
> #!/bin/sh
> exec python -m pip "$@"
>
> on Unix-likes

It would technically be enough to make the shebang line
`#!/usr/bin/env python` so the interpreter used was picked up from the
environment, rather than being preconfigured at install time. However,
the problem is that you don't know for certain that that python will
actually have `pip` installed, so it might just fail with a cryptic
error instead.


This would still be a massive improvement over the status quo, which in
this situation would present a perfect simulacrum of downloading and
installing the package you asked for, except then when you start python the
import still fails.

I did think of another issue: when installing into a virtualenv, we
probably want to keep the current system, so explicit/path/bin/pip
continues to work as expected.


However, `pip` could potentially be updated with a `checkenv`
subcommand that complains if `sys.executable` and
`shutil.which('python')` don't match (and could presumably include
other checks as well).


Unfortunately by the time you know to run this command to have already
understood the problem and how to fix it :-). That said, this is probably
cheap enough we could do it automatically at startup. Does pip know whether
it's pip, pip3, etc. that invoked it? I guess sys.argv[0] should tell us?


> and a pip.bat with the equivalent contents on Windows?
> (Bonus: maybe this would fix the problem with upgrading pip on
> Windows?)

Depending on how the batch file was written, I think the answer to
that is "maybe":
https://stackoverflow.com/questions/2888976/how-to-make-bat-
file-delete-it-self-after-completion/20333152#20333152


Sigh.

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] install pip packages from Python prompt

2017-11-02 Thread Nathaniel Smith
On Wed, Nov 1, 2017 at 2:47 PM, Terry Reedy  wrote:
> When pip installs a package into site_packages, does it at any point run
> package-specific installation code?  setup.py?

Nope. That's a can of worms that we've so far avoided opening.

> More specifically, can pip
> install an IDLE extension. If so, I think installing pipgui should add
> 'x_pipgui.py' to idlelib, if it exists, and add a section to
> idlelib/config-extension.def.  Using the existing extension mechanism would
> be an alternative to patching IDLE to conditionally add pipgui to some menu.

There is a de facto standard way to do this, which is to advertise a
setuptools entrypoint:

https://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins
This is some static metadata that a plugin can include in their
package in a well known place, and that tools like pkg_resources can
then look up.

But unfortuately this hasn't been standardized, and there's currently
no way to do the lookup from the stdlib, so maybe this is not so
helpful for IDLE...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Defining an easily installable "Recommended baseline package set"

2017-11-01 Thread Nathaniel Smith
On Wed, Nov 1, 2017 at 7:41 AM, Guido van Rossum <gu...@python.org> wrote:
> Can you write 1-2 paragraphs with the argument for each?
>
> On Tue, Oct 31, 2017 at 10:01 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> - lxml

My impression (probably others are more knowledgeable) is that lxml
has more or less replaced the stdlib 'xml' package as the de facto
standard -- sort of similar to the urllib2/requests situation. AFAIK
lxml has never been proposed for stdlib inclusion and I believe the
fact that it's all in Cython would be a barrier even if the
maintainers were amenable. But it might be helpful to our users to put
a box at the top of the 'xml' docs suggesting people check out 'lxml',
similar to the one on the urllib2 docs.

>> - numpy

Numpy's arrays are a foundational data structure and de facto
standard, and would probably fit naturally in the stdlib semantically,
but for a number of logistical/implementational reasons it doesn't
make sense to merge. Probably doesn't make much difference whether
python-dev "blesses" it or not in practice, since there aren't any
real competitors inside or outside the stdlib; it'd more just be an
acknowledgement of the status quo.

>> - cryptography

Conceptually, core cryptographic operations are the kind of
functionality that you might expect to see in the stdlib, but the
unique sensitivity of crypto code makes this a bad idea. Historically
there have been a variety of basic crypto packages for Python, but at
this point IIUC the other ones are all considered
obsolete-and-potentially-dangerous and the consensus is everyone
should move to 'cryptography', so documenting that in this PEP might
help send people in the right direction.

>> - idna

This is a bit of a funny one. IDNA functionality is pretty fundamental
-- you need it to do unicode<->bytes conversions on hostnames, so
basically anyone doing networking needs it. Python ships with some
built-in IDNA functionality (as the "idna" codec), but it's using an
obsolete standard (IDNA2003, versus the current IDNA2008, see
bpo-17305), and IIRC Christian thinks the whole codec-based design is
the wrong approach... basically what we have in the stdlib has been
broken for most of a decade and there doesn't seem to be anyone
interested in fixing it. So... in the long run the stdlib support
should either be fixed or deprecated. I'm not sure which is better.
(The argument for deprecating it would be that IIUC you need to update
the tables whenever a new unicode standard comes out, and since it's a
networking thing you want to keep in sync with the rest of the world,
which is easier with a standalone library. I don't know how much this
matters in practice.) But right now, this library is just better than
the stdlib functionality, and it wouldn't hurt to document that.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Defining an easily installable "Recommended baseline package set"

2017-10-31 Thread Nathaniel Smith
On Oct 31, 2017 4:42 AM, "Nick Coghlan"  wrote:

On 31 October 2017 at 02:29, Guido van Rossum  wrote:

> What's your proposed process to arrive at the list of recommended packages?
>

I'm thinking it makes the most sense to treat inclusion in the recommended
packages list as a possible outcome of proposals for standard library
inclusion, rather than being something we'd provide a way to propose
specifically.

We'd only use it in cases where a proposal would otherwise meet the
criteria for stdlib inclusion, but the logistics of actually doing so don't
work for some reason.

Running the initial 5 proposals through that filter:

* six: a cross-version compatibility layer clearly needs to be outside the
standard library
* setuptools: we want to update this in line with the PyPA interop specs,
not the Python language version
* cffi: updates may be needed for PyPA interop specs, Python implementation
updates or C language definition updates
* requests: updates are more likely to be driven by changes in network
protocols and client platform APIs than Python language changes
* regex: we don't want two regex engines in the stdlib, transparently
replacing _sre would be difficult, and _sre is still good enough for most
purposes


Some other packages that might meet these criteria, or at least be useful
for honing them:

- lxml
- numpy
- cryptography
- idna

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add processor generation to wheel metadata

2017-10-31 Thread Nathaniel Smith
On Mon, Oct 30, 2017 at 5:45 AM, Ivan Pozdeev via Python-ideas
 wrote:
> Generally, packages are compiled for the same processor generation as the
> corresponding Python.
> But not always -- e.g. NumPy opted for SSE2 even for Py2 to work around some
> compiler bug
> (https://github.com/numpy/numpy/issues/6428).
> I was bitten by that at an old machine once and found out that there is no
> way for `pip' to have checked for that.
> Besides, performance-oriented packages like the one mentioned could probably
> benefit from newer instructions.

You should probably resend this to distutils-sig instead of
python-ideas -- that's where discussions about python packaging
happen. (Python-ideas is more for discussions about the language
itself.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] install pip packages from Python prompt

2017-10-30 Thread Nathaniel Smith
On Mon, Oct 30, 2017 at 10:25 AM, Alexander Belopolsky
 wrote:
> On Mon, Oct 30, 2017 at 11:44 AM, Nick Coghlan  wrote:
> ..
>> 3. We can't replicate it as readily in the regular REPL, since that runs
>> Python code directly in the current process, but even there I believe we
>> could potentially trigger a full process restart via execve (or the C++
>> style _execve on Windows)
>
> This exact problem is solved rather elegantly in Julia.  When you
> upgrade a package that is already loaded in the REPL, it prints a
> warning:
>
> "The following packages have been updated but were already imported:
> ... Restart Julia to use the updated versions."
>
> listing the affected packages.
>
> See 
> .

This seems like the obvious solution to me too. Pip knows exactly
which files it modified. The interpreter knows which packages have
been imported. Having the REPL provide a friendly interface that ran
pip and then compared the lists would need some coordination between
the projects but wouldn't be rocket science, and would be *much* more
new-user-friendly than the current system.

(Also, I'm kind of grossed out by the attitude that it's a good thing
to drive people away by giving a bad first impression. Sure the shell
is worth learning, but it can wait until you actually need it. If you
make people fail for opaque reasons on basic tasks then the lesson
they learn isn't "oh I need to learn the shell", it's "oh I must be
stupid / maybe girls really can't do programming / I should give up".)

If you want to support conda too then cool, conda can install a
site.py that provides a conda() builtin that uses the same machinery.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP draft: context variables

2017-10-14 Thread Nathaniel Smith
On Sat, Oct 14, 2017 at 9:53 PM, M.-A. Lemburg  wrote:
> I have a hard time seeing the advantage of having a default
> where the context at the time of execution is dependent on
> where it happens rather than where it's defined.
>
> IMO, the default should be to use the context where the line
> was defined in the code, since that matches the intuitive
> way of writing and defining code.

Of course, that's already the default: it's now regular variables and
function arguments work. The reason we have forms like 'with
decimal.localcontext', 'with numpy.errstate' is to handle the case
where you want the context value to be determined by the runtime
context when it's accessed rather than the static context where it's
accessed. That's literally the whole point.

It's not like this is a new and weird concept in Python either -- e.g.
when you raise an exception, the relevant 'except' block is determined
based on where the 'raise' happens (the runtime stack), not where the
'raise' was written:

try:
def foo():
raise RuntimeError
except RuntimeError:
print("this is not going to execute, because Python doesn't work that way")
foo()

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP draft: context variables

2017-10-07 Thread Nathaniel Smith
On Oct 7, 2017 12:20, "Koos Zevenhoven"  wrote:


​Unfortunately, we actually need a third kind of generator semantics,
something like this:

@​contextvars.caller_context
def genfunc():
assert cvar.value is the_value
yield
assert cvar.value is the_value

with cvar.assign(the_value):
gen = genfunc()

next(gen)

with cvar.assign(1234567890):
try:
next(gen)
except StopIteration:
pass

Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly
missed the reasons for this in discussions related to PEP 550. Perhaps
because we had mostly been looking at it from an async angle.


That's certainly a semantics that one can write down (and it's what the
very first version of PEP 550 did), but why do you say it's needed? What
are these reasons that were missed? Do you have a use case?

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

2017-09-12 Thread Nathaniel Smith
On Tue, Sep 12, 2017 at 1:46 PM, Eric Snow <ericsnowcurren...@gmail.com> wrote:
> On Thu, Sep 7, 2017 at 11:19 PM, Nathaniel Smith <n...@pobox.com> wrote:
>> On Thu, Sep 7, 2017 at 8:11 PM, Eric Snow <ericsnowcurren...@gmail.com> 
>> wrote:
>>> My concern is that this is a chicken-and-egg problem.  The situation
>>> won't improve until subinterpreters are more readily available.
>>
>> Okay, but you're assuming that "more libraries work well with
>> subinterpreters" is in fact an improvement. I'm asking you to convince
>> me of that :-). Are there people saying "oh, if only subinterpreters
>> had a Python API and less weird interactions with C extensions, I
>> could do "? So far they haven't exactly taken the
>> world by storm...
>
> The problem is that most people don't know about the feature.  And
> even if they do, using it requires writing a C-extension, which most
> people aren't comfortable doing.
>
>>> Other than C globals, is there some other issue?
>>
>> That's the main one I'm aware of, yeah, though I haven't looked into it 
>> closely.
>
> Oh, good.  I haven't missed something. :)  Do you know how often
> subinterpreter support is a problem for users?  I was under the
> impression from your earlier statements that this is a recurring issue
> but my understanding from mod_wsgi is that it isn't that common.

It looks like we've been averaging one bug report every ~6 months for
the last 3 years:

https://github.com/numpy/numpy/issues?utf8=%E2%9C%93=is%3Aissue%20subinterpreter%20OR%20subinterpreters

They mostly come from Jep, not mod_wsgi. (Possibly because Jep has
some built-in numpy integration.) I don't know how many people file
bugs versus just living with it or finding some workaround. I suspect
for mod_wsgi in particular they probably switch to something else --
it's not like there's any shortage of WSGI servers that avoid these
problems. And for Jep there are prominent warnings to expect problems
and suggesting workarounds:
  https://github.com/ninia/jep/wiki/Workarounds-for-CPython-Extensions

>> I guess I would be much more confident in the possibilities here if
>> you could give:
>>
>> - some hand-wavy sketch for how subinterpreter A could call a function
>> that as originally defined in subinterpreter B without the GIL, which
>> seems like a precondition for sharing user-defined classes
>
> (Before I respond, note that this is way outside the scope of the PEP.
> The merit of subinterpreters extends beyond any benefits of running
> sans-GIL, though that is my main goal.  I've been updating the PEP to
> (hopefully) better communicate the utility of subinterpreters.)

Subinterpreters are basically an attempt to reimplement the OS's
process isolation in user-space, right? Classic trade-off where we
accept added complexity and fragility in the hopes of gaining some
speed? I just looked at the PEP again, and I'm afraid I still don't
understand what the benefits are unless we can remove the GIL and
somehow get a speedup over processes. Implementing CSP is a neat idea,
but you could do it with subprocesses too. AFAICT you could implement
the whole subinterpreters module API with subprocesses on 3.6, and
it'd be multi-core and have perfect extension module support.

> Code objects are immutable so that part should be relatively
> straight-forward.  There's the question of closures and default
> arguments that would have to be resolved.  However, those are things
> that would need to be supported anyway in a world where we want to
> pass functions and user-defined types between interpreters.  Doing so
> will be a gradual process of starting with immutable non-container
> builtin types and expanding out from there to other immutable types,
> including user-defined ones.

I tried arguing that code objects were immutable to the PyPy devs too
:-). The problem is that to call a function you need both its
__code__, which is immutable, and its __globals__, which is
emphatically not. The __globals__ thing means that if you start from
an average function you can often follow pointers to reach every other
global object (e.g. if the function uses regular expressions, you can
probably reach any module by doing
func.__globals__["re"].sys.modules[...]). You might hope that you
could somehow restrict this, but I can't think of any way that's
really useful :-(.

>
> Note that sharing mutable objects between interpreters would be a
> pretty advanced usage (i.e. opt-in shared state vs. threading's
> share-everything).  If it proves desirable then we'd sort that out
> then.  However, I don't see that as a more than an esoteric feature
> relative to subinterpreters.
>
> In my mind, the key advantage of being able to share more 

Re: [Python-ideas] PEP 562

2017-09-12 Thread Nathaniel Smith
On Sep 12, 2017 7:08 AM, "Ionel Cristian Mărieș via Python-ideas" <
python-ideas@python.org> wrote:

Wouldn't a better approach be a way to customize the type of the module?
That would allow people to define behavior for almost anything (__call__,
__getattr__, __setattr__, __dir__, various operators etc). This question
shouldn't exist "why can't I customize behavior X in a module when I can do
it for a class". Why go half-way.


If you're ok with replacing the object in sys.modules then the ability to
totally customize your module's type has existed since the dawn era. And if
you're not ok with that, then it's still existed since 3.5 via the
mechanism of assigning to __class__ to change the type in-place. So this
discussion isn't about adding new functionality per se, but about trying to
find some way to provide a little bit of sugar that provides most of the
value in a less obscure way.

(And unfortunately there's a chicken and egg problem for using custom
module types *without* the __class__ assignment hack, because you can't
load any code from a package until after you've created the top level
module object. So we've kind of taken custom module types as far as they
can go already.)

-n
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] PEP 562

2017-09-10 Thread Nathaniel Smith
The main two use cases I know of for this and PEP 549 are lazy imports of
submodules, and deprecating attributes. If we assume that you only want
lazy imports to show up in dir() and don't want deprecated attributes to
show up in dir() (and I'm not sure this is what you want 100% of the time,
but it seems like the most reasonable default to me), then currently you
need one of the PEPs for one of the cases and the other PEP for the other
case.

Would it make more sense to add direct support for lazy imports and
attribute deprecation to ModuleType? This might look something like
metamodule's FancyModule type:

https://github.com/njsmith/metamodule/blob/
ee54d49100a9a06341bb10a4d3549642139f/metamodule.py#L20

-n

On Sep 10, 2017 11:49, "Ivan Levkivskyi"  wrote:

> I have written a short PEP as a complement/alternative to PEP 549.
> I will be grateful for comments and suggestions. The PEP should
> appear online soon.
>
> --
> Ivan
>
> ***
>
> PEP: 562
> Title: Module __getattr__
> Author: Ivan Levkivskyi 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 09-Sep-2017
> Python-Version: 3.7
> Post-History: 09-Sep-2017
>
>
> Abstract
> 
>
> It is proposed to support ``__getattr__`` function defined on modules to
> provide basic customization of module attribute access.
>
>
> Rationale
> =
>
> It is sometimes convenient to customize or otherwise have control over
> access to module attributes. A typical example is managing deprecation
> warnings. Typical workarounds are assigning ``__class__`` of a module
> object
> to a custom subclass of ``types.ModuleType`` or substituting
> ``sys.modules``
> item with a custom wrapper instance. It would be convenient to simplify
> this
> procedure by recognizing ``__getattr__`` defined directly in a module that
> would act like a normal ``__getattr__`` method, except that it will be
> defined
> on module *instances*. For example::
>
>   # lib.py
>
>   from warnings import warn
>
>   deprecated_names = ["old_function", ...]
>
>   def _deprecated_old_function(arg, other):
>   ...
>
>   def __getattr__(name):
>   if name in deprecated_names:
>   warn(f"{name} is deprecated", DeprecationWarning)
>   return globals()[f"_deprecated_{name}"]
>   raise AttributeError(f"module {__name__} has no attribute {name}")
>
>   # main.py
>
>   from lib import old_function  # Works, but emits the warning
>
> There is a related proposal PEP 549 that proposes to support instance
> properties for a similar functionality. The difference is this PEP proposes
> a faster and simpler mechanism, but provides more basic customization.
> An additional motivation for this proposal is that PEP 484 already defines
> the use of module ``__getattr__`` for this purpose in Python stub files,
> see [1]_.
>
>
> Specification
> =
>
> The ``__getattr__`` function at the module level should accept one argument
> which is a name of an attribute and return the computed value or raise
> an ``AttributeError``::
>
>   def __getattr__(name: str) -> Any: ...
>
> This function will be called only if ``name`` is not found in the module
> through the normal attribute lookup.
>
> The reference implementation for this PEP can be found in [2]_.
>
>
> Backwards compatibility and impact on performance
> =
>
> This PEP may break code that uses module level (global) name
> ``__getattr__``.
> The performance implications of this PEP are minimal, since ``__getattr__``
> is called only for missing attributes.
>
>
> References
> ==
>
> .. [1] PEP 484 section about ``__getattr__`` in stub files
>(https://www.python.org/dev/peps/pep-0484/#stub-files)
>
> .. [2] The reference implementation
>(https://github.com/ilevkivskyi/cpython/pull/3/files)
>
>
> Copyright
> =
>
> This document has been placed in the public domain.
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


  1   2   >