[Python-ideas] Re: Add a line_offsets() method to str

2022-06-19 Thread Jonathan Slenders
 Thanks all for all the responses! That's quite a bit to think about.

A couple of thoughts:

1. First, I do support a transition to UTF-8, so I understand we don't want
to add more methods that deal with character offsets. (I'm familiar with
how strings work in Rust.) However, does that mean we won't be
using/exposing any offset at all, or will it become possible to slice using
byte offsets?

2. The commercial application I mentioned where this is critical is
actually using bytes instead of str. Sorry for not mentioning earlier. We
were doing the following:
list(accumulate(chain([0], map(len, text.splitlines(True)
where text is a bytes object. This is significantly faster than a binary
regex for finding all universal line endings. This application is an
asyncio web app that streams Cisco show-tech files (often several
gigabytes) from a file server over HTTP; stores them chunk by chunk into a
local cache file on disk; and builds a index of byte offsets in the
meantime by running the above expression over every chunk. That way the
client web app can quickly load the lines from disk as the user scrolls
through the file. A very niche application indeed, so use of Cython would
be acceptable in this particular case. I published the relevant snippet
here to be studied:
https://gist.github.com/jonathanslenders/59ddf8fe2a0954c7f1865fba3b151868
It does handle an interesting edge case regarding UTF-16.

3. The code in prompt_toolkit can be found here:
https://github.com/prompt-toolkit/python-prompt-toolkit/blob/master/src/prompt_toolkit/document.py#L209
(It's not yet using 'accumulate' there, but for the rest it's the same.)
Also here, universal line endings support is important, because the editing
buffer can in theory contain a mix of line endings. It has to be
performant, because it executes on every key stroke. In this case, a more
complex data structure could probably solve performance issues here, but
it's really not worth the complexity that it introduces in every text
manipulation (like every key binding). Also try using the "re" library to
search over a list of lines or anything that's not a simple string.

4. I tested on 3.11.0b3. Using the splitlines() approach is still 2.5 times
faster than re. Imagine if splitlines() doesn't have to do the work to
actually create the substrings, but only has to return the offsets, that
should be even much faster and not require so much memory. (I have an
benchmark that does it one chunk at a time, to prevent using too much
memory:
https://gist.github.com/jonathanslenders/bfca8e4f318ca64e718b4085a737accf )

So talking about bytes. Would it be acceptable to have a
`bytes.line_offsets()` method instead? Or
`bytes.splitlines(return_offsets=True)`? Because byte offsets are okay, or
not? `str.splitlines(return_offsets=True)` would be very nice, but I
understand the concerns.

It's somewhat frustrating here knowing that for `splitlines()`, the
information is there, already computed, just not immediately accessible.
(without having Python do lots of unnecessary work.)

Jonathan


Le dim. 19 juin 2022 à 15:34, Jonathan Fine  a écrit :

> Hi
>
> This is a nice problem, well presented. Here's four comments / questions.
>
> 1. How does the introduction of faster CPython in Python 3.11 affect the
> benchmarks?
> 2. Is there an across-the-board change that would speedup this
> line-offsets task?
> 3. To limit splitlines memory use (at small performance cost), chunk the
> input string into say 4 kb blocks.
> 4. Perhaps anything done here for strings should also be done for bytes.
>
> --
> Jonathan
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/AETGT5HDF3QOFODOWKB4X45ZE4CZ7Y3M/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FZ7V4FFKR45YLQDHTD2JZYEWZ5HEI3P2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Add a line_offsets() method to str

2022-06-18 Thread Jonathan Slenders
Good catch! One correction here, I somewhat mixed up the benchmarks. I
forgot both projects of mine required support for universal line endings
exactly like splitlines() does this out of the box. That requires a more
complex regex pattern. I was actually using:
re.compile(r"\n|\r(?!\n)")
And then the regex becomes significantly slower than the splitlines()
solution, which is still much slower than it has to be.

This makes me realize that `str.indexes(char)` is actually not what I need,
but really a `str.line_offsets()` which returns exactly the positions that
`str.splitlines()` would use. Does that make sense?

If this is reasonable, I wouldn't mind working on the implementation.

(@Christophe: In Python, a single string as a data structure is often much
easier to deal with and overall extremely performant. Try searching over a
list of lines.)

Thanks,
Jonathan




Le sam. 18 juin 2022 à 21:09, Lucas Wiman  a écrit :

> I'm a little confused by the benchmark. Using re looks pretty competitive
> in terms of speed, and should be much more memory efficient.
>
> # https://www.gutenberg.org/cache/epub/100/pg100.txt (5.7mb; ~170K lines)
> with open('/tmp/shakespeare.txt', 'r') as f:
> text = f.read()
> import re
> from itertools import *
> line_re = re.compile(r"\n")
>
> Then when I run it:
> In [25]: %timeit _ = list(accumulate(chain([0], map(len,
> text.splitlines(True)
> 30.4 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> In [26]: %timeit _ = [m.start() for m in line_re.finditer(text)]
> 29 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> This is on 3.10.3 on an Intel 2.3gz i9 Macbook. (Note that the regex is
> off-by-one from the splitlines implementation.)
>
> What benchmark shows the regex to be significantly slower?
>
> That said, str.indexes(char) sounds like a reasonable addition.
>
> Best wishes,
> Lucas Wiman
>
> On Fri, Jun 17, 2022 at 1:12 PM Jonathan Slenders 
> wrote:
>
>> Hi everyone,
>>
>> Today was the 3rd time I came across a situation where it was needed to
>> retrieve all the positions of the line endings (or beginnings) in a very
>> long python string as efficiently as possible. First time, it was needed in
>> prompt_toolkit, where I spent a crazy amount of time looking for the most
>> performant solution. Second time was in a commercial project where
>> performance was very critical too. Third time is for the Rich/Textual
>> project from Will McGugan. (See:
>> https://twitter.com/willmcgugan/status/1537782771137011715 )
>>
>> The problem is that the `str` type doesn't expose any API to efficiently
>> find all \n positions. Every Python implementation is either calling
>> `.index()` in a loop and collecting the results or running a regex over the
>> string and collecting all positions.
>>
>> For long strings, depending on the implementation, this results in a lot
>> of overhead due to either:
>> - calling Python functions (or any other Python instruction) for every \n
>> character in the input. The amount of executed Python instructions is O(n)
>> here.
>> - Copying string data into new strings.
>>
>> The fastest solution I've been using for some time, does this
>> (simplified): `accumulate(chain([0], map(len, text.splitlines(True`.
>> The performance is great here, because the amount of Python instructions is
>> O(1). Everything is chained in C-code thanks to itertools. Because of that,
>> it can outperform the regex solution with a factor of ~2.5. (Regex isn't
>> slow, but iterating over the results is.)
>>
>> The bad things about this solution is however:
>> - Very cumbersome syntax.
>> - We call `splitlines()` which internally allocates a huge amount of
>> strings, only to use their lengths. That is still much more overhead then a
>> simple for-loop in C would be.
>>
>> Performance matters here, because for these kind of problems, the list of
>> integers that gets produced is typically used as an index to quickly find
>> character offsets in the original string, depending on which line is
>> displayed/processed. The bisect library helps too to quickly convert any
>> index position of that string into a line number. The point is, that for
>> big inputs, the amount of Python instructions executed is not O(n), but
>> O(1). Of course, some of the C code remains O(n).
>>
>> So, my ask here.
>> Would it make sense to add a `line_offsets()` method to `str`?
>> Or even `character_offsets(character)` if we want to do that for any
>> character?
>> Or `indexes(...)/indices(...)` if we would allow substrings of arbitrary
>> lengths

[Python-ideas] Add a line_offsets() method to str

2022-06-17 Thread Jonathan Slenders
Hi everyone,

Today was the 3rd time I came across a situation where it was needed to
retrieve all the positions of the line endings (or beginnings) in a very
long python string as efficiently as possible. First time, it was needed in
prompt_toolkit, where I spent a crazy amount of time looking for the most
performant solution. Second time was in a commercial project where
performance was very critical too. Third time is for the Rich/Textual
project from Will McGugan. (See:
https://twitter.com/willmcgugan/status/1537782771137011715 )

The problem is that the `str` type doesn't expose any API to efficiently
find all \n positions. Every Python implementation is either calling
`.index()` in a loop and collecting the results or running a regex over the
string and collecting all positions.

For long strings, depending on the implementation, this results in a lot of
overhead due to either:
- calling Python functions (or any other Python instruction) for every \n
character in the input. The amount of executed Python instructions is O(n)
here.
- Copying string data into new strings.

The fastest solution I've been using for some time, does this (simplified):
`accumulate(chain([0], map(len, text.splitlines(True`. The performance
is great here, because the amount of Python instructions is O(1).
Everything is chained in C-code thanks to itertools. Because of that, it
can outperform the regex solution with a factor of ~2.5. (Regex isn't slow,
but iterating over the results is.)

The bad things about this solution is however:
- Very cumbersome syntax.
- We call `splitlines()` which internally allocates a huge amount of
strings, only to use their lengths. That is still much more overhead then a
simple for-loop in C would be.

Performance matters here, because for these kind of problems, the list of
integers that gets produced is typically used as an index to quickly find
character offsets in the original string, depending on which line is
displayed/processed. The bisect library helps too to quickly convert any
index position of that string into a line number. The point is, that for
big inputs, the amount of Python instructions executed is not O(n), but
O(1). Of course, some of the C code remains O(n).

So, my ask here.
Would it make sense to add a `line_offsets()` method to `str`?
Or even `character_offsets(character)` if we want to do that for any
character?
Or `indexes(...)/indices(...)` if we would allow substrings of arbitrary
lengths?

Thanks,
Jonathan
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6WAMKYXOYA3SKL5HIRZP4WARMYYKXI3Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Regex timeouts

2022-02-14 Thread Jonathan Slenders
For what it's worth, the "regex" library on PyPI (not "re") supports
timeouts:

https://pypi.org/project/regex/

On Mon, Feb 14, 2022, 6:54 PM J.B. Langston  wrote:

> Hello,
>
> I had opened this bug because I had a bad regex in my code that was
> causing python to hang in the regex evaluation:
> https://bugs.python.org/issue46627. I have fixed the problem with that
> specific regex, but more generally I think it would be good to have a
> timeout option that could be configurable when compiling the regex so that
> if the regex didn't complete within the specified timeframe, it would abort
> and throw an exception.
>
> The bug was closed as Won't Fix and it was suggested that I bring up my
> idea here. The suggestion made to me on the bug was to read Mastering
> Regular Expressions and get better at writing regexes. I will take this
> advice, but this isn't really a reasonable solution to my problem for a few
> reasons.
> My use case is log parsing and I have a large number of regexes that run
> over many different log lines. With the volume of regexes I have, it's hard
> to make sure every regex has no potential problems, especially when the
> pathological behavior only occurs on certain inputs that may not have been
> anticipated when developing the regex.
> Also because of the volume of data these regexes are parsing, I would
> never want to allow a regex to run longer than a few milliseconds because
> if it did, that would kill my log processing throughput. I'd rather that it
> just raise an exception and move on to the next log entry.
> Thanks,
> J.B.
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/BJJFVT6WECWOWDIMBEHCMYC4V5YCTGFN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OPZWKKRSJ27MPQSEBJTTA5RTLOCCZCOT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Generics alternative syntax

2022-02-08 Thread Jonathan Slenders
Personally, I very much like this approach, although I'd definitely prefer
<> brackets instead, like in other languages.

We could possibly think of it as being syntactic sugar for the current
TypeVar approach, and have it translated into TypeVars at runtime.

However, if we'd define it that way, then these type variables would still
end up in the global scope.
We don't have scopes at the class/function definition level. That would
mean that we can't reuse the same symbol for separate functions, unless the
types are identical.

Jonathan



Le mar. 8 févr. 2022 à 09:15, Chris Angelico  a écrit :

> On Tue, 8 Feb 2022 at 18:59, Abdulla Al Kathiri
>  wrote:
> >
> > I thought this is called python-ideas, meaning it’s ok to bring ideas
> even if they are ugly or stupid. No need to introduce it in Python if it’s
> too much but it might induce discussions.
> >
>
> Yes, it's absolutely okay to bring ideas of all kinds; but every idea
> has consequences. What Stephen pointed out was a consequence that you
> perhaps hadn't been aware of, due to Python's type syntax being
> strictly the same as its expression syntax.
>
> There's no such thing as "impossible", but the more barriers there are
> for an idea, the more useful it has to be in order to be of interest.
> Introducing a new @ operator to allow you to create an EmailAddress
> class that uses it? Not interesting. Introducing a new @ operator to
> allow you to multiply numpy arrays in a different way? Significantly
> more interesting, even though the cost to the language is effectively
> the same. (And even then it took a long time to happen, because it
> broke the pattern that every operator is important to core data
> types.)
>
> What has to change for your idea to be able to happen?
>
> > class Indexable[T_co::](Protocol[T_co]):
> > def __getitem__(self, n: int) -> T_co: …
>
> This is completely new syntax: between the class name and what it
> subclasses, you have a subscript-like syntax.
>
> This is new semantics for subclassing, I think, unless you want
> Indexable to be a subclass of Protocol[T_co], but I'm not entirely
> sure how to read this. New semantics aren't as hard as new syntax,
> though.
>
> > def get[T: int | str](n: int, object: Indexable[T]) -> T:
> > return object[n]
>
> Perhaps unsurprisingly, the new syntax needs to work for functions as
> well as classes. So that's not really an additional cost, it's just
> the same cost seen in another place.
>
> What would be the semantics of this subscripting? What would be the
> meaning of, say, "def spam[42]():"? What if you leave the brackets
> empty? Cute syntax on its own doesn't make an idea; the cute syntax
> needs to be backed by a demonstration of how it's beneficial.
>
> ChrisA
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/2C6G3THB5ARRRJ6NQU7LDTNDDZOEH2EX/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5L65SMPYTS3QE4HLEJF5G4VWOT5AO5AX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Barrier Object in asyncio lib

2021-02-26 Thread Jonathan Slenders
>> Why would you need locks for async? Is it to sync with things outside of
the async process?

`asyncio.Lock` is needed to lock across async operations. (If there is an
`await` in the body for the lock).


Le ven. 26 févr. 2021 à 10:45, Barry Scott  a
écrit :

>
>
> On 26 Feb 2021, at 08:31, Jonathan Slenders  wrote:
>
> Barry,
>
> What you describe sounds like `asyncio.gather(...)` if I understand
> correctly.
>
> The thing with a Barier is that it's usable in situations where we don't
> know the other tasks. Maybe there is no reference to them from the current
> scope. Maybe they are even not yet created.
> It certainly can be done with a list of `asyncio.Future` and
> `asyncio.gather(...)`, but that's a lot of boilerplate.
>
> IMHO, Yves is right. For both asyncio and threading, we have Lock, Event,
> Condition, Semaphore and BoundedSemaphore. Only Barier is missing among the
> asyncio primitives. (RLock doesn't make sense.)
>
>
> Why would you need locks for async? Is it to sync with things outside of
> the async process?
>
> With the large and complex async app I work on there are no locks at all.
>
> (I guess we can probably go to bugs.python.org with this proposal.)
>
>
> Having shown that a Barrier for async is a missing piece it would be good
> to get a thumbs up here.
>
> Barry
>
>
> Jonathan
>
>
>
>
>
>
> Le jeu. 25 févr. 2021 à 23:38, Barry Scott  a
> écrit :
>
>>
>>
>> On 25 Feb 2021, at 17:15, Jonathan Slenders  wrote:
>>
>> It does make sense to have a barrier synchronization primitive for
>> asyncio.
>> The idea is to make a coroutine block until at least X coroutines are
>> waiting to enter the barrier.
>> This is very useful, if certain actions need to be synchronized.
>>
>>
>> I do most of my async coding with twisted where what you calling a
>> barrier is a DeferredList.
>>
>> The way its used is that you add in all the deferreds that you want to
>> complete before you continue
>> into the list. Once all the deferered have competed the DefferedList
>> completes and its callback is run.
>>
>> Barry
>>
>>
>>
>> Recently, I had to implement a barier myself for our use case. See code
>> below:
>>
>> It is simple to implement, but I too would like to have one for asyncio,
>> in order to be consistent with the concurrency primitives we have for
>> threading.
>>
>> Jonathan
>>
>>
>> class Barier:
>> """
>> Make a coroutine block until there are at least X waiters.
>>
>> Similar to the threading Barier objects but for asyncio:
>> https://docs.python.org/3/library/threading.html#barrier-objects
>> """
>>
>> def __init__(self, parties: int) -> None:
>> self.parties = parties
>> self._waiting: int
>> self._event = asyncio.Event()
>>
>> def add_one(self) -> None:
>> self._waiting += 1
>> if self._waiting == self.parties:
>> self._event.set()
>>
>> async def wait(self, worker: "Worker") -> None:
>> """
>> Wait until all we have at least `parties` waiters.
>> """
>> self.add_one()
>> await self._event.wait()
>>
>>
>>
>>
>> Le jeu. 25 févr. 2021 à 16:42, Barry Scott  a
>> écrit :
>>
>>>
>>>
>>> > On 25 Feb 2021, at 13:14, Yves Duprat  wrote:
>>> >
>>> > Hi,the list,
>>> >
>>> > I'm wondering why Barrier object does not exist in the synchronization
>>> primitives of the asyncio lib while it is present in threading and
>>> multiprocessing libs ?
>>> > This may not be the right place to ask this question, but I never
>>> found an answer on the web.
>>> > Thanks for your help.
>>>
>>>
>>> I'm assuming that the barrier you are speaking of is the mechanism that
>>> is used to
>>> synchronise threads/processes running in parallel to prevent data races.
>>>
>>> With async code that is never an issue. Each function runs to completion
>>> uninterrupted.
>>> There are no data races. Each time a async function runs it can know
>>> that the state of
>>> the objects it uses will not be changed while it is running.
>>>
>>> Barry
>>>
>>>
>>>
>>> >
>>> > Yves
>>> > __

[Python-ideas] Re: Barrier Object in asyncio lib

2021-02-26 Thread Jonathan Slenders
Barry,

What you describe sounds like `asyncio.gather(...)` if I understand
correctly.

The thing with a Barier is that it's usable in situations where we don't
know the other tasks. Maybe there is no reference to them from the current
scope. Maybe they are even not yet created.
It certainly can be done with a list of `asyncio.Future` and
`asyncio.gather(...)`, but that's a lot of boilerplate.

IMHO, Yves is right. For both asyncio and threading, we have Lock, Event,
Condition, Semaphore and BoundedSemaphore. Only Barier is missing among the
asyncio primitives. (RLock doesn't make sense.)
(I guess we can probably go to bugs.python.org with this proposal.)

Jonathan






Le jeu. 25 févr. 2021 à 23:38, Barry Scott  a
écrit :

>
>
> On 25 Feb 2021, at 17:15, Jonathan Slenders  wrote:
>
> It does make sense to have a barrier synchronization primitive for asyncio.
> The idea is to make a coroutine block until at least X coroutines are
> waiting to enter the barrier.
> This is very useful, if certain actions need to be synchronized.
>
>
> I do most of my async coding with twisted where what you calling a barrier
> is a DeferredList.
>
> The way its used is that you add in all the deferreds that you want to
> complete before you continue
> into the list. Once all the deferered have competed the DefferedList
> completes and its callback is run.
>
> Barry
>
>
>
> Recently, I had to implement a barier myself for our use case. See code
> below:
>
> It is simple to implement, but I too would like to have one for asyncio,
> in order to be consistent with the concurrency primitives we have for
> threading.
>
> Jonathan
>
>
> class Barier:
> """
> Make a coroutine block until there are at least X waiters.
>
> Similar to the threading Barier objects but for asyncio:
> https://docs.python.org/3/library/threading.html#barrier-objects
> """
>
> def __init__(self, parties: int) -> None:
> self.parties = parties
> self._waiting: int
> self._event = asyncio.Event()
>
> def add_one(self) -> None:
> self._waiting += 1
> if self._waiting == self.parties:
> self._event.set()
>
> async def wait(self, worker: "Worker") -> None:
> """
> Wait until all we have at least `parties` waiters.
> """
> self.add_one()
> await self._event.wait()
>
>
>
>
> Le jeu. 25 févr. 2021 à 16:42, Barry Scott  a
> écrit :
>
>>
>>
>> > On 25 Feb 2021, at 13:14, Yves Duprat  wrote:
>> >
>> > Hi,the list,
>> >
>> > I'm wondering why Barrier object does not exist in the synchronization
>> primitives of the asyncio lib while it is present in threading and
>> multiprocessing libs ?
>> > This may not be the right place to ask this question, but I never found
>> an answer on the web.
>> > Thanks for your help.
>>
>>
>> I'm assuming that the barrier you are speaking of is the mechanism that
>> is used to
>> synchronise threads/processes running in parallel to prevent data races.
>>
>> With async code that is never an issue. Each function runs to completion
>> uninterrupted.
>> There are no data races. Each time a async function runs it can know that
>> the state of
>> the objects it uses will not be changed while it is running.
>>
>> Barry
>>
>>
>>
>> >
>> > Yves
>> > ___
>> > Python-ideas mailing list -- python-ideas@python.org
>> > To unsubscribe send an email to python-ideas-le...@python.org
>> > https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> > Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/IAFAH7PWMUDUTLXYLNSXES7VMDQ26A3W/
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>> >
>> ___
>> Python-ideas mailing list -- python-ideas@python.org
>> To unsubscribe send an email to python-ideas-le...@python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/B6WDPXNZH5KYK2BLHJXUFZF2DLFBLCBR/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/Q74V5X7FGPGZ6BC2C6MHV6D65JH3IE4H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: Barrier Object in asyncio lib

2021-02-25 Thread Jonathan Slenders
It does make sense to have a barrier synchronization primitive for asyncio.
The idea is to make a coroutine block until at least X coroutines are
waiting to enter the barrier.
This is very useful, if certain actions need to be synchronized.

Recently, I had to implement a barier myself for our use case. See code
below:

It is simple to implement, but I too would like to have one for asyncio, in
order to be consistent with the concurrency primitives we have for
threading.

Jonathan


class Barier:
"""
Make a coroutine block until there are at least X waiters.

Similar to the threading Barier objects but for asyncio:
https://docs.python.org/3/library/threading.html#barrier-objects
"""

def __init__(self, parties: int) -> None:
self.parties = parties
self._waiting: int
self._event = asyncio.Event()

def add_one(self) -> None:
self._waiting += 1
if self._waiting == self.parties:
self._event.set()

async def wait(self, worker: "Worker") -> None:
"""
Wait until all we have at least `parties` waiters.
"""
self.add_one()
await self._event.wait()




Le jeu. 25 févr. 2021 à 16:42, Barry Scott  a
écrit :

>
>
> > On 25 Feb 2021, at 13:14, Yves Duprat  wrote:
> >
> > Hi,the list,
> >
> > I'm wondering why Barrier object does not exist in the synchronization
> primitives of the asyncio lib while it is present in threading and
> multiprocessing libs ?
> > This may not be the right place to ask this question, but I never found
> an answer on the web.
> > Thanks for your help.
>
>
> I'm assuming that the barrier you are speaking of is the mechanism that is
> used to
> synchronise threads/processes running in parallel to prevent data races.
>
> With async code that is never an issue. Each function runs to completion
> uninterrupted.
> There are no data races. Each time a async function runs it can know that
> the state of
> the objects it uses will not be changed while it is running.
>
> Barry
>
>
>
> >
> > Yves
> > ___
> > Python-ideas mailing list -- python-ideas@python.org
> > To unsubscribe send an email to python-ideas-le...@python.org
> > https://mail.python.org/mailman3/lists/python-ideas.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/IAFAH7PWMUDUTLXYLNSXES7VMDQ26A3W/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/B6WDPXNZH5KYK2BLHJXUFZF2DLFBLCBR/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2S7T7N3D2W6UH2WDDBSGUTEFZPNMZO2G/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] New PEP 550: Execution Context

2017-08-13 Thread Jonathan Slenders
For what it's worth, as part of prompt_toolkit 2.0, I implemented something
very similar to Nathaniel's idea some time ago.
It works pretty well, but I don't have a strong opinion against an
alternative implementation.

- The active context is stored as a monotonically increasing integer.
- For each local, the actual values are stored in a dictionary that maps
the context ID to the value. (Could cause a GC issue - I'm not sure.)
- Every time when an executor is started, I have to wrap the callable in a
context manager that applies the current context to that thread.
- When a new 'Future' is created, I grab the context ID and apply it to the
callbacks when the result is set.

https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad9422a3c6a218a939843bdd2cc76f16/prompt_toolkit/eventloop/context.py
https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad9422a3c6a218a939843bdd2cc76f16/prompt_toolkit/eventloop/future.py

FYI: In my case, I did not want to pass the currently active "Application"
object around all of the code. But when I started supporting telnet,
multiple applications could be alive at once, each with a different I/O
backend. Therefore the active application needed to be stored in a kind of
executing context.

When PEP550 gets approved I'll probably make this compatible. It should at
least be possible to run prompt_toolkit on the asyncio event loop.

Jonathan







2017-08-13 1:35 GMT+02:00 Nathaniel Smith :

> I had an idea for an alternative API that exposes the same
> functionality/semantics as the current draft, but that might have some
> advantages. It would look like:
>
> # a "context item" is an object that holds a context-sensitive value
> # each call to create_context_item creates a new one
> ci = sys.create_context_item()
>
> # Set the value of this item in the current context
> ci.set(value)
>
> # Get the value of this item in the current context
> value = ci.get()
> value = ci.get(default)
>
> # To support async libraries, we need some way to capture the whole context
> # But an opaque token representing "all context item values" is enough
> state_token = sys.current_context_state_token()
> sys.set_context_state_token(state_token)
> coro.cr_state_token = state_token
> # etc.
>
> The advantages are:
> - Eliminates the current PEP's issues with namespace collision; every
> context item is automatically distinct from all others.
> - Eliminates the need for the None-means-del hack.
> - Lets the interpreter hide the details of garbage collecting context
> values.
> - Allows for more implementation flexibility. This could be
> implemented directly on top of Yury's current prototype. But it could
> also, for example, be implemented by storing the context values in a
> flat array, where each context item is assigned an index when it's
> allocated. In the current draft this is suggested as a possible
> extension for particularly performance-sensitive users, but this way
> we'd have the option of making everything fast without changing or
> extending the API.
>
> As precedent, this is basically the API that low-level thread-local
> storage implementations use; see e.g. pthread_key_create,
> pthread_getspecific, pthread_setspecific. (And the
> allocate-an-index-in-a-table is the implementation that fast
> thread-local storage implementations use too.)
>
> -n
>
> On Fri, Aug 11, 2017 at 3:37 PM, Yury Selivanov 
> wrote:
> > Hi,
> >
> > This is a new PEP to implement Execution Contexts in Python.
> >
> > The PEP is in-flight to python.org, and in the meanwhile can
> > be read on GitHub:
> >
> > https://github.com/python/peps/blob/master/pep-0550.rst
> >
> > (it contains a few diagrams and charts, so please read it there.)
> >
> > Thank you!
> > Yury
> >
> >
> > PEP: 550
> > Title: Execution Context
> > Version: $Revision$
> > Last-Modified: $Date$
> > Author: Yury Selivanov 
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 11-Aug-2017
> > Python-Version: 3.7
> > Post-History: 11-Aug-2017
> >
> >
> > Abstract
> > 
> >
> > This PEP proposes a new mechanism to manage execution state--the
> > logical environment in which a function, a thread, a generator,
> > or a coroutine executes in.
> >
> > A few examples of where having a reliable state storage is required:
> >
> > * Context managers like decimal contexts, ``numpy.errstate``,
> >   and ``warnings.catch_warnings``;
> >
> > * Storing request-related data such as security tokens and request
> >   data in web applications;
> >
> > * Profiling, tracing, and logging in complex and large code bases.
> >
> > The usual solution for storing state is to use a Thread-local Storage
> > (TLS), implemented in the standard library as ``threading.local()``.
> > Unfortunately, TLS does not work for isolating state of generators or
> > asynchronous code because such code shares a single thread.
> >
> >
> > Rationale
> > =

Re: [Python-ideas] Make partial a built-in

2016-09-20 Thread Jonathan Slenders
Le 20 sept. 2016 18:42, "Ryan Gonzalez"  a écrit :
> Doing something like:
>
> lambda x, y: myfunc(partial_arg, x, y)
>
> is more error-prone to changes in myfunc's signature.

No, if the signature of the function changes, then the signature of the
partial would also change. The risk is the same.

Jonathan
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/