[Python-ideas] Abstract dataclasses and dataclass fields
I am finding that it would be useful to be able to define a dataclass that is an abstract base class and define some of its field as abstract. As I am typing this, I realize that I could presumably write some code to implement what I'm asking for. Maybe it is a good enough idea to make part of the standard API in any case though? I'm thinking that a field would be made abstract by passing `abstract=True` as an argument to `dataclasses.field()`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TFDJDTM7ZOYKBOPAYSDCM3T7SYD2RIJL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Bind/normalize params for @functools.cache
Steve Jorgensen wrote: > I was surprised to find, when I pass arguments to a function decorated with > `@functools.cache` in different, equivalent ways, the cache does not > recognize them as the same. > counter = itertools.count(1) > @functools.cache > def example(a, b, c=0): > return (next(counter), a, b, c) > example(1, 2) # => (1, 1, 2, 0) > example(1, b=2) # => (2, 1, 2, 0) > example(1, 2, 0) # => (3, 1, 2, 0) > When I wrote my own implementation as a coding exercise, I noticed the same > weakness while testing it and solved that by having the decorator function > get the signature of the decorated function, then use the bind method of the > signature to bind the parameter values, then call the apply_defaults method > on the bound arguments, and then finally, use the args and kwargs properties > of the bound arguments to make the cache key. > It seems like functools.cache should do the same thing. If it is undesirable > for that to be the default behavior, then it could be optional (e.g. > @functools.cache(normalize=True) ). > I have not tested to see if functools.lru_cache has the same issue. I presume > that it does, so my suggestion would apply to that as well. After saying that, I realized that, if the behavior should be optional, then maybe it would make sense to provide another wrapper to normalize the parameters instead (see possible implementation below)? On the other hand, since the primary use of such a thing would be for caching, maybe it does make more sense to include the behavior in 'functools.cache' et al., as I originally suggested, or maybe have both. def bind_call_params(func): """ Transform a function to always receive its arguments in the same form (which are positional and which are keyword) even if its implementation is less strict than what is described by its signature. This is for use in cases where the form of in which the parameters are passed may be significant to a decorator (e.g. '@functools.cache'). """ sig = signature(func) @wraps(func) def wrapper(*args, **kwargs): bound = sig.bind(*args, **kwargs) bound.apply_defaults() return func(*bound.args, **bound.kwargs) return wrapper ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B7UC2472UBGCMO2S3NZWRTDZLJ7OOPRJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Bind/normalize params for @functools.cache
I was surprised to find, when I pass arguments to a function decorated with `@functools.cache` in different, equivalent ways, the cache does not recognize them as the same. counter = itertools.count(1) @functools.cache def example(a, b, c=0): return (next(counter), a, b, c) example(1, 2) # => (1, 1, 2, 0) example(1, b=2) # => (2, 1, 2, 0) example(1, 2, 0) # => (3, 1, 2, 0) When I wrote my own implementation as a coding exercise, I noticed the same weakness while testing it and solved that by having the decorator function get the signature of the decorated function, then use the bind method of the signature to bind the parameter values, then call the apply_defaults method on the bound arguments, and then finally, use the args and kwargs properties of the bound arguments to make the cache key. It seems like functools.cache should do the same thing. If it is undesirable for that to be the default behavior, then it could be optional (e.g. @functools.cache(normalize=True) ). I have not tested to see if functools.lru_cache has the same issue. I presume that it does, so my suggestion would apply to that as well. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DS72SMJRJNKO3UVDS7ZVKAAPES45PLOQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add InvalidStateError to the standard exception hierarchy
Paul Moore wrote: > What's wrong with defining a custom exception? It's literally one line: > `class InvalidStateError(Exception): pass`. Two lines if you want to put > the `pass` on its own line. > The built in exceptions are ones that are raised by the core interpreter. > Even the stdlib doesn't get builtin exceptions, look at sqlite3.Error, for > example. Defining a custom exception in the module alongside the function > that raises it is both normal practice, and far more discoverable. > Paul > On Thu, 1 Sept 2022 at 22:42, Steve Jorgensen stevec...@gmail.com wrote: > > I frequently find that I want to raise an exception when the target of a > > call is not in an appropriate state to perform the requested operation. > > Rather than choosing between `Exception` or defining a custom exception, it > > would be nice if there were a built-in `InvalidStateError` exception that > > my code could raise. > > In cases where I want to define a custom exception anyway, I think it > > would be nice if it could have a generic `InvalidStateError` exception > > class for it to inherit from. > > Of course, I would be open to other ideas for what the name of this > > exception should be. Other possibilities off the top of my head are > > `BadStateError` or `StateError`. > > ___ > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/NMHNKS... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > OK, but by that logic, why do we have standard exceptions like `ValueError` when we could define custom exceptions for the cases where that should be raised? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5JGBBE7JEKWYPEQO6NC4B7UFKJN2UK6K/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add InvalidStateError to the standard exception hierarchy
Matthias Görgens wrote: > > If the target of the call isn't in an appropriate state, isn't that a > > bug in the constructor that it allows you to construct objects that are > > in an invalid state? > > You should fix the object so that it is never in an invalid state rather > > than blaming the caller. > > You can't really do that with files that have been closed. > Unless you disallow manual closing of files altogether. > That being said, I'd suggest that people raise custom exception, so your > callers can catch exactly what they want to handle. > An generic exception like ValueError or the proposed InvalidStateError > could be thrown by almost anything you call in your block, instead of just > what you actually intend to catch. I didn't say that I was talking about a file. In fact, today, I'm talking about an object that manages a subprocess. If a caller tries to call a method of the manager to interact with the subprocess when the subprocess has not yet been started or after it has been terminated, then I want to raise an appropriate exception. I am raising a custom exception, and it annoys me that it has to simply inherit from Exception when I think that an invalid state condition is a common enough kind of issue that it should have a standard exception class in the hierarchy. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YC62WQIXUTM3ULVA64SBXBS5YZ3M2XGT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add InvalidStateError to the standard exception hierarchy
Matthias Görgens wrote: > > If the target of the call isn't in an appropriate state, isn't that a > > bug in the constructor that it allows you to construct objects that are > > in an invalid state? > > You should fix the object so that it is never in an invalid state rather > > than blaming the caller. > > You can't really do that with files that have been closed. > Unless you disallow manual closing of files altogether. > That being said, I'd suggest that people raise custom exception, so your > callers can catch exactly what they want to handle. > An generic exception like ValueError or the proposed InvalidStateError > could be thrown by almost anything you call in your block, instead of just > what you actually intend to catch. It depends on context whether it makes sense to define a custom exception, and I agree that I frequently should define a custom exception. In that case though, it would still be nice to have an appropriate generic exception for that to inherit from, just as I would inherit from `ValueError` for a special case of a value error. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GNBFLWNXWBV54C73MOZJDEXJPDIOVBGM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add InvalidStateError to the standard exception hierarchy
Jean Abou Samra wrote: > Le 01/09/2022 à 23:40, Steve Jorgensen a écrit : > > I frequently find that I want to raise an exception when the target of a > > call is not in an appropriate state to perform the requested operation. > > Rather than choosing between `Exception` or defining a custom exception, it > > would be nice if there were a built-in `InvalidStateError` exception that > > my code could raise. > > In cases where I want to define a custom exception anyway, I think it would > > be nice if it could have a generic `InvalidStateError` exception class for > > it to inherit from. > > Of course, I would be open to other ideas for what the name of this > > exception should be. Other possibilities off the top of my head are > > `BadStateError` or `StateError`. > > https://docs.python.org/3/library/exceptions.html#ValueError states that > ValueError is “Raised when an operation or function receives an argument > that has the right type but an inappropriate value, and the situation is > not described by a more precise exception such as |IndexError| > https://docs.python.org/3/library/exceptions.html#IndexError.” How would > a "state error" differ from this more precisely? What value would this new > exception type add? Both ValueError and this proposed StateError are very > generic. `ValueError` is about for when the value of an argument passed to the function is unacceptable. The exception that I propose would be for when there is nothing wrong with any argument value, but the object is not in the correct state for that method to be called. I should have provided an example. One example is when trying to call methods to interact with a remote system either before a connection has been made or after the connection has been terminated. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W2G5XNWKS6KHXSCH45QPLFRUMZIVNS4L/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Add InvalidStateError to the standard exception hierarchy
I frequently find that I want to raise an exception when the target of a call is not in an appropriate state to perform the requested operation. Rather than choosing between `Exception` or defining a custom exception, it would be nice if there were a built-in `InvalidStateError` exception that my code could raise. In cases where I want to define a custom exception anyway, I think it would be nice if it could have a generic `InvalidStateError` exception class for it to inherit from. Of course, I would be open to other ideas for what the name of this exception should be. Other possibilities off the top of my head are `BadStateError` or `StateError`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NMHNKSEZG7UZ6AIFTVGQXVECCNYYVODT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
Ethan Furman wrote: > On 7/9/22 12:19, Steve Jorgensen wrote: > > [...] It works great to combine them by defining the dataclass as a mixin > > for the Enum class. Why would > > it not be good to include that as an example in the official docs, assuming > > (as I believe) that it is a > > particularly useful combination? > > Do you have some real-world examples that show this? > -- > ~Ethan~ I have only used it in 1 real-world case s far. It's a good use case but not a good example case. I'll keep using this pattern though, and I'll probably end up with a good example soonish. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FGK4R4ES3STAS2PZLYX5UOV5HZRIFSF2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Dataclasses for complex models, A proposal for datatrees,
I think there have not been any replies to this so far because it's too much effort to figure out what you're actually suggesting. Can you try to make the request again, starting with a clear summary and then breaking out some of the details? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7XPPPC63XVXFIXP2WIT6ARRX7CTYPRSX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
Chris Angelico wrote: > On Mon, 11 Jul 2022 at 03:54, Steve Jorgensen stevec...@gmail.com wrote: > > David Mertz, Ph.D. wrote: > > I've seen this thread, and also wondered why anyone could EVER want a > > dataclass that is an enum. Nothing I've seen in the thread gives me any > > hint about that, really. > > On Sun, Jul 10, 2022 at 7:44 AM Barry Scott ba...@barrys-emacs.org wrote: > > On 9 Jul 2022, at 22:53, Steve Jorgensen stevec...@gmail.com wrote: > > I don't think that dataclasses have the limited set of intended uses > > that you are interpreting them as having. To me, the fact that they can be > > frozen makes them a good fit with Enum. > > Please quote the email that you are replying to. > > It is usually considered a code smell to have a class that is two or more > > things. > > This seems to be what you are trying to do. > > How can one class be a set of fields and also the enum for one of its own > > fields? > > I do not understand why this is resonable. > > Barry > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/V6U7UM... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/HZFZE3... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > > Sorry, I don't know how I communicated that I was trying to have one class > > be a set of fields and also the enum for one of its own fields. > > I'm really just wanting to have each member of the enum be an instance of a > > frozen dataclass. If an of the dataclass fields were of an enum type, then > > it would presumably not be for the same enum. In my example, none of the > > fields of the dataclass contains an enum. One contains a string, and the > > other contains an int. > > Just throwing an idea out there, but would it work better to have an > enum-namedtuple instead? > ChrisA The only benefit I can think of for namedtuple vs a dataclass is compactness in memory, but the number of members of an enum is typically very small. I think the extra flexibility of a dataclass makes more desirable for this purpose. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/F4YM66UAQ3GXXBIMPNX6MLEQA22K7UVL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
David Mertz, Ph.D. wrote: > I've seen this thread, and also wondered why anyone could EVER want a > dataclass that is an enum. Nothing I've seen in the thread gives me any > hint about that, really. > On Sun, Jul 10, 2022 at 7:44 AM Barry Scott ba...@barrys-emacs.org wrote: > > On 9 Jul 2022, at 22:53, Steve Jorgensen stevec...@gmail.com wrote: > > I don't think that dataclasses have the limited set of intended uses > > that you are interpreting them as having. To me, the fact that they can be > > frozen makes them a good fit with Enum. > > Please quote the email that you are replying to. > > It is usually considered a code smell to have a class that is two or more > > things. > > This seems to be what you are trying to do. > > How can one class be a set of fields and also the enum for one of its own > > fields? > > I do not understand why this is resonable. > > Barry > > > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/V6U7UM... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/HZFZE3... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. Sorry, I don't know how I communicated that I was trying to have one class be a set of fields and also the enum for one of its own fields. I'm really just wanting to have each member of the enum be an instance of a frozen dataclass. If an of the dataclass fields were of an enum type, then it would presumably not be for the same enum. In my example, none of the fields of the dataclass contains an enum. One contains a string, and the other contains an int. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KWL2FXQ2FKRMGBAB5PMR3GIRAQBC6CLR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
I don't think that dataclasses have the limited set of intended uses that you are interpreting them as having. To me, the fact that they can be frozen makes them a good fit with Enum. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V6U7UMQRTLDZ2W6SWREL472L6ZH7MHB5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
Ethan Furman wrote: > On 7/7/22 09:01, Steve Jorgensen wrote: > > Actually, maybe these are fundamentally incompatible? > > Their intended use seems fundamentally incompatible: > - dataclass was designed for making many mutable records (hundreds, > thousands, or more) > - enum was designed to make a handful of named constants (I haven't yet seen > one with even a hundred elements) > The repr from a combined dataclass/enum looks like a dataclass, giving no > clue that the object is an enum, and omitting > any information about which enum member it is and which enum it is from. > Given these conflicts of interest, I don't see any dataclass examples making > it into the enum documentation. > -- > ~Ethan~ Per my subsequent self-reply, they are only incompatible when trying to do them at the same time in the same class definition. It works great to combine them by defining the dataclass as a mixin for the Enum class. Why would it not be good to include that as an example in the official docs, assuming (as I believe) that it is a particularly useful combination? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VFGXT4QOWYF3UJVWYOR54GNTKEG2XT7D/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
After some playing around, I figured out a pattern that works without any changes to the implementations of `dataclass` or `Enum`, and I like this because it keeps the 2 kinds of concern separate. Maybe I'll try submitting an MR to add an example like this to the documentation for `Enum`. In [1]: from dataclasses import dataclass In [2]: from enum import Enum In [3]: @dataclass(frozen=True) ...: class CreatureDataMixin: ...: size: str ...: legs: int ...: In [4]: class Creature(CreatureDataMixin, Enum): ...: BEETLE = ('small', 6) ...: DOG = ('medium', 4) ...: In [5]: Creature.DOG Out[5]: Creature(size='medium', legs=4) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G2VALQ4RIVFKIOKVW4XZAHZMLSZWL2XS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Make dataclass aware that it might be used with Enum
Steve Jorgensen wrote: > Perhaps, this has already been addressed in a newer release (?) but in Python > 3.9, making `@dataclass` work with `Enum` is a bit awkward. > Currently, it order to make it work, I have to: > 1. Pass `init=False` to `@dataclass` and hand-write the `__init__` method > 2. Pass `repr=False` to `@dataclass` and use `Enum`'s representation or write > a custom __repr__ > Example: > In [72]: @dataclass(frozen=True, init=False, repr=False) > ...: class Creature(Enum): > ...: legs: int > ...: size: str > ...: Beetle = (6, 'small') > ...: Dog = (4, 'medium') > ...: def __init__(self, legs, size): > ...: self.legs = legs > ...: self.size = size > ...: > In [73]: Creature.Dog > Out[73]: Actually, maybe these are fundamentally incompatible? `@dataclass` is a decorator, so it acts on the class after it was already defined, but `Enum` acts before that when `@dataclass` cannot have not generated the `__init__` yet. Right? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/T775WMOLR6TNOXDAU37ZA2FKQB3SMJT6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Make dataclass aware that it might be used with Enum
Perhaps, this has already been addressed in a newer release (?) but in Python 3.9, making `@dataclass` work with `Enum` is a bit awkward. Currently, it order to make it work, I have to: 1. Pass `init=False` to `@dataclass` and hand-write the `__init__` method 2. Pass `repr=False` to `@dataclass` and use `Enum`'s representation or write a custom __repr__ Example: In [72]: @dataclass(frozen=True, init=False, repr=False) ...: class Creature(Enum): ...: legs: int ...: size: str ...: Beetle = (6, 'small') ...: Dog = (4, 'medium') ...: def __init__(self, legs, size): ...: self.legs = legs ...: self.size = size ...: In [73]: Creature.Dog Out[73]: ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EXPSE4KQYM5SWPFCWH4QPOTS6UCP5FNL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Dexter Hill wrote: > Steve Jorgensen wrote: > > Would we want something more general that could deal with cases where the > > input does not have a 1-to-1 mapping to the field that differ only, > > perhaps, in type hint? What if we want 1 argument to initializes 2 > > properties or vice verse, etc.? > > That's definitely an improvement that could be made, although I think it > > would require a large amount of changes. I don't know if you had syntax in > > mind for it, or an easy way to represent it, but at least from what I > > understand you would probably a whole new function like `field`, but that > > handles just that functionality, otherwise it would add a lot of arguments > > to `field`. > Steve Jorgensen wrote: > > In any case, having a new `InitFn` is worth digging into, I don't think it > > needs to have 2 arguments for type since the type annotation already covers > > 1 of those cases. I think it makes the most sense for the type annotation > > to apply to the property and the type of the argument to be provided either > > through an optional argument to `InitFn` or maybe that can be derived from > > the signature of the function that `InitFn` refers to. > > So the use case would be either this: > ```py > @dataclass > class Foo: > x: InitFn[str] = field(converter=chr) > ``` > where the field `x` has the type string, and the type for the `x` parameter > in `__init__` would be derrived from `chr`, or optionally: > ```py > @dataclass > class Foo: > x: InitFn[str, int] = field(converter=chr) > ``` > where you can provide a second type argument that specifies the type > parameter for `__init__`? How about this variation? Use with `init_using` instead of `converter` as the name of the argument to field, allow either a callable or a method name to be supplied, and expect the custom init function to behave like `__post_init__` in that it assigns to properties rather than returning a converted value. That will allow it to initialize more than 1 property. Next, we can say that if the same callable object or the same method name is passed to `init_using`, then it is called only once. Finally, we say that the class' init argument(s) and their type hints are taken from the `init_using` target. ``` @dataclass class DocumentFile: filename: str = field(init_using='_init_name_and_ctype') content_type: str = field(init_using='_init_name_and_ctype') description: str | None = field(default=None) # In this case, the function takes a `file_name` argument which is the same # as one of the property names that it initializes, but it could take an argument # with a completely different name, and the class init would have that as its # an argument instead. def _init_name_and_ctype(self, filename: str | Path = '/tmp/example.txt') -> None: self.filename = str(filename) self.content_type = mimetypes.guess_type(filename) # Roughly translates to class DocumentFile: filename: str content_type: str description: str | None def __init__(self, filename: str | Path = '/tmp/example.txt', description: str | None = None): self.description = description self._init_name_and_ctype(filename) def _init_name_and_ctype(self, file_name: str | Path = '/tmp/example.txt') -> None: self.file_name = str(file_name) self.content_type = mimetypes.guess_type(file_name) ``` ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CGOCLL2YRITOXJWQB55PHYUTYKF4BLSB/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Paul Bryan wrote: > Could the type hint for the __init__ parameter be inferred from the > (proposed) init_fn's own parameter type hint itself? > On Tue, 2022-06-28 at 16:39 +0000, Steve Jorgensen wrote: I think I was already suggesting that possibility "an optional argument to `InitFn` or maybe that can be derived from the signature of the function that `InitFn` refers to." Are we saying the same thing? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W6SYLYIQLORAJJCVXYPZFLV25XZG43DH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Dexter Hill wrote: > Ah right I see what you mean. In my example I avoided the use of `__init__` > and specifically `__post_init__` as (and it's probably a fairly uncommon use > case), in my actual project, `__post_init__` is defined on a base class, and > inherited by all other classes, and I wanted to avoid overriding > `__post_init__` (and super-ing). The idea was to have the conversion > generated by the dataclass, within the `__init__` no function were required > to be defined (similarly to how converters work in attrs). > With your suggestion, what do you think about having something similar to > `InitVar` so it's more in line with how `__post_init__` currently works? For > example, like one of my other suggestions, having a type called `InitFn` > which takes two types: the type for `__init__` and the type of the actual > field. Now I see why you wanted to avoid using __post_init__. I had been thinking to try to use __post_init_ instead of adding more ways to initialize, but your reasoning makes a lot of sense. Would we want something more general that could deal with cases where the input does not have a 1-to-1 mapping to the field that differ only, perhaps, in type hint? What if we want 1 argument to initializes 2 properties or vice verse, etc.? In any case, having a new `InitFn` is worth digging into, I don't think it needs to have 2 arguments for type since the type annotation already covers 1 of those cases. I think it makes the most sense for the type annotation to apply to the property and the type of the argument to be provided either through an optional argument to `InitFn` or maybe that can be derived from the signature of the function that `InitFn` refers to. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BCN2BUZSM6KH5VSTKHYWI3CB5UVDDNUH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Dexter Hill wrote: > Do you mind providing a little example of what you mean? I'm not sure I 100% > understand what your use of `__post_init__` is. In my mind, it would be > something like: > ```py > @dataclass > class Foo: > x: str = field(init=int, converter=chr) > # which converts to > class Foo: > def __init__(self, x: int): > self.x = chr(x) > ``` > without any use of `__post_init__`. If it were to be something like: > ```py > class Foo: > def __init__(self, x: int): > self.__post_init__(x) > def __post_init__(x: int): > self.x = chr(x) > ``` > which, I think is what you are suggesting (please correct me if I'm wrong), > then I feel that may be confusing if you were to override `__post_init__`, > which is often much easier than overriding `__init__`. > For exmple, in a situation like: > ```py > @dataclass > class Foo: > x: str = field(init=int, converter=chr) > y: InitVar[str] > ``` > if the user were to override `__post_init__`, would they know that they need > to include `x` as the first argument? It's not typed with `InitVar` so it > might not be clear that it's passed to `__post_init__`. That's close to what I mean. I'm actually suggesting to not have 'converter though, and instead use an explicit `__post_init__` for that, so ```py @dataclass class Foo: x: str = field(init=int) def __post_init__(self, x: int): self.x = chr(x) # converts to class Foo: def __init__(self, x: int): self.__post_init__(x) def __post_init__(self, x: int): self.x = chr(x) ``` Writing that out is helpful because now I see that the argument type can possibly be taken from the `__post_init__` signature, meaning there is no need to use the type as the value for the `init` argument to `field`. In that case, instead of `init=int`, it could maybe be something like `post_init=True`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LI7ZSAZ6VGQV4OEP7ZOXIWIKA4VLMWXJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Dexter Hill wrote: > I don't mind that solution although my concern is whether it would be > confusing to have `init` have two different purposes depending on the > argument. And, if `__post_init__` was overrided, which I would say it > commonly is, that would mean the user would have to manually do the > conversion, as well as remembering to add an extra argument for the > conversion function (assuming I'm understanding what you're saying). > If no type was provided to `init` but a conversion function was, it would be > a case of getting the type from the function signature, right? The reason I am saying to use the 'init' argument is that it seems to me to be a variation on what that argument already does. It controls whether the argument is passed to the generated `__init__` method. Passing a type as the value for 'init' would now behave like sort of a cross between `init=False` and `InitVar`. The field would still be created (unlike `InitVar`) but would not be automatically assigned the value passed as its corresponding argument, leaving that responsibility to `__post_init__`. Like with `InitVar`, the argument would be passed to `__post_init__` since it was not processed by `__init__`. The type annotation would continue to specify the type of the field, and the type passed to the 'init' argument would specify the type of its constructor argument. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4DUTNRIRLJKOY3CDRGIU6TZ4NV2RWP5Q/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Steve Jorgensen wrote: > Simão Afonso wrote: > > On 2022-06-23 17:35:59, Steve Jorgensen wrote: > > What if, instead, the `init` parameter could accept either a boolean > > (as it does now) or a type? When given a type, that would mean that to > > created the property and accept the argument but pass the argument ti > > `__post_init__` rather than using it to initialize the property > > directly. The type passed to `init` would become the type hint for the > > argument. > > What if you wanted to create a boolean type from a function? > > Then you would pass `type=bool` Oops. That was another typo. You would pass `init=bool`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YAAMMAP4YWLJ5YWZG6DLFLVTBX73MFGR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Simão Afonso wrote: > On 2022-06-23 17:35:59, Steve Jorgensen wrote: > > What if, instead, the `init` parameter could accept either a boolean > > (as it does now) or a type? When given a type, that would mean that to > > created the property and accept the argument but pass the argument ti > > `__post_init__` rather than using it to initialize the property > > directly. The type passed to `init` would become the type hint for the > > argument. > > What if you wanted to create a boolean type from a function? Then you would pass `type=bool` ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/C5RAIKFHB3KCXJGGGWYWZAGNQ7OJ3AUS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
Steve Jorgensen wrote: > I think I have an idea how to do something like what you're asking with less > magic, and I think an example implementation of this could actually be done > in pure Python code (though a more performant implementation would need > support at the C level). > What if a deferred object has 1 magic method ( __isdeferred__ ) that is > invoked directly rather than causing a thunk, and invocation of any other > method does cause a thunk. For the example implementation, a thunk would > simply mean that the value is computed and stored within the instance, and > method calls on the wrapper are now delegated to that. In the proper > implementation, the object would change its identity to become its computed > result. I haven't had any replies to this, but I think it warrants some attention, so I'll try to clarify what I'm suggesting. Basically, have a deferred object be a wrapper around any kind of callable, and give the wrapper a single method __is_deferred__ that does not trigger unwrapping. Any other method call or anything else that depends on knowing the actual object results in the callable being executed and the wrapper object being replaced by that result. From then on, it is no longer deferred. I like this idea because it is very easy to reason about and fairly flexible. Whether the deferred object is a closure or not depends entirely on its callable. When it gets unwrapped is easy to understand (basically anything other than assignment, passing as an argument, or asking whether it is deferred). What this does NOT help much with is using for argument defaults. Personally, I think that's OK. I think that there are good arguments (separately) for dynamic argument defaults and deferred objects and that trying to come up with 1 concept that covers both of those is not necessarily a good idea. It's not a good idea if we can't come up with a way to do it that IS easy to reason about, anyway. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OWDM7AUSYECALBQ2JVNQL3H2GH2NFSYV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Sorry for typos. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2RMKRA2GEQ3HADXG4TXYTCLRUX2CR5QG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dataclass field argument to allow converting value on init
Dexter Hill wrote: > The idea is to have a `default_factory` like argument (either in the `field` > function, or a new function entirely) that takes a function as an argument, > and that function, with the value provided by `__init__`, is called and the > return value is used as the value for the respective field. For example: > ```py > @dataclass > class Foo: > x: str = field(init_fn=chr) > f = Foo(65) > f.x # "A" > ``` > The `chr` function is called, given the value `65` and `x` is set to its > return value of `"A"`. I understand that there is both `__init__` and > `__post_init__` which can be used for this purpose, but sometimes it isn't > ideal to override them. If you overrided `__init__`, and were using > `__post_init__`, you would need to manually call it, and in my case, > `__post_init__` is implemented on a base class, which all other classes > inherit, and so overloading it would require re-implementing the logic from > it (and that's ignoring the fact that you also need to type the field with > `InitVar` to even have it passed to `__post_init__` in the first place). > I've created a proof of concept, shown below: > ```py > def initfn(fn, default=None): > class Inner: > def __set_name__(_, owner_cls, owner_name): > old_setattr = getattr(owner_cls, "__setattr__") > def __setattr__(self, attr_name, value): > if attr_name == owner_name: > # Bypass `__setattr__` > self.__dict__[attr_name] = fac(value) > else: > old_setattr(self, attr_name, value) > setattr(owner_cls, "__setattr__", __setattr__) > def fac(value): > if isinstance(value, Inner): > return default > return fn(value) > return field(default=Inner()) > ``` > It makes use of the fact that providing `default` as an argument to `field` > means it checks the value for a `__set_name__` function, and calls it with > the class and field name as arguments. Overriding `__setattr__` is just used > to catch when a value is being assigned to a field, and if that field's name > matches the name given to `__set_name__`, it calls the function on the value, > at sets the field to that instead. > It can be used like so: > ```py > @dataclass > class Foo: > x: str = initfn(fn=chr, default="Z") > f = Foo(65) > f2 = Foo() > f.x # "A" > f2.x # "Z" > ``` > It adds a little overhead, especially with having to override `__setattr__` > however, I believe it would have very little overhead if directly implemented > in the dataclass library. > Even in the case of being able to override one of the init functions, I still > think it would be nice to have as a quality of life feature as I feel calling > a function is too simple to want to override the functions, if that makes > sense. > Thanks. > Dexter What if, instead, the `init` parameter could accept either a boolean (as it does now) or a type? When given a type, that would mean that to created the property and accept the argument but pass the argument ti `__post_init__` rather than using it to initialize the property directly. The type passed to `init` would become the type hint for the argument. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YERVGXA5QJUHOQW357GVN7JERB2AJT6P/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
> No need to have an object there - you could just define it as a syntactic > construct instead. Assignment targets aren't themselves objects (although the > same syntax can often be used on the RHS, when it would resolve to one). Right. Thanks. That _should_ have been obvious. :) > Having a way to say "allow additional elements without iterating over them" > would be useful, but creating a new way to spell the non-assignment wouldn't > be of sufficiently great value to justify the syntax IMO. I mostly agree. I included that option for completeness. It would still have the benefit of avoiding the memory usage of creating a list and keeping references to the items until the list itself can be collected. Come to think of it, can (or could) Python already optimize that using current syntax, noticing that the variable assigned to is never used after it is "assigned" to? If that optimization were implemented (I presume it is not implemented now) then there is actually no point to this proposal at all except to allow "..." in final positions in the expression to the left of "=" and to have that mean to not iterate. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AHUOIVOS4GXHAI3AT7O5M2MI4BJJER24/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
Steve Jorgensen wrote: > This is based on previous discussions of possible ways of matching all > remaining items during destructuring but without iterating of remaining final > items. This is not exactly a direct replacement for that idea though, and > skipping iteration of final items might or might not be part of the goal. > In this proposal, the ellipsis (...) can be used in the expression on the > left side of the equals sign in destructuring anywhere that `*` can > appear and has approximately the same meaning. The difference is that when > the ellipsis is used, the matched items are not stored in variables. This can > be useful when the matched data might be very large. > ..., last_one = > a, ..., z = > first_one, ... = > Additionally, when the ellipsis comes last and the data is being retrieved by > iterating, stop retrieving items since that might be expensive and we know > that we will not use them. > Alternative A: > Still iterate over items when the ellipsis comes last (for side effects) but > introduce a new `final_elipsis` object that is used to stop iteration. The > negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that > case. > Alternative B: > Still iterate over items when the ellipsis comes last (for side effects) and > don't provide any new means of skipping iteration over final items. The > programmer can use islice to achieve that. Correction: "are not stored in variables" should say "are not stored in a variable" ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CCQYEZH465W4ARBMBIUWK6YN4J5HNA5B/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Ellipsis (...) to be roughly synonymous with * in destructuring but without capture.
This is based on previous discussions of possible ways of matching all remaining items during destructuring but without iterating of remaining final items. This is not exactly a direct replacement for that idea though, and skipping iteration of final items might or might not be part of the goal. In this proposal, the ellipsis (...) can be used in the expression on the left side of the equals sign in destructuring anywhere that `*` can appear and has approximately the same meaning. The difference is that when the ellipsis is used, the matched items are not stored in variables. This can be useful when the matched data might be very large. ..., last_one = a, ..., z = first_one, ... = Additionally, when the ellipsis comes last and the data is being retrieved by iterating, stop retrieving items since that might be expensive and we know that we will not use them. Alternative A: Still iterate over items when the ellipsis comes last (for side effects) but introduce a new `final_elipsis` object that is used to stop iteration. The negation of `ellipsis` (e.g. `-...`) could return `final_ellipsis` in that case. Alternative B: Still iterate over items when the ellipsis comes last (for side effects) and don't provide any new means of skipping iteration over final items. The programmer can use islice to achieve that. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QPMFXOOHKQJ6YFM35SJXZMANBQTRZ3FY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Generalized deferred computation in Python
I think I have an idea how to do something like what you're asking with less magic, and I think an example implementation of this could actually be done in pure Python code (though a more performant implementation would need support at the C level). What if a deferred object has 1 magic method ( __isdeferred__ ) that is invoked directly rather than causing a thunk, and invocation of any other method does cause a thunk. For the example implementation, a thunk would simply mean that the value is computed and stored within the instance, and method calls on the wrapper are now delegated to that. In the proper implementation, the object would change its identity to become its computed result. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RNZXM55GFZ5DHOHP6QZZ744HUVNDB2BV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add a line_offsets() method to str
Steve Jorgensen wrote: > Jonathan Slenders wrote: > > Hi everyone, > > Today was the 3rd time I came across a situation where it was needed to > > retrieve all the positions of the line endings (or beginnings) in a very > > long python string as efficiently as possible. First time, it was needed in > > prompt_toolkit, where I spent a crazy amount of time looking for the most > > performant solution. Second time was in a commercial project where > > performance was very critical too. Third time is for the Rich/Textual > > project from Will McGugan. (See: > > https://twitter.com/willmcgugan/status/1537782771137011715 ) > > The problem is that the `str` type doesn't expose any API to efficiently > > find all \n positions. Every Python implementation is either calling > > `.index()` in a loop and collecting the results or running a regex over the > > string and collecting all positions. > > For long strings, depending on the implementation, this results in a lot of > > overhead due to either: > > > > calling Python functions (or any other Python instruction) for every \n > > > > character in the input. The amount of executed Python instructions is O(n) > > here. > > > > Copying string data into new strings. > > > > The fastest solution I've been using for some time, does this (simplified): > > `accumulate(chain([0], map(len, text.splitlines(True`. The performance > > is great here, because the amount of Python instructions is O(1). > > Everything is chained in C-code thanks to itertools. Because of that, it > > can outperform the regex solution with a factor of ~2.5. (Regex isn't slow, > > but iterating over the results is.) > > The bad things about this solution is however: > > > > Very cumbersome syntax. > > We call `splitlines()` which internally allocates a huge amount of > > > > strings, only to use their lengths. That is still much more overhead then a > > simple for-loop in C would be. > > Performance matters here, because for these kind of problems, the list of > > integers that gets produced is typically used as an index to quickly find > > character offsets in the original string, depending on which line is > > displayed/processed. The bisect library helps too to quickly convert any > > index position of that string into a line number. The point is, that for > > big inputs, the amount of Python instructions executed is not O(n), but > > O(1). Of course, some of the C code remains O(n). > > So, my ask here. > > Would it make sense to add a `line_offsets()` method to `str`? > > Or even `character_offsets(character)` if we want to do that for any > > character? > > Or `indexes(...)/indices(...)` if we would allow substrings of arbitrary > > lengths? > > Thanks, > > Jonathan > > I presume there is some reason that `re.findall` did not work or was not > > optimal? I just saw your reply elsewhere in the conversation that says > That requires a more complex regex pattern. I was actually using: > re.compile(r"\n|\r(?!\n)") > And then the regex becomes significantly slower than the splitlines() > solution, which is still much slower than it has to be. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JGY2YNOCKZ2KS7BMQMNCEY3YHIRJC3UL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Add a line_offsets() method to str
Jonathan Slenders wrote: > Hi everyone, > Today was the 3rd time I came across a situation where it was needed to > retrieve all the positions of the line endings (or beginnings) in a very > long python string as efficiently as possible. First time, it was needed in > prompt_toolkit, where I spent a crazy amount of time looking for the most > performant solution. Second time was in a commercial project where > performance was very critical too. Third time is for the Rich/Textual > project from Will McGugan. (See: > https://twitter.com/willmcgugan/status/1537782771137011715 ) > The problem is that the `str` type doesn't expose any API to efficiently > find all \n positions. Every Python implementation is either calling > `.index()` in a loop and collecting the results or running a regex over the > string and collecting all positions. > For long strings, depending on the implementation, this results in a lot of > overhead due to either: > - calling Python functions (or any other Python instruction) for every \n > character in the input. The amount of executed Python instructions is O(n) > here. > - Copying string data into new strings. > The fastest solution I've been using for some time, does this (simplified): > `accumulate(chain([0], map(len, text.splitlines(True`. The performance > is great here, because the amount of Python instructions is O(1). > Everything is chained in C-code thanks to itertools. Because of that, it > can outperform the regex solution with a factor of ~2.5. (Regex isn't slow, > but iterating over the results is.) > The bad things about this solution is however: > - Very cumbersome syntax. > - We call `splitlines()` which internally allocates a huge amount of > strings, only to use their lengths. That is still much more overhead then a > simple for-loop in C would be. > Performance matters here, because for these kind of problems, the list of > integers that gets produced is typically used as an index to quickly find > character offsets in the original string, depending on which line is > displayed/processed. The bisect library helps too to quickly convert any > index position of that string into a line number. The point is, that for > big inputs, the amount of Python instructions executed is not O(n), but > O(1). Of course, some of the C code remains O(n). > So, my ask here. > Would it make sense to add a `line_offsets()` method to `str`? > Or even `character_offsets(character)` if we want to do that for any > character? > Or `indexes(...)/indices(...)` if we would allow substrings of arbitrary > lengths? > Thanks, > Jonathan I presume there is some reason that `re.findall` did not work or was not optimal? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PO3V3XXHZL7CF4YCD635AF57OYG2RORC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Bare wildcard in de-structuring to ignore remainder and stop iterating (restart)
Steven D'Aprano wrote: > Okay, I'm convinced. > If we need this feature (and I'm not convinced about that part), then it > makes sense to keep the star and write it as `spam, eggs, *... = items`. I thought about that, but to me, there are several reasons to not do that and to have the ellipsis mean multiple rather than prepending * for that: 1. In common usage outside of programming, the ellipsis means a continuation and not just a single additional thing. 2. Having `*...` mean any number of things implies that `...` means a single thing, and I don't think there is a reason to match 1 thing but not assign it to a variable. It is also already fine to repeat `_` in the left side expression. 3. I am guessing (though I could be wrong) that support for `*...` would be a bigger change and more complicated in the Python source code. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2YGDMCGY5NBMIO57F6M7K3HP6HRYKTWZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Bare wildcard in de-structuring to ignore remainder and stop iterating (restart)
Also in reply to Paul & Stephen, … Yes. I really like the idea of using the ellipsis in the expression on the left. It avoids any breaking changes, avoids adding new semantics to '*', and also reads quite well. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GAMCAQDMLKDDNMRITIJHWZEHKCRMZ5DE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Bare wildcard in de-structuring to ignore remainder and stop iterating (restart)
Steve Jorgensen wrote: > Restarting this with an improved title "Bare" vs "Raw", and I will try not to > digress so much in the new thread. > My suggestion is to allow a bare asterisk at the end of a desctructuring > expression to indicate that additional elements are to be ignored if present > and not iterated over if the rhs is being evaluated by iterating. > (first, second, *) = items > This provides a way of using destructuring from something that will be > processed by iterating and for which the number of items might be very large > and/or accessing of successive items is expensive. > As Paul Moore pointed out in the original thread, itertools.islice can be > used to limit the number of items iterated over. That's a nice solution, but > it required knowing or thinking of the solution, an additional import, and > repetition of the count of items to be destrucured at the outermost nesting > level on the lhs. > What are people's impressions of this idea. Is it valuable enough to pursue > writing a PEP? > If so, then what should I do in writing the PEP to make sure that it's > somewhat close to something that can potentially be accepted? Perhaps, there > is a guide for doing that? First, thanks very much for the thoughtful and helpful replies so far. Since my last message here, I have noticed a couple of issues with the suggestion. 1. In a function declaration, the bare "*" specifically expects to match nothing, and in this case, I am suggesting that it have no expectation. That's a bit of a cognitive dissonance. 2. The new structural pattern matching that was introduced in Python 3.10 introduces a very similar concept by using an underscore as a wildcard that matches and doesn't bind to anything. That leads me to want to change the proposal to say that we give the same meaning to "_" in ordinary destructuring that it has in structural pattern matching, and then, I believe that a final "*_" in the expression on the left would end up with exactly the same meaning that I originally proposed for the bare "*". Although that would be a breaking change, it is already conventional to use "_" as a variable name only when we specifically don't care what it contains following its assignment, so for any code to be affected by the change would be highly unusual. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2DJXQ22GN3ABWGT2VUTGIXEUMMA6XOLO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Bare wildcard in de-structuring to ignore remainder and stop iterating (restart)
Restarting this with an improved title "Bare" vs "Raw", and I will try not to digress so much in the new thread. My suggestion is to allow a bare asterisk at the end of a desctructuring expression to indicate that additional elements are to be ignored if present and not iterated over if the rhs is being evaluated by iterating. (first, second, *) = items This provides a way of using destructuring from something that will be processed by iterating and for which the number of items might be very large and/or accessing of successive items is expensive. As Paul Moore pointed out in the original thread, itertools.islice can be used to limit the number of items iterated over. That's a nice solution, but it required knowing or thinking of the solution, an additional import, and repetition of the count of items to be destrucured at the outermost nesting level on the lhs. What are people's impressions of this idea. Is it valuable enough to pursue writing a PEP? If so, then what should I do in writing the PEP to make sure that it's somewhat close to something that can potentially be accepted? Perhaps, there is a guide for doing that? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4DN7T3NZEAUPJBA2SNJ4YWM564QPVE5N/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!
Is there anything that I can do, as a random Python user to help move this to the next stage? I'm happy to go along with whatever the preponderance of responses here seem to think in terms of which syntax choice is best. Although I have a slight preference, all of the options seem decent to me. I am definitely in favor of having the PEP accepted and implemented. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5572SP7T2GR5PYIVTYN5VESHV5XJ2JA5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!
To clarify my statement about readability of the '@' prefix option… I think that its meaning is less clear if one doesn't already know what the syntax means. I think the code would be easier to skim, however, using that option after one does know its meaning. My favorite options are '@' or '?=' (tied), followed by ':=' followed by '=>'. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TDPKOPGWQ4ORRJDHWJMX5GMW2TQ5FI5B/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!
Ah and since previous parameters can be referenced, and `self` or `cls` is the first argument to any method, that is always available to default value expressions. Correct? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZB3FSVZH2JVRI6LAMK7WCUSITC4RYBUO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!
One thing was not clear to me from the current PEP 671 text. When that is used in a method, what is the closure for the expressions? Would/should assignments in the class definition be available or only global variables in the module and local variables in the function (if applicable) in which the class definition happens? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XQD56GF3W2L223HSSBOVMIWTKF2AERH6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: PEP 671 (late-bound arg defaults), next round of discussion!
I couldn't figure out the best place in the reply tree to post this, so replying to the OP, answering the questions, taking into account other discussion that has happened. > 1) If this feature existed in Python 3.11 exactly as described, would you use it? Definitely > 2) Independently: Is the syntactic distinction between "=" and "=>" a cognitive burden? No, but I feel there is some cognitive burden with the distinction between that and other arrow notations that we have now and will likely have later. 4) If "no" to question 1, is there some other spelling or other small change that WOULD mean you would use it? (Some examples in the PEP.) Technically this is not applicable since I would use it anyway, but… I would slightly prefer any one of the alternative syntaxes. At first, I was not liking the '@' prefix idea because the '@' is separated from the default expression that it is conceptually associated with. That option does have a strong redeeming aspect though, which is that I think it might be the easiest to read. 5) Do you know how to compile CPython from source, and would you be willing to try this out? Please? :) Sure. I don't think I need to try it to know that I would appreciate it though, unless I were to find that it is buggy or something. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Y6VHQZI5FDR25WUBFDF2NRRRPVSTT7RL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Null wildcard in de-structuring to ignore remainder and stop iterating
I had actually not thought about the question of what should happen when performing multiple index operations on the same iterator, and maybe that's a reason that the idea of adding index lookup using brackets is not as good as it first seems. The whole point of adding that would be to reduce the number of situations in which it matters whether you have a sequence, or and iterator. As soon as we consider what should happen for multiple index lookups on a single iterator, that concept breaks down. The next thing that makes me think of that's even farther afield from the initial topic of this thread would be to have some new function in the standard library that is similar to 'islice' but returns an array instead of a new iterator and performs optimally when given a list or tuple as an argument. Maybe it could be named something like 'gslice', short for "greedy slice". Hypothetical simplistic implementation: def gslice(source, start_or_stop=None, stop=None, step=None): if isinstance(source, collections.abc.Sequence): return source[slice(start_or_stop, stop, step)] elif isinstance(source, collections.abc.Iterable): return list(islice(start_or_stop, stop, step)) else: raise TypeError("'source' must be a sequence or iterable") ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RRSJ65RYDRJ2X4K235M4M4AYJSTQAINB/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Null wildcard in de-structuring to ignore remainder and stop iterating
My current thinking in response to that is that using islice is a decent solution except that it's not obvious. You have to jump outside of the thinking about the destructuring capability and consider what else could be used to help. Probably, first thing that _would_ come to mind from outside would be slicing with square brackets, but that would restrict the solution to only work with sequences and not other iterables and iterators as islice does. That brings up a tangential idea. Why not allow square-bracket indexing of generators instead of having to import and utilize islice for that? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZGMFS4Y56MDPQLEIKW6PQVW2WDHRSGZV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Add .except() and .only(), and .values_at(). instance methods to dict
I think these are an extremely common needs that are worth having standard methods for. If adding instance methods seems like a bad idea, then maybe add functions to the standard library that perform the same operations. m = {'a': 123, 'b': 456, 'c': 789} m.except(('a', 'c')) # {'b': 456} m.only(('b', 'c')) # {'b': 456, 'c': 789} m.values_at(('a', 'b')) # [123, 456] …or… from mappings import except, only, values_at m = {'a': 123, 'b': 456, 'c': 789} except(m, ('a', 'c')) # {'b': 456} only(m, ('b', 'c')) # {'b': 456, 'c': 789} values_at(m, ('a', 'b')) # [123, 456] ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SMHI3ABM4XLASYYDGSTY45BKHTM7QMK2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Null wildcard in de-structuring to ignore remainder and stop iterating
OK. That's not terrible. It is a redundancy though, having to re-state the count of variables that are to be de-structured into on the left. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/F3YHX7F3HGKFYAX7JH3LJNJRSDN2XOYE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Null wildcard in de-structuring to ignore remainder and stop iterating
I was using the reading of lines from a file as a contrived example. There are many other possible cases such as de-structuring from iterator such as `itertools.repeat()` with no `count` argument which will generate values endlessly. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YH722VXD32AX4MDIDOXVP64YVPNXTNQ6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Null wildcard in de-structuring to ignore remainder and stop iterating
A contrived use case: with open('document.txt', 'r') as io: (line1, line2, *) = io It is possible to kind of achieve the same result using `*_` except that would actually read all the lines from the file, even if we only want the first 2. …so I am suggesting that we use the bare `*` here to mean that we don't care whether there are additional items in the sequence, _and_ we want to stop iterating. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NURDVNQUMKDH7242FCQBBYIU7WSATTB6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Means of avoiding accidental import of/from an Implicit Namespace Package
More than once, I've had bugs that were hard to track down because I was accidentally using an implicit namespace without realizing it. The last time this happened, it was a typo, and my init file was named `_init__.py` instead of `__init__.py`. The init file imported from sub-modules, including 1 with a class that was supposed be be registered via an `__init_subclass__` callback that was not happening. I'm sure that implicit namespace packages are here to stay, and I imagine I will actually want to use them on purpose at some point, but it would be nice if we could come up with a straightforward way to avoid the accidental usages. One idea that comes to mind is to add a new built-in context manager within which the importing of a purely implicit namespace raises an exception. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V7QU3IDGKITJ3J4FL7G6YAFKIXM44IC2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Enhance flexibility of dataclass repr
I should add that… I did find it is already possible to define a dataclass field for a property that is implemented as `@property`-decorated function, but it's a bit of a hack. It only works if the property has a setter that succeeds, even it the attribute is supposed to be read-only or if it is not appropriate for it's setter to be called during initialization. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LHEBFGTOZRO4LD7N5JLQLPV46CE5NCNP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Enhance flexibility of dataclass repr
The desire for this came up for me in relation to a set of dataclasses used to define a tree structure where each item has a reference to its parent. Including the complete expansion of the parent (with its children and its parent with its children, etc.) is WAY too much information, but at the same time, I do want to at least identify the parent. Currently, the only way to get what I'm looking for is to write a custom `__repr__` from scratch. It would be great if there was at least 1 way to take advantage of the automatic repr and still customize its handling for specific fields. The first thing I thought of in that regard for my example was to add a `parent_name` property using `@property` and specify `repr=False` for `parent. The auto-generated repr is not aware of properties defined that way though. Maybe that could be solved by adding an argument named something like `descriptor=` to `field()` where a `True` value means that getting and setting happens through a separately defined descriptor (e.g. via`@property`) and should not be implemented automatically, even though it should be otherwise treated as a dataclass property. The second thought I had is to be able to customize `repr` for any field. One way to do that might be to allow `field()`'s `repr` argument to accept a method name string and/or a callable that accepts an instance of the class in addition to accepting `True` or `False`. I actually like the idea of having both of those capabilities. Opinions? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BE5I4MXLPBW3RUKSV5M35CEJRJHISKNW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
Andrew Barnert wrote: > On May 11, 2020, at 00:40, Steve Jorgensen ste...@stevej.name wrote: > > Proposal: > > Add a new function (possibly os.path.sanitizepart) to sanitize a value for > > use as a single component of a path. In the default case, the value must > > also not be a > > reference to the current or parent directory ("." or "..") and must not > > contain control > > characters. > If not: the result can contain the path separator, illegal characters that > aren’t > control characters, nonprinting characters that aren’t control characters, > and characters > whose bytes (in the filesystem’s encoding) are ASCII control characters? > And it can be a reserved name, or even something like C:; as long as it’s not > the Unix > . or ..? Are there non-printing characters outside of those in the Unicode general category of "C" that make sense to omit? There are combining characters and such that do not have glyphs but are visible in the sense that they modify the glyphs displayed for the characters that they combine with. Regarding names like "C:", you are absolutely right to point that out. When the platform is Windows, certainly, ":" should not be allowed, and perhaps colon should not be allowed at all. I'll need to research that a bit. This matters because if the path part is used without explicit "./" prefixed to it, then it will refer to a root path, so same problem as allowing a name starting with "/" in *NIX. That should be unconditionally disallowed in the case of WIN or GENERAL systems. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SDMTI5KQKWYZV3MOTFRS27M7RED56THZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
Andrew Barnert wrote: > On May 11, 2020, at 00:40, Steve Jorgensen ste...@stevej.name wrote: > > Proposal: > > Add a new function (possibly os.path.sanitizepart) to sanitize a value for > > use as a single component of a path. In the default case, the value must > > also not be a > > reference to the current or parent directory ("." or "..") and must not > > contain control > > characters. > > “Also” in addition to what? Are there other requirements enforced besides > > these > two that aren’t specified anywhere? Sorry that was not clear. In addition to ensuring that it it a single part, meaning that it contains no path separators. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LVJ62W42HQNNQOJXIKS7KLRIOY5IE7JT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
Steve Jorgensen wrote: > Based on responses to my previous proposal, I am convinced that it was > over-ambitious > and not appropriate for inclusion in the Python standard library, so starting > over with a > more narrowly scoped suggestion. > Proposal: > Add a new function (possibly os.path.sanitizepart) to sanitize a value for > use as a single component of a path. In the default case, the value must also > not be a > reference to the current or parent directory ("." or "..") and must not > contain control > characters. > When an invalid character is encountered, then ValueError will be raised > in the default case, or the character may be replaced or escaped. > When an invalid name is encountered, then ValueError will be raised in the > default case, or the first character may be replaced, escaped, or prefixed. > Control characters (those in the Unicode general category of "C") are treated > as invalid > by default. > After applying any transformations, if the result would still be invalid, > then an > exception is raised. > Proposed function signature: sanitizepart(name, replace=None, escape=None, > prefix=None, flags=0) > When replace is supplied, it is used as a replacement for any invalid > characters or for the first character of an invalid name. When prefix is not > also supplied, this is also used as the replacement for the first character > of the name if > it is invalid, not simply due to containing invalid characters. > When escape is supplied (typically "%") it is used as the escape character > in the same way that "%" is used in URL encoding. When a non-ASCII character > is escaped, > it is represented as a sequence of encoded bytes/octets. When prefix is not > also supplied, this is also used to escape the first character of the name if > it is > invalid, not simply due to containing invalid characters. > replace and escape are mutually exclusive. > When prefix is supplied (typically "_"), it is prepended the name if it is > invalid, not simply due to containing invalid characters. > Flags: > > path.PERMIT_RELATIVE (1): Permit relative path values ("." "..") > path.PERMIT_CTRL (2): Permit characters in the Unicode general category of > "C". Somewhere between the 1st and 2nd proposal, I lost track of the system-specificity issue. Even with this more focused proposal, there is the issue of different path separators on Windows vs *nix, so the function needs another argument for that. Presumably, it would have a default of `None` meaning to use the current platform and would have constants for `NIX`, `WIN`, and `GENERAL` where `WIN` and `GENERAL` behave the same, recognizing either "/" or "\" as a file separator character. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SRQJ2BZHYYVIPW7CGABLNCWLZMOMCZO3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part) 2nd try
Steve Jorgensen wrote: > When escape is supplied (typically "%") it is used as the escape character > in the same way that "%" is used in URL encoding. When a non-ASCII character > is escaped, > it is represented as a sequence of encoded bytes/octets. I neglected to say that the octet sequence would be for the UTF-8 representation of the non-ASCII character. This is consistent with ECMAScript's `encodeURI` (see https://www.ecma-international.org/ecma-262/5.1/#sec-15.1.3). Also, to clarify why this is needed, it is for when there are non-ASCII control characters such as \u2066 (Left-to-Right Isolate) in the given name value and control characters are not being allowed. Other non-ASCII Unicode characters are permitted, so this is not applicable to those. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/O6AIDG4BDFQUYYZJYVX24LSNHYHO5JFL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Sanitize filename (path part) 2nd try
Based on responses to my previous proposal, I am convinced that it was over-ambitious and not appropriate for inclusion in the Python standard library, so starting over with a more narrowly scoped suggestion. Proposal: Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for use as a single component of a path. In the default case, the value must also not be a reference to the current or parent directory ("." or "..") and must not contain control characters. When an invalid character is encountered, then `ValueError` will be raised in the default case, or the character may be replaced or escaped. When an invalid name is encountered, then `ValueError` will be raised in the default case, or the first character may be replaced, escaped, or prefixed. Control characters (those in the Unicode general category of "C") are treated as invalid by default. After applying any transformations, if the result would still be invalid, then an exception is raised. Proposed function signature: `sanitizepart(name, replace=None, escape=None, prefix=None, flags=0)` When `replace` is supplied, it is used as a replacement for any invalid characters or for the first character of an invalid name. When `prefix` is not also supplied, this is also used as the replacement for the first character of the name if it is invalid, not simply due to containing invalid characters. When `escape` is supplied (typically "%") it is used as the escape character in the same way that "%" is used in URL encoding. When a non-ASCII character is escaped, it is represented as a sequence of encoded bytes/octets. When `prefix` is not also supplied, this is also used to escape the first character of the name if it is invalid, not simply due to containing invalid characters. `replace` and `escape` are mutually exclusive. When `prefix` is supplied (typically "_"), it is prepended the name if it is invalid, not simply due to containing invalid characters. Flags: - path.PERMIT_RELATIVE (1): Permit relative path values ("." "..") - path.PERMIT_CTRL (2): Permit characters in the Unicode general category of "C". ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LRIKMG3G4I4YQNK6BTU7MICHT7X67MEF/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Stephen J. Turnbull wrote: > Steve Jorgensen writes: > > I'm thinking of this specifically in terms of > > sanitizing input, > > assuming that later usage of the value might or might not properly > > protect against potential vulnerabilities. This is also limited to > > the case where the value is supposed to be a single path referring > > to an entry within a single directory context. > > This sounds extremely specialized to me. For example, presumably > you're not referring to dotted module specifications in Python, but > those usually do map to filesystem paths in implementations, and I can > imagine vulnerabilities (the one on top of my head requires a fair > amount of Python ignorance and environmental serendipity, which sort > of proves my point about situation-specificity) using Python module > paths as mapped to filesystem paths. > ISTM that it might be useful to provide a toolbox for scanning paths > with various validation operations, but that it's really up to > applications to decide which operations to use and what parameters > (eg, evil code point set, bytes vs code points vs code units vs > characters), and so on. PyPI seems ideal for that, until it matures > more than a discussion on the mailing lists can provide. > Steve (T) …so maybe it makes sense to have only the more specific sanitization in the standard library, then. In the POSIX case, I think that means just blocking "/" characters and "." or ".." values. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EVSXG4ZPE5OXNV3NCHPIU5YKAJRMM3NF/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Dan Sommers wrote: > I know what sanitize means (in English and in the technical sense I > believe you intend here), but can you provide some context and actual > use cases? > Sanitize on input so that your application code doesn't "accidentally" > spit out the contents of /etc/shadow? Sanitize on output so that your > code doesn't produce syntactically broken links in an HTML document or > weird results in an xterm? Sanitize in both directions for safe round > tripping to a database server? I'm thinking of this specifically in terms of sanitizing input, assuming that later usage of the value might or might not properly protect against potential vulnerabilities. This is also limited to the case where the value is supposed to be a single path referring to an entry within a single directory context. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UYMWQOXF26M2O52JZJJAJ76MI2NYKTNC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Dan Sommers wrote: > On Sun, 10 May 2020 00:34:43 - > "Steve Jorgensen" ste...@stevej.name wrote: > > I believe the Python standard library should include > > a means of > > sanitizing a filesystem entry, and this should not be something > > requiring a 3rd party package. > > I'm not disagreeing. > > What I am envisioning is a function (presumably in > > os.path with a signature roughly like > > {{{ > > sanitizepart(name, permissive=False, mode=ESCAPE, system=None) > > }}} > > When permissive is False, characters that are generally > > unsafe are > > rejected. When permissive is True, only path separator > > characters > > are rejected. Generally unsafe characters besides path separators > > would include things like a leading ".", any non-printing character, > > any wildcard, piping and redirection characters, etc. > > Okay, now I'm disagreeing. ;-) > I know what sanitize means (in English and in the technical sense I > believe you intend here), but can you provide some context and actual > use cases? > Sanitize on input so that your application code doesn't "accidentally" > spit out the contents of /etc/shadow? Sanitize on output so that your > code doesn't produce syntactically broken links in an HTML document or > weird results in an xterm? Sanitize in both directions for safe round > tripping to a database server? All of those use cases potentially > require separate handling, especially in terms of quoting and escaping. > For another example, suppose I'm writing a command line utility on a > POSIX system to compute a hash of the contents of a file. There's > nothing wrong with ".profile" as a file name. Why are you rejecting > leading "." characters? What about leading "-"s, or embedded "|"s? > Yes, certain shells and shell commands can make them "difficult" to deal > with in one way or another, but they're not "generally unsafe." > A very, very, very long time ago, we wrote some software for a customer > who liked to "editing" our data files to make minor corrections instead > of using our software. Our solution was to use "illegal" filenames that > the shell rejected, but that an application could access directly > anyway. I guess the point is that "sanitize" can mean different things > to different parts of a system. > Dan I totally get what you're saying. For the sake of simplicity, I thought that the 2 permissiveness options should be one that only prevents path traversal and one that is extremely conservative, omitting characters that are often safe and appropriate but may be unsafe in some cases. In regard to dot files, those can be safe in some cases, but unsafe in others — writing to configuration files that will be read by shell helpers or editors, for instance. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QQ2FO6ARZD4WM45OPYGBXEGXYQO72PRY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Steve Jorgensen wrote: > Steve Jorgensen wrote: > > I believe the Python standard library should include > > a means of sanitizing a filesystem > > entry, and this should not be something requiring a 3rd party package. > > One of reasons I think this should be in the standard lib is because that > > provides a > > common, simple means for code reviewers and static analysis services such > > as Veracode to > > recognize that a value is sanitized in an accepted manner. > > What I am envisioning is a function (presumably in os.path with a > > signature roughly like > > {{{ > > sanitizepart(name, permissive=False, mode=ESCAPE, system=None) > > }}} > > When permissive is False, characters that are generally > > unsafe are rejected. When permissive is True, only path > > separator characters are rejected. Generally unsafe characters besides path > > separators > > would include things like a leading ".", any non-printing character, any > > wildcard, piping > > and redirection characters, etc. > > The mode argument indicates what to do with unacceptable characters. > > Escape them (ESCAPE), omit them (OMIT) or raise an exception > > (RAISE). This could also double as an escape character argument when a > > string > > is given. The default escape character should probably be "%" (same as URL > > encoding). > > The system argument accepts a combination of bit flags indicating what > > operating system's rules to apply, or None meaning to use rules for the > > current platform. Systems would probably include SYS_POSIX, > > SYS_WIN, and SYS_MISC where miscellaneous means to enforce rules > > for all commonly used systems. One example of a distinction is that on a > > POSIX system, > > backslash characters are not path separators, but on Windows, both forward > > and backward > > slashes are path separators. > > {{{ > > from os import path > > from os.path import sanitizepart > > print(repr( > > os.path.sanitizepart('/ABC\QRS%', system=path.SYS_WIN)) > > # => '%2fABC%5cQRS%%' > > os.path.sanitizepart('/ABC\QRS%', True, mode=path.STRIP, > > system=path.SYS_POSIX)) > > # => 'ABC\QRS%' > > os.path.sanitizepart('../AB*\x01\n', system=path.SYS_POSIX)) > > # => '%2e.%2fABC%26CD%2a%01%10' > > os.path.sanitizepart('../AB*\x01\n', True, system=path.SYS_POSIX)) > > # => '..%2eAB*\x01\n' > > }}} > > Existing work: > https://pypi.org/project/pathvalidate/#sanitize-a-filename More existing work: * https://pypi.org/project/sanitize-filename/ * http://detox.sourceforge.net/ * https://sourceforge.net/p/glindra/news/2005/08/glindra-rename--lower--portable/ ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ITEHIWIFNGM5WOMOC5UAHKQVMLVIBR6Z/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Steve Jorgensen wrote: > Steve Jorgensen wrote: > > Andrew Barnert wrote: > > On May 9, 2020, at 17:35, Steve Jorgensen > > ste...@stevej.name wrote: > > I believe the Python standard library should > > include > > a means of sanitizing a filesystem entry, and this should not be something > > requiring a > > 3rd > > party package. > > One of reasons I think this should be in the standard lib is because that > > provides a > > common, simple means for code reviewers and static analysis services such > > as Veracode to > > recognize that a value is sanitized in an accepted manner. > > This does seem like a good idea. People who do this themselves get it wrong > > all > > the time, occasionally with disastrous consequences, so if Python can solve > > that, that > > would be great. > > But, at least historically, this has been more complicated than what you’re > > suggesting > > here. For example, don’t you have to catch things like directories named > > “Con” or files > > whose 8.3 representation has “CON” as the 8 part? I don’t think you can > > hang an entire > > Windows system by abusing those anymore, but you can still produce > > filenames that some > > APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, > > mingw/native > > shells, Python itself…) can’t access (or can only access if the user > > manually specified a > > .\ absolute path, or whatever). > > Yes. I am aware of some of the unsafe names in DOS and older Windows. As I > > mentioned in my other reply, there is a distinction between the ones that > > are merely > > invalid and those that are actually unsafe. In researching existing Linux > > tools just now, > > I was reminded that a leading dash is frequently unsafe because many tools > > will treat an > > argument starting with dash as an option argument. > > Is there an established algorithm/rule that lots of > > people in the industry trust that > > Python can just reference, instead of having to research or invent it? > > Because otherwise, > > we run the risk of making things worse instead of better. > > An excellent point! I just started digging into that and found references to > > detox and Glindra. Neither of those seems to be well maintained though. The > > documentation > > pages for Glindra no longer exist and detox is not in standard package > > repositories for > > CentOS later than 6 (and only in EPEL for that. Still digging. > > Extremely apropos to the question of what charters might be problematic > and/or unsafe: https://dwheeler.com/essays/fixing-unix-linux-filenames.html That article links to another by the same author that is specific to vulnerabilities caused by file names. https://dwheeler.com/secure-programs/Secure-Programs-HOWTO/file-names.html ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FDZOXS2BNZHJ4XAG7WU7BO3AA7KF6WWK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Steve Jorgensen wrote: > Andrew Barnert wrote: > > On May 9, 2020, at 17:35, Steve Jorgensen > > ste...@stevej.name wrote: > > I believe the Python standard library should > > include > > a means of sanitizing a filesystem entry, and this should not be something > > requiring a > > 3rd > > party package. > > One of reasons I think this should be in the standard lib is because that > > provides a > > common, simple means for code reviewers and static analysis services such > > as Veracode to > > recognize that a value is sanitized in an accepted manner. > > This does seem like a good idea. People who do this themselves get it wrong > > all > > the time, occasionally with disastrous consequences, so if Python can solve > > that, that > > would be great. > > But, at least historically, this has been more complicated than what you’re > > suggesting > > here. For example, don’t you have to catch things like directories named > > “Con” or files > > whose 8.3 representation has “CON” as the 8 part? I don’t think you can > > hang an entire > > Windows system by abusing those anymore, but you can still produce > > filenames that some > > APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, > > mingw/native > > shells, Python itself…) can’t access (or can only access if the user > > manually specified a > > .\ absolute path, or whatever). > > Yes. I am aware of some of the unsafe names in DOS and older Windows. As I > mentioned in my other reply, there is a distinction between the ones that are > merely > invalid and those that are actually unsafe. In researching existing Linux > tools just now, > I was reminded that a leading dash is frequently unsafe because many tools > will treat an > argument starting with dash as an option argument. > > Is there an established algorithm/rule that lots of > > people in the industry trust that > > Python can just reference, instead of having to research or invent it? > > Because otherwise, > > we run the risk of making things worse instead of better. > > An excellent point! I just started digging into that and found references to > detox and Glindra. Neither of those seems to be well maintained though. The > documentation > pages for Glindra no longer exist and detox is not in standard package > repositories for > CentOS later than 6 (and only in EPEL for that. Still digging. Extremely apropos to the question of what charters might be problematic and/or unsafe: https://dwheeler.com/essays/fixing-unix-linux-filenames.html ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EDJQA7SDUWEHJ53GYXIGX2HPTU3JEM6X/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Andrew Barnert wrote: > On May 9, 2020, at 17:35, Steve Jorgensen ste...@stevej.name wrote: > > I believe the Python standard library should include > > a means of sanitizing a filesystem entry, and this should not be something > > requiring a 3rd > > party package. > > One of reasons I think this should be in the standard lib is because that > > provides a > > common, simple means for code reviewers and static analysis services such > > as Veracode to > > recognize that a value is sanitized in an accepted manner. > > This does seem like a good idea. People who do this themselves get it wrong > > all > the time, occasionally with disastrous consequences, so if Python can solve > that, that > would be great. > But, at least historically, this has been more complicated than what you’re > suggesting > here. For example, don’t you have to catch things like directories named > “Con” or files > whose 8.3 representation has “CON” as the 8 part? I don’t think you can hang > an entire > Windows system by abusing those anymore, but you can still produce filenames > that some > APIs, and some tools (possibly including Explorer, cmd, powershell, Cygwin, > mingw/native > shells, Python itself…) can’t access (or can only access if the user manually > specified a > \.\ absolute path, or whatever). Yes. I am aware of some of the unsafe names in DOS and older Windows. As I mentioned in my other reply, there is a distinction between the ones that are merely invalid and those that are actually unsafe. In researching existing Linux tools just now, I was reminded that a leading dash is frequently unsafe because many tools will treat an argument starting with dash as an option argument. > Is there an established algorithm/rule that lots of people in the industry > trust that > Python can just reference, instead of having to research or invent it? > Because otherwise, > we run the risk of making things worse instead of better. An excellent point! I just started digging into that and found references to detox and Glindra. Neither of those seems to be well maintained though. The documentation pages for Glindra no longer exist and detox is not in standard package repositories for CentOS later than 6 (and only in EPEL for that. Still digging. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2LLQWDJJFDM7QJHLMUE73VNJ2T2FA2VM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Responding to points individually to avoid confusing multi-topic threads. :) Andrew Barnert wrote: < snip > > > When permissive is False, > > characters that are generally unsafe are rejected. When permissive is > > True, only path separator characters are rejected. Generally unsafe > > characters besides path separators would include things like a leading ".", > > any > > non-printing character, any wildcard, piping and redirection characters, > > etc. > > I think neither of these is what I’d usually want. > I never want to sanitize just pathsep characters without sanitizing all > illegal > characters. > I do often want to sanitize all illegal characters (just \0 and the path sep > on POSIX, > a larger set that I don’t know by heart on Windows). Sanitization and validation are not the same thing though. \0 is invalid and will result in an error when passed to a function that attempts to use it to reference a file, so allowing that character to pass through sanitization doesn't constitute an exploitable vulnerability. Having said that, it's usually friendlier to fail sooner rather than later, so it maybe it actually does make sense for sanitization to fail for illegal characters as well as for valid, unsafe characters. Hmm. I just realized that "..." and (to a lesser extent) "." are valid path parts but are nevertheless usually not safe to allow. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/V6EH7JSEKJTT57HHQU3CCQOYE3E7I2G3/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Sanitize filename (path part)
Steve Jorgensen wrote: > I believe the Python standard library should include a means of sanitizing a > filesystem > entry, and this should not be something requiring a 3rd party package. > One of reasons I think this should be in the standard lib is because that > provides a > common, simple means for code reviewers and static analysis services such as > Veracode to > recognize that a value is sanitized in an accepted manner. > What I am envisioning is a function (presumably in os.path with a > signature roughly like > {{{ > sanitizepart(name, permissive=False, mode=ESCAPE, system=None) > }}} > When permissive is False, characters that are generally > unsafe are rejected. When permissive is True, only path > separator characters are rejected. Generally unsafe characters besides path > separators > would include things like a leading ".", any non-printing character, any > wildcard, piping > and redirection characters, etc. > The mode argument indicates what to do with unacceptable characters. > Escape them (ESCAPE), omit them (OMIT) or raise an exception > (RAISE). This could also double as an escape character argument when a string > is given. The default escape character should probably be "%" (same as URL > encoding). > The system argument accepts a combination of bit flags indicating what > operating system's rules to apply, or None meaning to use rules for the > current platform. Systems would probably include SYS_POSIX, > SYS_WIN, and SYS_MISC where miscellaneous means to enforce rules > for all commonly used systems. One example of a distinction is that on a > POSIX system, > backslash characters are not path separators, but on Windows, both forward > and backward > slashes are path separators. > {{{ > from os import path > from os.path import sanitizepart > print(repr( > os.path.sanitizepart('/ABC\QRS%', system=path.SYS_WIN)) > # => '%2fABC%5cQRS%%' > os.path.sanitizepart('/ABC\\QRS%', True, mode=path.STRIP, > system=path.SYS_POSIX)) > > # => 'ABC\QRS%' > os.path.sanitizepart('../AB*\x01\n', system=path.SYS_POSIX)) > > # => '%2e.%2fABC%26CD%2a%01%10' > os.path.sanitizepart('../AB*\x01\n', True, system=path.SYS_POSIX)) > > # => '..%2eAB*\x01\n' > }}} Existing work: https://pypi.org/project/pathvalidate/#sanitize-a-filename ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FI2V2EZGLSYB3AAV5V5RNEOFJQWQE45S/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Sanitize filename (path part)
I believe the Python standard library should include a means of sanitizing a filesystem entry, and this should not be something requiring a 3rd party package. One of reasons I think this should be in the standard lib is because that provides a common, simple means for code reviewers and static analysis services such as Veracode to recognize that a value is sanitized in an accepted manner. What I am envisioning is a function (presumably in `os.path` with a signature roughly like {{{ sanitizepart(name, permissive=False, mode=ESCAPE, system=None) }}} When `permissive` is `False`, characters that are generally unsafe are rejected. When `permissive` is `True`, only path separator characters are rejected. Generally unsafe characters besides path separators would include things like a leading ".", any non-printing character, any wildcard, piping and redirection characters, etc. The `mode` argument indicates what to do with unacceptable characters. Escape them (`ESCAPE`), omit them (`OMIT`) or raise an exception (`RAISE`). This could also double as an escape character argument when a string is given. The default escape character should probably be "%" (same as URL encoding). The `system` argument accepts a combination of bit flags indicating what operating system's rules to apply, or `None` meaning to use rules for the current platform. Systems would probably include `SYS_POSIX`, `SYS_WIN`, and `SYS_MISC` where miscellaneous means to enforce rules for all commonly used systems. One example of a distinction is that on a POSIX system, backslash characters are not path separators, but on Windows, both forward and backward slashes are path separators. {{{ from os import path from os.path import sanitizepart print(repr( os.path.sanitizepart('/ABC\\QRS%', system=path.SYS_WIN)) # => '%2fABC%5cQRS%%' os.path.sanitizepart('/ABC\\QRS%', True, mode=path.STRIP, system=path.SYS_POSIX)) # => 'ABC\\QRS%' os.path.sanitizepart('../AB*\x01\n', system=path.SYS_POSIX)) # => '%2e.%2fABC%26CD%2a%01%10' os.path.sanitizepart('../AB*\x01\n', True, system=path.SYS_POSIX)) # => '..%2eAB*\x01\n' }}} ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SQH4LPERFLKBLXPDUOVJMV24JBCBUCYO/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Instance method to test equivalence between set and iterable
Steven D'Aprano wrote: > On Mon, Mar 23, 2020 at 12:03:50AM -0000, Steve Jorgensen wrote: > > Every set is a superset of itself and a subset of > > itself. A set may > > not be a "formal" subset or a "formal" superset of itself. issubset > > and issuperset refer to standard subsets and supersets, not formal > > subsets and supersets. > > Sorry, I don't understand your terminology "formal" and "standard". I > think you might mean "proper" rather than formal? But I don't know what > you mean by "standard". Right. I meant "proper". Not "formal". By "standard", I simply mean without the "proper" qualifier. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GGAKYH5HFZHIPTOIXJA64MY2W7BAIZMQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Instance method to test equivalence between set and iterable
Paul Moore wrote: > On Sun, 22 Mar 2020 at 20:01, Steve Jorgensen ste...@stevej.name wrote: > > > > Currently, the issubset and > > issuperset methods of set objects accept arbitrary iterables as arguments. > > An > > iterable that is both a subset and superset is, in a sense, "equal" to the > > set. It would > > be inappropriate for == to return True for such a comparison, > > however, since that would break the Hashable contract. > > Should sets have an additional method, something like like(other), > > issimilar(other), or isequivalent(other), that returns > > True for any iterable that contains the all of the items in the set and no > > items that are not in the set? It would therefore be true in the same cases > > where > > = set(other) or .issubset(other) and > > .issuperset(other) is true. > > What is the practical use case for this? It seems like it would be a > pretty rare need, at best. > Paul Basically, it is for a sense of completeness. It feels weird that there is a way to check whether an iterable is a subset of a set or a superset of a set but no way to directly ask whether it is equivalent to the set. Even though the need for it might not be common, I think that the collection of methods makes more sense if a method like this is present. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MRCHHRVCXEUAB3HBV4WRMZ56O3HUJQYL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Instance method to test equivalence between set and iterable
Steven D'Aprano wrote: > On Sun, Mar 22, 2020 at 07:59:59PM -0000, Steve Jorgensen wrote: > > Currently, the issubset and > > issuperset methods of set objects > > accept arbitrary iterables as arguments. An iterable that is both a > > subset and superset is, in a sense, "equal" to the set. It would be > > inappropriate for == to return True for such a comparison, > > however, since that would break the Hashable contract. > > I think the "arbitrary iterables" part is a distraction. We are > fundamentally talking about a comparison on sets, even if Python relaxes > the requirements and also allows one operand to be a arbitrary iterable. > I don't believe that a set A can be both a superset and subset of > another set B at the same time. On a Venn Diagram, that would require A > to be both completely surrounded by B and B to be completely surrounded > by A at the same time, which is impossible. > I think you might be talking about sets which partially overlap: > A = {1, 2, 3, 4} > B = {2, 3, 4, 5} Every set is a superset of itself and a subset of itself. A set may not be a "formal" subset or a "formal" superset of itself. `issubset` and `issuperset` refer to standard subsets and supersets, not formal subsets and supersets. In Python, you can trivially check that… ``` In [1]: {1, 2, 3}.issubset({1, 2, 3}) Out[1]: True In [2]: {1, 2, 3}.issuperset({1, 2, 3}) Out[2]: True In [3]: {1, 2, 3}.issubset((1, 2, 3)) Out[3]: True In [4]: {1, 2, 3}.issuperset((1, 2, 3)) Out[4]: True ``` ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QRI7LQAR7TZXSWOVYY5KLS52HK2GU7IK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Instance method to test equivalence between set and iterable
Bar Harel wrote: > Hey Steve, > How about set.symmetric_difference()? > Does it not do what you want? > Best regards, > Bar Harel > On Sun, Mar 22, 2020, 10:03 PM Steve Jorgensen ste...@stevej.name wrote: > > Currently, the issubset and > > issuperset methods of set objects accept > > arbitrary iterables as arguments. An iterable that is both a subset and > > superset is, in a sense, "equal" to the set. It would be inappropriate for > > == to return True for such a comparison, however, since that > > would > > break the Hashable contract. > > Should sets have an additional method, something like like(other), > > issimilar(other), or isequivalent(other), that returns > > True for any > > iterable that contains the all of the items in the set and no items that > > are not in the set? It would therefore be true in the same cases where > > = set(other) or .issubset(other) and > > .issuperset(other) > > is true. > > > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/ULQQ7T... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > Indirectly, it does, but that returns a set, not a `bool`. It would also, therefore, do more work than necessary to determine the result in many cases. A python implementation for what I'm talking about would be something like the following. ``` def like(self, other): found = set() for item in other: if item not in self: return False found.add(item) return len(found) == len(self) ``` ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XURB3B3RVM23ECR7BZZFFW7ISLLR63NQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Instance method to test equivalence between set and iterable
Currently, the `issubset` and `issuperset` methods of set objects accept arbitrary iterables as arguments. An iterable that is both a subset and superset is, in a sense, "equal" to the set. It would be inappropriate for `==` to return `True` for such a comparison, however, since that would break the `Hashable` contract. Should sets have an additional method, something like `like(other)`, `issimilar(other)`, or `isequivalent(other)`, that returns `True` for any iterable that contains the all of the items in the set and no items that are not in the set? It would therefore be true in the same cases where ` = set(other)` or `.issubset(other) and .issuperset(other)` is true. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ULQQ7TZBPQN3RAGKIP52XHFD6LR4HIB4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Formalized pretty & encoding-aware object representation (was dunder methods for...)
Steve Jorgensen wrote: > Based on the conversations stemming from my previous post, it is clear that > the topic > was too implementation-specific. It is not clear whether dunder methods are > an appropriate > component of the solution (they might or might not be). > Also, it presumably makes sense to start by looking at prior art rather than > inventing > from scratch. > Quotes from previous thread regarding prior art to look at: > Jonathan Fine wrote: > > > Here's some comments on the state of the art. In > > addition to > > https://docs.python.org/3/library/pprint.html > > there's also > > https://docs.python.org/3/library/reprlib.html > > and > > https://docs.python.org/3/library/json.html > > I expect that these three modules have some overlap in purpose and design > > (but probably not in code). > > And if you're brave, there's also > > https://docs.python.org/3/library/pickle.html > > and > > https://github.com/psf/black > > Time to declare a special interest. I'm a long-time user and great fan of > > TeX / LaTeX. And some nice way of pretty-printing Python objects using TeX > > notation could be useful. > > And also related is Geoffrey French's Larch environment for editing Python, > > which has a pretty-printing component. > > http://www.britefury.com/larch_site/ > > with best wishes > > Jonathan > > Alex Hall wrote: > > Might be helpful to look at https://github.com/tommikaikkonen/prettyprinter > > and https://github.com/wolever/pprintpp > > > Angus Hollands wrote: > > Has anyone mentioned the IPython pretty printer yet? I'm late to the > > conversation unfortunately, so apologies if someone else already raised it. > > https://ipython.readthedocs.io/en/stable/api/generated/IPython.lib.pretty.html#IPython.lib.pretty.pretty ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3WCI2E3BL4VBJ6W33PWNZLR25YUW3662/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Syntax for loop invariants
haael wrote: > Python has more and more optional tools for formal correctness cheking. > Why not loop invariants? > Loop invariant is such a statement that if it was true before the loop > iteration, it will also be true after the iteration. It can be > implemented as an assertion of an implication. > now_value = False > while running_condition(...): > prev_value = now_value > now_value = invariant_condition(...) > assert now_value if prev_value else True > > Here for ellipsis we can substitute any values and variables. > I propose the new syntax: > while running_condition(...): > invariant invariant_condition(...) > > The keyword 'invariant' is allowed only inside a loop. The interpreter > will create a separate boolean variable holding the truth value of each > invariant. On the loop entry, the value is reset to false. When the > 'invariant' statement is encountered, the interpreter will evaluate the > expression, test the implication 'prev_value -> now_value' and update > the value. If the implication is not met, an exception will be thrown > 'InvariantError' which is a subclass of 'AssertionError'. > Like assertions, invariants will be checked only in debug mode. > I am developing a library for formal proofs and such a feature would be > handy. > haael If something like this would be appropriate to have, then maybe it would be more appropriate to have a more generic-purpose DbC-like capability that could be used to check various kinds of pre/post conditions around various kinds of code construct. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TEBTN2W4TNZUCELN3ZBORZEYWSYI6XHK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Formalized pretty & encoding-aware object representation (was dunder methods for...)
Oops. Somehow this subject was posted twice. Please ignore this thread & follow the other thread with the same subject line. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NBPCLSFPDZIK2SGDUDK7CHHMHXROD7X5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Formalized pretty & encoding-aware object representation (was dunder methods for...)
Steve Jorgensen wrote: > Based on the conversations stemming from my previous post, it is clear that > the topic > was too implementation-specific. It is not clear whether dunder methods are > an appropriate > component of the solution (they might or might not be). > Also, it presumably makes sense to start by looking at prior art rather than > inventing > from scratch. There has been some argument regarding whether objects should say how to present themselves "prettily". I think a case can be made either way, but in either case, it makes sense that it should be easy to override the representation for an object type without subclassing or monkey-patching it. Also, it might make sense not to clutter up the dunder-method space for all kinds of objects with this kind of thing. Without using dunder methods, it could still be possible for any body of code to provide default special-representational rules for its objects by registering hooks. Also, as a hybrid-approach, it could be that the defaults for representation are determined first by looking at a default registry and then falling back to dunder methods if present. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EMXMEPFSXTUMFGY2LN5UHWCJYSVBKEEK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Formalized pretty & encoding-aware object representation (was dunder methods for...)
Based on the conversations stemming from my previous post, it is clear that the topic was too implementation-specific. It is not clear whether dunder methods are an appropriate component of the solution (they might or might not be). This suggestion is to try to solve 2 inter-related but different issues, possibly through the same mechanism, 2 unrelated mechanisms, or partially overlapping mechanisms. Although the current `__str__` and `__repr__` concepts seem perfectly appropriate to me, I think there also is justification for a means of having standard pretty-informal (str-like) and pretty-formal (repr-like) representations for various types of object. In the informal case, it should be possible to pass information about a file object that it will be written to (especially encoding & possibly isatty()) to the representation code, and in the formal case, either the representation code should interact with the pretty-printer or it should be able to return data in a from that tells the pretty printer how to nest portions of the representation. It presumably makes sense to start by looking at prior art rather than inventing from scratch. Quotes from previous thread regarding prior art to look at: Jonathan Fine wrote: > Here's some comments on the state of the art. In addition to > https://docs.python.org/3/library/pprint.html > there's also > https://docs.python.org/3/library/reprlib.html > and > https://docs.python.org/3/library/json.html > I expect that these three modules have some overlap in purpose and design > (but probably not in code). > And if you're brave, there's also > https://docs.python.org/3/library/pickle.html > and > https://github.com/psf/black > Time to declare a special interest. I'm a long-time user and great fan of > TeX / LaTeX. And some nice way of pretty-printing Python objects using TeX > notation could be useful. > And also related is Geoffrey French's Larch environment for editing Python, > which has a pretty-printing component. > http://www.britefury.com/larch_site/ > with best wishes > Jonathan Alex Hall wrote: > Might be helpful to look at https://github.com/tommikaikkonen/prettyprinter > and https://github.com/wolever/pprintpp ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GLXVPG6UAOTEKDVCV362CTGB4EGYYWPP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Formalized pretty & encoding-aware object representation (was dunder methods for...)
Based on the conversations stemming from my previous post, it is clear that the topic was too implementation-specific. It is not clear whether dunder methods are an appropriate component of the solution (they might or might not be). Also, it presumably makes sense to start by looking at prior art rather than inventing from scratch. Quotes from previous thread regarding prior art to look at: Jonathan Fine wrote: > Here's some comments on the state of the art. In addition to > https://docs.python.org/3/library/pprint.html > there's also > https://docs.python.org/3/library/reprlib.html > and > https://docs.python.org/3/library/json.html > I expect that these three modules have some overlap in purpose and design > (but probably not in code). > And if you're brave, there's also > https://docs.python.org/3/library/pickle.html > and > https://github.com/psf/black > Time to declare a special interest. I'm a long-time user and great fan of > TeX / LaTeX. And some nice way of pretty-printing Python objects using TeX > notation could be useful. > And also related is Geoffrey French's Larch environment for editing Python, > which has a pretty-printing component. > http://www.britefury.com/larch_site/ > with best wishes > Jonathan Alex Hall wrote: > Might be helpful to look at https://github.com/tommikaikkonen/prettyprinter > and https://github.com/wolever/pprintpp ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MQVTUDPIX7LWTPMPSBAQLPCDZSPMBUEU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Steve Jorgensen wrote: > Steve Jorgensen wrote: > > Steve Jorgensen wrote: > > > > The problem I came up with trying to spike out > > my > > proposal last night is that there > > doesn't seem to be anyway to implement it without creating infinite > > recursion in the > > issublcass call. If I make Orderable a real or virtual subclass > > of ProtoOrderable and Orderable's __subclasshook__ > > or metaclass __subclasscheck__ (I tried both ways) tries to check whether > > C is a subclass of ProtoOrderable, then an infinite recursion > > occurs. > > It wasn't immediately obvious to me why that is the case, but when I > > thought about it > > deeply, I can see why that must happen. > > An alternative that I thought about previously but seems very smelly to me > > for several > > reasons is to have both Orderable and NonOrderable ABCs. In that > > case, what should be done to prevent a class from being both orderable and > > non-orderable > > or figure out which should take precedence in that case? > > As a meta-solution (wild-assed idea) what if metaclass registration could > > accept > > keyword arguments, similar to passing keyword arguments to a class > > definition? That way, > > a > > single ABC (ProtoOrderable or whatever better name) could be a real or > > virtual subclass that is explicitly orderable or non-orderable depending on > > orderable=. > > I have been unable to implement the class hierarchy that I proposed, and I > > think > > I've determined that it's just not a practical fit with how the virtual bas > > class > > mechanism works, so… > > Maybe just a single TotalOrdered or TotalOrderable ABC with a > > register_explicit_only method. The __subclasshook__ method would > > skip the rich comparison methods check and return NotImplemented for any > > class registered using register_explicit_only (or any of its true > > subclasses). > > The only weird edge case in the above is that is someone registers another > > ABC using > > TotalOrdered.register_explicit_only and uses that as a virtual base class of > > something else, the register_explicit_only registration will not apply to > > the > > virtual subclass. I'm thinking that's completely acceptable as a known > > limitation if > > documented? > > Code spike of that idea: > from abc import ABCMeta > from weakref import WeakSet > > > class TotallyOrderable(metaclass=ABCMeta): > _explicit_only_registry = WeakSet() > > @classmethod > def register_explicit_only(cls, C): > if cls is not TotallyOrderable: > raise NotImplementedError( > f"{cls.__name__} does not implement 'register_explicit_only'") > > cls._explicit_only_registry.add(C) > > @classmethod > def __subclasshook__(cls, C): > if cls is not TotallyOrderable: > return NotImplemented > > for B in C.__mro__: > if B in cls._explicit_only_registry: > return NotImplemented > > return cls._check_overrides_rich_comparison_methods(C) > > @classmethod > def _check_overrides_rich_comparison_methods(cls, C): > mro = C.__mro__ > for method in ('__lt__', '__le__', '__gt__', '__ge__'): > for B in mro: > if B is not object and method in B.__dict__: > if B.__dict__[method] is None: > return NotImplemented > break > else: > return NotImplemented > return True Naming question: Should an abstract base class for this concept be named `TotalOrderable`, `TotallyOrderable`, `TotalOrdered`, or `TotallyOrdered`? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IPVNBE6VQZJZPF5ZB7XLPCAIX47SBMIL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dunder methods for encoding & prettiness aware formal & informal representations
Alex Hall wrote: > Might be helpful to look at https://github.com/tommikaikkonen/prettyprinter > and https://github.com/wolever/pprintpp Right! Thx. :) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HHE6NKJ5C7HBYJO2ASHXMKYLVC6ZBVLE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dunder methods for encoding & prettiness aware formal & informal representations
Jonathan Fine wrote: > Hi Steve (for clarity Jorgensen) > Thank you for your good idea, and your enthusiasm. And I thank Guido, for > suggesting a good contribution this list can make. > Here's some comments on the state of the art. In addition to > https://docs.python.org/3/library/pprint.html > there's also > https://docs.python.org/3/library/reprlib.html > and > https://docs.python.org/3/library/json.html > I expect that these three modules have some overlap in purpose and design > (but probably not in code). > And if you're brave, there's also > https://docs.python.org/3/library/pickle.html > and > https://github.com/psf/black > Time to declare a special interest. I'm a long-time user and great fan of > TeX / LaTeX. And some nice way of pretty-printing Python objects using TeX > notation could be useful. > And also related is Geoffrey French's Larch environment for editing Python, > which has a pretty-printing component. > http://www.britefury.com/larch_site/ > with best wishes > Jonathan I feel kind of silly for jumping right to the idea of prototyping rather than looking for prior art. :) It clearly makes more sense to choose an existing popular library as a candidate starting point for promotion into the stdlib rather than starting from scratch. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OHMJBHPBLOUQ42E6J4YGOGNUBLWUFAH4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: dunder methods for encoding & prettiness aware formal & informal representations
Guido van Rossum wrote: > I think the idea you're looking for is an alternative for the pprint module > that allows classes to have formatting hooks that get passed in some > additional information (or perhaps a PrettyPrinter object) that can affect > the formatting. > This would seem to be an ideal thing to try to design and put on PyPI, > except it would be more effective if there was a standard, rather than > several competing such modules, with different APIs for the formatting > hooks. > So I encourage having a discussion (might as well be here) about the design > of the new PrettyPrinter API. I like it. :) I'll do some prototyping to see if I come up with any promising patterns. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RPW742V6X3UAXGNZ5GZVOFFZIKSO5MCL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] dunder methods for encoding & prettiness aware formal & informal representations
This is really an idea for an idea. I'm not sure what the ideal dunder method names or APIs should be. Encoding awareness: The informal (`str`) representations of `inf` and `-inf` are "inf" and "-inf", and that seems appropriate as a known-safe value, but if we're writing the representation to a stream, and the stream has a Unicode encoding, then those might prefer to represent themselves as "∞" and "-∞". If there were a dunder method for informal representation to which the destination stream was passed, then the object could decide how to represent itself based on the properties of the stream. Prettiness awareness: It would be nice if an object could have control of how it is represented when pretty-printed. If there is any way for that to be done now, it is not at all evident from the pprint module documentation. It would be nice if there were some method that, if implemented for the object, would be used to allow the object to tell the pretty printer to treat it is a composite with starting text, component objects, and ending text. Additional thoughts & open questions: Perhaps there should only be stream awareness for informal representation and prettiness awareness for formal representation (separate concepts and APIs) or perhaps both ideas are applicable to both kinds of representation. Is it better for a stream-aware representation method to return the value to be written to the stream or to directly append its representation to that stream? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OQPPJ7SNM5CZUI5RYT5R4Z6YZWMNNTZS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] More appropriate behavior for the NotImplemented object
I realize this is probably something that would be hard to change for compatibility reasons. Maybe someone can think of a way around that though? It seems to me that `not NotImplemented` should result in `NotImplemented` and attempting to convert it to `bool` should raise a `TypeError` exception. Take the following example: ``` def __lt__(self, other): return not self.__ge__(other): def __le__(self, other): return not self.__gt__(other): def __ge__(self, other): ``` Currently, this will not work because `NotImplemented` is truthy and `not NotImplemented` is `False`, so it is necessary to complicate the implementations of `__lt__` and `__le__` to specifically check whether the value returned from the complementary method returned `NotImplemented` or not. If the value of `not NotImplemented` was `NotImplemented` then the coding pattern above would simply work. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U7GOYMMMBQQPSD45JDNCSOO7VULDZTD6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Andrew Barnert wrote: > On Mar 5, 2020, at 11:05, Steve Jorgensen ste...@stevej.name wrote: > > Steve Jorgensen wrote: > > Steve Jorgensen wrote: > > > > The problem I came up with trying to spike out > > my > > proposal last night is that there > > doesn't seem to be anyway to implement it without creating infinite > > recursion in the > > issublcass call. > > Is this something we should be looking to add to the ABC mechanism in > general? > Would a way to “unregister” classes that would be implicitly accepted be > simpler than a > way to “register_explicit_only” classes so they skip the implicit test? > > If I make Orderable a real or virtual subclass > > of ProtoOrderable and Orderable's __subclasshook__ > > or metaclass __subclasscheck__ (I tried both ways) tries to check whether > > C is a subclass of ProtoOrderable, then an infinite recursion > > occurs. > > It wasn't immediately obvious to me why that is the case, but when I > > thought about it > > deeply, I can see why that must happen. > > An alternative that I thought about previously but seems very smelly to me > > for several > > reasons is to have both Orderable and NonOrderable ABCs. In that > > case, what should be done to prevent a class from being both orderable and > > non-orderable > > or figure out which should take precedence in that case? > > As a meta-solution (wild-assed idea) what if metaclass registration could > > accept > > keyword arguments, similar to passing keyword arguments to a class > > definition? That way, > > a > > single ABC (ProtoOrderable or whatever better name) could be a real or > > virtual subclass that is explicitly orderable or non-orderable depending on > > orderable=. > > I have been unable to implement the class hierarchy that I proposed, and I > > think > > I've determined that it's just not a practical fit with how the virtual bas > > class > > mechanism works, so… > > Maybe just a single TotalOrdered or TotalOrderable ABC with a > > register_explicit_only method. The __subclasshook__ method would > > skip the rich comparison methods check and return NotImplemented for any > > class registered using register_explicit_only (or any of its true > > subclasses). > > The only weird edge case in the above is that is someone registers another > > ABC using > > TotalOrdered.register_explicit_only and uses that as a virtual base class of > > something else, the register_explicit_only registration will not apply to > > the > > virtual subclass. I'm thinking that's completely acceptable as a known > > limitation if > > documented? > > Code spike of that idea: > > from abc import ABCMeta > > from weakref import WeakSet > > > > > > class TotallyOrderable(metaclass=ABCMeta): > >_explicit_only_registry = WeakSet() > > > >@classmethod > >def register_explicit_only(cls, C): > >if cls is not TotallyOrderable: > >raise NotImplementedError( > >f"{cls.__name__} does not implement > > 'register_explicit_only'") > > > >cls._explicit_only_registry.add(C) > > > >@classmethod > >def __subclasshook__(cls, C): > >if cls is not TotallyOrderable: > >return NotImplemented > > > >for B in C.__mro__: > >if B in cls._explicit_only_registry: > >return NotImplemented > > > >return cls._check_overrides_rich_comparison_methods(C) > > > >@classmethod > >def _check_overrides_rich_comparison_methods(cls, C): > >mro = C.__mro__ > >for method in ('__lt__', '__le__', '__gt__', '__ge__'): > >for B in mro: > >if B is not object and method in B.__dict__: > >if B.__dict__[method] is None: > >return NotImplemented > >break > >else: > >return NotImplemented > >return True > > > > > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/2OZBPQ... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > Maybe so because I found a limitation with my code spike. Calling `register_explicit_only` doesn't bust
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Steve Jorgensen wrote: > Steve Jorgensen wrote: > > > The problem I came up with trying to spike out my > > proposal last night is that there > > doesn't seem to be anyway to implement it without creating infinite > > recursion in the > > issublcass call. If I make Orderable a real or virtual subclass > > of ProtoOrderable and Orderable's __subclasshook__ > > or metaclass __subclasscheck__ (I tried both ways) tries to check whether > > C is a subclass of ProtoOrderable, then an infinite recursion > > occurs. > > It wasn't immediately obvious to me why that is the case, but when I > > thought about it > > deeply, I can see why that must happen. > > An alternative that I thought about previously but seems very smelly to me > > for several > > reasons is to have both Orderable and NonOrderable ABCs. In that > > case, what should be done to prevent a class from being both orderable and > > non-orderable > > or figure out which should take precedence in that case? > > As a meta-solution (wild-assed idea) what if metaclass registration could > > accept > > keyword arguments, similar to passing keyword arguments to a class > > definition? That way, > > a > > single ABC (ProtoOrderable or whatever better name) could be a real or > > virtual subclass that is explicitly orderable or non-orderable depending on > > orderable=. > > I have been unable to implement the class hierarchy that I proposed, and I > > think > I've determined that it's just not a practical fit with how the virtual bas > class > mechanism works, so… > Maybe just a single TotalOrdered or TotalOrderable ABC with a > register_explicit_only method. The __subclasshook__ method would > skip the rich comparison methods check and return NotImplemented for any > class registered using register_explicit_only (or any of its true > subclasses). > The only weird edge case in the above is that is someone registers another > ABC using > TotalOrdered.register_explicit_only and uses that as a virtual base class of > something else, the register_explicit_only registration will not apply to the > virtual subclass. I'm thinking that's completely acceptable as a known > limitation if > documented? Code spike of that idea: ``` from abc import ABCMeta from weakref import WeakSet class TotallyOrderable(metaclass=ABCMeta): _explicit_only_registry = WeakSet() @classmethod def register_explicit_only(cls, C): if cls is not TotallyOrderable: raise NotImplementedError( f"{cls.__name__} does not implement 'register_explicit_only'") cls._explicit_only_registry.add(C) @classmethod def __subclasshook__(cls, C): if cls is not TotallyOrderable: return NotImplemented for B in C.__mro__: if B in cls._explicit_only_registry: return NotImplemented return cls._check_overrides_rich_comparison_methods(C) @classmethod def _check_overrides_rich_comparison_methods(cls, C): mro = C.__mro__ for method in ('__lt__', '__le__', '__gt__', '__ge__'): for B in mro: if B is not object and method in B.__dict__: if B.__dict__[method] is None: return NotImplemented break else: return NotImplemented return True ``` ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2OZBPQPYIFFG2E6BS2EYLDJF2QP5FRTG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Steve Jorgensen wrote: > The problem I came up with trying to spike out my proposal last night is that > there > doesn't seem to be anyway to implement it without creating infinite recursion > in the > issublcass call. If I make Orderable a real or virtual subclass > of ProtoOrderable and Orderable's __subclasshook__ > or metaclass __subclasscheck__ (I tried both ways) tries to check whether > C is a subclass of ProtoOrderable, then an infinite recursion > occurs. > It wasn't immediately obvious to me why that is the case, but when I thought > about it > deeply, I can see why that must happen. > An alternative that I thought about previously but seems very smelly to me > for several > reasons is to have both Orderable and NonOrderable ABCs. In that > case, what should be done to prevent a class from being both orderable and > non-orderable > or figure out which should take precedence in that case? > As a meta-solution (wild-assed idea) what if metaclass registration could > accept > keyword arguments, similar to passing keyword arguments to a class > definition? That way, a > single ABC (ProtoOrderable or whatever better name) could be a real or > virtual subclass that is explicitly orderable or non-orderable depending on > orderable=. I have been unable to implement the class hierarchy that I proposed, and I think I've determined that it's just not a practical fit with how the virtual bas class mechanism works, so… Maybe just a single `TotalOrdered` or `TotalOrderable` ABC with a `register_explicit_only` method. The `__subclasshook__` method would skip the rich comparison methods check and return `NotImplemented` for any class registered using `register_explicit_only` (or any of its true subclasses). The only weird edge case in the above is that is someone registers another ABC using `TotalOrdered.register_explicit_only` and uses that as a virtual base class of something else, the `register_explicit_only` registration will not apply to the virtual subclass. I'm thinking that's completely acceptable as a known limitation if documented? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AF4O3PFQ7VNHCUVBWB3NENYNGPU74SVX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Andrew Barnert wrote: > On Mar 4, 2020, at 00:07, Steve Jorgensen ste...@stevej.name wrote: > > Taking one step back out of the realm of mathematical > > definition, however, the original idea was simply to distinguish what I now > > understand to > > be "totally ordered" types from other types, be they "partially ordered" or > > unordered — > > not even having a full complement of rich comparison operators or having > > all but using > > them in weirder ways than sets do. > > Is there any commonly used or even imaginable useful type that uses them in > weirder ways than set and float (which are both partially ordered) or > np.array (where they > aren’t even Boolean-values)? In particular, transitivity keeps coming up, but > all of those > examples are transitive (it’s never true that a being > true than a to > distinguish them, but if there aren’t, it doesn’t seem unreasonable for > PartiallyOrdered > to “wrongly” pick up hypothetical pathological types that no one will ever > write in > exchange for automatically being right about every actual type anyone uses. > After all, > Iterable is a virtual superclass of any type with __iter__, even if it > returns the number > 42 instead of an Iterator, and so on; technically every implicit ABC in > Python is “wrong” > like this, but in practice it doesn’t come up and implicit ABCs are very > useful. I see what you're saying. I guess what I was getting at is that for purposes of determining whether something is totally orderable or not, it doesn't matter what kind of not-totally-orderable the thing is — partially orderable (like sets), non-orderable (without full complement of operators), or some other weird thing that has the full compliment of operators. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WDB6UPXAMCJUMWNZBEJ2466JCBGU5PIH/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Stéfane Fermigier wrote: > On Wed, Mar 4, 2020 at 8:24 AM Steve Jorgensen ste...@stevej.name wrote: > > Chris Angelico wrote: > > On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen > > ste...@stevej.name wrote: > > > > https://en.wikipedia.org/wiki/Partially_ordered_set > > "Partially ordered" means you can compare pairs of elements and find > > which one comes first. "Totally ordered" means you can compare ANY > > pair of elements, and you'll always know which comes first. > > ChrisA > > Ah. Good to know. I don't think "Partially ordered" actually applies, > > then, because that still seems to imply that transitivity would apply to > > comparisons between any given pair of objects. Simply having > > implementations of all the rich comparison operators does not make that > > true, however, and in particular, that's not true for sets. > > Not quite: https://en.wikipedia.org/wiki/Partially_ordered_set#Examples > (see > example 2). > Or: > https://math.stackexchange.com/questions/1305004/what-is-meant-by-ordering-o... > S. Ah! That Wikipedia article is very helpful. I see that it is not necessary for all items in a partially ordered set to be comparable. Taking one step back out of the realm of mathematical definition, however, the original idea was simply to distinguish what I now understand to be "totally ordered" types from other types, be they "partially ordered" or unordered — not even having a full complement of rich comparison operators or having all but using them in weirder ways than sets do. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/S6VZ4DWZBL3NLBFZKJYPN5EE5OMRAF3V/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Steve Jorgensen wrote: > Chris Angelico wrote: > > On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen > > ste...@stevej.name wrote: > > > > https://en.wikipedia.org/wiki/Partially_ordered_set > > "Partially ordered" means you can compare pairs of elements and find > > which one comes first. "Totally ordered" means you can compare ANY > > pair of elements, and you'll always know which comes first. > > ChrisA > > Ah. Good to know. I don't think "Partially ordered" actually applies, then, > because that still seems to imply that transitivity would apply to > comparisons between any > given pair of objects. Simply having implementations of all the rich > comparison operators > does not make that true, however, and in particular, that's not true for sets. > If we consider just the sets {1, 2} and {1, 3}, … > In [1]: {1, 2} < {1, 3} > Out[1]: False > > In [2]: {1, 2} >= {1, 3} > Out[2]: False > > Neither is a subset of the other, so both of those tests return > False. Ah. Maybe I'm arguing against a different point than what you were making then. Just because sets are not partially ordered does not mean that "partially ordered" is not a useful distinction in addition to "totally ordered". In that case, maybe the hierarchy would be something like… * ProtoOrdered (or ProtoOrderable): Orderability is explicit and never inferred. Unordered unless also a subclass of PartiallyOrdered or TotallyOrdered. * * PartiallyOrdered * * * TotallyOrdered An class that does not directly or virtually subclass any of those but implements all the rich comparison operators would be treated as an inferred virtual subclass of `TotallyOrdered`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7FV27SQYFR6M66JHHYMFW7EDKHXNJ3MJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Chris Angelico wrote: > On Wed, Mar 4, 2020 at 6:04 PM Steve Jorgensen ste...@stevej.name wrote: > https://en.wikipedia.org/wiki/Partially_ordered_set > "Partially ordered" means you can compare pairs of elements and find > which one comes first. "Totally ordered" means you can compare ANY > pair of elements, and you'll always know which comes first. > ChrisA Ah. Good to know. I don't think "Partially ordered" actually applies, then, because that still seems to imply that transitivity would apply to comparisons between any given pair of objects. Simply having implementations of all the rich comparison operators does not make that true, however, and in particular, that's not true for sets. If we consider just the sets `{1, 2}` and `{1, 3}`, … ``` In [1]: {1, 2} < {1, 3} Out[1]: False In [2]: {1, 2} >= {1, 3} Out[2]: False ``` Neither is a subset of the other, so both of those tests return `False`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3IOKPBV6DIQCJ5FNLVSMP3M7HHJ2STO2/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Greg Ewing wrote: > On 4/03/20 7:42 am, Steve Jorgensen wrote: > > That's a much better term. Orderable and > > ProtoOrderable. > > I would suggest "TotallyOrdered" and "PartiallyOrdered". Possibly, but the reasoning is not obvious to me. Can you explain? I get that `TotallyOrdered` is consistent with https://docs.python.org/2/library/functools.html#functools.total_ordering, but I don't get the `PartialyOrdered` term. In case I was not sufficiently clear about my proposal (just making sure) the `Proto`… in my concept simply means that the determination of whether the class is orderable is explicit and not determined by whether the rich comparison methods are present. A class that has `ProtoOrderable` but not `Orderable` as an actual or virtual subclass is not orderable, but a class that is not a sublcass of either is assumed to be orderable if it implements all the rich comparison methods. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LU3UFEXBQZJS2TUZQFCPVFH7Q37I62E7/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Guido van Rossum wrote: > On Tue, Mar 3, 2020 at 10:43 AM Steve Jorgensen ste...@stevej.name wrote: > > Guido van Rossum wrote: > > I think it’s usually called Orderable. It’s a > > useful concept in static > > type > > checking too (e.g. mypy), where we’d use it as an upper bound for type > > variables, if we had it. I guess to exclude sets you’d have to introduce > > TotalOrderable. > > Right. That's a much better term. Orderable and > > ProtoOrderable. > > Or even PartialOrderable and Orderable. This would follow Rust's > PartialOrd > and Ord (https://doc.rust-lang.org/std/cmp/trait.PartialOrd.html > and > https://doc.rust-lang.org/std/cmp/trait.Ord.html). > But beware, IIRC there are pathological cases involving floats, (long) ints > and rounding where transitivity may be violated in Python (though I believe > only Tim Peters can produce an example :-). I'm honestly not sure that > that's enough to sink the idea. (If it were, NaN would be a bigger problem.) Yeah. Violations of transitivity are already breaking their contracts, so having a new way of expressing the contract has no affect on that. The problem I came up with trying to spike out my proposal last night is that there doesn't seem to be anyway to implement it without creating infinite recursion in the `issublcass` call. If I make `Orderable` a real or virtual subclass of `ProtoOrderable` and `Orderable`'s `__subclasshook__` or metaclass `__subclasscheck__` (I tried both ways) tries to check whether `C` is a subclass of `ProtoOrderable`, then an infinite recursion occurs. It wasn't immediately obvious to me why that is the case, but when I thought about it deeply, I can see why that must happen. An alternative that I thought about previously but seems very smelly to me for several reasons is to have both `Orderable` and `NonOrderable` ABCs. In that case, what should be done to prevent a class from being both orderable and non-orderable or figure out which should take precedence in that case? As a meta-solution (wild-assed idea) what if metaclass registration could accept keyword arguments, similar to passing keyword arguments to a class definition? That way, a single ABC (`ProtoOrderable` or whatever better name) could be a real or virtual subclass that is explicitly orderable or non-orderable depending on `orderable=`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BFFPWPZGNPJCT3KFFU6DJHI5RBG2NBYC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation
Guido van Rossum wrote: > I think it’s usually called Orderable. It’s a useful concept in static type > checking too (e.g. mypy), where we’d use it as an upper bound for type > variables, if we had it. I guess to exclude sets you’d have to introduce > TotalOrderable. > On Tue, Mar 3, 2020 at 04:03 Steve Jorgensen ste...@stevej.name wrote: > > I have encountered cases in which I would like to > > validate that an > > argument can be properly compared with other instances of its type. This is > > true of numbers, strings, dates, … but not for NoneClass, type, > > …. > > One way that I have tried to handle this is to check whether the object > > can be compared to itself using >, <, >=, > > and <= and that it is > > neither > or < itself and is both >= and > > <= itself. The most > > glaring example of why this is insufficient is the set type. A > > set > > object meets all of those criteria, but given any 2 instances, it is not > > true that if set a > b is False then a <= b > > is True. The operators > > are not acting as comparisons of relative magnitude in this case but as > > tests for superset/subset relations — which is fine and good but doesn't > > help with this situation. > > What I think would be helpful is to have a Magnitude abstract base class > > that is a subclass of ProtoMagnitude (or whatever better names anyone can > > imagine). > > The subclass hook for Magnitude would return True for any > > class with > > instance methods for all of the rich comparison methods, but it would skip > > that check and return False for any real or virtual subclass of > > ProtoMagnitude (overridable by registering as a Magnitude > > subclass). > > The, set type would then be registered as a virtual base class of > > ProtoMagnitude but not Magnitude so that issubclass(set, > > Magnitude) > > would return False. > > For performance optimization, the module that defines these ABCs would > > register the obviously appropriate built-in and standard-lib types with > > Magnitude: Number, str, list, > > tuple, date, … > > Why not have this be a separate distribution package? This concept is only > > reliable if all of the code that makes use of it shares a common > > implementation. It does no good to register a class as ProtoMagnitude, > > for instance, if an instance of that will passed to code in another library > > that is unaware of the ProtoMagnitude and Magnitude ABCs in the > > package, or maybe has its own independent system for attempting to > > accomplish the same goal. > > > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/7WC4SF... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --Guido (mobile) > > Right. That's a much better term. `Orderable` and `ProtoOrderable`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DIZPDIRF3254ZZZMCWSPEUOBLKC2MQZZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Magnitude and ProtoMagnitude ABCs — primarily for argument validation
I have encountered cases in which I would like to validate that an argument can be properly compared with other instances of its type. This is true of numbers, strings, dates, … but not for `NoneClass`, `type`, …. One way that I have tried to handle this is to check whether the object can be compared to itself using `>`, `<`, `>=`, and `<=` and that it is neither `>` or `<` itself and is both `>=` and `<=` itself. The most glaring example of why this is insufficient is the `set` type. A `set` object meets all of those criteria, but given any 2 instances, it is not true that if set `a > b` is `False` then `a <= b` is `True`. The operators are not acting as comparisons of relative magnitude in this case but as tests for superset/subset relations — which is fine and good but doesn't help with this situation. What I think would be helpful is to have a `Magnitude` abstract base class that is a subclass of `ProtoMagnitude` (or whatever better names anyone can imagine). The subclass hook for `Magnitude` would return `True` for any class with instance methods for all of the rich comparison methods, but it would skip that check and return `False` for any real or virtual subclass of `ProtoMagnitude` (overridable by registering as a `Magnitude` subclass). The, `set` type would then be registered as a virtual base class of `ProtoMagnitude` but not `Magnitude` so that `issubclass(set, Magnitude)` would return `False`. For performance optimization, the module that defines these ABCs would register the obviously appropriate built-in and standard-lib types with `Magnitude`: `Number`, `str`, `list`, `tuple`, `date`, … Why not have this be a separate distribution package? This concept is only reliable if all of the code that makes use of it shares a common implementation. It does no good to register a class as `ProtoMagnitude`, for instance, if an instance of that will passed to code in another library that is unaware of the `ProtoMagnitude` and `Magnitude` ABCs in the package, or maybe has its own independent system for attempting to accomplish the same goal. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7WC4SF2GYVLP56K6Q74OKFPJGHGWAPIP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Means of comparing slices for intersection or containment containment and computing intersections or unions
Andrew Barnert wrote: > On Feb 29, 2020, at 10:03, Steve Jorgensen ste...@stevej.name wrote: > > In that case, I still do think that this kind of > > functionality is of enough general use to have something for it in the > > Python standard > > library, though it should probably be through the introduction of a new > > type (possibly > > named something like "bounds") since neither range nor slice is really a > > good fit. I'm > > thinking it should/would be much more limited in scope than intervaltree > > (which does look > > really nice). > > There are a ton of different libraries on PyPI for interval/discrete > > range/range > values and sets and algebra and/or arithmetic on them, not to mention related > things like > saturating values within bounds. They all provide different functionality > with different > interfaces. Why do we need to pick one (or redesign and reimplement one > without even > looking for it) in particular? To me, it just feels like a missing core feature. What I'm talking about is something far simpler and less ambitious than what I would expect to see in an external addon but something that might likely be useful to any/all such things. I have decided that it makes more sense for me to publish something like what I'm looking for in a library of tools though, and then use that as the basis for a new post after I have that ready. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5JQXHETABMXTKPGDLDCAWQS3W3C7LBK4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Means of comparing slices for intersection or containment containment and computing intersections or unions
Steve Jorgensen wrote: > Christopher Barker wrote: > > On Sat, Feb 29, 2020 at 4:37 AM Alex Hall > > alex.moj...@gmail.com wrote: > > It seems like most of this would be very easy > > to > > implement yourself with > > the exact semantics that you prefer and find most intuitive, while other > > people might have different expectations. > > I have to agree here. You are proposing that a slice object be treated as a > > general purpose interval, but that is not, in fact what they are. This is > > made clear by: " Presumably, these operations would raise exceptions when > > used with slices that have > > step values other than None." > > and also: "whereas a slice represents a possibly continuous range of any > > kind of value to which magnitude is applicable." well, sort of. Given the > > implementation and duck typing, I suppose that's true. But in fact, slices > > were designed for, and are (at least mostly) used to, well, slice > > sequences, which are always integer indexes, and hav semantics specific to > > that use case: > > OK. That does make sense to me. In that case, I still do think that this kind of functionality is of enough general use to have something for it in the Python standard library, though it should probably be through the introduction of a new type (possibly named something like "bounds") since neither range nor slice is really a good fit. I'm thinking it should/would be much more limited in scope than intervaltree (which does look really nice). ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CKEPZYKUIZWGNA27NFXHMAMNPSXZZJIT/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Means of comparing slices for intersection or containment containment and computing intersections or unions
Christopher Barker wrote: > On Sat, Feb 29, 2020 at 4:37 AM Alex Hall alex.moj...@gmail.com wrote: > > It seems like most of this would be very easy to > > implement yourself with > > the exact semantics that you prefer and find most intuitive, while other > > people might have different expectations. > > I have to agree here. You are proposing that a slice object be treated as a > general purpose interval, but that is not, in fact what they are. This is > made clear by: " Presumably, these operations would raise exceptions when > used with slices that have > step values other than None." > and also: "whereas a slice represents a possibly continuous range of any > kind of value to which magnitude is applicable." well, sort of. Given the > implementation and duck typing, I suppose that's true. But in fact, slices > were designed for, and are (at least mostly) used to, well, slice > sequences, which are always integer indexes, and hav semantics specific to > that use case: OK. That does make sense to me. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RDGKOLMVBJECFT2YO6UBD2KVVQ25WJTL/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Means of comparing slices for intersection or containment containment and computing intersections or unions
Steve Jorgensen wrote: > I am purposefully proposing this for slices as opposed to ranges because it > is about > the bounds of the slices, not the items in synthetic sequences. Also, slices > can refer to > any type of value, not just integers. > Presumably, these operations would raise exceptions when used with slices > that have > step values other than None. Alternatively, those could > hypothetically be valid in restricted cases such as when all properties are > either int or > Fraction types. Probably better to have them be simply unsupported though. > a in b # True if is fully contained within > a.intersects(b) # True if any value could be within both and > a & b# Intersection of and or None if no intersection > a | b# Union of and or Exception if neither contiguous nor > overlapping. > > Also, it might be nice to be able to test whether a non-slice value falls > within the > slice's bounds. This would be using x in s as a shorthand for s.start > <= x < s.end. Again, this is different than asking whether a value is "in" a > range because a rage is a sequence of discrete integers whereas a slice > represents a > possibly continuous range of any kind of value to which magnitude is > applicable. > slice(1, 2) in (0, 3) > # => True because 1 >= 0 and 2 <= 3 > > slice(0.5, 1.5) in slice(0, 2) > # => True because 0.5 >= 1.5 and 0.5 < 2 > > 1 in (0, 3) > # => True because 0 <= 1 < 3 > > 'Joe' in slice('Alice', 'Riley') > # => True because 'Alice' <= 'Joe' < 'Riley' > > slice(1.1, 5.9).intersects(slice(2, 10.5)) > # => True because either... > # 1.1 <= 2< 5.9 or > # 1.1 < 10.5 < 5.5 or > # 2 <= 1.1 < 10.5 or > # 2 < 5.9 < 10.5 > > slice(5.5, 15.5) & slice(10.25, 20.25) > # => slice(10.25, 15.5) > > slice(5.5, 15.5) | slice(10.25, 20.25) > # => slice(5.5, 20.25) > > slice('abc', 'fff') & slice('eee', 'xyz') > # => slice('fff', 'eee') I notice I made a couple of subtle mistakes in the above. All the more reason to implement these concepts correctly as standard. :) ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/O6TLJRFOW2Q4A6PWY4O6IKY3P5ZPA5PQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Means of comparing slices for intersection or containment containment and computing intersections or unions
I am purposefully proposing this for slices as opposed to ranges because it is about the bounds of the slices, not the items in synthetic sequences. Also, slices can refer to any type of value, not just integers. Presumably, these operations would raise exceptions when used with slices that have `step` values other than `None`. Alternatively, those could hypothetically be valid in restricted cases such as when all properties are either int or Fraction types. Probably better to have them be simply unsupported though. ``` a in b # True if is fully contained within a.intersects(b) # True if any value could be within both and a & b# Intersection of and or None if no intersection a | b# Union of and or Exception if neither contiguous nor overlapping. ``` Also, it might be nice to be able to test whether a non-slice value falls within the slice's bounds. This would be using `x in s` as a shorthand for `s.start <= x < s.end`. Again, this is different than asking whether a value is "in" a range because a rage is a sequence of discrete integers whereas a slice represents a possibly continuous range of any kind of value to which magnitude is applicable. ``` slice(1, 2) in (0, 3) # => True because 1 >= 0 and 2 <= 3 slice(0.5, 1.5) in slice(0, 2) # => True because 0.5 >= 1.5 and 0.5 < 2 1 in (0, 3) # => True because 0 <= 1 < 3 'Joe' in slice('Alice', 'Riley') # => True because 'Alice' <= 'Joe' < 'Riley' slice(1.1, 5.9).intersects(slice(2, 10.5)) # => True because either... # 1.1 <= 2< 5.9 or # 1.1 < 10.5 < 5.5 or # 2 <= 1.1 < 10.5 or # 2 < 5.9 < 10.5 slice(5.5, 15.5) & slice(10.25, 20.25) # => slice(10.25, 15.5) slice(5.5, 15.5) | slice(10.25, 20.25) # => slice(5.5, 20.25) slice('abc', 'fff') & slice('eee', 'xyz') # => slice('fff', 'eee') ``` ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/EGAMXMDCKBA4I5BSQL4KCI2DE3NM7L7F/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Incremental step on road to improving situation around iterable strings
Steven D'Aprano wrote: > On Sun, Feb 23, 2020 at 11:25:12PM +0200, Alex Hall wrote: > > "Strings are not iterable - you cannot loop over them > > or treat them as a > > collection. > > Are you implying that we should deprecate the in operator for > strings > too? I would not get rid of the `in` behavior, but the `in` behavior of a string is actually not like that of the `in` operator for a typical collection. Seen as simply a collection of single-character strings, "b" would be in "abcd", but "bc" would not. The `in` operator for strings is checking whether the left operand is a substring as opposed to an item. `(2, 3)` is not `in` `(1, 2, 3, 4)`. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4H2IEA6MNOBH2JKENGLOYIE33O7BT4ST/ Code of Conduct: http://python.org/psf/codeofconduct/