Re: [Python-Dev] Dataclasses and correct hashability

2018-02-07 Thread Nick Coghlan
On 7 February 2018 at 12:48, Guido van Rossum  wrote:
> That seems a rare case (though I hadn't thought of it). I had thought of the
> use case where you want a frozen type without a hash; that you can
> presumably implement using
>
> def __hash__(self): raise TypeError("not hashable")

Now that attributes in the class dict pre-empt the generated versions,
"__hash__ = None" in the class body will also turn off hash generation
without enabling hashing.

> We can do a similar thing to preserve the superclass __hash__ if it's rare
> enough:
>
> def __hash__(self): return super().__hash__()

Similarly for this variant, you can do "__hash__ =
BaseClassWithDesiredHashAlgorithm.__hash__". (That may be more
appropriate than dynamic lookup in the MRO if you're extending a
frozen class with a subclass that adds more fields, but you want to
keep using the base class hash definition for some reason)

> If at all possible I'd like to kill the tri-state hash= flag -- the amount
> of time spent creating and discussing the huge table in the bpo issue are an
> indication of how much effort it would take other people to understand it.

+1 - the nice thing about "unsafe_hash=True" is that you *only* need
it if adding the hash would be unsafe, and you haven't set __hash__
explicitly in the class body.

While "frozen=True, unsafe_hash=True" is redundant (since the hash is
safe when instances are frozen), it isn't a surprising defect waiting
to happen the way "frozen=False, hash=True" is in the current
interface.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
I'm not a fan, sorry.

On Tue, Feb 6, 2018 at 7:33 PM, Ethan Furman  wrote:

> On 02/06/2018 06:48 PM, Guido van Rossum wrote:
>
> That seems a rare case (though I hadn't thought of it). I had thought of
>> the use case where you want a frozen type
>> without a hash; that you can presumably implement using
>>
>> def __hash__(self): raise TypeError("not hashable")
>>
>> We can do a similar thing to preserve the superclass __hash__ if it's
>> rare enough:
>>
>> def __hash__(self): return super().__hash__()
>>
>> If at all possible I'd like to kill the tri-state hash= flag -- the
>> amount of time spent creating and discussing the
>> huge table in the bpo issue are an indication of how much effort it would
>> take other people to understand it.
>>
>
> I think the biggest reason this has become so complicated is because we
> are refusing to use an Enum:
>
> class Hashable(Enum):
> IF_SAFE = 1
> ADD = 2
> DEFER = 3
> NONE = 4
>
> IF_SAFE is currently the False value.
> ADD is currently the True value
> DEFER means don't add one
> NONE means set __hash__ to None
>
> The only thing missing now is a setting to indicate that dataclass should
> do nothing if the class already has a __hash__ method -- possibly DEFER,
> although I think IF_SAFE can include "the class already has one, it's not
> safe to override it".
>
> If either of those use cases becomes annoyingly common we'll have to think
>> of something else.
>>
>
> Or we could solve it now and not have to deal with backwards-compatibility
> issues in the future.
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%
> 40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Ethan Furman

On 02/06/2018 06:48 PM, Guido van Rossum wrote:


That seems a rare case (though I hadn't thought of it). I had thought of the 
use case where you want a frozen type
without a hash; that you can presumably implement using

def __hash__(self): raise TypeError("not hashable")

We can do a similar thing to preserve the superclass __hash__ if it's rare 
enough:

def __hash__(self): return super().__hash__()

If at all possible I'd like to kill the tri-state hash= flag -- the amount of 
time spent creating and discussing the
huge table in the bpo issue are an indication of how much effort it would take 
other people to understand it.


I think the biggest reason this has become so complicated is because we are 
refusing to use an Enum:

class Hashable(Enum):
IF_SAFE = 1
ADD = 2
DEFER = 3
NONE = 4

IF_SAFE is currently the False value.
ADD is currently the True value
DEFER means don't add one
NONE means set __hash__ to None

The only thing missing now is a setting to indicate that dataclass should do nothing if the class already has a __hash__ 
method -- possibly DEFER, although I think IF_SAFE can include "the class already has one, it's not safe to override it".



If either of those use cases becomes annoyingly common we'll have to think of 
something else.


Or we could solve it now and not have to deal with backwards-compatibility 
issues in the future.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
That seems a rare case (though I hadn't thought of it). I had thought of
the use case where you want a frozen type without a hash; that you can
presumably implement using

def __hash__(self): raise TypeError("not hashable")

We can do a similar thing to preserve the superclass __hash__ if it's rare
enough:

def __hash__(self): return super().__hash__()

If at all possible I'd like to kill the tri-state hash= flag -- the amount
of time spent creating and discussing the huge table in the bpo issue are
an indication of how much effort it would take other people to understand
it.

If either of those use cases becomes annoyingly common we'll have to think
of something else.

On Tue, Feb 6, 2018 at 5:38 PM, Eric V. Smith  wrote:

> Sorry for the late reply. Still recovering from a computer failure.
>
> My only concern with this approach is: what if you don’t want any __hash__
> added? Say you want to use your base class’s hashing. I guess you could
> always “del cls.__hash__” after the class is created, but it’s not elegant.
>
> That’s what we got from the tri-state option: never add (False), always
> add (True), or add if it’s safe (None).
>
> --
> Eric
>
> On Feb 5, 2018, at 12:49 AM, Guido van Rossum  wrote:
>
> Looks like this is turning into a major flamewar regardless of what I say.
> :-(
>
> I really don't want to lose the ability to add a hash function to a
> mutable dataclass by flipping a flag in the decorator. I'll explain below.
> But I am fine if this flag has a name that clearly signals it's an unsafe
> thing to do.
>
> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
> @dataclass decorator with a simpler flag named unsafe_hash=. This would be
> a simple bool (not a tri-state flag like the current hash=None|False|True).
> The default would be False, and the behavior then would be to add a hash
> function automatically only if it's safe (using the same rules as for
> hash=None currently). With unsafe_hash=True, a hash function would always
> be generated that takes all fields into account except those declared using
> field(hash=False). If there's already a `def __hash__` in the function I
> don't care what it does, maybe it should raise rather than quietly doing
> nothing or quietly overwriting it.
>
> Here's my use case.
>
> A frozen class requires a lot of discipline, since you have to compute the
> values of all fields before calling the constructor. A mutable class allows
> other initialization patterns, e.g. manually setting some fields after the
> instance has been constructed, or having a separate non-dunder init()
> method. There may be good reasons for using these patterns, e.g. the object
> may be part of a cycle (e.g. parent/child links in a tree). Or you may just
> use one of these patterns because you're a pretty casual coder. Or you're
> modeling something external.
>
> My point is that once you have one of those patterns in place, changing
> your code to avoid them may be difficult. And yet your code may treat the
> objects as essentially immutable after the initialization phase (e.g. a
> parse tree). So if you create a dataclass and start coding like that for a
> while, and much later you need to put one of these into a set or use it as
> a dict key, switching to frozen=True may not be a quick option. And writing
> a __hash__ method by hand may feel like a lot of busywork. So this is where
> [unsafe_]hash=True would come in handy.
>
> I think naming the flag unsafe_hash should take away most objections,
> since it will be clear that this is not a safe thing to do. People who
> don't understand the danger are likely to copy a worse solution from
> StackOverflow anyway. The docs can point to frozen=True and explain the
> danger.
>
> --
> --Guido van Rossum (python.org/~guido)
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> eric%2Ba-python-dev%40trueblade.com
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Eric V. Smith
Sorry for the late reply. Still recovering from a computer failure. 

My only concern with this approach is: what if you don’t want any __hash__ 
added? Say you want to use your base class’s hashing. I guess you could always 
“del cls.__hash__” after the class is created, but it’s not elegant. 

That’s what we got from the tri-state option: never add (False), always add 
(True), or add if it’s safe (None).

--
Eric

> On Feb 5, 2018, at 12:49 AM, Guido van Rossum  wrote:
> 
> Looks like this is turning into a major flamewar regardless of what I say. :-(
> 
> I really don't want to lose the ability to add a hash function to a mutable 
> dataclass by flipping a flag in the decorator. I'll explain below. But I am 
> fine if this flag has a name that clearly signals it's an unsafe thing to do.
> 
> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the 
> @dataclass decorator with a simpler flag named unsafe_hash=. This would be a 
> simple bool (not a tri-state flag like the current hash=None|False|True). The 
> default would be False, and the behavior then would be to add a hash function 
> automatically only if it's safe (using the same rules as for hash=None 
> currently). With unsafe_hash=True, a hash function would always be generated 
> that takes all fields into account except those declared using 
> field(hash=False). If there's already a `def __hash__` in the function I 
> don't care what it does, maybe it should raise rather than quietly doing 
> nothing or quietly overwriting it.
> 
> Here's my use case.
> 
> A frozen class requires a lot of discipline, since you have to compute the 
> values of all fields before calling the constructor. A mutable class allows 
> other initialization patterns, e.g. manually setting some fields after the 
> instance has been constructed, or having a separate non-dunder init() method. 
> There may be good reasons for using these patterns, e.g. the object may be 
> part of a cycle (e.g. parent/child links in a tree). Or you may just use one 
> of these patterns because you're a pretty casual coder. Or you're modeling 
> something external.
> 
> My point is that once you have one of those patterns in place, changing your 
> code to avoid them may be difficult. And yet your code may treat the objects 
> as essentially immutable after the initialization phase (e.g. a parse tree). 
> So if you create a dataclass and start coding like that for a while, and much 
> later you need to put one of these into a set or use it as a dict key, 
> switching to frozen=True may not be a quick option. And writing a __hash__ 
> method by hand may feel like a lot of busywork. So this is where 
> [unsafe_]hash=True would come in handy.
> 
> I think naming the flag unsafe_hash should take away most objections, since 
> it will be clear that this is not a safe thing to do. People who don't 
> understand the danger are likely to copy a worse solution from StackOverflow 
> anyway. The docs can point to frozen=True and explain the danger.
> 
> -- 
> --Guido van Rossum (python.org/~guido)
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/eric%2Ba-python-dev%40trueblade.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
On Tue, Feb 6, 2018 at 12:44 PM, Ethan Furman  wrote:

> Although, couldn't we add a field-level frozen attribute (using property
> for the implementation), and check that all equality fields are properties
> as well as hashable?
>

That would be a totally unrelated feature request. Let's wait whether it's
actually needed a lot before designing it.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Ethan Furman

On 02/06/2018 12:24 PM, Guido van Rossum wrote:

On Tue, Feb 6, 2018 at 11:40 AM, Ethan Furman wrote:



It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is,

>> mutable data is involved in the hash calculation), but there still seems
>> to be one possibility for an "unsafe_hash" to actually be safe -- that is,
>> if only immutable fields are used in __eq__, then dataclass could safely
>> generate a hash for us.


Do we have a way to know if the equality fields are hashable?  I suppose

>> we could check each one for a for a non-None __hash__.  Then we could
>> modify that first condition from


- frozen=True

to

- frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for

>>   eq_field in equality_fields)


There seems to be a misunderstanding underlying these questions. Even if

> all fields have an immutable type (e.g. all ints, supporting __eq__ and
> __hash__), if the containing class isn't frozen, they can be assigned to.
> E.g.


@dataclass()
class Point:
  x: int
  y: int

p = Point(1, 1)
p.x = 2  # This is legal

The only way to make that assignment to p.x illegal is to make the *class*

> frozen (using @dataclass(frozen=True)) -- nothing we can do about the *field*
> will change this.

Oh, right.  When I was thinking this I thought a field could be frozen individually, didn't find the option at the field 
level when I checked the PEP, and then promptly forgot and suggested it anyway.


Although, couldn't we add a field-level frozen attribute (using property for the implementation), and check that all 
equality fields are properties as well as hashable?


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
On Tue, Feb 6, 2018 at 11:40 AM, Ethan Furman  wrote:

> It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is,
> mutable data is involved in the hash calculation), but there still seems to
> be one possibility for an "unsafe_hash" to actually be safe -- that is, if
> only immutable fields are used in __eq__, then dataclass could safely
> generate a hash for us.
>
> Do we have a way to know if the equality fields are hashable?  I suppose
> we could check each one for a for a non-None __hash__.  Then we could
> modify that first condition from
>
> - frozen=True
>
> to
>
> - frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for
> eq_field in equality_fields)
>

There seems to be a misunderstanding underlying these questions. Even if
all fields have an immutable type (e.g. all ints, supporting __eq__ and
__hash__), if the containing class isn't frozen, they can be assigned to.
E.g.

@dataclass()
class Point:
x: int
y: int

p = Point(1, 1)
p.x = 2  # This is legal

The only way to make that assignment to p.x illegal is to make the *class*
frozen (using @dataclass(frozen=True)) -- nothing we can do about the
*field* will change this.

Of course if you use @dataclass(frozen=True, unsafe_hash=True) you may
still get a safe hash. :-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
That's much less self-descriptive and harder to search Google or
StackOverflow for. It's also easier to overlook. We really want to send the
signal that this is unsafe and requires serious consideration before it is
turned on.

On Tue, Feb 6, 2018 at 11:57 AM, David Mertz  wrote:

> Honestly, the name I would most want for the keyword argument is '_hash'.
> That carries the semantics I desire.
>
> On Feb 6, 2018 10:13 AM, "Ethan Furman"  wrote:
>
>> On 02/06/2018 09:38 AM, Guido van Rossum wrote:
>>
>> Where do you get the impression that one would have to explicitly request
>>> __hash__ if frozen=True is set? To the
>>> contrary, my proposal is for @dataclass to automatically add a __hash__
>>> method when frozen=True is set. This is what the
>>> code currently released as 3.7.0b1 does if hash=None (the default).
>>>
>>
>> Which is my issue with the naming -- although, really, it's more with the
>> parameter/argument:  in a hand-written class,
>>
>>   __hash__ = None
>>
>> means the object in is not hashable, but with the decorator:
>>
>>   @dataclass(..., hash=None, ...)
>>
>> it means something else.
>>
>> My preference for "fixing" the issue:
>>
>> 1) make the default be a custom object (not None), so that `hash=None`
>>means disable hashing
>>
>> 2) change the param name -- maybe to `add_hash` (I agree with D'Aprano
>>that `unsafe_hash` can be misleading)
>>
>> --
>> ~Ethan~
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%40g
>> nosis.cx
>>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Eric V. Smith

On 2/6/18 2:40 PM, Ethan Furman wrote:

On a different note, should the PEP be updated with the current
signature?  It still talks about hash=None being the default.


Once we've reached an agreement, I'll update the PEP. I don't think 
we're there quite yet.


Eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread David Mertz
Honestly, the name I would most want for the keyword argument is '_hash'.
That carries the semantics I desire.

On Feb 6, 2018 10:13 AM, "Ethan Furman"  wrote:

> On 02/06/2018 09:38 AM, Guido van Rossum wrote:
>
> Where do you get the impression that one would have to explicitly request
>> __hash__ if frozen=True is set? To the
>> contrary, my proposal is for @dataclass to automatically add a __hash__
>> method when frozen=True is set. This is what the
>> code currently released as 3.7.0b1 does if hash=None (the default).
>>
>
> Which is my issue with the naming -- although, really, it's more with the
> parameter/argument:  in a hand-written class,
>
>   __hash__ = None
>
> means the object in is not hashable, but with the decorator:
>
>   @dataclass(..., hash=None, ...)
>
> it means something else.
>
> My preference for "fixing" the issue:
>
> 1) make the default be a custom object (not None), so that `hash=None`
>means disable hashing
>
> 2) change the param name -- maybe to `add_hash` (I agree with D'Aprano
>that `unsafe_hash` can be misleading)
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%
> 40gnosis.cx
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Ethan Furman

On 02/06/2018 11:18 AM, Guido van Rossum wrote:


We may be in violent agreement.

I propose *not* to add a way to *disable* hashing when the rest of the flags to 
@dataclass() would indicate that it's
safe to add a __hash__ method.


Okay.


I propose that with @dataclass(unsafe_hash=False) (the default), a __hash__ 
method is added when the following
conditions are true:

- frozen=True (not the default)
- compare=True (the default)
- no __hash__ method is defined in the class

If we instead use @dataclass(unsafe_hash=True), a __hash__ will be added 
regardless of the other flags, but if a
__hash__ method is present, an exception is raised.

Other values (e.g. unsafe_hash=None) are illegal for this flag.


Ah!  Excellent, that greatly allays my worries.


Note that the the hash= flag to the field() function is unchanged from what's 
currently in PEP 557 or in the
implementation in 3.7.0b1. In particular, the generated __hash__ method will 
disregard fields declared using
field(hash=False). It will also disregard fields declared using 
field(compare=False, hash=False|None).


It sounds like `unsafe_hash=True` indicates a truly unsafe hash (that is, mutable data is involved in the hash 
calculation), but there still seems to be one possibility for an "unsafe_hash" to actually be safe -- that is, if only 
immutable fields are used in __eq__, then dataclass could safely generate a hash for us.


Do we have a way to know if the equality fields are hashable?  I suppose we could check each one for a for a non-None 
__hash__.  Then we could modify that first condition from


- frozen=True

to

- frozen=True or all(getattr(eq_fld, '__hash__', None) is not None for eq_field 
in equality_fields)

Thoughts?


On a different note, should the PEP be updated with the current signature?  It still talks about hash=None being the 
default.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
We may be in violent agreement.

I propose *not* to add a way to *disable* hashing when the rest of the
flags to @dataclass() would indicate that it's safe to add a __hash__
method.

I propose that with @dataclass(unsafe_hash=False) (the default), a __hash__
method is added when the following conditions are true:

- frozen=True (not the default)
- compare=True (the default)
- no __hash__ method is defined in the class

If we instead use @dataclass(unsafe_hash=True), a __hash__ will be added
regardless of the other flags, but if a __hash__ method is present, an
exception is raised.

Other values (e.g. unsafe_hash=None) are illegal for this flag.

Note that the the hash= flag to the field() function is unchanged from
what's currently in PEP 557 or in the implementation in 3.7.0b1. In
particular, the generated __hash__ method will disregard fields declared
using field(hash=False). It will also disregard fields declared using
field(compare=False, hash=False|None).


On Tue, Feb 6, 2018 at 10:11 AM, Ethan Furman  wrote:

> On 02/06/2018 09:38 AM, Guido van Rossum wrote:
>
> Where do you get the impression that one would have to explicitly request
>> __hash__ if frozen=True is set? To the
>> contrary, my proposal is for @dataclass to automatically add a __hash__
>> method when frozen=True is set. This is what the
>> code currently released as 3.7.0b1 does if hash=None (the default).
>>
>
> Which is my issue with the naming -- although, really, it's more with the
> parameter/argument:  in a hand-written class,
>
>   __hash__ = None
>
> means the object in is not hashable, but with the decorator:
>
>   @dataclass(..., hash=None, ...)
>
> it means something else.
>
> My preference for "fixing" the issue:
>
> 1) make the default be a custom object (not None), so that `hash=None`
>means disable hashing
>
> 2) change the param name -- maybe to `add_hash` (I agree with D'Aprano
>that `unsafe_hash` can be misleading)
>
> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%
> 40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Ethan Furman

On 02/06/2018 09:38 AM, Guido van Rossum wrote:


Where do you get the impression that one would have to explicitly request 
__hash__ if frozen=True is set? To the
contrary, my proposal is for @dataclass to automatically add a __hash__ method 
when frozen=True is set. This is what the
code currently released as 3.7.0b1 does if hash=None (the default).


Which is my issue with the naming -- although, really, it's more with the 
parameter/argument:  in a hand-written class,

  __hash__ = None

means the object in is not hashable, but with the decorator:

  @dataclass(..., hash=None, ...)

it means something else.

My preference for "fixing" the issue:

1) make the default be a custom object (not None), so that `hash=None`
   means disable hashing

2) change the param name -- maybe to `add_hash` (I agree with D'Aprano
   that `unsafe_hash` can be misleading)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Guido van Rossum
Where do you get the impression that one would have to explicitly request
__hash__ if frozen=True is set? To the contrary, my proposal is for
@dataclass to automatically add a __hash__ method when frozen=True is set.
This is what the code currently released as 3.7.0b1 does if hash=None (the
default).

On Tue, Feb 6, 2018 at 9:26 AM, Steven D'Aprano  wrote:

> On Mon, Feb 05, 2018 at 10:50:21AM -0800, David Mertz wrote:
>
> > Absolutely I agree. 'unsafe_hash' as a name is clear warning to users.
>
> (I don't mean to pick on David specifically, I had to reply to some
> message in this thread and I just picked his.)
>
> I'm rather gobsmacked at the attitudes of many people here about hashing
> data classes. I thought *I* was the cynical pessimist who didn't have a
> high opinion of the quality of the average programmer, but according to
> this thread apparently I'm positively Pollyanna-esque for believing that
> most people will realise that if an API offers separate switches for
> hashable and frozen, you need to set *both* if you want both.
>
> Greg Smith even says that writing dunders apart from __init__ is a code
> smell, and warns people not to write dunders. Seriously? I get that
> __hash__ is hard to write correctly, which is why we have a hash=True to
> do the hard work for us, but I can't help feeling that at the point
> we're saying "don't write dunders, any dunder, you'll only do it wrong"
> we have crossed over to the wrong side of the pessimist/optimist line.
>
> But here we are: talking about naming a perfectly reasonable argument
> "unsafe_hash". Why are we trying to frighten people?
>
> There is nothing unsafe about a DataClass with hash=True, frozen=True,
> but this scheme means that even people who know what they're doing will
> write unsafe_hash=True, frozen=True, as if hashability was some sort of
> hand grenade waiting to go off.
>
> Perhaps we ought to deprecate __hash__ and start calling it
> __danger_danger_hash__ too? No, I don't think so.
>
> In the past, we've (rightly!) rejected proposals to call things like
> eval "unsafe_eval", and that really is dangerously unsafe when used
> naively with untrusted, unsanitised data. Hashing mutable objects by
> accident might be annoyingly difficult and frustrating to debug, but
> code injection attacks can lead to identity theft and worse, serious
> consequences for real people.
>
> I'm 100% in favour of programmer education, but I think this label is
> *miseducation*. We're suggesting that hashability is unsafe, regardless
> of whether the object is frozen or not.
>
> I'd far prefer to get a runtime warning:
>
> "Are you sure you want hash=True without frozen=True?"
>
> (or words to that extent) rather than burden all uses of the hash
> parameter, good and bad, with the unsafe label.
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Steven D'Aprano
On Mon, Feb 05, 2018 at 10:50:21AM -0800, David Mertz wrote:

> Absolutely I agree. 'unsafe_hash' as a name is clear warning to users.

(I don't mean to pick on David specifically, I had to reply to some 
message in this thread and I just picked his.)

I'm rather gobsmacked at the attitudes of many people here about hashing 
data classes. I thought *I* was the cynical pessimist who didn't have a 
high opinion of the quality of the average programmer, but according to 
this thread apparently I'm positively Pollyanna-esque for believing that 
most people will realise that if an API offers separate switches for 
hashable and frozen, you need to set *both* if you want both.

Greg Smith even says that writing dunders apart from __init__ is a code 
smell, and warns people not to write dunders. Seriously? I get that 
__hash__ is hard to write correctly, which is why we have a hash=True to 
do the hard work for us, but I can't help feeling that at the point 
we're saying "don't write dunders, any dunder, you'll only do it wrong" 
we have crossed over to the wrong side of the pessimist/optimist line.

But here we are: talking about naming a perfectly reasonable argument 
"unsafe_hash". Why are we trying to frighten people?

There is nothing unsafe about a DataClass with hash=True, frozen=True, 
but this scheme means that even people who know what they're doing will 
write unsafe_hash=True, frozen=True, as if hashability was some sort of 
hand grenade waiting to go off.

Perhaps we ought to deprecate __hash__ and start calling it 
__danger_danger_hash__ too? No, I don't think so.

In the past, we've (rightly!) rejected proposals to call things like 
eval "unsafe_eval", and that really is dangerously unsafe when used 
naively with untrusted, unsanitised data. Hashing mutable objects by 
accident might be annoyingly difficult and frustrating to debug, but 
code injection attacks can lead to identity theft and worse, serious 
consequences for real people.

I'm 100% in favour of programmer education, but I think this label is 
*miseducation*. We're suggesting that hashability is unsafe, regardless 
of whether the object is frozen or not.

I'd far prefer to get a runtime warning:

"Are you sure you want hash=True without frozen=True?"

(or words to that extent) rather than burden all uses of the hash 
parameter, good and bad, with the unsafe label.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-06 Thread Lukasz Langa
To add a counter-example that I'm painfully familiar with: the old Thrift for 
Python makes its (mutable) structures hashable. This is "useful" because you 
can memoize functions that take Thrift structures as arguments. You can key 
dictionaries with them. And so on.

Unfortunately, more often then not this same structure is passed around and 
inevitably gets mutated in place. Everything breaks. What should be memoized 
isn't (and in pathological cases, what shouldn't be memoized is), dictionary 
access *using the same object* raises unexpected key errors but iterating on 
the dictionary reveals the key.

It's clearly clowntown and in hindsight we shouldn't have enabled this 
behavior. But we can't go back since too much code relies on hashability now 
and it's "mostly" fine.

As a solution, the new asyncio-only Thrift implementation for Python uses 
C-level structures to make sure they are truly immutable in Python. We hash 
them and cache them like there's no tomorrow.

- Ł


> On Feb 5, 2018, at 11:54 PM, Ivan Levkivskyi  wrote:
> 
> Just wanted to add my 5 cents here. I am a bit surprised how people are 
> scared by adding `__hash__` to mutable classes.
> From my experience it is quite normal, I was always thinking about `hash()` 
> as hashing a _value_,
> and never as hashing _identity_, if I would need the latter, there is a 
> different function for this, `id()`.
> Moreover, I often did this in situations where dataclasses would be useful: a 
> class with several fields,
> necessary dunders, and few other methods (record-like classes).
> My motivation was mostly speed-up by memorization.
> To be fair my use cases were mostly about some tree-like strictures, but this 
> is probably a coincidence.
> 
> FWIW it is not a super-safe practice, but neither super-dangerous.
> 
> --
> Ivan
> 
> 
> 
> On 5 February 2018 at 22:56, Nick Coghlan  > wrote:
> On 6 February 2018 at 03:47, Guido van Rossum  > wrote:
> > If there's going to be an API for it, it should be in the class, not
> > something that mutates the class afterwards.
> 
> Something I realised after posting the __class__ setting idea is that
> you can actually use a comparable trick to inject an unsafe hash from
> the frozen version into the mutable version:
> 
> >>> from dataclasses import dataclass
> >>> @dataclass
> ... class Example:
> ... a: int
> ... b: int
> ...
> >>> c = Example(1, 2)
> >>> hash(c)
> Traceback (most recent call last):
>  File "", line 1, in 
> TypeError: unhashable type: 'Example'
> 
> >>> @dataclass(frozen=True)
> ... class LockedExample(Example):
> ... pass
> ...
> >>> Example.__hash__ = LockedExample.__hash__
> >>> hash(c)
> 3713081631934410656
> 
> So "unsafe_hash=True" would just be a shorthand spelling of that which
> skips creating the full frozen version of the class (and with the
> explicit parameter, we can better document the risks of making
> something hashable without also freezing it post-creation).
> 
> Cheers,
> Nick.
> 
> --
> Nick Coghlan   |   ncogh...@gmail.com    |   
> Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev 
> 
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/levkivskyi%40gmail.com 
> 
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev 
> 
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/lukasz%40langa.pl 
> 


signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Ivan Levkivskyi
Just wanted to add my 5 cents here. I am a bit surprised how people are
scared by adding `__hash__` to mutable classes.
>From my experience it is quite normal, I was always thinking about `hash()`
as hashing a _value_,
and never as hashing _identity_, if I would need the latter, there is a
different function for this, `id()`.
Moreover, I often did this in situations where dataclasses would be useful:
a class with several fields,
necessary dunders, and few other methods (record-like classes).
My motivation was mostly speed-up by memorization.
To be fair my use cases were mostly about some tree-like strictures, but
this is probably a coincidence.

FWIW it is not a super-safe practice, but neither super-dangerous.

--
Ivan



On 5 February 2018 at 22:56, Nick Coghlan  wrote:

> On 6 February 2018 at 03:47, Guido van Rossum  wrote:
> > If there's going to be an API for it, it should be in the class, not
> > something that mutates the class afterwards.
>
> Something I realised after posting the __class__ setting idea is that
> you can actually use a comparable trick to inject an unsafe hash from
> the frozen version into the mutable version:
>
> >>> from dataclasses import dataclass
> >>> @dataclass
> ... class Example:
> ... a: int
> ... b: int
> ...
> >>> c = Example(1, 2)
> >>> hash(c)
> Traceback (most recent call last):
>  File "", line 1, in 
> TypeError: unhashable type: 'Example'
>
> >>> @dataclass(frozen=True)
> ... class LockedExample(Example):
> ... pass
> ...
> >>> Example.__hash__ = LockedExample.__hash__
> >>> hash(c)
> 3713081631934410656
>
> So "unsafe_hash=True" would just be a shorthand spelling of that which
> skips creating the full frozen version of the class (and with the
> explicit parameter, we can better document the risks of making
> something hashable without also freezing it post-creation).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> levkivskyi%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Nick Coghlan
On 6 February 2018 at 03:47, Guido van Rossum  wrote:
> If there's going to be an API for it, it should be in the class, not
> something that mutates the class afterwards.

Something I realised after posting the __class__ setting idea is that
you can actually use a comparable trick to inject an unsafe hash from
the frozen version into the mutable version:

>>> from dataclasses import dataclass
>>> @dataclass
... class Example:
... a: int
... b: int
...
>>> c = Example(1, 2)
>>> hash(c)
Traceback (most recent call last):
 File "", line 1, in 
TypeError: unhashable type: 'Example'

>>> @dataclass(frozen=True)
... class LockedExample(Example):
... pass
...
>>> Example.__hash__ = LockedExample.__hash__
>>> hash(c)
3713081631934410656

So "unsafe_hash=True" would just be a shorthand spelling of that which
skips creating the full frozen version of the class (and with the
explicit parameter, we can better document the risks of making
something hashable without also freezing it post-creation).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Paul G
I don't think it matters so much whether you are stacking two decorators or a 
single decorator, but would an @add_unsafe_hash decorator be useful for 
anything *except* data classes? If not, then there's no point in having a 
*second* decorator that can *only* modify the first one - particularly 
considering @dataclass actually takes arguments.

On 02/05/2018 02:12 PM, Guido van Rossum wrote:
> Yes, that's what I meant -- "afterwards" meaning after the @dataclass
> decorator is applied.
> 
> On Mon, Feb 5, 2018 at 11:09 AM, Kirill Balunov 
> wrote:
> 
>>
>> 2018-02-05 20:47 GMT+03:00 Guido van Rossum :
>>
>>> If there's going to be an API for it, it should be in the class, not
>>> something that mutates the class afterwards.
>>>
>>
>>
>> I apologize and don't want to make unnecessary noise. But the already
>> selected design with decorator @dataclass implies that it will mutate
>> the freshly created class (which in its turn already limits some
>> possibilities), or I've missed something? If you meant that everything
>> should be defined in one place, then I basically understand your desire as
>> the least of two evils.
>>
>> With kind regards,
>> -gdg
>>
> 
> 
> 
> 
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
> 



signature.asc
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Guido van Rossum
Yes, that's what I meant -- "afterwards" meaning after the @dataclass
decorator is applied.

On Mon, Feb 5, 2018 at 11:09 AM, Kirill Balunov 
wrote:

>
> 2018-02-05 20:47 GMT+03:00 Guido van Rossum :
>
>> If there's going to be an API for it, it should be in the class, not
>> something that mutates the class afterwards.
>>
>
>
> I apologize and don't want to make unnecessary noise. But the already
> selected design with decorator @dataclass implies that it will mutate
> the freshly created class (which in its turn already limits some
> possibilities), or I've missed something? If you meant that everything
> should be defined in one place, then I basically understand your desire as
> the least of two evils.
>
> With kind regards,
> -gdg
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Kirill Balunov
2018-02-05 20:47 GMT+03:00 Guido van Rossum :

> If there's going to be an API for it, it should be in the class, not
> something that mutates the class afterwards.
>


I apologize and don't want to make unnecessary noise. But the already
selected design with decorator @dataclass implies that it will mutate
the freshly created class (which in its turn already limits some
possibilities), or I've missed something? If you meant that everything
should be defined in one place, then I basically understand your desire as
the least of two evils.

With kind regards,
-gdg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread David Mertz
Absolutely I agree. 'unsafe_hash' as a name is clear warning to users.

On Feb 4, 2018 10:43 PM, "Chris Barker"  wrote:



On Sun, Feb 4, 2018 at 11:57 PM, Gregory P. Smith  wrote:

> +1 using unsafe_hash as a name addresses my concern.
>
mine too -- anyone surprised by using this deserves what they get :-)

-CHB


On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum  wrote:
>
>> Looks like this is turning into a major flamewar regardless of what I
>> say. :-(
>>
>> I really don't want to lose the ability to add a hash function to a
>> mutable dataclass by flipping a flag in the decorator. I'll explain below.
>> But I am fine if this flag has a name that clearly signals it's an unsafe
>> thing to do.
>>
>> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
>> @dataclass decorator with a simpler flag named unsafe_hash=. This would be
>> a simple bool (not a tri-state flag like the current hash=None|False|True).
>> The default would be False, and the behavior then would be to add a hash
>> function automatically only if it's safe (using the same rules as for
>> hash=None currently). With unsafe_hash=True, a hash function would always
>> be generated that takes all fields into account except those declared using
>> field(hash=False). If there's already a `def __hash__` in the function I
>> don't care what it does, maybe it should raise rather than quietly doing
>> nothing or quietly overwriting it.
>>
>> Here's my use case.
>>
>> A frozen class requires a lot of discipline, since you have to compute
>> the values of all fields before calling the constructor. A mutable class
>> allows other initialization patterns, e.g. manually setting some fields
>> after the instance has been constructed, or having a separate non-dunder
>> init() method. There may be good reasons for using these patterns, e.g. the
>> object may be part of a cycle (e.g. parent/child links in a tree). Or you
>> may just use one of these patterns because you're a pretty casual coder. Or
>> you're modeling something external.
>>
>> My point is that once you have one of those patterns in place, changing
>> your code to avoid them may be difficult. And yet your code may treat the
>> objects as essentially immutable after the initialization phase (e.g. a
>> parse tree). So if you create a dataclass and start coding like that for a
>> while, and much later you need to put one of these into a set or use it as
>> a dict key, switching to frozen=True may not be a quick option. And writing
>> a __hash__ method by hand may feel like a lot of busywork. So this is where
>> [unsafe_]hash=True would come in handy.
>>
>> I think naming the flag unsafe_hash should take away most objections,
>> since it will be clear that this is not a safe thing to do. People who
>> don't understand the danger are likely to copy a worse solution from
>> StackOverflow anyway. The docs can point to frozen=True and explain the
>> danger.
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%
>> 40krypto.org
>>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.
> barker%40noaa.gov
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/
mertz%40gnosis.cx
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Guido van Rossum
If there's going to be an API for it, it should be in the class, not
something that mutates the class afterwards.

On Mon, Feb 5, 2018 at 1:59 AM, Kirill Balunov 
wrote:

> On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum  > wrote:
>>
>> Looks like this is turning into a major flamewar regardless of what I say.
>> :-(
>> I really don't want to lose the ability to add a hash function to a
>> mutable dataclass by flipping a flag in the decorator. I'll explain below.
>> But I am fine if this flag has a name that clearly signals it's an unsafe
>> thing to do.
>>
>> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
>> @dataclass decorator with a simpler flag named unsafe_hash=. This would be
>> a simple bool (not a tri-state flag like the current hash=None|False|True).
>> The default would be False, and the behavior then would be to add a hash
>> function automatically only if it's safe (using the same rules as for
>> hash=None currently). With unsafe_hash=True, a hash function would always
>> be generated that takes all fields into account except those declared using
>> field(hash=False). If there's already a `def __hash__` in the function I
>> don't care what it does, maybe it should raise rather than quietly doing
>> nothing or quietly overwriting it.
>>
>> Here's my use case.
>>
>>
> May be it is better to provide a special purpose function
> `make_unsafe_hash` in
> dataclass module which will patch a dataclass, instead of to clutter
> @dataclass
> API with arguments which are rather special case.
>
> This `unsafe_hash` argument will constantly raise questions among ordinary
> users
> like me, and will be possibly considered as a non-obvious design - there
> is a
> public API but it is somehow unsafe. On the other hand, with a function,
> when
> the user asks how to make a `frozen=False` dataclass hashable, you can
> suggest
> to use this `make_unsafe_hash` function with all its cautions in its docs
> or to try to
> implement __hash__ by yourself.
>
> Also taking into account the Python approach for backward compatibility it
> is
> better to stick with function and if it will be usefull to add a
> `unsafe_hash`
> argument in Python 3.8. It is easier to add later than to deprecate in the
> future.
>
> With kind regards,
> -gdg
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Guido van Rossum
I'm sorry, but a solution that requires __class__ assignment is way too
fragile for my taste.

On Mon, Feb 5, 2018 at 6:28 AM, Nick Coghlan  wrote:

> On 5 February 2018 at 15:49, Guido van Rossum  wrote:
> > My point is that once you have one of those patterns in place, changing
> your
> > code to avoid them may be difficult. And yet your code may treat the
> objects
> > as essentially immutable after the initialization phase (e.g. a parse
> tree).
> > So if you create a dataclass and start coding like that for a while, and
> > much later you need to put one of these into a set or use it as a dict
> key,
> > switching to frozen=True may not be a quick option. And writing a
> __hash__
> > method by hand may feel like a lot of busywork. So this is where
> > [unsafe_]hash=True would come in handy.
> >
> > I think naming the flag unsafe_hash should take away most objections,
> since
> > it will be clear that this is not a safe thing to do. People who don't
> > understand the danger are likely to copy a worse solution from
> StackOverflow
> > anyway. The docs can point to frozen=True and explain the danger.
>
> Aye, calling the flag unsafe_hash would convert me from -1 to -0.
>
> The remaining -0 is because I think there's a different and more
> robust way to tackle your example use case:
>
> # Mutable initialization phase
> >>> from dataclasses import dataclass
> >>> @dataclass
> ... class Example:
> ... a: int
> ... b: int
> ...
> >>> c = Example(None, None)
> >>> c
> Example(a=None, b=None)
> >>> c.a = 1
> >>> c.b = 2
> >>> c
> Example(a=1, b=2)
>
>
> # Frozen usage phase
> >>> @dataclass(frozen=True)
> ... class LockedExample(Example):
> ... pass
> ...
> >>> c.__class__ = LockedExample
> >>> c.a = 1
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "/home/ncoghlan/devel/cpython/Lib/dataclasses.py", line 448,
> in _frozen_setattr
>raise FrozenInstanceError(f'cannot assign to field {name!r}')
> dataclasses.FrozenInstanceError: cannot assign to field 'a'
> >>> c.b = 2
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "/home/ncoghlan/devel/cpython/Lib/dataclasses.py", line 448,
> in _frozen_setattr
>raise FrozenInstanceError(f'cannot assign to field {name!r}')
> dataclasses.FrozenInstanceError: cannot assign to field 'b'
> >>> hash(c)
> 3713081631934410656
>
> The gist of that approach is to assume that there will be *somewhere*
> in the code where it's possible to declare the construction of the
> instance "complete", and flip the nominal class over to the frozen
> subclass to make further mutation unlikely, even though the true
> underlying type is still the mutable version.
>
> That said, if we do provide "unsafe_hash", then the documentation for
> that flag becomes a place where we can explicitly suggest using a
> frozen subclass instead.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Kirill Balunov
>
> On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum  > wrote:
>
> Looks like this is turning into a major flamewar regardless of what I say.
> :-(
> I really don't want to lose the ability to add a hash function to a
> mutable dataclass by flipping a flag in the decorator. I'll explain below.
> But I am fine if this flag has a name that clearly signals it's an unsafe
> thing to do.
>
> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
> @dataclass decorator with a simpler flag named unsafe_hash=. This would be
> a simple bool (not a tri-state flag like the current hash=None|False|True).
> The default would be False, and the behavior then would be to add a hash
> function automatically only if it's safe (using the same rules as for
> hash=None currently). With unsafe_hash=True, a hash function would always
> be generated that takes all fields into account except those declared using
> field(hash=False). If there's already a `def __hash__` in the function I
> don't care what it does, maybe it should raise rather than quietly doing
> nothing or quietly overwriting it.
>
> Here's my use case.
>
>
May be it is better to provide a special purpose function
`make_unsafe_hash` in
dataclass module which will patch a dataclass, instead of to clutter
@dataclass
API with arguments which are rather special case.

This `unsafe_hash` argument will constantly raise questions among ordinary
users
like me, and will be possibly considered as a non-obvious design - there is
a
public API but it is somehow unsafe. On the other hand, with a function,
when
the user asks how to make a `frozen=False` dataclass hashable, you can
suggest
to use this `make_unsafe_hash` function with all its cautions in its docs
or to try to
implement __hash__ by yourself.

Also taking into account the Python approach for backward compatibility it
is
better to stick with function and if it will be usefull to add a
`unsafe_hash`
argument in Python 3.8. It is easier to add later than to deprecate in the
future.

With kind regards,
-gdg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Nick Coghlan
On 5 February 2018 at 15:49, Guido van Rossum  wrote:
> My point is that once you have one of those patterns in place, changing your
> code to avoid them may be difficult. And yet your code may treat the objects
> as essentially immutable after the initialization phase (e.g. a parse tree).
> So if you create a dataclass and start coding like that for a while, and
> much later you need to put one of these into a set or use it as a dict key,
> switching to frozen=True may not be a quick option. And writing a __hash__
> method by hand may feel like a lot of busywork. So this is where
> [unsafe_]hash=True would come in handy.
>
> I think naming the flag unsafe_hash should take away most objections, since
> it will be clear that this is not a safe thing to do. People who don't
> understand the danger are likely to copy a worse solution from StackOverflow
> anyway. The docs can point to frozen=True and explain the danger.

Aye, calling the flag unsafe_hash would convert me from -1 to -0.

The remaining -0 is because I think there's a different and more
robust way to tackle your example use case:

# Mutable initialization phase
>>> from dataclasses import dataclass
>>> @dataclass
... class Example:
... a: int
... b: int
...
>>> c = Example(None, None)
>>> c
Example(a=None, b=None)
>>> c.a = 1
>>> c.b = 2
>>> c
Example(a=1, b=2)


# Frozen usage phase
>>> @dataclass(frozen=True)
... class LockedExample(Example):
... pass
...
>>> c.__class__ = LockedExample
>>> c.a = 1
Traceback (most recent call last):
 File "", line 1, in 
 File "/home/ncoghlan/devel/cpython/Lib/dataclasses.py", line 448,
in _frozen_setattr
   raise FrozenInstanceError(f'cannot assign to field {name!r}')
dataclasses.FrozenInstanceError: cannot assign to field 'a'
>>> c.b = 2
Traceback (most recent call last):
 File "", line 1, in 
 File "/home/ncoghlan/devel/cpython/Lib/dataclasses.py", line 448,
in _frozen_setattr
   raise FrozenInstanceError(f'cannot assign to field {name!r}')
dataclasses.FrozenInstanceError: cannot assign to field 'b'
>>> hash(c)
3713081631934410656

The gist of that approach is to assume that there will be *somewhere*
in the code where it's possible to declare the construction of the
instance "complete", and flip the nominal class over to the frozen
subclass to make further mutation unlikely, even though the true
underlying type is still the mutable version.

That said, if we do provide "unsafe_hash", then the documentation for
that flag becomes a place where we can explicitly suggest using a
frozen subclass instead.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Terry Reedy

On 2/5/2018 2:28 AM, Glenn Linderman wrote:

This is an interesting use case. I haven't got the internals knowledge 
to know just how just different mutable and immutable classes and 
objects are under the hood.


I believe there is no internal difference.  An object is immutable if 
there is not way to mutate it with Python code that not poke into 
internals, such as one can do with ctypes or 3rd party extensions. 
Numbers and strings have no mutation methods, including no .__init__.


A tuple is a fixed sequence of objects and has no .__init__.  But if any 
object in a tuple is mutable, then the tuple is.  But the tuple does not 
know its status, and there is no 'is_mutable' function. However, 
tuple.__hash__ calls the .__hash__ method of each object and if that is 
missing for one, tuple.__hash raises.


>>> hash((1, 'a', []))
Traceback (most recent call last):
  File "", line 1, in 
hash((1, 'a', []))
TypeError: unhashable type: 'list'

The built-in immutable objects are mutated from their initial blank 
values in the C code of their .__new__ methods.  So they are only 
'immutable' once constructed.  Guido pointed out that users constructing 
objects in Python code might reasonably do so other than only with 
.__new__, but still want to treat the object as immutable once constructed.


In Lisp, for instance, lists are actually trees.  To be immutable, they 
can only be singly linked and must be constructed from leaf nodes to the 
root (or head).  Python programmers should be able to link in both 
directions and start from the root, and still treat the result as frozen 
and hashable.


But this use case makes me wonder if, even 
at the cost of some performance that "normal" immutable classes and 
objects might obtain, if it would be possible to use the various 
undisciplined initialization patterns as desired, followed by as 
declaration "This OBJECT is now immutable" which would calculate its 
HASH value, and prevent future mutations of the object?


Something like this has been proposed, at least for dicts, and rejected.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Glenn Linderman

On 2/5/2018 12:11 AM, Nathaniel Smith wrote:

On Sun, Feb 4, 2018 at 11:28 PM, Glenn Linderman  wrote:

This is an interesting use case. I haven't got the internals knowledge to
know just how just different mutable and immutable classes and objects are
under the hood. But this use case makes me wonder if, even at the cost of
some performance that "normal" immutable classes and objects might obtain,
if it would be possible to use the various undisciplined initialization
patterns as desired, followed by as declaration "This OBJECT is now
immutable" which would calculate its HASH value, and prevent future
mutations of the object?

It would be technically possible to support something like

@dataclass(freezable=True)
class Foo:
 blah: int

foo = Foo()
# Initially, object is mutable, and hash(foo) raises an error
foo.blah = 1
assertRaises(hash, foo)

# This method is automatically generated for classes with freezable=True
foo.freeze()

# Now object is immutable, and hash(foo) is allowed
assertRaises(foo.__setattr__, "blah", 2)
hash(foo)

I don't know if it's worth the complexity, but I guess it would cover
at least some of the use cases Guido raised.

-n

Thanks, Nathaniel, for confirming that what I was suggesting is not 
impossible, even if it turns out to be undesirable for some reason, or 
unwanted by anyone else. But I have encountered a subset of the use 
cases Guido mentioned, and had to make a 2nd class to gather/hold the 
values of the eventual immutable class, before I could make it, because 
pieces of the data for the class values were obtained from different 
sources at different times. Once all collected, then the immutability 
could be obtained, the rest of the processing performed. Thrashes the 
allocator pretty well doing it that way, but the job got done.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-05 Thread Nathaniel Smith
On Sun, Feb 4, 2018 at 11:28 PM, Glenn Linderman  wrote:
> This is an interesting use case. I haven't got the internals knowledge to
> know just how just different mutable and immutable classes and objects are
> under the hood. But this use case makes me wonder if, even at the cost of
> some performance that "normal" immutable classes and objects might obtain,
> if it would be possible to use the various undisciplined initialization
> patterns as desired, followed by as declaration "This OBJECT is now
> immutable" which would calculate its HASH value, and prevent future
> mutations of the object?

It would be technically possible to support something like

@dataclass(freezable=True)
class Foo:
blah: int

foo = Foo()
# Initially, object is mutable, and hash(foo) raises an error
foo.blah = 1
assertRaises(hash, foo)

# This method is automatically generated for classes with freezable=True
foo.freeze()

# Now object is immutable, and hash(foo) is allowed
assertRaises(foo.__setattr__, "blah", 2)
hash(foo)

I don't know if it's worth the complexity, but I guess it would cover
at least some of the use cases Guido raised.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-04 Thread Glenn Linderman

On 2/4/2018 9:49 PM, Guido van Rossum wrote:
A frozen class requires a lot of discipline, since you have to compute 
the values of all fields before calling the constructor. A mutable 
class allows other initialization patterns, e.g. manually setting some 
fields after the instance has been constructed, or having a separate 
non-dunder init() method. There may be good reasons for using these 
patterns, e.g. the object may be part of a cycle (e.g. parent/child 
links in a tree). Or you may just use one of these patterns because 
you're a pretty casual coder. Or you're modeling something external.


My point is that once you have one of those patterns in place, 
changing your code to avoid them may be difficult. And yet your code 
may treat the objects as essentially immutable after the 
initialization phase (e.g. a parse tree). So if you create a dataclass 
and start coding like that for a while, and much later you need to put 
one of these into a set or use it as a dict key, switching to 
frozen=True may not be a quick option. And writing a __hash__ method 
by hand may feel like a lot of busywork. So this is where 
[unsafe_]hash=True would come in handy.


I think naming the flag unsafe_hash should take away most objections, 
since it will be clear that this is not a safe thing to do. People who 
don't understand the danger are likely to copy a worse solution from 
StackOverflow anyway. The docs can point to frozen=True and explain 
the danger.


This is an interesting use case. I haven't got the internals knowledge 
to know just how just different mutable and immutable classes and 
objects are under the hood. But this use case makes me wonder if, even 
at the cost of some performance that "normal" immutable classes and 
objects might obtain, if it would be possible to use the various 
undisciplined initialization patterns as desired, followed by as 
declaration "This OBJECT is now immutable" which would calculate its 
HASH value, and prevent future mutations of the object?


Yes, I'm aware that the decision for immutability has historically been 
done at the class level, not the object level, but in my ignorance of 
the internals, I wonder if that is necessary, for performance or more 
importantly, for other reasons.


And perhaps the implementation is internally almost like two classes, 
one mutable, and the other immutable, and the declaration would convert 
the object from one to the other.  But if I say more, I'd just be babbling.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-04 Thread Chris Barker
On Sun, Feb 4, 2018 at 11:57 PM, Gregory P. Smith  wrote:

> +1 using unsafe_hash as a name addresses my concern.
>
mine too -- anyone surprised by using this deserves what they get :-)

-CHB


On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum  wrote:
>
>> Looks like this is turning into a major flamewar regardless of what I
>> say. :-(
>>
>> I really don't want to lose the ability to add a hash function to a
>> mutable dataclass by flipping a flag in the decorator. I'll explain below.
>> But I am fine if this flag has a name that clearly signals it's an unsafe
>> thing to do.
>>
>> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
>> @dataclass decorator with a simpler flag named unsafe_hash=. This would be
>> a simple bool (not a tri-state flag like the current hash=None|False|True).
>> The default would be False, and the behavior then would be to add a hash
>> function automatically only if it's safe (using the same rules as for
>> hash=None currently). With unsafe_hash=True, a hash function would always
>> be generated that takes all fields into account except those declared using
>> field(hash=False). If there's already a `def __hash__` in the function I
>> don't care what it does, maybe it should raise rather than quietly doing
>> nothing or quietly overwriting it.
>>
>> Here's my use case.
>>
>> A frozen class requires a lot of discipline, since you have to compute
>> the values of all fields before calling the constructor. A mutable class
>> allows other initialization patterns, e.g. manually setting some fields
>> after the instance has been constructed, or having a separate non-dunder
>> init() method. There may be good reasons for using these patterns, e.g. the
>> object may be part of a cycle (e.g. parent/child links in a tree). Or you
>> may just use one of these patterns because you're a pretty casual coder. Or
>> you're modeling something external.
>>
>> My point is that once you have one of those patterns in place, changing
>> your code to avoid them may be difficult. And yet your code may treat the
>> objects as essentially immutable after the initialization phase (e.g. a
>> parse tree). So if you create a dataclass and start coding like that for a
>> while, and much later you need to put one of these into a set or use it as
>> a dict key, switching to frozen=True may not be a quick option. And writing
>> a __hash__ method by hand may feel like a lot of busywork. So this is where
>> [unsafe_]hash=True would come in handy.
>>
>> I think naming the flag unsafe_hash should take away most objections,
>> since it will be clear that this is not a safe thing to do. People who
>> don't understand the danger are likely to copy a worse solution from
>> StackOverflow anyway. The docs can point to frozen=True and explain the
>> danger.
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
>> greg%40krypto.org
>>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> chris.barker%40noaa.gov
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-04 Thread Gregory P. Smith
+1 using unsafe_hash as a name addresses my concern. It's a good signal
that there are caveats worth considering.

-gps

On Sun, Feb 4, 2018, 9:50 PM Guido van Rossum  wrote:

> Looks like this is turning into a major flamewar regardless of what I say.
> :-(
>
> I really don't want to lose the ability to add a hash function to a
> mutable dataclass by flipping a flag in the decorator. I'll explain below.
> But I am fine if this flag has a name that clearly signals it's an unsafe
> thing to do.
>
> I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
> @dataclass decorator with a simpler flag named unsafe_hash=. This would be
> a simple bool (not a tri-state flag like the current hash=None|False|True).
> The default would be False, and the behavior then would be to add a hash
> function automatically only if it's safe (using the same rules as for
> hash=None currently). With unsafe_hash=True, a hash function would always
> be generated that takes all fields into account except those declared using
> field(hash=False). If there's already a `def __hash__` in the function I
> don't care what it does, maybe it should raise rather than quietly doing
> nothing or quietly overwriting it.
>
> Here's my use case.
>
> A frozen class requires a lot of discipline, since you have to compute the
> values of all fields before calling the constructor. A mutable class allows
> other initialization patterns, e.g. manually setting some fields after the
> instance has been constructed, or having a separate non-dunder init()
> method. There may be good reasons for using these patterns, e.g. the object
> may be part of a cycle (e.g. parent/child links in a tree). Or you may just
> use one of these patterns because you're a pretty casual coder. Or you're
> modeling something external.
>
> My point is that once you have one of those patterns in place, changing
> your code to avoid them may be difficult. And yet your code may treat the
> objects as essentially immutable after the initialization phase (e.g. a
> parse tree). So if you create a dataclass and start coding like that for a
> while, and much later you need to put one of these into a set or use it as
> a dict key, switching to frozen=True may not be a quick option. And writing
> a __hash__ method by hand may feel like a lot of busywork. So this is where
> [unsafe_]hash=True would come in handy.
>
> I think naming the flag unsafe_hash should take away most objections,
> since it will be clear that this is not a safe thing to do. People who
> don't understand the danger are likely to copy a worse solution from
> StackOverflow anyway. The docs can point to frozen=True and explain the
> danger.
>
> --
> --Guido van Rossum (python.org/~guido)
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/greg%40krypto.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-04 Thread Guido van Rossum
Looks like this is turning into a major flamewar regardless of what I say.
:-(

I really don't want to lose the ability to add a hash function to a mutable
dataclass by flipping a flag in the decorator. I'll explain below. But I am
fine if this flag has a name that clearly signals it's an unsafe thing to
do.

I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
@dataclass decorator with a simpler flag named unsafe_hash=. This would be
a simple bool (not a tri-state flag like the current hash=None|False|True).
The default would be False, and the behavior then would be to add a hash
function automatically only if it's safe (using the same rules as for
hash=None currently). With unsafe_hash=True, a hash function would always
be generated that takes all fields into account except those declared using
field(hash=False). If there's already a `def __hash__` in the function I
don't care what it does, maybe it should raise rather than quietly doing
nothing or quietly overwriting it.

Here's my use case.

A frozen class requires a lot of discipline, since you have to compute the
values of all fields before calling the constructor. A mutable class allows
other initialization patterns, e.g. manually setting some fields after the
instance has been constructed, or having a separate non-dunder init()
method. There may be good reasons for using these patterns, e.g. the object
may be part of a cycle (e.g. parent/child links in a tree). Or you may just
use one of these patterns because you're a pretty casual coder. Or you're
modeling something external.

My point is that once you have one of those patterns in place, changing
your code to avoid them may be difficult. And yet your code may treat the
objects as essentially immutable after the initialization phase (e.g. a
parse tree). So if you create a dataclass and start coding like that for a
while, and much later you need to put one of these into a set or use it as
a dict key, switching to frozen=True may not be a quick option. And writing
a __hash__ method by hand may feel like a lot of busywork. So this is where
[unsafe_]hash=True would come in handy.

I think naming the flag unsafe_hash should take away most objections, since
it will be clear that this is not a safe thing to do. People who don't
understand the danger are likely to copy a worse solution from
StackOverflow anyway. The docs can point to frozen=True and explain the
danger.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-04 Thread Chris Barker - NOAA Federal
>> IMO, the danger of
>> "@dataclass(hash=True)" far overweighs whatever convenience it might
>> provide.

Is there any reason specifying has=True could set frozen=True unless
the user specifically sets frozen=False?

Or is that already the case?

I think the folks that are concerned about this issue are quite right
— most Python users equate immutable and hashable—so the dataclass API
should reflect that.

And this would still make it easy and clear to specify the unusual
(and arguably dangerous) case of:

hash=True, frozen=False

-CHB

>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-03 Thread Steve Holden
As a Bear of Relatively Little Brain, I've grown up understanding, and
teaching, that mutable things aren't to be used as dict keys. I'm aware
that immutability isn't strictly the required condition, but it for most
people, that's the primary reason for using frozen sets and tuples, for
example, and immutability serves as a practical and comprehensible first
approximation. So I'm at a loss to understand why I am being offered a
feature that (especially during maintenance by a different developer) might
be prone to bizarre errors caused by a change in hash.

I realise that this won't happen very often, but the difficulty of the
debug task should surely merit at least some warning for us bears - you
know, the ones that take your work and use it to do mundane things with.

On a slightly tangential note, us bears are very glad that such questions
are taken seriously and discussed in such depth. Thank you all.

Steve Holden

On Sat, Feb 3, 2018 at 6:44 AM, Guido van Rossum  wrote:

> It appears Eric and I are the only ones in favor of keeping the current
> behavior. But I still am not convinced by all the worries about "attractive
> nuisances" and all the other bad names this feature has been called. We
> don't know that any of the doomsday scenarios will happen. In my
> experience, usually once something is rolled out the set of issues that are
> *actually* raised is entirely different from the things its designers were
> worried about.
>
> Please don't commit a change to roll this back without checking in with
> me; I have some misgivings about the problem being raised here that I still
> need to think through more carefully. In the meantime, please try to use
> dataclasses with 3.7b1!
>
> On Fri, Feb 2, 2018 at 10:25 PM, Nick Coghlan  wrote:
>
>>
>>
>> On 3 Feb. 2018 1:09 am, "Eric V. Smith"  wrote:
>>
>>
>> The problem with dropping hash=True is: how would you write __hash__
>> yourself? It seems like a bug magnet if you're adding fields to the class
>> and forget to update __hash__, especially in the presence of per-field
>> hash=False and eq=False settings. And you'd need to make sure it matches
>> the generated __eq__ (if 2 objects are equal, they need to have the same
>> hash value).
>>
>>
>> I think anyone that does this needs to think *very* carefully about how
>> they do it, and offering both "hash=True" and "frozen=True" is an
>> attractive nuisance that means people will write "hash=True" when what they
>> wanted was "frozen=True".
>>
>> In particular, having to work out how write a maintainable "__hash__"
>> will encourage folks to separate out the hashed fields as a separate frozen
>> subrecord or base class.
>>
>> If this proves to be an intolerable burden then the short hand spelling
>> could be added back in 3.8, but once we ship it we're going to be stuck
>> with explaining the interactions indefinitely.
>>
>> Cheers,
>> Nick.
>>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-03 Thread Ethan Furman

On 02/02/2018 10:44 PM, Guido van Rossum wrote:


It appears Eric and I are the only ones in favor of keeping the current 
behavior. But I still am not convinced by all
the worries about "attractive nuisances" and all the other bad names this 
feature has been called. We don't know that
any of the doomsday scenarios will happen. In my experience, usually once 
something is rolled out the set of issues that
are *actually* raised is entirely different from the things its designers were 
worried about.


This may all be true, but consider how many times we have asked, "How does attrs 
handle that?"

It would be wise to also ask, "What pitfalls have attrs discovered, and what would 
they do different if they could?"

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-03 Thread Gregory P. Smith
On Fri, Feb 2, 2018 at 10:25 PM Nick Coghlan  wrote:

>
>
> On 3 Feb. 2018 1:09 am, "Eric V. Smith"  wrote:
>
>
> The problem with dropping hash=True is: how would you write __hash__
> yourself? It seems like a bug magnet if you're adding fields to the class
> and forget to update __hash__, especially in the presence of per-field
> hash=False and eq=False settings. And you'd need to make sure it matches
> the generated __eq__ (if 2 objects are equal, they need to have the same
> hash value).
>
>
> I think anyone that does this needs to think *very* carefully about how
> they do it, and offering both "hash=True" and "frozen=True" is an
> attractive nuisance that means people will write "hash=True" when what they
> wanted was "frozen=True".
>

> In particular, having to work out how write a maintainable "__hash__" will
> encourage folks to separate out the hashed fields as a separate frozen
> subrecord or base class.
>
> If this proves to be an intolerable burden then the short hand spelling
> could be added back in 3.8, but once we ship it we're going to be stuck
> with explaining the interactions indefinitely.
>

+1 Nick put words to my chief concerns.

It is easy for an author see hash=True in existing code somewhere (cargo
culting) and assume it does what they want, or quickly glance at the the
API and see "hash=True" without actually taking the time to understand the
implications of that to see that the parameter named "frozen" is the one
they are supposed to want that _safely_ makes their dataclass properly
hashable, not the more attractive parameter named "hash" that enables
dangerous behavior.

Forcing people who need a __hash__ method to write it explicitly sounds
like a good thing to me.  I am not at all worried about someone forgetting
to add a new field to an implementation of the __hash__ method when adding
a new field, the fields and __hash__ method are all defined in the same
place in the code.

I expect someone with a common need for always having a __hash__ method
will produce a library on top of dataclasses that implements something like
our current hash=True behavior.  If that kind of thing turns out to be
widely used, we can reintroduce the feature in dataclasses in 3.8 or later,
informed by what we see practical uses actually doing.

In my practical experience, people writing Python code do not need to learn
and understand the intricacies of what it means to have a __hash__ method,
be hashable, or "frozen".  We intentionally warn people against writing
dunder methods other than __init__ in their code as they are often power
features with less obvious semantics than it may seem at first glance
making such code harder to maintain.

Even calling the parameter "hash=" and saying it adds a __hash__ method as
the PEP currently does seems to launder the danger, washing away the
"dunder smell" that adding a special considerations __hash__ method carries.

The PEP (and presumably forthcoming dataclasses module documentation) says
"This is a specialized use case and should be considered carefully" which I
agree with.  But any time we suggest that in an API, how about having the
API name make it clear that this is special and not to be done lightly?  I
guess i'm arguing against using "hash=" as the arg name in favor of
"danger_there_be_vorpal_rabbits_hash_me_maybe=" or something more usefully
similar if we're going to have it.

-gps
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Guido van Rossum
It appears Eric and I are the only ones in favor of keeping the current
behavior. But I still am not convinced by all the worries about "attractive
nuisances" and all the other bad names this feature has been called. We
don't know that any of the doomsday scenarios will happen. In my
experience, usually once something is rolled out the set of issues that are
*actually* raised is entirely different from the things its designers were
worried about.

Please don't commit a change to roll this back without checking in with me;
I have some misgivings about the problem being raised here that I still
need to think through more carefully. In the meantime, please try to use
dataclasses with 3.7b1!

On Fri, Feb 2, 2018 at 10:25 PM, Nick Coghlan  wrote:

>
>
> On 3 Feb. 2018 1:09 am, "Eric V. Smith"  wrote:
>
>
> The problem with dropping hash=True is: how would you write __hash__
> yourself? It seems like a bug magnet if you're adding fields to the class
> and forget to update __hash__, especially in the presence of per-field
> hash=False and eq=False settings. And you'd need to make sure it matches
> the generated __eq__ (if 2 objects are equal, they need to have the same
> hash value).
>
>
> I think anyone that does this needs to think *very* carefully about how
> they do it, and offering both "hash=True" and "frozen=True" is an
> attractive nuisance that means people will write "hash=True" when what they
> wanted was "frozen=True".
>
> In particular, having to work out how write a maintainable "__hash__" will
> encourage folks to separate out the hashed fields as a separate frozen
> subrecord or base class.
>
> If this proves to be an intolerable burden then the short hand spelling
> could be added back in 3.8, but once we ship it we're going to be stuck
> with explaining the interactions indefinitely.
>
> Cheers,
> Nick.
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Nick Coghlan
On 3 Feb. 2018 1:09 am, "Eric V. Smith"  wrote:


The problem with dropping hash=True is: how would you write __hash__
yourself? It seems like a bug magnet if you're adding fields to the class
and forget to update __hash__, especially in the presence of per-field
hash=False and eq=False settings. And you'd need to make sure it matches
the generated __eq__ (if 2 objects are equal, they need to have the same
hash value).


I think anyone that does this needs to think *very* carefully about how
they do it, and offering both "hash=True" and "frozen=True" is an
attractive nuisance that means people will write "hash=True" when what they
wanted was "frozen=True".

In particular, having to work out how write a maintainable "__hash__" will
encourage folks to separate out the hashed fields as a separate frozen
subrecord or base class.

If this proves to be an intolerable burden then the short hand spelling
could be added back in 3.8, but once we ship it we're going to be stuck
with explaining the interactions indefinitely.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread David Mertz
I agree with Ethan, Elvis, and a few others. I think 'hash=True,
frozen=False' should be disabled in 3.7.  It's an attractive nuisance.
Maybe not so attractive because its obscurity, but still with no clear
reason to exist.

If many users of of dataclass find themselves defining '__hash__' with
mutable dataclass, it's perfectly possible to allow the switch combination
later. But taking it out after previously allowing it—even if every use in
the wild is actually a bug in waiting—is harder.

On Feb 2, 2018 2:10 PM, "Ethan Furman"  wrote:

> On 02/02/2018 08:09 AM, Eric V. Smith wrote:
>
>> On 2/2/2018 10:56 AM, Elvis Pranskevichus wrote:
>>
>
> My point is exactly that there is _no_ valid use case, so (hash=True,
>>> frozen=False) should not be a thing!  Why are you so insistent on adding
>>> a dangerous option which you admit is nearly useless?
>>>
>>
>> Because it's not the default, it will be documented as being an advanced
>> use case, and it's useful in rare instances.
>>
>
> Personally, I don't think advanced use-cases need to be supported by flags
> as they can be supported by just writing the __dunder__ methods.
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/mertz%
> 40gnosis.cx
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Ethan Furman

On 02/02/2018 08:09 AM, Eric V. Smith wrote:

On 2/2/2018 10:56 AM, Elvis Pranskevichus wrote:



My point is exactly that there is _no_ valid use case, so (hash=True,
frozen=False) should not be a thing!  Why are you so insistent on adding
a dangerous option which you admit is nearly useless?


Because it's not the default, it will be documented as being an advanced use 
case, and it's useful in rare instances.


Personally, I don't think advanced use-cases need to be supported by flags as they can be supported by just writing the 
__dunder__ methods.


--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Ethan Furman

On 02/02/2018 08:14 AM, Yury Selivanov wrote:


Eric, in my opinion we shouldn't copy attrs.  [...]



We are designing a new API that is going to be hugely popular.  Why
can't we ship it with dangerous options prohibited in 3.7 (it's easy
to do that!) and then enable them in 3.8 when there's an actual clear
use case?


+1

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Chris Barker
On Fri, Feb 2, 2018 at 7:38 AM, Elvis Pranskevichus 
wrote:

> On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
> > However, I don't feel very strongly about this. As I've said, I expect
> > the use cases for hash=True to be very, very rare.
>
> Why do you think that the requirement to make a dataclass hashable is a
> "very, very rare" requirement?


I think what's rare is wanting hashability without it being frozen.


> Just put yourself in the shoes of an average Python developer.  You try
> to put a dataclass in a set, you get a TypeError.  Your immediate
> reaction is to add "hash=True".  Things appear to work.


agreed, the easy and obvious way should be to make it frozen -- if it's
hard to make it hashable and not frozen, then that's good.

But it is nice to have the __hash__ generated more you so maybe a flag
for "unfrozen_hashable" -- really klunky, but if that is a rare need, then
there you go.

Or maybe:

If either hash or frozen is specified, it become both frozen and hashable.

If both are specified, then it does what the user is asking for.

-CHB
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Eric V. Smith

On 2/2/2018 10:56 AM, Elvis Pranskevichus wrote:

On Friday, February 2, 2018 10:51:11 AM EST Eric V. Smith wrote:

I was specifically talking about the case of a non-frozen, hashable
class. If you want to make a class frozen and hashable, then:

@dataclass(frozen=True)

will do just that.

The case you brought up initially is the non-frozen, hashable class.
It's that case that I think is very rare. I'll ask again: what's your
use case for wanting a non-frozen, hashable class? I'm genuinely
interested.


My point is exactly that there is _no_ valid use case, so (hash=True,
frozen=False) should not be a thing!  Why are you so insistent on adding
a dangerous option which you admit is nearly useless?


Because it's not the default, it will be documented as being an advanced 
use case, and it's useful in rare instances.


And as I've said a number of times, both here and in other discussions, 
I'm not arguing strenuously for this. I just think that, given that it's 
not the default and it's not recommended and is useful in advanced 
cases, I would prefer to leave it in. I understand that you disagree 
with me.


Eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Elvis Pranskevichus
> Because it's not the default, it will be documented as being an
> advanced use case, and it's useful in rare instances.
> 
> And as I've said a number of times, both here and in other
> discussions, I'm not arguing strenuously for this. I just think that,
> given that it's not the default and it's not recommended and is
> useful in advanced cases, I would prefer to leave it in. I understand
> that you disagree with me.

Is there a real world example of such an "advanced case"?  

Eric, have you read https://github.com/python-attrs/attrs/issues/136 ?  
Specifically this comment from Hynek [1]: 

"I never really thought about it, but yeah mutable objects shouldn’t 
have a __hash__ at all." 

It is clear from that thread that "hash=True" was an early design 
mistake, which was left in for compatibility reasons.  Why are we 
copying bad design to the standard library?

  Elvis


[1] https://github.com/python-attrs/attrs/issues/
136#issuecomment-277425421
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Yury Selivanov
On Fri, Feb 2, 2018 at 10:51 AM, Paul Moore  wrote:
[..]
> To put it another way, using your words above, "The moment you want to
> use a dataclass a a dict key, or put it in a set, you need it to be
> *immutable*" (not hashable, unless you really know what you're doing).

Can someone clarify what is the actual use case of someone *knowingly*
making a mutable collection hashable?  Why can't that advanced user
write their own __hash__ implementation?  It's easy to do so.

For what it's worth I think this argument is being blindly used to
justify the current questionable design of "dataclass(hash=True)"
being the same as "dataclass(hash=True, frozen=False) case.  At least
a few other core developers are concerned with this, but all I see is
"attrs does the same thing".

Eric, in my opinion we shouldn't copy attrs.  It was designed as an
external package with its own backwards-compatibility story.  At some
point it was realized that "attrs(hash=True, frozen=False)" is an
anti-pattern, but it couldn't be removed at that point.  Hence the
warning in the documentation.  We can do better.

We are designing a new API that is going to be hugely popular.  Why
can't we ship it with dangerous options prohibited in 3.7 (it's easy
to do that!) and then enable them in 3.8 when there's an actual clear
use case?

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Elvis Pranskevichus
On Friday, February 2, 2018 10:51:11 AM EST Eric V. Smith wrote:
> I was specifically talking about the case of a non-frozen, hashable
> class. If you want to make a class frozen and hashable, then:
> 
> @dataclass(frozen=True)
> 
> will do just that.
> 
> The case you brought up initially is the non-frozen, hashable class.
> It's that case that I think is very rare. I'll ask again: what's your
> use case for wanting a non-frozen, hashable class? I'm genuinely
> interested.

My point is exactly that there is _no_ valid use case, so (hash=True, 
frozen=False) should not be a thing!  Why are you so insistent on adding 
a dangerous option which you admit is nearly useless?

   Elvis
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Paul Moore
On 2 February 2018 at 15:38, Elvis Pranskevichus  wrote:
> On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
>> However, I don't feel very strongly about this. As I've said, I expect
>> the use cases for hash=True to be very, very rare.
>
> Why do you think that the requirement to make a dataclass hashable is a
> "very, very rare" requirement?  The moment you want to use a dataclass a
> a dict key, or put it in a set, you need it to be hashable.
>
> Just put yourself in the shoes of an average Python developer.  You try
> to put a dataclass in a set, you get a TypeError.  Your immediate
> reaction is to add "hash=True".  Things appear to work.  Then, you, or
> someone else, decides to mutate the dataclass object and then you are
> looking at a very frustrating debug session.

If I saw someone try to put a dataclass into a set, I'd point out that
dataclasses are *mutable*, and if they want immutable values they
should use "frozen=True". If it were me in that situation, that's what
I'd do as well. Adding hashability to a mutable object would *never*
be my immediate reaction.

To put it another way, using your words above, "The moment you want to
use a dataclass a a dict key, or put it in a set, you need it to be
*immutable*" (not hashable, unless you really know what you're doing).

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Eric V. Smith

On 2/2/2018 10:38 AM, Elvis Pranskevichus wrote:

On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:

However, I don't feel very strongly about this. As I've said, I expect
the use cases for hash=True to be very, very rare.


Why do you think that the requirement to make a dataclass hashable is a
"very, very rare" requirement?  The moment you want to use a dataclass a
a dict key, or put it in a set, you need it to be hashable.


I was specifically talking about the case of a non-frozen, hashable 
class. If you want to make a class frozen and hashable, then:


@dataclass(frozen=True)

will do just that.

The case you brought up initially is the non-frozen, hashable class. 
It's that case that I think is very rare. I'll ask again: what's your 
use case for wanting a non-frozen, hashable class? I'm genuinely interested.


You seem to think that hash=True means "make the class hashable". That's 
not true. It means "add a __hash__" to this class". There are other, 
better ways to make the class hashable, specifically frozen=True. You 
might want to read all of https://bugs.python.org/issue32513 for the 
background on the current behavior.



Just put yourself in the shoes of an average Python developer.  You try
to put a dataclass in a set, you get a TypeError.  Your immediate
reaction is to add "hash=True".  Things appear to work.  Then, you, or
someone else, decides to mutate the dataclass object and then you are
looking at a very frustrating debug session.


I will be documented that the correct way to do this is frozen=True.


In all, I think we're better off documenting best practices and making
them the default, like attrs does, and leave it to the programmer to
follow them. I realize we're handing out footguns


I don't think attrs doing the same thing is a valid justification.  This
is a major footgun that is very easy to trigger, and there's really no
precedent in standard data types.


the alternatives seem even more complex and are limiting.


The alternative is simple and follows the design of other standard
containers: immutable containers are hashable, mutable containers are
not.  @dataclass(frozen=False) gives you a SimpleNamespace-like and
@dataclass(frozen=True) gives you a namedtuple-like.  If you _really_
know what you are doing, then you can always declare an explicit
__hash__.


I'm not sure what you're arguing for here. This is how dataclasses work.


The problem with dropping hash=True is: how would you write __hash__
yourself?


Is "def __hash__(self): return hash((self.field1, self.field2))" that
hard?  It is explicit, and you made a concious choice, i.e you
understand how __hash__ works.


I didn't say it was hard, I said it needed to be kept up to date as you 
add fields. That is, you'd have to duplicate the field list. dataclasses 
is trying to prevent you from repeating the field list anywhere.



IMO, the danger of
"@dataclass(hash=True)" far overweighs whatever convenience it might
provide.


We'll just have to disagree about this. As I said, I don't feel very 
strongly about it, but I lean toward leaving it in and documenting it 
for what it is and does.


Eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Elvis Pranskevichus
On Friday, February 2, 2018 10:08:43 AM EST Eric V. Smith wrote:
> However, I don't feel very strongly about this. As I've said, I expect
> the use cases for hash=True to be very, very rare. 

Why do you think that the requirement to make a dataclass hashable is a 
"very, very rare" requirement?  The moment you want to use a dataclass a 
a dict key, or put it in a set, you need it to be hashable.  

Just put yourself in the shoes of an average Python developer.  You try 
to put a dataclass in a set, you get a TypeError.  Your immediate 
reaction is to add "hash=True".  Things appear to work.  Then, you, or 
someone else, decides to mutate the dataclass object and then you are 
looking at a very frustrating debug session.

> In all, I think we're better off documenting best practices and making 
> them the default, like attrs does, and leave it to the programmer to 
> follow them. I realize we're handing out footguns

I don't think attrs doing the same thing is a valid justification.  This 
is a major footgun that is very easy to trigger, and there's really no 
precedent in standard data types.

> the alternatives seem even more complex and are limiting.

The alternative is simple and follows the design of other standard 
containers: immutable containers are hashable, mutable containers are 
not.  @dataclass(frozen=False) gives you a SimpleNamespace-like and 
@dataclass(frozen=True) gives you a namedtuple-like.  If you _really_ 
know what you are doing, then you can always declare an explicit 
__hash__.  

> The problem with dropping hash=True is: how would you write __hash__ 
> yourself?

Is "def __hash__(self): return hash((self.field1, self.field2))" that 
hard?  It is explicit, and you made a concious choice, i.e you 
understand how __hash__ works.  IMO, the danger of 
"@dataclass(hash=True)" far overweighs whatever convenience it might 
provide.

   Elvis
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Eric V. Smith

On 2/2/2018 12:33 AM, Nick Coghlan wrote:

For  3.7, I think we should seriously considered just straight up
disallowing the "hash=True, frozen=False" combination, and instead
require folks to provide their own hash function in that case.
"Accidentally hashable" (whether by identity or field hash) isn't a
thing that data classes should be allowing to happen.

If we did that, then the public "hash" parameter could potentially be
dropped entirely for the time being - the replacement for "hash=True"
would be a "def __hash__: ..." in the body of the class definition,
and the replacement for "hash=False" would be "__hash__ = None" in the
class body.


attrs has the same behavior (if you ignore how dataclasses handles the 
cases where __hash__ or __eq__ already exist in the class definition). 
Here's what attrs says about adding __hash__ via hash=True:


"Although not recommended, you can decide for yourself and force attrs 
to create one (e.g. if the class is immutable even though you didn’t 
freeze it programmatically) by passing True or not. Both of these cases 
are rather special and should be used carefully."


The problem with dropping hash=True is: how would you write __hash__ 
yourself? It seems like a bug magnet if you're adding fields to the 
class and forget to update __hash__, especially in the presence of 
per-field hash=False and eq=False settings. And you'd need to make sure 
it matches the generated __eq__ (if 2 objects are equal, they need to 
have the same hash value).


If we're going to start disallowing things, how about the per-field 
hash=True, eq=False case?


However, I don't feel very strongly about this. As I've said, I expect 
the use cases for hash=True to be very, very rare. And now that we allow 
overriding __hash__ in the class body without setting hash=False, there 
aren't a lot of uses for hash=False, either. But we would need to think 
through how you'd get the behavior of hash=False with multiple 
inheritance, if that's what you wanted. Again, a very, very rare case.


In all, I think we're better off documenting best practices and making 
them the default, like attrs does, and leave it to the programmer to 
follow them. I realize we're handing out footguns, the alternatives seem 
even more complex and are limiting.


Eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-02 Thread Elvis Pranskevichus
On Friday, February 2, 2018 12:33:04 AM EST Nick Coghlan wrote:
> For  3.7, I think we should seriously considered just straight up
> disallowing the "hash=True, frozen=False" combination, and instead
> require folks to provide their own hash function in that case.
> "Accidentally hashable" (whether by identity or field hash) isn't a
> thing that data classes should be allowing to happen.
> 
> If we did that, then the public "hash" parameter could potentially
> be dropped entirely for the time being - the replacement for
> "hash=True" would be a "def __hash__: ..." in the body of the class
> definition, and the replacement for "hash=False" would be "__hash__
> = None" in the class body.

I think "frozen=True" should just imply hashability (by fields).  You 
can always do "__hash__ = None" to opt out.  I don't see the default 
hashability of an immutable object as a problem.  tuples and 
frozensets are hashable after all.

   Elvis


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Nick Coghlan
On 2 February 2018 at 11:49, Elvis Pranskevichus  wrote:
> In my experience this type of breakage is so subtle that people will
> happily write code lots of code like this without noticing.  My main
> objection here is that the dataclass does not go far enough to prevent
> obviously wrong behaviour.  Or it goes too far with the whole hash/
> frozen distinction.

For  3.7, I think we should seriously considered just straight up
disallowing the "hash=True, frozen=False" combination, and instead
require folks to provide their own hash function in that case.
"Accidentally hashable" (whether by identity or field hash) isn't a
thing that data classes should be allowing to happen.

If we did that, then the public "hash" parameter could potentially be
dropped entirely for the time being - the replacement for "hash=True"
would be a "def __hash__: ..." in the body of the class definition,
and the replacement for "hash=False" would be "__hash__ = None" in the
class body.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Elvis Pranskevichus
On Thursday, February 1, 2018 8:37:41 PM EST Eric V. Smith wrote:
> > hash=None and compare=True leads to the same result, which, I
> > think is even worse.
> 
> Have you actually tried that?

I meant this:

@dataclasses.dataclass(hash=True)
class A:
foo: int = dataclasses.field(compare=True)

> I don't recommend ever specifying the decorator hash= parameter
> unless you have an unusual use case, in which case it's on you to
> get it right. 

In my experience this type of breakage is so subtle that people will 
happily write code lots of code like this without noticing.  My main 
objection here is that the dataclass does not go far enough to prevent 
obviously wrong behaviour.  Or it goes too far with the whole hash/
frozen distinction.

 Elvis


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Eric V. Smith

On 2/1/2018 8:29 PM, Elvis Pranskevichus wrote:

On Thursday, February 1, 2018 8:21:03 PM EST Eric V. Smith wrote:

I should add: This is why you shouldn't override the default
(hash=None) unless you know what the consequences are. Can I ask
why you want to specify hash=True?


hash=None and compare=True leads to the same result, which, I think is
even worse.


Have you actually tried that?

>>> @dataclass(hash=None)
... class A:
...   foo: int = field(hash=True, compare=True)
...
>>> hash(A(2))
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'A'

I believe the default hash=None on the class decorator does right thing. 
Please provide a counter-example.


>> You're allowed to override the parameters to dataclasses.dataclass
>> for cases where you know what you're doing. Consenting adults,
>> and all.
>
> I don't agree with this.  You are comparing implicit dataclass
> behavior with an explicit shoot-in-the-foot __hash__() definition.

I don't recommend ever specifying the decorator hash= parameter unless 
you have an unusual use case, in which case it's on you to get it right. 
There was recently a long python-dev discussion on this issue. I need to 
update the PEP to reflect it, but the advice still stands: you almost 
always want to use the default hash=None.


Do you have a use case for specifying a hash function on a class with 
mutable instances? Maybe you want frozen=True?


Eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Elvis Pranskevichus
On Thursday, February 1, 2018 8:21:03 PM EST Eric V. Smith wrote:
> I should add: This is why you shouldn't override the default
> (hash=None) unless you know what the consequences are. Can I ask
> why you want to specify hash=True?

hash=None and compare=True leads to the same result, which, I think is 
even worse.  

> You're allowed to override the parameters to dataclasses.dataclass
> for cases where you know what you're doing. Consenting adults,
> and all.

I don't agree with this.  You are comparing implicit dataclass 
behavior with an explicit shoot-in-the-foot __hash__() definition.

Elvis


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Eric V. Smith

On 2/1/2018 8:17 PM, Eric V. Smith wrote:

On 2/1/2018 7:34 PM, Elvis Pranskevichus wrote:

There appears to be a critical omission from the current dataclass
implementation: it does not make hash=True fields immutable.

Per Python spec:

"the implementation of hashable collections requires that a key’s hash
value is immutable (if the object’s hash value changes, it will be in
the wrong hash bucket)"

Yet:

 import dataclasses

 @dataclasses.dataclass(hash=True)
 class A:
 foo: int = dataclasses.field(hash=True, compare=True)

 a = A(foo=1)

 s = set()
 s.add(a)   # s == {a}
 a.foo = 2

 print(a in s)
 print({a} == s}
 print(s == s)

 # prints False False True


This looks to me like a clearly wrong behavior.


 Elvis


Data classes do not protect you from doing the wrong thing. This is the 
same as writing:


class A:
     def __init__(self, foo):
     self.foo = foo
     def __hash__(self):
     return hash((self.foo,))

You're allowed to override the parameters to dataclasses.dataclass for 
cases where you know what you're doing. Consenting adults, and all.


I should add: This is why you shouldn't override the default (hash=None) 
unless you know what the consequences are. Can I ask why you want to 
specify hash=True?


Eric

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Eric V. Smith

On 2/1/2018 7:34 PM, Elvis Pranskevichus wrote:

There appears to be a critical omission from the current dataclass
implementation: it does not make hash=True fields immutable.

Per Python spec:

"the implementation of hashable collections requires that a key’s hash
value is immutable (if the object’s hash value changes, it will be in
the wrong hash bucket)"

Yet:

 import dataclasses

 @dataclasses.dataclass(hash=True)
 class A:
 foo: int = dataclasses.field(hash=True, compare=True)

 a = A(foo=1)

 s = set()
 s.add(a)   # s == {a}
 a.foo = 2

 print(a in s)
 print({a} == s}
 print(s == s)

 # prints False False True


This looks to me like a clearly wrong behavior.


 Elvis


Data classes do not protect you from doing the wrong thing. This is the 
same as writing:


class A:
def __init__(self, foo):
self.foo = foo
def __hash__(self):
return hash((self.foo,))

You're allowed to override the parameters to dataclasses.dataclass for 
cases where you know what you're doing. Consenting adults, and all.


Eric.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Dataclasses and correct hashability

2018-02-01 Thread Elvis Pranskevichus
There appears to be a critical omission from the current dataclass 
implementation: it does not make hash=True fields immutable.  

Per Python spec: 

"the implementation of hashable collections requires that a key’s hash 
value is immutable (if the object’s hash value changes, it will be in 
the wrong hash bucket)"

Yet:

import dataclasses

@dataclasses.dataclass(hash=True)
class A:
foo: int = dataclasses.field(hash=True, compare=True)

a = A(foo=1)

s = set()
s.add(a)   # s == {a}
a.foo = 2

print(a in s)
print({a} == s}
print(s == s)

# prints False False True


This looks to me like a clearly wrong behavior.


Elvis


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com