Re: [Python-Dev] Immutability vs. hashability

2018-02-12 Thread Chris Barker
On Mon, Feb 5, 2018 at 5:17 PM, Steven D'Aprano  wrote:

> I'm not happy about the concept of pandering to the least capable, most
> ignorant programmers by baking a miscomprehension into an important
> standard library API.


I don't think this is "baking a miscomprehension", but rather adhering to
the principle of least surprize.


> (Things would be different if we just outright banned mutable+hashable,
> but I don't think anyone wants that.)
>

yeah, though I'm still not sure what the hashable mutable dataclass use
case is.

Fortunately, I also believe that the number of programmers who would
> fail to draw the right conclusion from the existence of separate
> switches will actually be pretty small in practice. The fact that there
> are two separate switches is a pretty big clue that mutability and
> hashability can be controlled separately.
>

well, yes, but as we all know, more people read code than write it -- so we
should be concerned about what a mid-level programmer will conclude when
s/he sees:

@dataclass(frozen=True)

or

@dataclass(hash=True)

It may not be clear that the other options is available, and what it's
default it.


> I believe that the proposed API is much simpler to understand than your
> revision. We have:
>
> - frozen and hash both default to False;
> - if you explicitly set one, the other uses the default.
>
> This corresponds to a very common, Pythonic pattern that nearly
> everyone is familiar with:
>
> def spam(frozen=False, hash=False):
> ...
>
> which is easy to understand and easy to explain. Versus your proposal:
>
> - if you set neither, then frozen and hash both default to False;
> - but if you explicitly set one, the other uses True, namely the
>   opposite of the standard default.
>
> which corresponds to something harder to describe and much less common:
>
> def spam(frozen=None, hash=None):
> if frozen is hash is None:
> frozen = hash = False
> elif frozen is None:
> frozen = True
> elif hash is None:
> hash = True
> ...
>
> "frozen and hash default to True, unless neither are set, in which case
> they default to False."
>

I agree that it's easier to document, but we all know no one reads
documentation anyway.

My  argument (which may not be correct) is based on the idea that the
hash=True, frozen=False is the rare, specialized use case -- the folks that
need that will figure it out.

(1) I set frozen=True thinking that makes the class hashable so I can
> use it in a set or hash. The first time I actually do so, I get an
> explicit and obvious TypeError. Problem solved.[1]
>
> (2) I set hash=True thinking that makes the class frozen. This scenario
> is more problematic, because there's no explicit and obvious error when
> I get it wrong. Instead, my program could silently do the wrong thing if
> my instances are quietly mutated.
>
> The first error is self-correcting, and so I believe that the second is
> the only one we should worry about.


fair enough.


> - how much should we worry? (how often will this happen?);
>

I probably wouldn't be worried at all, except that the word "frozen" is
used in the built-in frozenset, with this documentation:

"""
The frozenset type is immutable and hashable — its contents cannot be
altered after it is created; it can therefore be used as a dictionary key
or as an element of another set.
"""

and frozenset pops up pretty high in google if you search "frozen python
frozen" (second to the topic of frozen binaries...)


> I don't think this is a failure mode that we need to be concerned with.
> We can't protect everyone from everything.
>

I suppose you're right -- I think were we disagree is how confusing my
proposal is -- I think the common less error prone use cases should be easy
and obvious to do, and the uncommon, more "expert" use-cases should be
possible and clearly documented when the code is read.

But it's not that big a deal -- I'm done -- thanks for specifically
addressing the issues I brought up.

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-12 Thread Chris Barker
I don't seem to be getting my own messages back to reply to, but:

yes, of course, dataclasses won't hash if a field is mutable:


In [*58*]: @dataclasses.dataclass(hash=*True*, frozen=*True*)

...: *class* *Hash*:

...: x: int  = 5

...: l: list = dataclasses.field(default_factory=list)

...:

...:

...:

...:


In [*59*]: h = Hash()


In [*60*]: hash(h)

---

TypeError Traceback (most recent call last)

 in ()

> 1 hash(h)


/Users/chris.barker/miniconda2/envs/py3/lib/python3.6/site-packages/dataclasses.py
in __hash__(self)


TypeError: unhashable type: 'list'


And the hash does depend on the values of the fields.So you can REALLY
ignore my previous note.

Again, sorry for the noise.

-CHB



On Mon, Feb 5, 2018 at 3:37 PM, Steven D'Aprano  wrote:

> On Sun, Feb 04, 2018 at 09:18:25PM -0800, Guido van Rossum wrote:
>
> > The way I think of it generally is that immutability is a property of
> > types, while hashability is a property of values.
>
> That's a great way to look at it, thanks.
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> chris.barker%40noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-12 Thread Chris Barker
On Mon, Feb 5, 2018 at 3:37 PM, Steven D'Aprano  wrote:

> On Sun, Feb 04, 2018 at 09:18:25PM -0800, Guido van Rossum wrote:
>
> > The way I think of it generally is that immutability is a property of
> > types, while hashability is a property of values.
>
> That's a great way to look at it, thanks.
>

hmm -- maybe we should get a ValueError then when you try to use a
non-hashable value?

In [*9*]: t = ([1,2,3],)


In [*10*]: set(t)

---

TypeError Traceback (most recent call last)

 in ()

> 1 set(t)


TypeError: unhashable type: 'list'


Of course, in this case, the error is triggered by the type of the zeroth
element of the tuple, not by the  value of the tuple per se.

Which means that hashability really is a property of type -- but container
types require a specific way of thinking -- hashability is not determined
by the type of the container (or may not be), but by the types of it's
contents. Is that the value of the container?

So maybe: an object is hashable if it is a hashable type, and if it is a
container, if it's contents are hashable types.


With dataclasses as they stand -- it seems the values of the fields does
not affect hashability:

(this is the version 0.4 from PyPi -- disregard if it's out of date)

Unhashable by default:

In [*14*]: @dataclasses.dataclass()

...: *class* *NoHash*:

...: x = 5

...: l = [1,2,3]

...:


In [*15*]: set((nh,))

---

TypeError Traceback (most recent call last)

 in ()

> 1 set((nh,))


TypeError: unhashable type: 'NoHash'



OK, that's what we expect.



But then if it is hashable:

In [*19*]: @dataclasses.dataclass(hash=*True*)

...: *class* *Hash*:

...: x = 5

...: l = [1,2,3]

...:


In [*20*]: h = Hash()


In [*21*]: set((h,))

Out[*21*]: {Hash()}



All works, regardless of the values of the fields

I haven't looked at the code -- but it appears the hash has nothing to do
with the values of the fields:

In [*23*]: hash(h)

Out[*23*]: 3527539


In [*24*]: h.l.append(6)


In [*25*]: hash(h)

Out[*25*]: 3527539


In [*26*]: h.x = 7


In [*27*]: hash(h)

Out[*27*]: 3527539

and it looks like all instances hash the same:

In [*31*]: h2 = Hash()


In [*32*]: hash(h2)

Out[*32*]: 3527539


In [*33*]: hash(h)

Out[*33*]: 3527539

So I'm wondering how hashablility is useful at all?

But it sure looks like there's a lot of room for confusion and error, even
if it's a frozen dataclass.

This may a case where we need to really make sure the docs are good!

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-05 Thread Steven D'Aprano
On Mon, Feb 05, 2018 at 12:09:52AM -0600, Chris Barker wrote:

> But a bit more detail -- I'm commenting on the API, not the capability -
> that is, since users often equate hashable and immutability, they will
> expect that if they say hash=True, then will get an immutable, and if they
> say frozen=True, they will get something hashable (as long as the fields
> are hashable, just like a tuple.
> 
> That is, even though these concepts are independent, the defaults shouldn't
> reflect that.

I'm not happy about the concept of pandering to the least capable, most 
ignorant programmers by baking a miscomprehension into an important 
standard library API. The fact is that mutability and hashability ARE 
independent qualities, and the API ought to reflect reality, not 
ignorance. That's why there are two separate switches, frozen and hash, 
not just one "frozen_hashable" switch.

(Things would be different if we just outright banned mutable+hashable, 
but I don't think anyone wants that.)

Fortunately, I also believe that the number of programmers who would 
fail to draw the right conclusion from the existence of separate 
switches will actually be pretty small in practice. The fact that there 
are two separate switches is a pretty big clue that mutability and 
hashability can be controlled separately.

I believe that the proposed API is much simpler to understand than your 
revision. We have:

- frozen and hash both default to False;
- if you explicitly set one, the other uses the default.

This corresponds to a very common, Pythonic pattern that nearly 
everyone is familiar with:

def spam(frozen=False, hash=False):
...

which is easy to understand and easy to explain. Versus your proposal:

- if you set neither, then frozen and hash both default to False;
- but if you explicitly set one, the other uses True, namely the 
  opposite of the standard default.

which corresponds to something harder to describe and much less common:

def spam(frozen=None, hash=None):
if frozen is hash is None:
frozen = hash = False
elif frozen is None:
frozen = True
elif hash is None:
hash = True
...

"frozen and hash default to True, unless neither are set, in which case 
they default to False."


Let's look at the two possible scenarios you are worried about:

(1) I set frozen=True thinking that makes the class hashable so I can 
use it in a set or hash. The first time I actually do so, I get an 
explicit and obvious TypeError. Problem solved.[1]

(2) I set hash=True thinking that makes the class frozen. This scenario 
is more problematic, because there's no explicit and obvious error when 
I get it wrong. Instead, my program could silently do the wrong thing if 
my instances are quietly mutated.

The first error is self-correcting, and so I believe that the second is 
the only one we should worry about. There are two questions:

- how much should we worry? (how often will this happen?);

- what do we do about it?

I think the answers ought to be, not much and nothing. Or *at most*, 
raise a *warning* when hash=True is set without also explicitly setting 
frozen. But even that seems unnecessary to me.

I think that the intersection of events needed for this to be a real 
problem will be fairly small:

- people using DataClasses;
- who want a frozen, hashable class;
- and believe that the two are equivalent;
- and who weren't clued in by the existence of separate switches;
- and set hash=True without frozen=True;
- and don't write a unit test to confirm that their data is immutable;
- and accidentally mutate an instance which they thought was immutable;
- in such a way as to cause a silent failure.

I don't think this is a failure mode that we need to be concerned with. 
We can't protect everyone from everything.



[1] Yes, I'm glossing over the possible annoyance if not difficulty of 
actually solving the problem: somebody has to raise a bug report, 
someone has to fix the bug which in principle could involve a lot of 
disruption, there should be regression tests and maybe a new release of 
the application, etc. But this is par for the course for *any* bug -- 
there's no need to imagine that this specific bug is so terrible that 
the standard library needs to protect programmers from the possibility 
of ordinary, run-of-the-mill bugs.

-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-05 Thread Steven D'Aprano
On Sun, Feb 04, 2018 at 09:18:25PM -0800, Guido van Rossum wrote:

> The way I think of it generally is that immutability is a property of
> types, while hashability is a property of values.

That's a great way to look at it, thanks.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-05 Thread Antoine Pitrou
On Sun, 4 Feb 2018 14:31:06 -0800
Guido van Rossum  wrote:
> On Sun, Feb 4, 2018 at 11:59 AM, Chris Barker - NOAA Federal <
> chris.bar...@noaa.gov> wrote:  
> 
> > I think the folks that are concerned about this issue are quite right
> > — most Python users equate immutable and hashable—so the dataclass API
> > should reflect that.
> >  
> 
> Since they are *not* equivalent (consider a tuple containing a list) I'm
> not at all convinced that any API in the core language should "reflect"
> this misconception, depending on how you meant that.

+1 from me.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-04 Thread Chris Barker
On Sun, Feb 4, 2018 at 7:54 PM, Nick Coghlan  wrote:

> On 5 February 2018 at 08:31, Guido van Rossum  wrote:
> > On Sun, Feb 4, 2018 at 11:59 AM, Chris Barker - NOAA Federal
> >  wrote:
> >>
> >> I think the folks that are concerned about this issue are quite right
> >> — most Python users equate immutable and hashable—so the dataclass API
> >> should reflect that.
> >
> > Since they are *not* equivalent (consider a tuple containing a list) I'm
> not
> > at all convinced that any API in the core language should "reflect" this
> > misconception, depending on how you meant that.
>
> Lists are themselves mutable, and hence inherently unhashable.
>
> Tuples are themselves immutable, and hence hashable if their contents are.
>
> I interpret Chris's comment as saying that data classes should behave
> the same way that the builtin container types do:
>

pretty much, yes,

But a bit more detail -- I'm commenting on the API, not the capability -
that is, since users often equate hashable and immutability, they will
expect that if they say hash=True, then will get an immutable, and if they
say frozen=True, they will get something hashable (as long as the fields
are hashable, just like a tuple.

That is, even though these concepts are independent, the defaults shouldn't
reflect that.

It's the ability to ask the interpreter to guess what you mean
> "frozen=False, hash=True" that creates the likelihood of confusion.
>

Actually, I think if the user does explicitly specify: "frozen=False,
hash=True", then that's what they should get, and it's a pretty fragile
beast, but apparently there's enough of a use case for folks to want it,
and I don't think it's a confusing API.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-04 Thread Guido van Rossum
That's a lot to read between the lines. I was unhappy that Chris took the
statement that immutability and hashability are equivalent, claimed that
most people think of it that way, and did not point out that it was false,
thereby making the impression that he wasn't aware of the difference.

The way I think of it generally is that immutability is a property of
types, while hashability is a property of values.

I don't want the original debate (about what to do with hash=True for
dataclasses) to be spread across multiple threads so I'll reply separately
there.

On Sun, Feb 4, 2018 at 5:54 PM, Nick Coghlan  wrote:

> On 5 February 2018 at 08:31, Guido van Rossum  wrote:
> > On Sun, Feb 4, 2018 at 11:59 AM, Chris Barker - NOAA Federal
> >  wrote:
> >>
> >> I think the folks that are concerned about this issue are quite right
> >> — most Python users equate immutable and hashable—so the dataclass API
> >> should reflect that.
> >
> > Since they are *not* equivalent (consider a tuple containing a list) I'm
> not
> > at all convinced that any API in the core language should "reflect" this
> > misconception, depending on how you meant that.
>
> Lists are themselves mutable, and hence inherently unhashable.
>
> Tuples are themselves immutable, and hence hashable if their contents are.
>
> I interpret Chris's comment as saying that data classes should behave
> the same way that the builtin container types do:
>
> * if the data class itself is mutable (frozen=False, comparable to
> list, dict, set), then it is *not* hashable (unless you explicitly
> implement __hash__)
>
> * if the data class itself is immutable (frozen=True, comparable to
> tuple or frozenset), then whether or not it is hashable depends on
> whether or not the field values are hashable.
>
> It's the ability to ask the interpreter to guess what you mean
> "frozen=False, hash=True" that creates the likelihood of confusion.
>
> Whereas if we leave out the "hash=True" option entirely, then the most
> natural way to obtain a partially-mutable record, which has a fixed
> comparison key and selectively mutable state, then the recommended way
> of handling that would be through containment, where the mutable state
> is moved out to a subrecord that gets excluded from hashes and
> comparisons.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Immutability vs. hashability

2018-02-04 Thread Nick Coghlan
On 5 February 2018 at 08:31, Guido van Rossum  wrote:
> On Sun, Feb 4, 2018 at 11:59 AM, Chris Barker - NOAA Federal
>  wrote:
>>
>> I think the folks that are concerned about this issue are quite right
>> — most Python users equate immutable and hashable—so the dataclass API
>> should reflect that.
>
> Since they are *not* equivalent (consider a tuple containing a list) I'm not
> at all convinced that any API in the core language should "reflect" this
> misconception, depending on how you meant that.

Lists are themselves mutable, and hence inherently unhashable.

Tuples are themselves immutable, and hence hashable if their contents are.

I interpret Chris's comment as saying that data classes should behave
the same way that the builtin container types do:

* if the data class itself is mutable (frozen=False, comparable to
list, dict, set), then it is *not* hashable (unless you explicitly
implement __hash__)

* if the data class itself is immutable (frozen=True, comparable to
tuple or frozenset), then whether or not it is hashable depends on
whether or not the field values are hashable.

It's the ability to ask the interpreter to guess what you mean
"frozen=False, hash=True" that creates the likelihood of confusion.

Whereas if we leave out the "hash=True" option entirely, then the most
natural way to obtain a partially-mutable record, which has a fixed
comparison key and selectively mutable state, then the recommended way
of handling that would be through containment, where the mutable state
is moved out to a subrecord that gets excluded from hashes and
comparisons.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Immutability vs. hashability

2018-02-04 Thread Guido van Rossum
On Sun, Feb 4, 2018 at 11:59 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:

> I think the folks that are concerned about this issue are quite right
> — most Python users equate immutable and hashable—so the dataclass API
> should reflect that.
>

Since they are *not* equivalent (consider a tuple containing a list) I'm
not at all convinced that any API in the core language should "reflect"
this misconception, depending on how you meant that.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com