Re: Planning a Python Course for Beginners

2017-08-11 Thread Ned Batchelder
On 8/11/17 6:37 AM, Python wrote:
>  Marko Rauhamaa wrote:
>> Python :
>>
>>> Marko Rauhamaa wrote:
>>> I didn't disagree with any of these statements about __hash__, but only
>>> your statement about id and __eq__:
>>>
 id() is actually an ideal return value of __hash__(). The only
 criterion is that the returned number should be different if the
 __eq__() is False. That is definitely true for id()
>>>
>>> nan is a clear, simple, undeniable counterexample to that claim.
>>
>> Still, I don't see the point you are trying to make.
>
> You do have a cognitive disease, don't you?
>
>
>

Maybe it's time to just drop it, rather than starting to insult each other?

--Ned.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-11 Thread Python

 Marko Rauhamaa wrote:

Python :


Marko Rauhamaa wrote:
I didn't disagree with any of these statements about __hash__, but only
your statement about id and __eq__:


id() is actually an ideal return value of __hash__(). The only
criterion is that the returned number should be different if the
__eq__() is False. That is definitely true for id()


nan is a clear, simple, undeniable counterexample to that claim.


Still, I don't see the point you are trying to make.


You do have a cognitive disease, don't you?



--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Gregory Ewing

Marko Rauhamaa wrote:

Of course, some algorithms can (and, we have learned, do) prefer some
bits over others, but that's inside the implementation black box. I
would think every bit should carry an approximately equal weight.


Ideally that would be true, but you need to consider the performance
cost of making it so. Dict could go to the trouble of thoroughly
scrambling the hash bits before even making the first probe, but
that would slow down *every* dict lookup.

The way things are, it uses a very simple technique for the first
probe that *usually* gives good results, which speeds things up
overall.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Gregory Ewing

Steve D'Aprano wrote:

On Thu, 10 Aug 2017 07:00 pm, Peter Otten wrote:


   /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
  excessive hash collisions for dicts and sets */


which I think agrees with my comment: using the id() itself would put too many
objects in the same bucket (i.e. too many collisions).


I suspect this is more of a minor performance tweak than a vital
issue. Otherwise it would mean that dict's algorithm for
assigning items to buckets based on the hash isn't all that
great.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Gregory Ewing

Python wrote:

Marko Rauhamaa wrote:


id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id()


nan is a clear, simple, undeniable counterexample to that claim.


It's a counterexample to the claim that id() *must* be different if
__eq__() is False, but that's not the claim that was made. The
claim was that it *should* be different, which allows for the
possibility that it might not be different.

(I'll put away my hairsplitting axe and go away now.)

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Chris Angelico :

> On Fri, Aug 11, 2017 at 7:17 AM, Marko Rauhamaa  wrote:
>> That's interesting, but suggests there's something weird (~ suboptimal)
>> going on with CPython's scrambling algorithm. Also, your performance
>> test might yield completely different results on other Python
>> implementations.
>>
>> Apart from uniqueness, there's no particular reason to prefer one
>> __hash__() value over another as long as the interesting bits are inside
>> the CPU's simple integer range.
>
> Not true. Every time you probe a new location [1], you have to fetch
> more data from RAM. That delays you significantly. An ideal hashing
> system is going to give a high probability of giving you an empty
> bucket on the first try, to minimize the number of main memory
> accesses required.
>
> CPython's scrambling algorithm means that, even when its first try
> doesn't succeed, there's a good chance that its second will succeed.
> But that doesn't change the fact that you want the first one to
> succeed as often as possible.

What does all that have to do with where the unique bits are in the hash
value?

  0x1000
  0x2000
  0x3000
  0x4000

should be no worse as __hash__() values than

  0x1000
  0x2000
  0x3000
  0x4000

or

  1
  2
  3
  4

Of course, some algorithms can (and, we have learned, do) prefer some
bits over others, but that's inside the implementation black box. I
would think every bit should carry an approximately equal weight.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Chris Angelico
On Fri, Aug 11, 2017 at 7:17 AM, Marko Rauhamaa  wrote:
> That's interesting, but suggests there's something weird (~ suboptimal)
> going on with CPython's scrambling algorithm. Also, your performance
> test might yield completely different results on other Python
> implementations.
>
> Apart from uniqueness, there's no particular reason to prefer one
> __hash__() value over another as long as the interesting bits are inside
> the CPU's simple integer range.

Not true. Every time you probe a new location [1], you have to fetch
more data from RAM. That delays you significantly. An ideal hashing
system is going to give a high probability of giving you an empty
bucket on the first try, to minimize the number of main memory
accesses required.

CPython's scrambling algorithm means that, even when its first try
doesn't succeed, there's a good chance that its second will succeed.
But that doesn't change the fact that you want the first one to
succeed as often as possible.

ChrisA

[1] Unless it's within the same cache line, which is eight pointers
wide on my CPU. Highly unlikely when working with 100,000 pointers.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Chris Angelico :

> I'm aware of this. Doesn't change the fact that the *INITIAL INDEX* is
> based on exactly what I said.
>
> Yaknow?

What you're saying is that CPython heavily prefers the low-order bits to
be unique performance-wise. I don't know why that particular heuristic
bias was chosen.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Peter Otten <__pete...@web.de>:

> Marko Rauhamaa wrote:
>> I see no point in CPython's rotation magic.
>
> Let's see:
>
> $ cat hashperf.py
> class A(object):
> __slots__ = ["_hash"]
>
> def __hash__(self):
> return self._hash
>
> def no_magic():
> a = A()
> a._hash = id(a)
> return a
>
> def magic():
> a = A()
> a._hash = id(a) >> 4
> return a
>
> $ python3 -m timeit -s 'from hashperf import magic, no_magic; s = 
> {no_magic() for _ in range(10**5)}' 'for x in s: x in s'
> 10 loops, best of 3: 70.7 msec per loop
>
> $ python3 -m timeit -s 'from hashperf import magic, no_magic; s = {magic() 
> for _ in range(10**5)}' 'for x in s: x in s'
> 10 loops, best of 3: 52.8 msec per loop
>
> "magic" wins this makeshift test. Other than that you're right ;)

That's interesting, but suggests there's something weird (~ suboptimal)
going on with CPython's scrambling algorithm. Also, your performance
test might yield completely different results on other Python
implementations.

Apart from uniqueness, there's no particular reason to prefer one
__hash__() value over another as long as the interesting bits are inside
the CPU's simple integer range.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Chris Angelico
On Fri, Aug 11, 2017 at 6:56 AM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Fri, Aug 11, 2017 at 6:03 AM, Marko Rauhamaa  wrote:
>>> I see no point in CPython's rotation magic.
>>
>> Have you ever implemented a hashtable? The most common way to pick a
>> bucket for an object is to use modulo on the number of buckets.
>
> Like I said earlier, CPython takes the __hash__() value and scrambles
> it. Look for "perturb" in:
>
>   https://github.com/python/cpython/blob/master/Objects/dictobject.c>
>
> From a comment:
>
>Now the probe sequence depends (eventually) on every bit in the hash
>code, and the pseudo-scrambling property of recurring on 5*j+1 is
>more valuable, because it quickly magnifies small differences in the
>bits that didn't affect the initial index.

I'm aware of this. Doesn't change the fact that the *INITIAL INDEX* is
based on exactly what I said.

Yaknow?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Chris Angelico :

> On Fri, Aug 11, 2017 at 6:03 AM, Marko Rauhamaa  wrote:
>> I see no point in CPython's rotation magic.
>
> Have you ever implemented a hashtable? The most common way to pick a
> bucket for an object is to use modulo on the number of buckets.

Like I said earlier, CPython takes the __hash__() value and scrambles
it. Look for "perturb" in:

  https://github.com/python/cpython/blob/master/Objects/dictobject.c>

>From a comment:

   Now the probe sequence depends (eventually) on every bit in the hash
   code, and the pseudo-scrambling property of recurring on 5*j+1 is
   more valuable, because it quickly magnifies small differences in the
   bits that didn't affect the initial index.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Peter Otten
Marko Rauhamaa wrote:

> Peter Otten <__pete...@web.de>:
> 
>> Steve D'Aprano wrote:
>>> The C code says:
>>> 
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
>>> 
>>> which I think agrees with my comment: using the id() itself would put
>>> too many objects in the same bucket (i.e. too many collisions).
>>
>> There's a subtle diffence: you expected objects with nearby memory
>> addresses to end up in the same "bucket" while actually all addresses
>> (are likely to) have the same low bits, and creation time does not
>> matter.
> 
> I see no point in CPython's rotation magic.

Let's see:

$ cat hashperf.py
class A(object):
__slots__ = ["_hash"]

def __hash__(self):
return self._hash

def no_magic():
a = A()
a._hash = id(a)
return a

def magic():
a = A()
a._hash = id(a) >> 4
return a

$ python3 -m timeit -s 'from hashperf import magic, no_magic; s = 
{no_magic() for _ in range(10**5)}' 'for x in s: x in s'
10 loops, best of 3: 70.7 msec per loop

$ python3 -m timeit -s 'from hashperf import magic, no_magic; s = {magic() 
for _ in range(10**5)}' 'for x in s: x in s'
10 loops, best of 3: 52.8 msec per loop

"magic" wins this makeshift test. Other than that you're right ;)


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Chris Angelico
On Fri, Aug 11, 2017 at 6:03 AM, Marko Rauhamaa  wrote:
> Peter Otten <__pete...@web.de>:
>
>> Steve D'Aprano wrote:
>>> The C code says:
>>>
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
>>>
>>> which I think agrees with my comment: using the id() itself would put
>>> too many objects in the same bucket (i.e. too many collisions).
>>
>> There's a subtle diffence: you expected objects with nearby memory
>> addresses to end up in the same "bucket" while actually all addresses
>> (are likely to) have the same low bits, and creation time does not
>> matter.
>
> I see no point in CPython's rotation magic.

Have you ever implemented a hashtable? The most common way to pick a
bucket for an object is to use modulo on the number of buckets.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Peter Otten <__pete...@web.de>:

> Steve D'Aprano wrote:
>> The C code says:
>> 
>>>/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>>excessive hash collisions for dicts and sets */
>> 
>> which I think agrees with my comment: using the id() itself would put
>> too many objects in the same bucket (i.e. too many collisions).
>
> There's a subtle diffence: you expected objects with nearby memory
> addresses to end up in the same "bucket" while actually all addresses
> (are likely to) have the same low bits, and creation time does not
> matter.

I see no point in CPython's rotation magic.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Peter Otten
Steve D'Aprano wrote:

> The C code says:
> 
>>/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>excessive hash collisions for dicts and sets */
> 
> which I think agrees with my comment: using the id() itself would put too
> many objects in the same bucket (i.e. too many collisions).

There's a subtle diffence: you expected objects with nearby memory addresses 
to end up in the same "bucket" while actually all addresses (are likely to) 
have the same low bits, and creation time does not matter.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Python :

> Marko Rauhamaa wrote:
> I didn't disagree with any of these statements about __hash__, but only
> your statement about id and __eq__:
>
>> id() is actually an ideal return value of __hash__(). The only
>> criterion is that the returned number should be different if the
>> __eq__() is False. That is definitely true for id()
>
> nan is a clear, simple, undeniable counterexample to that claim.

Still, I don't see the point you are trying to make.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Chris Angelico
On Fri, Aug 11, 2017 at 2:41 AM, Steve D'Aprano
 wrote:
> On Fri, 11 Aug 2017 12:58 am, Chris Angelico wrote:
>
>> On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano
>>  wrote:
>>
>>> The C code says:
>>>
 /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
>>>
>>> which I think agrees with my comment: using the id() itself would put too
>>> many objects in the same bucket (i.e. too many collisions).
>>>
>>>
 If that were the problem it wouldn't be solved by the current approach:

>>> sample = [object() for _ in range(10)]
>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
 [1, 1, 1, 1, 1, 1, 1, 1, 1]
>>
>> A difference of 1 in a hash is usually going to mean dropping
>> something into the next bucket. A difference of 4, 8, or 16 would mean
>> that a tiny dictionary (which has 8 slots and thus uses modulo-8)
>> would have everything on the same slot.
>
> Um... yes? And how does that relate to the comment given in the source code?
>
> "bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash
> collisions for dicts and sets"
> Are we in agreement so far?

Yes, we're in agreement. It may have been unclear from my quoting
style, but the main point I was disagreeing with was this:

 If that were the problem it wouldn't be solved by the current approach:

>>> sample = [object() for _ in range(10)]
>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
 [1, 1, 1, 1, 1, 1, 1, 1, 1]

Incrementing hashes by 1 usually will put things into successive
buckets. Incrementing by 8 or 16 will usually put things into the same
bucket.

Sorry for the confusion.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Steve D'Aprano
On Fri, 11 Aug 2017 12:58 am, Chris Angelico wrote:

> On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano
>  wrote:
> 
>> The C code says:
>>
>>> /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>>excessive hash collisions for dicts and sets */
>>
>> which I think agrees with my comment: using the id() itself would put too
>> many objects in the same bucket (i.e. too many collisions).
>>
>>
>>> If that were the problem it wouldn't be solved by the current approach:
>>>
>> sample = [object() for _ in range(10)]
>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
>>> [1, 1, 1, 1, 1, 1, 1, 1, 1]
> 
> A difference of 1 in a hash is usually going to mean dropping
> something into the next bucket. A difference of 4, 8, or 16 would mean
> that a tiny dictionary (which has 8 slots and thus uses modulo-8)
> would have everything on the same slot.

Um... yes? And how does that relate to the comment given in the source code?

"bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash
collisions for dicts and sets"

According to the comment, IDs of objects are typically:

0b(bunch of bits)1000
0b(bunch of bits)

i.e. they're typically multiples of 8 or 16. Right? So modulo 8, they'll all map
to the zeroeth bucket; modulo 16, they'll all map to the zeroeth or eighth
bucket. Collisions, just like I said. (Was I wrong?)

By stripping of the first four bits, you get:

0b(bunch of bits)

which will hopefully be well-mixed modulo 8 or 16. Are we in agreement so far?


>> So my money is on object() being anomalous: because it is so small, the
>> hashes end up so similar. For "typical" classes, the hash function does a
>> much better job of mixing the hash values up.
>>
> 
> And this is also possible, but most likely the difference would simply
> widen; small dictionaries would still have all objects landing in the
> zeroth bucket. Hence rotating away the low bits.

I can't tell if you're disagreeing with me or not.

I said: using the object ID itself would be a terrible hash, because there would
be lots of collisions. Lo and behold the source code for the default hash says
(paraphrased), "don't use the ID itself, that will have lots of collisions" and
processes the ID by doing a >> 4 to strip off the least significant four bits.

So I'm genuinely puzzled on where (if anywhere) our point of disagreement is.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Python

Marko Rauhamaa wrote:

Python :


Marko Rauhamaa wrote:

Python :


Marko Rauhamaa wrote:

id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id().


$ python
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.

nan = float('NaN')
id(nan) == id(nan)

True

nan == nan

False




Point being?


It is a counter example to your claim that if __eq__(...) is false
then id should return different values.


No it's not:

 * __hash__() *should* return different values. It is neither possible
   nor necessary in the general case.

 * For NaN, there's no better alternative.

 * Dictionaries and sets try "is" before __eq__(...) so everything works
   anyway.


So, to be precise, the __hash__() rule is:

a.__hash__() *should* return a different number than b.__hash__() if
a is not b and not a.__eq__(b)

a.__hash__() *must* return the same number as b.__hash__() if
a is b or a.__eq__(b)


I didn't disagree with any of these statements about __hash__, but only
your statement about id and __eq__:


id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id()


nan is a clear, simple, undeniable counterexample to that claim.

the hash function for floats is quite interesting btw, you may want to
look what is its value for nan.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Python :

> Marko Rauhamaa wrote:
>> Python :
>>
>>> Marko Rauhamaa wrote:
 id() is actually an ideal return value of __hash__(). The only criterion
 is that the returned number should be different if the __eq__() is
 False. That is definitely true for id().
>>>
>>> $ python
>>> Python 2.7.13 (default, Jan 19 2017, 14:48:08)
>>> [GCC 6.3.0 20170118] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>> nan = float('NaN')
>> id(nan) == id(nan)
>>> True
>> nan == nan
>>> False
>>
>>
>> Point being?
>
> It is a counter example to your claim that if __eq__(...) is false
> then id should return different values.

No it's not:

 * __hash__() *should* return different values. It is neither possible
   nor necessary in the general case.

 * For NaN, there's no better alternative.

 * Dictionaries and sets try "is" before __eq__(...) so everything works
   anyway.


So, to be precise, the __hash__() rule is:

a.__hash__() *should* return a different number than b.__hash__() if
a is not b and not a.__eq__(b)

a.__hash__() *must* return the same number as b.__hash__() if
a is b or a.__eq__(b)


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Chris Angelico
On Fri, Aug 11, 2017 at 12:45 AM, Steve D'Aprano
 wrote:

> The C code says:
>
>> /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>>excessive hash collisions for dicts and sets */
>
> which I think agrees with my comment: using the id() itself would put too many
> objects in the same bucket (i.e. too many collisions).
>
>
>> If that were the problem it wouldn't be solved by the current approach:
>>
> sample = [object() for _ in range(10)]
> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
>> [1, 1, 1, 1, 1, 1, 1, 1, 1]

A difference of 1 in a hash is usually going to mean dropping
something into the next bucket. A difference of 4, 8, or 16 would mean
that a tiny dictionary (which has 8 slots and thus uses modulo-8)
would have everything on the same slot.

>
> So my money is on object() being anomalous: because it is so small, the hashes
> end up so similar. For "typical" classes, the hash function does a much better
> job of mixing the hash values up.
>

And this is also possible, but most likely the difference would simply
widen; small dictionaries would still have all objects landing in the
zeroth bucket. Hence rotating away the low bits.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Steve D'Aprano
On Thu, 10 Aug 2017 07:00 pm, Peter Otten wrote:

> Steven D'Aprano wrote:
> 
>> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
>> 
>>> Good point! A very good __hash__() implementation is:
>>> 
>>> def __hash__(self):
>>> return id(self)
>>> 
>>> In fact, I didn't know Python (kinda) did this by default already. I
>>> can't find that information in the definition of object.__hash__():
>> 
>> 
>> Hmmm... using id() as the hash would be a terrible hash function. Objects
> 
> It's actually id(self) >> 4 (almost, see C code below), to account for
> memory alignment.

Thanks for tracking that down. As you show, the default hash isn't id() itself.


The C code says:

> /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
>excessive hash collisions for dicts and sets */

which I think agrees with my comment: using the id() itself would put too many
objects in the same bucket (i.e. too many collisions).


 obj = object()
 hex(id(obj))
> '0x7f1f058070b0'
 hex(hash(obj))
> '0x7f1f058070b'
> 
 sample = (object() for _ in range(10))
 all(id(obj) >> 4 == hash(obj) for obj in sample)
> True
> 
>> would fall into similar buckets if they were created at similar times,
>> regardless of their value, rather than being well distributed.
> 
> If that were the problem it wouldn't be solved by the current approach:
> 
 sample = [object() for _ in range(10)]
 [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
> [1, 1, 1, 1, 1, 1, 1, 1, 1]


Arguably that's a flaw with the current approach that (maybe?) makes object()'s
hash too closely. But:

- perhaps it doesn't matter in practice, since the hash is taken modulo 
  the size of the hash table;

- or maybe Python's dicts and sets are good enough that a difference 
  of 1 is sufficient to give a good distribution of objects in the
  hash table;

- or maybe it does matter, but since people hardly ever use object() 
  itself as the keys in dicts, it doesn't come up.


Here's your example with a class that inherits from object, rather than object
itself:

py> class X(object):
... pass
...
py> sample = [X() for x in range(10)]
py> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
[-5338, -10910, -2976, -2284, -21326, 4, -8, 2, -4]


So my money is on object() being anomalous: because it is so small, the hashes
end up so similar. For "typical" classes, the hash function does a much better
job of mixing the hash values up.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Python

Marko Rauhamaa wrote:

Python :


Marko Rauhamaa wrote:

id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id().


$ python
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.

nan = float('NaN')
id(nan) == id(nan)

True

nan == nan

False




Point being?


It is a counter example to your claim that if __eq__(...) is false
then id should return different values.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Python :

> Marko Rauhamaa wrote:
>> id() is actually an ideal return value of __hash__(). The only criterion
>> is that the returned number should be different if the __eq__() is
>> False. That is definitely true for id().
>
> $ python
> Python 2.7.13 (default, Jan 19 2017, 14:48:08)
> [GCC 6.3.0 20170118] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 nan = float('NaN')
 id(nan) == id(nan)
> True
 nan == nan
> False


Point being?


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Python

Marko Rauhamaa wrote:

id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id().


$ python
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> nan = float('NaN')
>>> id(nan) == id(nan)
True
>>> nan == nan
False
>>>
--
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Marko Rauhamaa
Peter Otten <__pete...@web.de>:
> Steven D'Aprano wrote:
>> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
>> 
>>> Good point! A very good __hash__() implementation is:
>>> 
>>> def __hash__(self):
>>> return id(self)
>>> 
>>> In fact, I didn't know Python (kinda) did this by default already. I
>>> can't find that information in the definition of object.__hash__():
>> 
>> 
>> Hmmm... using id() as the hash would be a terrible hash function.

id() is actually an ideal return value of __hash__(). The only criterion
is that the returned number should be different if the __eq__() is
False. That is definitely true for id().

> It's actually id(self) >> 4 (almost, see C code below), to account for
> memory alignment.

Memory alignment makes no practical difference. It it is any good, the
internal implementation will further scramble and scale the returned
hash value. For example:

index = hash(obj) % prime_table_size

>> would fall into similar buckets if they were created at similar
>> times, regardless of their value, rather than being well distributed.
>
> If that were the problem it wouldn't be solved by the current approach:

It is not a problem. Hash values don't need to be well distributed, they
simply need to be discerning to tiny differences in equality.

 sample = [object() for _ in range(10)]
 [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
> [1, 1, 1, 1, 1, 1, 1, 1, 1]

Nice demo :-)


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Peter Otten
Steven D'Aprano wrote:

> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
> 
>> Good point! A very good __hash__() implementation is:
>> 
>> def __hash__(self):
>> return id(self)
>> 
>> In fact, I didn't know Python (kinda) did this by default already. I
>> can't find that information in the definition of object.__hash__():
> 
> 
> Hmmm... using id() as the hash would be a terrible hash function. Objects

It's actually id(self) >> 4 (almost, see C code below), to account for 
memory alignment.

>>> obj = object()
>>> hex(id(obj))
'0x7f1f058070b0'
>>> hex(hash(obj))
'0x7f1f058070b'

>>> sample = (object() for _ in range(10))
>>> all(id(obj) >> 4 == hash(obj) for obj in sample)
True

> would fall into similar buckets if they were created at similar times,
> regardless of their value, rather than being well distributed. 

If that were the problem it wouldn't be solved by the current approach:

>>> sample = [object() for _ in range(10)]
>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
[1, 1, 1, 1, 1, 1, 1, 1, 1]


Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
   excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-10 Thread Steven D'Aprano
On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:

> Good point! A very good __hash__() implementation is:
> 
> def __hash__(self):
> return id(self)
> 
> In fact, I didn't know Python (kinda) did this by default already. I
> can't find that information in the definition of object.__hash__():


Hmmm... using id() as the hash would be a terrible hash function. Objects 
would fall into similar buckets if they were created at similar times, 
regardless of their value, rather than being well distributed. But let's 
see whether or not objects actually do so, as you claim:

>>> a, b, c, d = "abc", "def", "ghi", "jki"
>>> [id(obj) for obj in (a,b,c,d)]
[139932454814752, 139932454814808, 139932454814920, 139932454913616]
>>> [hash(obj) for obj in (a,b,c,d)]
[7231609897320296628, -876470178105133015, -5049894847448874792, 
5697571649565117128]



Wait, maybe you're referring to hash() of object(), inherited by classes 
that don't define their own __hash__. Let's check it out:

>>> a, b, c, d = [object() for i in range(4)]
>>> [id(obj) for obj in (a,b,c,d)]
[139932455747696, 139932455747712, 139932455747728, 139932455747744]
>>> [hash(obj) for obj in (a,b,c,d)]
[8745778484231, 8745778484232, 8745778484233, 8745778484234]



Maybe object does something different for itself than for subclasses?

>>> class X(object):
... pass
... 
>>> a, b, c, d = [X() for i in range(4)]
>>> [id(obj) for obj in (a,b,c,d)]
[139932454939952, 139932454939896, 139932454940008, 139932454940064]
>>> [hash(obj) for obj in (a,b,c,d)]
[8745778433747, -9223363291076342065, -9223363291076342058, 8745778433754]



I see zero evidence that Python uses id() as the default hash.

Not even for classic classes in Python 2.




-- 
“You are deluded if you think software engineers who can't write 
operating systems or applications without security holes, can write 
virtualization layers without security holes.” —Theo de Raadt
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
Chris Angelico :

> On Wed, Aug 9, 2017 at 11:46 PM, Marko Rauhamaa  wrote:
>> Really, the most obvious use case for hashed objects is their membership
>> in a set. For example:
>>
>> invitees = set(self.bff)
>> invitees |= self.classmates()
>> invitees |= self.relatives()
>
> Okay. So you should define value by object identity - NOT any sort of
> external primary key.

Good point! A very good __hash__() implementation is:

def __hash__(self):
return id(self)

In fact, I didn't know Python (kinda) did this by default already. I
can't find that information in the definition of object.__hash__():

   https://docs.python.org/3/reference/datamodel.html?#object.__hash__>

I only found it out by trying it.

> That goes completely against your original statement, which I shall
> quote again:
>
 In relational-database terms, your "value" is the primary key and
 your "metadata" is the rest of the columns.
>
> If there is any possibility that you could have two objects in memory
> with the same primary key but other attributes different, you'd have
> major MAJOR problems with this kind of set operation.

In light of the above realization, don't override __hash__() in any way
in your class, and your object works perfectly as a key or a set member.

A __hash__() definition is only needed when your __eq__() definition is
different from "is".

As for running into "major MAJOR" problems, yes, you need to know what
you're doing and face the consequences. It's a bit analogous to sort()
depending on the definitions of the "rich" comparison.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Steve D'Aprano
On Wed, 9 Aug 2017 11:46 pm, Marko Rauhamaa wrote:

> Typically, an object's equality is simply the "is" relation.

"Typically"? I don't think so. Are you sure you've programmed in Python before?
*wink*

py> [1, 2] is [1, 2]
False

The most commonly used objects don't define equality as identity, e.g. strings,
lists, tuples, ints, bytes, dicts etc don't. It would mean that two objects
with the same value would nevertheless compare as unequal.

In general, caring about `is` (identity) is a failure of abstraction. Why should
we care about object identity? Take the value 42 -- why should anyone care
whether that is represented in computer memory by a single object or by a
billion separate objects?

You might care about memory constraints, but that's a leaky abstraction.
Ideally, where memory is not a constraint, if you care about identity, you are
probably doing it wrong.

I'll allow, in principle, that caring about the identity of stateless, valueless
objects that are defined only by their identity such as None and NotImplemented
may be acceptable. But the Singleton design pattern, as beloved by Java
programmers, puts the emphasis on the wrong place: identity, instead of state.
Why not have one or a million objects, so long as they have the same state?
Hence the Borg design pattern.

There are, in my opinion, very few legitimate uses for `is` and identity
checking, and nearly all of them are either:

- testing for None; or
- debugging implementation details.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Chris Angelico
On Wed, Aug 9, 2017 at 11:46 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Wed, Aug 9, 2017 at 10:00 PM, Marko Rauhamaa  wrote:
>>> Chris Angelico :
>>>
 Which means that its value won't change. That's what I said. Two
 things will be equal regardless of that metadata.
>>>
>>> In relational-database terms, your "value" is the primary key and
>>> your "metadata" is the rest of the columns.
>>
>> I would say the primary key is the "identity" and the rest of the
>> columns are the "value".
>
> Your response illustrates why you and I are not yet on the same page on
> this.
>
> Typically, an object's equality is simply the "is" relation. The only
> thing remaining for its usability as a key is a hash method. In fact,
> just defining:
>
>def __hash__(self):
>return 0
>
> will technically make any class applicable as a key or set member.

Yes, at the cost of making all your set operations into linear
searches. Basically, if all your objects have the same hashes, you
might as well use lists instead of sets, except that lists don't have
the methods/operators you want.

> The interesting fields of the object (which you disparagingly referred
> to as "metadata") don't need to participate in the calculation of the
> hash. You ought to pick the maximal collection of immutable fields as a
> basis of your hash, and you are all set (no pun intended).

The rules are (1) two objects that compare equal (__eq__) MUST have
the same hash; and (2) an object's hash must never change. Also, for
efficiency's sake, objects that compare unequal should ideally have
different hashes. That's why I refer to it as metadata; it's not
allowed to be part of the object's value, because two objects MUST
compare equal even if those other fields change. Can you give me a
real-world example of where two objects are equal but have important
attributes that differ?

>> But if you're defining "value" solely by the PK, then you have to ask
>> yourself what you're using this in a dictionary for - are you going to
>> construct multiple objects representing the same underlying database
>> row, and expect them to compare equal?
>
> Let's leave the relational world and return to objects.
>
> Really, the most obvious use case for hashed objects is their membership
> in a set. For example:
>
> invitees = set(self.bff)
> invitees |= self.classmates()
> invitees |= self.relatives()

Okay. So you should define value by object identity - NOT any sort of
external primary key. That goes completely against your original
statement, which I shall quote again:

>>> In relational-database terms, your "value" is the primary key and
>>> your "metadata" is the rest of the columns.

If there is any possibility that you could have two objects in memory
with the same primary key but other attributes different, you'd have
major MAJOR problems with this kind of set operation. The best
solution would be to use the IDs themselves (as integers) in the set,
and ignore the whole question of identity and value.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
Chris Angelico :

> On Wed, Aug 9, 2017 at 10:00 PM, Marko Rauhamaa  wrote:
>> Chris Angelico :
>>
>>> Which means that its value won't change. That's what I said. Two
>>> things will be equal regardless of that metadata.
>>
>> In relational-database terms, your "value" is the primary key and
>> your "metadata" is the rest of the columns.
>
> I would say the primary key is the "identity" and the rest of the
> columns are the "value".

Your response illustrates why you and I are not yet on the same page on
this.

Typically, an object's equality is simply the "is" relation. The only
thing remaining for its usability as a key is a hash method. In fact,
just defining:

   def __hash__(self):
   return 0

will technically make any class applicable as a key or set member.

The interesting fields of the object (which you disparagingly referred
to as "metadata") don't need to participate in the calculation of the
hash. You ought to pick the maximal collection of immutable fields as a
basis of your hash, and you are all set (no pun intended).

> But if you're defining "value" solely by the PK, then you have to ask
> yourself what you're using this in a dictionary for - are you going to
> construct multiple objects representing the same underlying database
> row, and expect them to compare equal?

Let's leave the relational world and return to objects.

Really, the most obvious use case for hashed objects is their membership
in a set. For example:

invitees = set(self.bff)
invitees |= self.classmates()
invitees |= self.relatives()

 And Python doesn't enforce this in any way except for lists. That's
 somewhat unfortunate since sometimes you really would like an
 immutable (or rather, no-longer-mutable) list to act as a key.
>>>
>>> Then make a tuple out of it. Job done. You're trying to say that its
>>> value won't now change.
>>
>> Yeah, when there's a will, there's a way.
>
> I don't understand your comment. Do you mean that if someone wants to
> change it, s/he will?

No. I mean coercing lists to tuples can be quite a hefty operation. On
the other hand, so could hashing a list unless the value is memoized.

More importantly, tuple(collection) goes against the grain of what a
tuple is. Tuples are not collections. In particular, tuples in a given
role ordinarily have the same arity.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Verde Denim


On 8/9/2017 9:25 AM, Marko Rauhamaa wrote:
> r...@zedat.fu-berlin.de (Stefan Ram):
>
>> Steve D'Aprano  writes:
>>> There's a word for frozen list: "tuple".
>>   Yes, but one should not forget that a tuple
>>   can contain mutable entries (such as lists).
> Not when used as keys:
>
> >>> hash(([], []))
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'list'
>
>
> Marko
Hence the word 'can' and not 'will' or 'must' or 'shall' ...

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
r...@zedat.fu-berlin.de (Stefan Ram):

> Steve D'Aprano  writes:
>>There's a word for frozen list: "tuple".
>
>   Yes, but one should not forget that a tuple
>   can contain mutable entries (such as lists).

Not when used as keys:

>>> hash(([], []))
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'list'


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Chris Angelico
On Wed, Aug 9, 2017 at 10:00 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> Which means that its value won't change. That's what I said. Two
>> things will be equal regardless of that metadata.
>
> In relational-database terms, your "value" is the primary key and your
> "metadata" is the rest of the columns.

I would say the primary key is the "identity" and the rest of the
columns are the "value". But if you're defining "value" solely by the
PK, then you have to ask yourself what you're using this in a
dictionary for - are you going to construct multiple objects
representing the same underlying database row, and expect them to
compare equal? And if they're equal without being identical, how do
you know which one of them actually corresponds to the database? Down
this path lies a form of madness that I want nothing to do with.

>>> And Python doesn't enforce this in any way except for lists. That's
>>> somewhat unfortunate since sometimes you really would like an
>>> immutable (or rather, no-longer-mutable) list to act as a key.
>>
>> Then make a tuple out of it. Job done. You're trying to say that its
>> value won't now change.
>
> Yeah, when there's a will, there's a way.

I don't understand your comment. Do you mean that if someone wants to
change it, s/he will? Because that's not really the point. If you're
declaring that a list can now be safely compared by value, you don't
want it to be mutable in any way. That's what a tuple is for.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Steve D'Aprano
On Wed, 9 Aug 2017 08:38 pm, Marko Rauhamaa wrote:

> sometimes you really would like an immutable
> (or rather, no-longer-mutable) list to act as a key.


There's a word for frozen list: "tuple".

:-)


-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Steve D'Aprano
On Wed, 9 Aug 2017 02:19 pm, Dennis Lee Bieber wrote:

> On Tue, 8 Aug 2017 15:38:42 + (UTC), Grant Edwards
>  declaimed the following:
> 
>>On 2017-08-08, Peter Heitzer  wrote:
[...]
>>> The differences between blanks and tabs :-)
>>
>>You've misspelled "Tabs are evil and should never be used". ;)
> 
> 
> Tabs are logical entities indicating structure and should always be used!
> 

Amen to that brother!



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Steve D'Aprano
On Wed, 9 Aug 2017 07:51 pm, Marko Rauhamaa wrote:

> Dennis Lee Bieber :
> 
>> Then there is the facet that tuples (being unmutable) can be used as
>> keys into a dictionary...
> 
> Mutable objects can be used as keys into a dictionary.

Indeed.

And people can also put their hand into a fire in order to pull out a red-hot
burning coal. I wouldn't recommend either, at least not under uncontrolled
conditions.


-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Larry Martell
On Wed, Aug 9, 2017 at 8:00 AM, Marko Rauhamaa  wrote:
> Yeah, when there's a will, there's a way.

My Dad used to say "Where there's a will, there's relatives."
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
Chris Angelico :

> On Wed, Aug 9, 2017 at 8:38 PM, Marko Rauhamaa  wrote:
>> Chris Angelico :
>>
>>> On Wed, Aug 9, 2017 at 7:51 PM, Marko Rauhamaa  wrote:
 Mutable objects can be used as keys into a dictionary.
>>>
>>> Only when the objects' mutability does not affect their values.
>>
>> Up to equality. The objects can carry all kinds of mutable payload as
>> long as __hash__() and __eq__() don't change with it.
>
> Which means that its value won't change. That's what I said. Two
> things will be equal regardless of that metadata.

In relational-database terms, your "value" is the primary key and your
"metadata" is the rest of the columns.

>> And Python doesn't enforce this in any way except for lists. That's
>> somewhat unfortunate since sometimes you really would like an
>> immutable (or rather, no-longer-mutable) list to act as a key.
>
> Then make a tuple out of it. Job done. You're trying to say that its
> value won't now change.

Yeah, when there's a will, there's a way.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Chris Angelico
On Wed, Aug 9, 2017 at 8:38 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Wed, Aug 9, 2017 at 7:51 PM, Marko Rauhamaa  wrote:
>>> Mutable objects can be used as keys into a dictionary.
>>
>> Only when the objects' mutability does not affect their values.
>
> Up to equality. The objects can carry all kinds of mutable payload as
> long as __hash__() and __eq__() don't change with it.

Which means that its value won't change. That's what I said. Two
things will be equal regardless of that metadata.

> And Python doesn't enforce this in any way except for lists. That's
> somewhat unfortunate since sometimes you really would like an immutable
> (or rather, no-longer-mutable) list to act as a key.

Then make a tuple out of it. Job done. You're trying to say that its
value won't now change.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread alister via Python-list
On Tue, 08 Aug 2017 14:19:53 +, Stefan Ram wrote:

> I am planning a Python course.
> 
>   I started by writing the course akin to courses I gave in other
>   languages, that means, the course starts roughly with these topics:
> 
> - number and string literals - types of number and string literals
>   (just giving the names »int«, »float«, and »string«)
> - using simple predefined operators (+, -, *, /)
>   (including 2*"a" and "a"+"b")
> - calling simple predefined functions (len, type, ...)
> 
>   . This is a little bit boring however and might not show off Python's
>   strength early in the course.
> 
>   So, I now think that maybe I should start to also include list (like
> 
> [1,2,3]
> 
>   ) right from the start. A list conceptually is not much more difficult
>   than a string since a string "abc" resembles a list ["a","b","c"].
>   I.e., the course then would start as follows:
> 
> - number, string, and list literals - types of number, string and list
> literals
>   (just giving the names »int«, »float«, »string«, and »list«)
> - using simple predefined operators (+, -, *, /)
>   (including 2*"a", "a"+"b",  2*["a"], and [1]+[2])
> - calling simple predefined functions (len, type, ...)
> 
>   However, once the box has been opened, what else to let out? What
>   about tuples (like
> 
> (1,2,3)
> 
>   ). Should I also teach tuples right from the start?
> 
>   But then how to explain to beginners why two different types (lists
>   AND tuples) are needed for the concept of a linear arrangement of
>   things?
> 
>   Are there any other very simple things that I have missed and that
>   should be covered very early in a Python course?
> 
>   (Especially things that can show off fantastic Python features that
>   are missing from other programming languages, but still only using
>   literals, operators and function calls.)


if these are beginners with no basic programming knowledge then
try not to confuse them with anything unduly complicated, I would even go 
so far as to start with psuedo code on a pen & paper processor & only 
introduce the concepts of different data types only when they have 
progressed to the point that they need to know.



-- 
Round Numbers are always false.
-- Samuel Johnson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
Chris Angelico :

> On Wed, Aug 9, 2017 at 7:51 PM, Marko Rauhamaa  wrote:
>> Mutable objects can be used as keys into a dictionary.
>
> Only when the objects' mutability does not affect their values.

Up to equality. The objects can carry all kinds of mutable payload as
long as __hash__() and __eq__() don't change with it.

And Python doesn't enforce this in any way except for lists. That's
somewhat unfortunate since sometimes you really would like an immutable
(or rather, no-longer-mutable) list to act as a key.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Chris Angelico
On Wed, Aug 9, 2017 at 7:51 PM, Marko Rauhamaa  wrote:
> Dennis Lee Bieber :
>
>>   Then there is the facet that tuples (being unmutable) can be used as
>> keys into a dictionary...
>
> Mutable objects can be used as keys into a dictionary.

Only when the objects' mutability does not affect their values.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
Dennis Lee Bieber :

>   Then there is the facet that tuples (being unmutable) can be used as
> keys into a dictionary...

Mutable objects can be used as keys into a dictionary.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-09 Thread Marko Rauhamaa
Dennis Lee Bieber :

> Tabs are logical entities indicating structure and should always be
> used! 

I wrote an entire database program using only tabs.



http://dilbert.com/strip/1992-09-08>


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-08 Thread Bob Gailer
On Aug 8, 2017 10:20 AM, "Stefan Ram"  wrote:
>
>   I am planning a Python course.
>
>   I started by writing the course akin to courses I gave
>   in other languages, that means, the course starts roughly
>   with these topics:
>
> - number and string literals
> - types of number and string literals
>   (just giving the names »int«, »float«, and »string«)
> - using simple predefined operators (+, -, *, /)
>   (including 2*"a" and "a"+"b")
> - calling simple predefined functions (len, type, ...)
>
>   . This is a little bit boring however and might not
>   show off Python's strength early in the course.
>
>   So, I now think that maybe I should start to also
>   include list (like
>
> [1,2,3]
>
>   ) right from the start. A list conceptually is not
>   much more difficult than a string since a string
>   "abc" resembles a list ["a","b","c"]. I.e., the
>   course then would start as follows:
>
> - number, string, and list literals
> - types of number, string and list literals
>   (just giving the names »int«, »float«, »string«,
>   and »list«)
> - using simple predefined operators (+, -, *, /)
>   (including 2*"a", "a"+"b",  2*["a"], and [1]+[2])
> - calling simple predefined functions (len, type, ...)
>
>   However, once the box has been opened, what else
>   to let out? What about tuples (like
>
> (1,2,3)
>
>   ). Should I also teach tuples right from the start?
>
>   But then how to explain to beginners why two
>   different types (lists AND tuples) are needed for
>   the concept of a linear arrangement of things?
>
>   Are there any other very simple things that
>   I have missed and that should be covered very
>   early in a Python course?
IMHO its a good idea to introduce conversational programming early. Start
with input() and print() then int(), if, while, break . Add one item at a
time.  This will be more interesting and useful than a bunch of data types
and operators, and  answer a lot of questions that otherwise show up on the
help and tutor lists. Also explain tracebacks. None of the above in great
detail; just let students know there is more detail to come later
>
>   (Especially things that can show off fantastic
>   Python features that are missing from other
>   programming languages, but still only using
>   literals, operators and function calls.)
I think program flow is more important than fantastic or unique
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-08 Thread Grant Edwards
On 2017-08-08, Peter Heitzer  wrote:
> Stefan Ram  wrote:
>>  I am planning a Python course.
> [different topics]
>>  Are there any other very simple things that
>>  I have missed and that should be covered very 
>>  early in a Python course? 
>
> The differences between blanks and tabs :-)

You've misspelled "Tabs are evil and should never be used". ;)


-- 
Grant Edwards   grant.b.edwardsYow! We just joined the
  at   civil hair patrol!
  gmail.com

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-08 Thread justin walters
On Tue, Aug 8, 2017 at 7:19 AM, Stefan Ram  wrote:

>   I am planning a Python course.
>
>   I started by writing the course akin to courses I gave
>   in other languages, that means, the course starts roughly
>   with these topics:
>
> - number and string literals
> - types of number and string literals
>   (just giving the names »int«, »float«, and »string«)
> - using simple predefined operators (+, -, *, /)
>   (including 2*"a" and "a"+"b")
> - calling simple predefined functions (len, type, ...)
>
>   . This is a little bit boring however and might not
>   show off Python's strength early in the course.
>
>   So, I now think that maybe I should start to also
>   include list (like
>
> [1,2,3]
>
>   ) right from the start. A list conceptually is not
>   much more difficult than a string since a string
>   "abc" resembles a list ["a","b","c"]. I.e., the
>   course then would start as follows:
>
> - number, string, and list literals
> - types of number, string and list literals
>   (just giving the names »int«, »float«, »string«,
>   and »list«)
> - using simple predefined operators (+, -, *, /)
>   (including 2*"a", "a"+"b",  2*["a"], and [1]+[2])
> - calling simple predefined functions (len, type, ...)
>
>   However, once the box has been opened, what else
>   to let out? What about tuples (like
>
> (1,2,3)
>
>   ). Should I also teach tuples right from the start?
>
>   But then how to explain to beginners why two
>   different types (lists AND tuples) are needed for
>   the concept of a linear arrangement of things?
>
>   Are there any other very simple things that
>   I have missed and that should be covered very
>   early in a Python course?
>
>   (Especially things that can show off fantastic
>   Python features that are missing from other
>   programming languages, but still only using
>   literals, operators and function calls.)
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>

One thing I find that gets overlooked in a lot of beginner Python
courses is Python's module system. Covering the module system
will allow you to hit on the subject of namespacing and scope.

Python's module system is one of its greatest strengths in my opinion.
No other language I've used makes it so simple to structure a project.

Another idea: Dictionaries

Dictionaries are a conceptually simple data structure that should be easy
for beginners to grok. Obviously, their implementation is a bit complex,
but I
don't think you would need to get into that. Dictionaries are very powerful
data structures that can be used to keep code DRY and store important
data to be accessed from anywhere in an application or script. Dictionaries
also have a time complexity average of O(1) for read access which
makes them fairly efficient.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-08 Thread Chris Angelico
On Wed, Aug 9, 2017 at 1:02 AM, Stefan Ram  wrote:
> Chris Angelico  writes:
>>Why a new Python course?
>
>   It is not a course in the sense of a written text
>   (which I would call "course notes").
>
>   It is a course in the sense of an event, where I will meet
>   participants in a classroom. I will get paid for it, so this
>   payment is the reason I do it (simplified ;-).

Ah. Well, that's a good answer to the question of "why are you even
bothering to write this", but unfortunately doesn't answer the
questions that I hoped it would, about target audience and such. Heh.
C'est la vie.

>   Since this is my first-ever Python course, but has not yet
>   begun, I do not know the participants, but I can say this:
>
> - the course description requires that the participant
>   have experiences "working with a computer", but not that
>   they have any knowledge about programming.
>
> - the participants in my other courses for other
>   programming languages usually are slow learners, so I
>   prepare for this kind of audience. I try to avoid topics
>   that are abstract, advanced or complicated as much as
>   possible. I try to include very simple exercises.
>   Most books and tutorials assume faster learners,
>   so that's another reasone why I don't use them.

So, this here is the important info. In that case, I would start with
wowing them with the amazing stuff Python can do. Start with a few
demonstrations of the simplicity and beauty of expression evaluation.
It's okay to use concepts you haven't yet explained; instead of
starting with concrete info and building up to something interesting,
start with something interesting and then explain how it all works.

Just my opinion, of course, but if you didn't want totally unbacked
opinions, you wouldn't have come to this list :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-08 Thread Peter Heitzer
Stefan Ram  wrote:
>  I am planning a Python course.
[different topics]
>  Are there any other very simple things that
>  I have missed and that should be covered very 
>  early in a Python course? 

The differences between blanks and tabs :-)

-- 
Dipl.-Inform(FH) Peter Heitzer, peter.heit...@rz.uni-regensburg.de
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Planning a Python Course for Beginners

2017-08-08 Thread Chris Angelico
On Wed, Aug 9, 2017 at 12:19 AM, Stefan Ram  wrote:
>   I am planning a Python course.
>

Before answering any other questions, answer this one:

Why a new Python course? How is it different from what already exists?

The answer to that will govern just about everything else. The
specifics that you're asking about are unanswerable without first
knowing your audience, for instance.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list