Re: [Python-Dev] gc ideas -- sparse memory

2010-12-05 Thread Stephen J. Turnbull
Martin v. Löwis writes:
   Why is useful to expose an identity hash?  AFAICS it is *only* useful
   in building an identity hash table.  If so, why not just provide id()
   or the is operator or both and be done with it?
  
  That's precisely James' point: Java provides the identity hash
  *instead* of the id() function (i.e. it does not have an equivalent
  of id()). Doing so gives greater liberties in implementing Java.

Yes, we understand that it makes the implementer's job easier.  *Why
bother having an identity hash at all?*  Having taken away id() and
provided maximum leisure to the implementer via

def identity_hash(object):
return 42

is there *any* benefit left for the user/developer?  All I see is
costs: costs in implementation, costs in debugging.  And AFAICS this
is a problem that can be solved once and reused by everybody who needs
id(); why does every developer need to write his own id() function?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-05 Thread Greg Ewing

Martin v. Löwis wrote:

The hole C API would break if objects would move in memory.
Since they have to stay at fixed addresses, it's easy enough to use the
address as ID.


Yes. Some of the discussion here seems to be assuming that the
reason Python doesn't move objects is so that it can use the
address as a unique ID. But it's the other way around.

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Martin v. Löwis
 Am I still missing something?

Apparently. The hole C API would break if objects would move in memory.
Since they have to stay at fixed addresses, it's easy enough to use the
address as ID. There is no problem to be solved here.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Martin v. Löwis
 I'm afraid I don't follow you. Unless you're suggesting some sort of
 esoteric object system whereby objects *don't* have identity (e.g. where
 objects are emergent properties of some sort of distributed,
 non-localised information), any object naturally has an identity --
 itself.

Not in Java or C#. It is in these languages possible to determine
whether to references refer to the same object. However, objects don't
naturally have a distinct identification (be it an integer or something
else).

If you really want to associate unique numbers with objects in these
languages, the common approach is to put them into an identity
dictionary as keys.

 It seems counter-productive to me to bother with an identity function
 which doesn't meet that constraint. If id(x) == id(y) implies nothing
 about x and y (they may, or may not, be the same object) then what's the
 point?

See James' explanation: it would be possible to use this as the
foundation of an identity hash table.

 Why would you bother using that function when you could just use
 x == y instead?

Because in a hash table, you also need a hash value.

Regards,
Martn
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Steven D'Aprano

Martin v. Löwis wrote:

I'm afraid I don't follow you. Unless you're suggesting some sort of
esoteric object system whereby objects *don't* have identity (e.g. where
objects are emergent properties of some sort of distributed,
non-localised information), any object naturally has an identity --
itself.


Not in Java or C#. It is in these languages possible to determine
whether to references refer to the same object. However, objects don't
naturally have a distinct identification (be it an integer or something
else).


Surely even in Java or C#, objects have an *identity* even if the 
language doesn't provide a way to query their distinct *identification*. 
An object stored at one memory location is distinct from any other 
object stored at different memory locations at that same moment in time, 
regardless of whether or not the language gives you a convenient label 
for that identity. Even if that memory location can change during the 
lifetime of the object, at any one moment, object X is a different 
object from every other object.


The fact that we can even talk about this object versus that object 
implies that objects have identity.


To put it in Python terms, if the id() function were removed, it would 
no longer be possible to get the unique identification associated with 
an object, but you could still compare the identity of two objects using 
`is`.


Of course, I'm only talking about objects. In Java there are values 
which aren't objects, such as ints and floats. That's irrelevant for our 
discussion, because Python has no such values.




If you really want to associate unique numbers with objects in these
languages, the common approach is to put them into an identity
dictionary as keys.


It seems counter-productive to me to bother with an identity function
which doesn't meet that constraint. If id(x) == id(y) implies nothing
about x and y (they may, or may not, be the same object) then what's the
point?


See James' explanation: it would be possible to use this as the
foundation of an identity hash table.


I'm afraid James' explanation didn't shed any light on the question to 
me. It seems to me that Java's IdentityHashValue[sic -- I think the 
correct function name is actually IdentityHashCode] is equivalent to 
Python's hash(), not to Python's id(), and claiming it is related to 
identity is misleading and confusing.


I don't think I'm alone here -- it seems to me that even among Java 
programmers, the same criticisms have been raised:


http://bugs.sun.com/bugdatabase/view_bug.do?bug%5Fid=6321873
http://deepakjha.wordpress.com/2008/07/31/interesting-fact-about-identityhashcode-method-in-javalangsystem-class/

Like hash(), IdentityHashCode doesn't make any promises about identity 
at all. Two distinct objects could have the same hash code, and a 
further test is needed to distinguish them.




Why would you bother using that function when you could just use
x == y instead?


Because in a hash table, you also need a hash value.


Well, sure, in a hash table you need a hash value. But I was talking 
about an id() function.


So is that it? Is IdentityHashValue (or *Code, as the case may be) just 
a longer name for hash()?




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Antoine Pitrou
On Sat, 4 Dec 2010 15:06:45 +1100
Cameron Simpson c...@zip.com.au wrote:
 On 03Dec2010 18:15, James Y Knight f...@fuhm.net wrote:
 | On Dec 3, 2010, at 6:04 PM, Terry Reedy wrote:
 |  gc is implementation specific. CPython uses ref counting + cycle
 |  gc. A constraint on all implementations is that objects have a fixed,
 |  unique id during their lifetime. CPython uses the address as the id, so
 |  it cannot move objects. Other implementations do differently. Compacting
 |  gc requires an id to current address table or something.
 | 
 | It's somewhat unfortuante that python has this constraint, instead of
 | the looser: objects have a fixed id during their lifetime, which is
 | much easier to implement, and practically as useful.
 
 Python doesn't have the constraint you imagine; it _does_ have objects
 have a fixed id during their lifetime.
 
 _CPython_ has this constraint because it uses the address as the id,
 which is free and needs no papping or extra storage.

Well, the main reason CPython has this constraint is that the C API
mandates constant object pointers. That we can then reuse the pointer
for id() is only a consequence of this.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Martin v. Löwis
 Surely even in Java or C#, objects have an *identity* even if the
 language doesn't provide a way to query their distinct *identification*.

When people said the id of an object should this or that, they always
meant identification, not identity (perhaps except for you). Nobody
would propose that the identity (in your wording) of two distinct
objects might be the same, or might change over time.

 I'm afraid James' explanation didn't shed any light on the question to
 me. It seems to me that Java's IdentityHashValue[sic -- I think the
 correct function name is actually IdentityHashCode] is equivalent to
 Python's hash()

No, it's not. java.lang.Object.hashCode() is equivalent to Python's
hash(). In particular, for both of these, the following requirement
holds: object that *compare* equal must hash equal.

This is not the case for the identity hash code, or Python's id
function: objects that compare equal may still have different ids,
or different java identity hash codes.

 not to Python's id(), and claiming it is related to
 identity is misleading and confusing.

No, it's right on spot. This is *not* about the regular value hash,
but about the identity hash.

 Like hash(), IdentityHashCode doesn't make any promises about identity
 at all.

It sure does: calling identityHashCode on the very same object twice
will always return the same value - even after a garbage collection.

So if two references refer to the same object, calling identityHashCode
on both of them gives the same value.

This is similar to the relationship between equals and hashCode.

 Two distinct objects could have the same hash code, and a
 further test is needed to distinguish them.

Correct. However, references with different identity hash codes will
never refer to the same object.

 Why would you bother using that function when you could just use
 x == y instead?

 Because in a hash table, you also need a hash value.
 
 Well, sure, in a hash table you need a hash value. But I was talking
 about an id() function.
 
 So is that it? Is IdentityHashValue (or *Code, as the case may be) just
 a longer name for hash()?

Of course not.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Stephen J. Turnbull
Steven D'Aprano writes:
  Martin v. Löwis wrote:

   It seems counter-productive to me to bother with an identity function
   which doesn't meet that constraint. If id(x) == id(y) implies nothing
   about x and y (they may, or may not, be the same object) then what's the
   point?
   
   See James' explanation: it would be possible to use this as the
   foundation of an identity hash table.
  
  I'm afraid James' explanation didn't shed any light on the question to 
  me. It seems to me that Java's IdentityHashValue[sic -- I think the 
  correct function name is actually IdentityHashCode] is equivalent to 
  Python's hash(), not to Python's id(), and claiming it is related to 
  identity is misleading and confusing.

Not quite equivalent.  Python's hash() obeys a certain additional
constraint (that numeric values that compare equal have the same
hash), and IdentityHashValue presumably won't work at all on numbers
in Java (since they aren't objects).

Bikeshedding aside, I have to agree with you: there's no point in
calling such a function id() or IdentityAnything().  Everything you
need to know (including both the fact that it could be an efficient
way to search for a unique object, and the possible non-uniqueness of
the object assigned that code) is contained in the word hash.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Maciej Fijalkowski
On Sat, Dec 4, 2010 at 1:37 PM, Antoine Pitrou solip...@pitrou.net wrote:
 On Sat, 4 Dec 2010 15:06:45 +1100
 Cameron Simpson c...@zip.com.au wrote:
 On 03Dec2010 18:15, James Y Knight f...@fuhm.net wrote:
 | On Dec 3, 2010, at 6:04 PM, Terry Reedy wrote:
 |  gc is implementation specific. CPython uses ref counting + cycle
 |  gc. A constraint on all implementations is that objects have a fixed,
 |  unique id during their lifetime. CPython uses the address as the id, so
 |  it cannot move objects. Other implementations do differently. Compacting
 |  gc requires an id to current address table or something.
 |
 | It's somewhat unfortuante that python has this constraint, instead of
 | the looser: objects have a fixed id during their lifetime, which is
 | much easier to implement, and practically as useful.

 Python doesn't have the constraint you imagine; it _does_ have objects
 have a fixed id during their lifetime.

 _CPython_ has this constraint because it uses the address as the id,
 which is free and needs no papping or extra storage.

 Well, the main reason CPython has this constraint is that the C API
 mandates constant object pointers. That we can then reuse the pointer
 for id() is only a consequence of this.

 Regards

 Antoine.


Yes.

I think further discussion of Java-vs-Python makes no sense (and it's
proven that C-API-less python can exist and be compatible)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Stephen J. Turnbull
Quotes are out of order.

Martin v. Löwis writes:

  No, it's not. java.lang.Object.hashCode() is equivalent to Python's
  hash(). In particular, for both of these, the following requirement
  holds: object that *compare* equal must hash equal.

Aha.  I see, now.  That is indeed a different concept.  Mutable
objects retain their identities across mutations.

   Two distinct objects could have the same hash code, and a
   further test is needed to distinguish them.
  
  Correct. However, references with different identity hash codes will
  never refer to the same object.

Why is useful to expose an identity hash?  AFAICS it is *only* useful
in building an identity hash table.  If so, why not just provide id()
or the is operator or both and be done with it?

This is different from exposing the value hash AFAICS, where it may be
possible to optimize value hashing of composite objects by computing
their hashes from the hashes of subobjects, perhaps omitting some
subobjects that don't affect equality (eg, Emacs strings may carry
display properties that are ignored when they are compared as
strings).

With identity, I don't see much point in leaving that up to the
programmer.  In implementations with stable object addresses, the
address is a fine implementation of identity_hash() -- perfect, in
fact.  Otherwise, you either have to omit mutable members from the
computation or store the hashcode in the object.  In the former
situation, you will likely get very coarse hashing, and the set of
immutable members (if any ;-) can be computed by the compiler.  In the
latter, you may as well store an id directly and avoid the whole hash
concept for that type (except as a space optimization, but that again
implies coarse hashing).

  [F]or the identity hash code, or Python's id function: objects that
  compare equal may still have different ids, or different java
  identity hash codes.

ISTM that's no problem, you have to do an extra comparison for
identity when the codes are equal either way.  Of course the identity
hash code is a more precise optimization, but that doesn't make hash()
unusable for this purpose (cf Sun's built-in implementation of
IdentityHashCode that always returns 2).

The problem presumably lies in the fact that there are id()-ifiable
objects that, because they are mutable, either are unhashable (Python
hash) or would have an unstable value hash (Lisp sxhash).

Ie, AFAICS

def identity_hash (thing):
Wrapper for hash() suitable for building an identity hash table.
try:
return hash(thing)
except TypeError:
return 0

would be a fine implementation for Python.wink

   Like hash(), IdentityHashCode doesn't make any promises about identity
   at all.
  
  It sure does: calling identityHashCode on the very same object twice
  will always return the same value - even after a garbage
  collection.

Well, if you want to put it that way, so does hash().  I read Steven
as placing more emphasis on like hash() than on at all.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-04 Thread Martin v. Löwis
 Why is useful to expose an identity hash?  AFAICS it is *only* useful
 in building an identity hash table.  If so, why not just provide id()
 or the is operator or both and be done with it?

That's precisely James' point: Java provides the identity hash
*instead* of the id() function (i.e. it does not have an equivalent
of id()). Doing so gives greater liberties in implementing Java.
For example, an implementation that would only use the type and
not the instance for identity hash would still be conforming
(as would one that always returns 0).

 With identity, I don't see much point in leaving that up to the
 programmer.  In implementations with stable object addresses, the
 address is a fine implementation of identity_hash() -- perfect, in
 fact.

Of course. James' complaint is that Python-the-language mandates
support for an id() function, though - a requirement that even
implementations that don't have stable object addresses now must
support. If Python mandated only identity hash properties of the
id() function, alternative implementations could be simplified.

 ISTM that's no problem, you have to do an extra comparison for
 identity when the codes are equal either way.  Of course the identity
 hash code is a more precise optimization, but that doesn't make hash()
 unusable for this purpose (cf Sun's built-in implementation of
 IdentityHashCode that always returns 2).

That's only the case if the hash() result is guaranteed not
to change. In some applications, it may be desirable to have that
as an absolute guarantee (rather than just being a convention).

No, you can't substitute identity hash with hash: the value hash of an
object may change over time, whereas the identity hash must not.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Terry Reedy

On 12/3/2010 5:55 PM, Dima Tisnek wrote:

How hard or reasonable would it be to free memory pages on OS level?

[pcmiiw] Gabage collection within a generation involves moving live
objects to compact the generation storage. This separates the memory
region into 2 parts live and cleared, the pointer to the beginning
of the cleared part is where next allocation is going to happen.

When this is done, does Python gc move the objects preserving order or
does it try to populate garbaged slot with some live object
disregarding order? Earlier case is more applicable, but latter case
is a target for below too.

If we were to look at memory regions from the OS point of view, they
are allocated as pages (or possibly as hugetlb pages). So if we are to
compact something like this [LL__][_L__][][L_LL][LFFF]  where []
is a page, L is live object and _ is garbage and F is free memory,
would it not make more sense to tell OS that [] is not needed
anymore, and not move some of the consequtive [L_LL][LFFF] at all, or
at least not move those objects as far down the memory region?

This would of course have a certain overhead of tracking which pages
are given back to OS and mapping them back when needed, but at the
same time, be beneficial because fewer objets are moved and also
possibly improve cpu cache performance because objects won't be moved
so far out.

p.s. if someone has an athoritative link to modern python gc design,
please let me know.


gc is implementation specific. CPython uses ref counting + cycle gc. A 
constraint on all implementations is that objects have a fixed, unique 
id during their lifetime. CPython uses the address as the id, so it 
cannot move objects. Other implementations do differently. Compacting gc 
requires an id to current address table or something.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread James Y Knight
On Dec 3, 2010, at 6:04 PM, Terry Reedy wrote:
 gc is implementation specific. CPython uses ref counting + cycle gc. A 
 constraint on all implementations is that objects have a fixed, unique id 
 during their lifetime. CPython uses the address as the id, so it cannot move 
 objects. Other implementations do differently. Compacting gc requires an id 
 to current address table or something.

It's somewhat unfortuante that python has this constraint, instead of the 
looser: objects have a fixed id during their lifetime, which is much easier 
to implement, and practically as useful.

James

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Martin v. Löwis
Am 03.12.2010 23:55, schrieb Dima Tisnek:
 How hard or reasonable would it be to free memory pages on OS level?

Very easy. Python already does that.

 [pcmiiw] Gabage collection within a generation involves moving live
 objects to compact the generation storage. This separates the memory
 region into 2 parts live and cleared, the pointer to the beginning
 of the cleared part is where next allocation is going to happen.

I think you are talking about copying collectors here. This is not how
Python's garbage collection works.

 When this is done, does Python gc move the objects preserving order or
 does it try to populate garbaged slot with some live object
 disregarding order? Earlier case is more applicable, but latter case
 is a target for below too.

(C)Python's garbage collector is not moving objects at all.

 If we were to look at memory regions from the OS point of view, they
 are allocated as pages (or possibly as hugetlb pages). So if we are to
 compact something like this [LL__][_L__][][L_LL][LFFF]  where []
 is a page, L is live object and _ is garbage and F is free memory,
 would it not make more sense to tell OS that [] is not needed
 anymore, and not move some of the consequtive [L_LL][LFFF] at all, or
 at least not move those objects as far down the memory region?

See above. Python does no moving of objects whatsoever.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Dima Tisnek
Oh my bad, I must've confused python with some research paper.
Unique id is not so hard to make without an address.

While on this topic, what is the real need for unique ids?
Also I reckon not all objects need a unique id like this, e.g.
interned strings, simple data types and hashable and comparable
objects could perhaps survive without unique id?

On 3 December 2010 16:04, Terry Reedy tjre...@udel.edu wrote:
 On 12/3/2010 5:55 PM, Dima Tisnek wrote:

 How hard or reasonable would it be to free memory pages on OS level?

 [pcmiiw] Gabage collection within a generation involves moving live
 objects to compact the generation storage. This separates the memory
 region into 2 parts live and cleared, the pointer to the beginning
 of the cleared part is where next allocation is going to happen.

 When this is done, does Python gc move the objects preserving order or
 does it try to populate garbaged slot with some live object
 disregarding order? Earlier case is more applicable, but latter case
 is a target for below too.

 If we were to look at memory regions from the OS point of view, they
 are allocated as pages (or possibly as hugetlb pages). So if we are to
 compact something like this [LL__][_L__][][L_LL][LFFF]  where []
 is a page, L is live object and _ is garbage and F is free memory,
 would it not make more sense to tell OS that [] is not needed
 anymore, and not move some of the consequtive [L_LL][LFFF] at all, or
 at least not move those objects as far down the memory region?

 This would of course have a certain overhead of tracking which pages
 are given back to OS and mapping them back when needed, but at the
 same time, be beneficial because fewer objets are moved and also
 possibly improve cpu cache performance because objects won't be moved
 so far out.

 p.s. if someone has an athoritative link to modern python gc design,
 please let me know.

 gc is implementation specific. CPython uses ref counting + cycle gc. A
 constraint on all implementations is that objects have a fixed, unique id
 during their lifetime. CPython uses the address as the id, so it cannot move
 objects. Other implementations do differently. Compacting gc requires an id
 to current address table or something.

 --
 Terry Jan Reedy

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/dimaqq%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Martin v. Löwis
 Oh my bad, I must've confused python with some research paper.
 Unique id is not so hard to make without an address.
 
 While on this topic, what is the real need for unique ids?

They are absolutely needed for mutable objects. For immutable ones,
it would be ok to claim that they are identical if they are equal
(assuming they support equality - which is tricky for things like NaN).

Of course, the C API has lots of assumptions that identity and address
are really the same thing.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Terry Reedy

On 12/3/2010 6:15 PM, James Y Knight wrote:

On Dec 3, 2010, at 6:04 PM, Terry Reedy wrote:

gc is implementation specific. CPython uses ref counting + cycle
gc. A constraint on all implementations is that objects have a
fixed, unique id during their lifetime. CPython uses the address as
the id, so it cannot move objects. Other implementations do
differently. Compacting gc requires an id to current address table
or something.


I left out that the id must be an int.


It's somewhat unfortuante that python has this constraint, instead of
the looser: objects have a fixed id during their lifetime, which is
much easier to implement, and practically as useful.


Given that the only different between 'fixed and unique' and 'fixed' is 
the uniqueness part, I do not understand 'practically as useful'. 
Duplicate ids (in the extreme, that same for all objects) hardly seem 
useful at all.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread James Y Knight
On Dec 3, 2010, at 7:05 PM, Terry Reedy wrote:
 I left out that the id must be an int.
 
 It's somewhat unfortuante that python has this constraint, instead of
 the looser: objects have a fixed id during their lifetime, which is
 much easier to implement, and practically as useful.
 
 Given that the only different between 'fixed and unique' and 'fixed' is the 
 uniqueness part, I do not understand 'practically as useful'. Duplicate ids 
 (in the extreme, that same for all objects) hardly seem useful at all.

Sure they are. This is what Java provides you, for example. If you have fixed, 
but potentially non-unique ids (in Java you get this using 
identityHashCode()), you can still make an identity hashtable. You simply 
need to *also* check using is that the two objects really are the same one 
after finding the hash bin using id.

It'd be a quality of implementation issue whether an implementation gives you 
the same value for every object. It should not, of course, if it wants programs 
to have reasonable performance. Same sort of thing as how __hash__ should not 
return 0 for everything.

Besides identity-hashtables, the main other thing id gets used for is to have 
some identifier string in a printer (e.g. class Foo at 0x1234567890). 
There, it's generally good enough to use an id which is not guaranteed to be, 
but often is, unique. It works for Java...

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Terry Reedy

On 12/3/2010 7:46 PM, James Y Knight wrote:


Sure they are. This is what Java provides you, for example. If you
have fixed, but potentially non-unique ids (in Java you get this
using identityHashCode()), you can still make an identity


I do not see the point of calling a (non-unique) hash value the identity


hashtable. You simply need to *also* check using is that the two


In Python, that unique isness is the identify.

(a is b) == (id(a) == id(b)) by definition.


objects really are the same one after finding the hash bin using id.


by using the hash value, which is how Python dict operate.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Cameron Simpson
On 03Dec2010 18:15, James Y Knight f...@fuhm.net wrote:
| On Dec 3, 2010, at 6:04 PM, Terry Reedy wrote:
|  gc is implementation specific. CPython uses ref counting + cycle
|  gc. A constraint on all implementations is that objects have a fixed,
|  unique id during their lifetime. CPython uses the address as the id, so
|  it cannot move objects. Other implementations do differently. Compacting
|  gc requires an id to current address table or something.
| 
| It's somewhat unfortuante that python has this constraint, instead of
| the looser: objects have a fixed id during their lifetime, which is
| much easier to implement, and practically as useful.

Python doesn't have the constraint you imagine; it _does_ have objects
have a fixed id during their lifetime.

_CPython_ has this constraint because it uses the address as the id,
which is free and needs no papping or extra storage. Of course, it
removes certain freedoms from the GC etc as a side effect.
-- 
Cameron Simpson c...@zip.com.au DoD#743
http://www.cskk.ezoshosting.com/cs/

The number of cylinders for this disk is set to 364737.
There is nothing wrong with that, but this is larger than 1024, [...]
- fdisk on our new RAID 07oct2007 :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread James Y Knight
On Dec 3, 2010, at 10:50 PM, Terry Reedy wrote:
 On 12/3/2010 7:46 PM, James Y Knight wrote:
 
 Sure they are. This is what Java provides you, for example. If you
 have fixed, but potentially non-unique ids (in Java you get this
 using identityHashCode()), you can still make an identity
 
 I do not see the point of calling a (non-unique) hash value the identity

My point was simply that a) it's an unfortunate constraint on potential GC 
implementations that objects need to have a fixed and unique id in Python, and 
b) that it's not actually necessary to have such a constraint (in the abstract 
sense of required; obviously it's a requirement upon Python *today*, due to 
existing code which depends upon that promise). 

Would you be happier if I had said it's unfortunate that Python has an id 
function instead of an identityHashValue function? I suppose that's what I 
really meant. Python the language would not have been harmed had it had from 
the start an identityHashValue() function instead of an id() function. In the 
CPython implementation, it may even have had the exact same behavior, but 
would've allowed other implementations more flexibility.

James

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Terry Reedy

On 12/3/2010 11:06 PM, Cameron Simpson wrote:

On 03Dec2010 18:15, James Y Knightf...@fuhm.net  wrote:
| On Dec 3, 2010, at 6:04 PM, Terry Reedy wrote:
|  gc is implementation specific. CPython uses ref counting + cycle
|  gc. A constraint on all implementations is that objects have a fixed,
|  unique id during their lifetime. CPython uses the address as the id, so
|  it cannot move objects. Other implementations do differently. Compacting
|  gc requires an id to current address table or something.
|
| It's somewhat unfortuante that python has this constraint, instead of
| the looser: objects have a fixed id during their lifetime, which is
| much easier to implement, and practically as useful.

Python doesn't have the constraint you imagine; it _does_ have objects
have a fixed id during their lifetime.


id(object)
Return the “identity” of an object. This is an integer which is 
guaranteed to be unique and constant for this object during its lifetime.


Of course, other implementations are free to change builtins, but code 
that depends on CPython's definitions will not run.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Dima Tisnek
On 3 December 2010 16:45, Martin v. Löwis mar...@v.loewis.de wrote:
 Oh my bad, I must've confused python with some research paper.
 Unique id is not so hard to make without an address.

 While on this topic, what is the real need for unique ids?

 They are absolutely needed for mutable objects. For immutable ones,
 it would be ok to claim that they are identical if they are equal
 (assuming they support equality - which is tricky for things like NaN).

Indeed, but do ids really need to be unique and fixed at the same time?

a is b # (if atomic) needs unique ids, but doesn't really need fixed ids
a[b]   # needs fixed hash, but not strictly a globally unique id

I can imagine an implementaion of pickle for example that uses unique
and fixed as a given to detect cycles, etc; but that would be
implementation detail.

It seems to me unique and fixed id implies that it is stored somewhere
(with incref beforehands and decref afterwards), however a proper
reference to an object could be used just as well.

Am I still missing something?


 Of course, the C API has lots of assumptions that identity and address
 are really the same thing.

 Regards,
 Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Maciej Fijalkowski
On Sat, Dec 4, 2010 at 6:34 AM, James Y Knight f...@fuhm.net wrote:
 On Dec 3, 2010, at 10:50 PM, Terry Reedy wrote:
 On 12/3/2010 7:46 PM, James Y Knight wrote:

 Sure they are. This is what Java provides you, for example. If you
 have fixed, but potentially non-unique ids (in Java you get this
 using identityHashCode()), you can still make an identity

 I do not see the point of calling a (non-unique) hash value the identity

 My point was simply that a) it's an unfortunate constraint on potential GC 
 implementations that objects need to have a fixed and unique id in Python, 
 and b) that it's not actually necessary to have such a constraint (in the 
 abstract sense of required; obviously it's a requirement upon Python *today*, 
 due to existing code which depends upon that promise).

 Would you be happier if I had said it's unfortunate that Python has an id 
 function instead of an identityHashValue function? I suppose that's what I 
 really meant. Python the language would not have been harmed had it had from 
 the start an identityHashValue() function instead of an id() function. In the 
 CPython implementation, it may even have had the exact same behavior, but 
 would've allowed other implementations more flexibility.

 James


I don't see how this related to moving vs non-moving GC. PyPy (and I
believe IronPython and Java) all have fixed unique ids that are not
necesarilly their addresses. The only problem is that id() computed
that way is more costly performance-wise, but works.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] gc ideas -- sparse memory

2010-12-03 Thread Steven D'Aprano

James Y Knight wrote:

On Dec 3, 2010, at 10:50 PM, Terry Reedy wrote:

On 12/3/2010 7:46 PM, James Y Knight wrote:


Sure they are. This is what Java provides you, for example. If you
have fixed, but potentially non-unique ids (in Java you get this
using identityHashCode()), you can still make an identity

I do not see the point of calling a (non-unique) hash value the identity


My point was simply that a) it's an unfortunate constraint on potential GC implementations that objects need to have a fixed and unique id in Python, and b) that it's not actually necessary to have such a constraint (in the abstract sense of required; obviously it's a requirement upon Python *today*, due to existing code which depends upon that promise). 


I'm afraid I don't follow you. Unless you're suggesting some sort of 
esoteric object system whereby objects *don't* have identity (e.g. where 
objects are emergent properties of some sort of distributed, 
non-localised information), any object naturally has an identity -- 
itself.


It seems to me that an identify function must obey one constraint:

* Two objects which exist simultaneously have the same identity if
  and only if they are the same object i.e. id(x) == id(y) if and
  only if x is y.

Other than that, an implementation is free to make id() behave any way 
they like. CPython uses the memory location of the object, but Jython 
and IronPython use an incrementing counter which is never re-used for 
the lifetime of the Python process. CPython's implementation implies 
that objects may not be moved in memory, but that's not a language 
constraint, that's an implementation issue.


It seems counter-productive to me to bother with an identity function 
which doesn't meet that constraint. If id(x) == id(y) implies nothing 
about x and y (they may, or may not, be the same object) then what's the 
point? Why would you bother using that function when you could just use 
x == y instead?



Would you be happier if I had said it's unfortunate that Python has an id function 
instead of an identityHashValue function? I suppose that's what I really meant. Python the 
language would not have been harmed had it had from the start an identityHashValue() function instead of 
an id() function. In the CPython implementation, it may even have had the exact same behavior, but 
would've allowed other implementations more flexibility.


No, because I don't see what the point of identityHashValue or why you 
would ever bother to use it.



--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com