On May 10, 2005, at 10:57 AM, Jim Hugunin wrote:
Bob Ippolito wrote:
(1) Don't have mutable value types, use a reference type that points
to a value type (some kind of proxy)
I don't think that this is possible to do in a consistent way and my
suspicion is that doing this half-way would be more confusing than not
doing it at all. Let's walk through the original example:
apt = Array.CreateInstance(Point, 1)
This creates a true CLI array of Point structs
pt = Point(1,2)
Today this makes a new Point struct and returns the boxed version of
that struct. We could instead return a new instance of an
imaginary new
type, ValueProxy<Point>. This new instance is a standard reference
type
that holds a point as its data. This proxy will need to forward all
field, property and method accesses to the contained Point struct.
apt[0] = pt
What do we do here? We need to copy the data in pt into apt[0]. This
is what it means to have an array of structs. No matter what we do
with
proxies or wrappers there's no way out of this copy. We could add
some
kind of pointer to the ValueProxy<Point> keeping track of the fact
that
there's a copy of this variable now held in apt[0]. This would
need to
be an arbitrarily large list of pointers. This list would also be
easy
to break with CLI code that directly modified apt or other containers
holding on to the value types.
pt.X = 0
The only way this can modify apt[0] is if we keep the full list of
references in ValueProxy. See above for why keeping that full list
still wouldn't always work.
apt[0].X = 0
This example would work using the ValueProxy that pointed to apt[0];
however, when apt[0] is assigned to a variable the situation
becomes as
bad as it is for pt.
for pt in apt:
pt.X = 0
The for loop uses an Enumerator to loop through the points in apt.
Without constructing a custom enumerator for arrays there's no way to
get anything but copy semantics here. While we could build a custom
enumerator for arrays this wouldn't solve the general case of value
types being returned from methods.
When I played with this example in C#, I discovered something
interesting:
Point[] pa = new Point[3];
foreach (Point p in pa) {
pt.X = 10;
}
The code above generates an error from the C# compiler:
"Cannot modify members of 'p' because it is a 'foreach iteration
variable'"
The C# compiler is treating these iteration variables as semi-
immutable
in order to minimize the confusion that can come from the copy
semantics
of value types. This seems like a promising idea...
Actually the idea I had was different -- leaving boxed type handling
as-is, but the __getitem__ of the Point[] instance would return
"ValueProxy" instances.. which would give you similar semantics to C#
-- as long as you don't keep it around for a long time. Of course,
you could deviate from standard Python a little bit and have an
optional extension to the __getitem__ protocol that would recognize
that the __getitem__ is really just to find a "pointer" so that it
can set an attribute somewhere. __getitemforsetattr__ or something...
I only really had that idea because it would fix the reported bug,
you're probably right about how it's currently half-implemented being
more confusing.. however, I think it might be less confusing than the
current state.
(2) Make value types immutable (or at least the ones you grab from
collections)
All of the problems with value types stem from their mutability.
Nobody
ever complains that int, double, char, etc. are value types because
those are all immutable. For immutable objects there's no difference
between pass by reference and pass by value.
The CLR team's API Design Guidelines say this:
- Do not create mutable value types.
http://blogs.msdn.com/kcwalina/archive/2004/09/28/235232.aspx
(or see here - http://peter.golde.org/2003/10/13.html#a16)
In some ways, this would be just reflecting in IronPython this good
design sense.
One advantage of immutability is that it would make failures like the
following much more obvious:
apt[0].X = 0
If value types were immutable this would throw. The exception message
might give people enough information to get started tracking down the
issue and modifying their code to work correctly.
What are the problems with this approach?
1. C#/VB examples won't port very naturally to IronPython and the docs
will need a section explaining the various workarounds to the fact
that
IronPython doesn't support this idiom. This isn't ideal, but I could
easily live with this doc burden.
2. There's no way that I know of to make a value type 100% immutable
without controlling its implementation. IronPython could block
setting
of fields and properties on value types, but there's no way to
reliably
detect and block all sets that came through methods. Just getting the
properties and fields would probably cover 95% of the cases where
people
try to mutate a value type, but it seems pretty awkward to me to say
that value types in IronPython are sort-of immutable unless there are
mutating methods. The fact that this is what the C# compiler does for
iteration variables is encouraging at least in that it's a precedent.
3. There might be things that are impossible to express with this
restriction. I don't think that's true, particularly with the use of
named parameters to initialize fields and properties in the value
type's
constructor. However, one of the principles of IronPython is that it
should be able to use any CLS library and it's possible there's some
weird library design with value types that wouldn't work if they were
considered virtually immutable by IronPython.
If we went down the immutable value type route, it would be
interesting
to look at different kinds of sugar that could be provided to make the
impact on most programs less than it currently is.
In PyObjC we have similar problems to this.. the mutable value type
problem exists, but isn't a problem in practice because people Just
Don't Do That. What *is* a problem is that Foundation has a mutable
string type.
Now this sounds like a small problem at first, but since Foundation
NSDictionary is key-copying, mutable strings are hashable and are
allowed to pass for a regular string anywhere. Also, since unicode
objects are immutable in Python and their hash can not change, weird
things can happen.
In practice, this is also not a problem (anymore). From Python, the
NSMutableString is bridged to a subclass of unicode. So, it has a
copy of the contents at the time of its creation, and all of the
Python methods will behave as documented since they are using
Python's implementation. However, it also has all of the methods of
NSMutableString and they also act correctly. In order to get an
updated Python representation, you simply call some Objective-C-
implemented-method that will return the object again and you'll get a
new proxy (normally proxies are guaranteed unique so "is" works, but
this is not true for most classes that we conveniently bridge to
immutable Python built-in types). Fortunately, the NSObject protocol
has a "self" instance method that will return that instance ..
>>> from Foundation import *
>>> s = NSMutableString.string()
>>> s
u''
>>> hash(s)
0
>>> s.description()
u''
>>> s.appendString_('foo')
>>> hash(s)
0
>>> s
u''
>>> s.description()
u'foo'
>>> s.self()
u'foo'
It looks confusing in a contrived example like this, but in practice
you're generally either using one set of methods or the other.. so
I've never been confused by it and we haven't had any complaints.
You could provide some similar workaround, with a function or method
that mutates a field (because unlike in the PyObjC case, you're not
guaranteed mutating methods).
-bob
_______________________________________________
users-ironpython.com mailing list
users-ironpython.com@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com