Noam Raphael <[EMAIL PROTECTED]> wrote: > > Hello, > > It seems that we both agree that freezing is cool (-; . We disagree on > whether a copy-on-write behaviour is desired. Your arguments agains > copy-on-write are: > 1. It's not needed. > 2. It's complicated to implement. > > But first of all, you didn't like my use cases. I want to argue with that. > > > In reading your description of a 'table of values', I can't help but be > > reminded of the wxPython (and wxWidget) wx.Grid and its semantics. It > > offers arbitrary tables of values (whose editors and viewers you can > > change at will), which offers a mechanism by which you can "listen" to > > changes that occur to the contents of a cell. I can't help but think > > that if you offered a protocol by which a user can signal that a cell > > has been changed, perhaps by writing the value to the table itself > > (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351 > > freeze), etc., that both you and the users of your table would be much > > happier. > > Perhaps I didn't make it clear. The difference between wxPython's Grid > and my table is that in the table, most values are *computed*. This > means that there's no point in changing the values themselves. They > are also used frequently as set members (I can describe why, but it's > a bit complicated.)
Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work. Also from the sounds of it, you are storing both source and destination values in the same table...hrm, that sounds quite a bit like a spreadsheet. How does every spreadsheet handle that again? Oh yeah, they only ever store immutables (generally strings which are interpreted). But I suppose since you are (of course) storing mutable objects, you need to work a bit harder...so store mutables, and return immutable copies (which you can cache if you want, and invalidate when your application updates the results...like a wx.Grid update on changed). > > As for the graph issue, you've got a bigger problem than users just > > being able to edit edge lists, users can clear the entire dictionary of > > vertices (outgoing.clear()). It seems to me that a more reasonable > > method to handle this particular case is to tell your users "don't > > modify the dictionaries or the edge lists", and/or store your edge lists > > as tuples instead of lists or dictionaries, and/or use an immutable > > dictionary (as offered by Barry in the PEP). > > As I wrote before, telling my users "don't modify the edge lists" is > just like making lists hashable, and telling all Python users, "dont > modify lists that are dictionary keys." There's no way to tell the > users that - there's no convention for objects which should not be > changed. You can write it in the documentation, but who'll bother > looking there? When someone complains that something doesn't work, I tell them to read the documentation. If your users haven't been told to RTFM often enough to actually make it happen, then you need a RTFM-bat. Want to know how you make one? You start wrapping the objects you return which segfaults the process if they change things. When they start asking, tell them it is documented quite clearly "do not to modify objects returned, or else". Then there's the other option, which I provide below. > I don't think that your other suggestions will work: the data > structure of the graph itself can't be made of immutable objects, > because of the fact that the graph is a mutable object - you can > change it. It can be made of immutable objects, but this means copying > all the data every time the graph changes. So let me get this straight: You've got a graph. You want to be able to change the graph, but you don't want your users to accidentally change the graph. Sounds to me like an API problem, not a freeze()/mutable problem. Want an API? class graph: ... def IterOutgoing(self, node): ... def IterIncoming(self, node): ... def IsAdjacent(self, node1, node2): ... def IterNodes(self): ... def AddEdge(self, f_node, t_node): ... def RemEdge(self, node1, node2): ... def AddNode(self): ... If you are reasonable in your implementation, all of the above operations can be fast, and you will never have to worry about your users accidentally mucking about with your internal data structures: because you aren't exposing them. If you are really paranoid, you can take the next step and implement it in Pyrex or C, so that only a malicous user can muck about with internal structures, at which point you stop supporting them. > Now, about copy-on-write: > > > There's also this little issue of "copy on write" semantics with Python. > > Anyone who tells you that "copy on write" is easy, is probably hanging > > out with the same kind of people who say that "threading is easy". Of > > course both are easy if you limit your uses to some small subset of > > interesting interactions, but "copy on write" gets far harder when you > > start thinking of dictionaries, lists, StringIOs, arrays, and all the > > possible user-defined classes, which may be mutated beyond obj[key] = > > value and/or obj.attr = value (some have obj.method() which mutates the > > object). As such, offering a callback mechanism similar to weak > > references is probably pretty close to impossible with CPython. > > Let's limit ourselves to copy-on-write of objects which do not contain > nonfrozen objects. Perhaps it's enough - the table, the graph, and > strings, are perfect examples of these. Implementation doesn't seem to > complicated to me - whenever the object is about to change, and there > is a connected frozen copy, you make a shallow copy of the object, > point the frozen copy to it, release the reference to the frozen copy, > and continue as usual. That's all. What you have written here is fairly unintelligible, but thankfully you clarify yourself...pity it still doesn't work, I explain below. [snip] > The problem is that we have to keep the created frozen object as long > as the original object stays alive. So we have to know if it has > changed. This is where those callbacks get in. As long as what is done > with them is correct, there should be no problems. They are used only > to disengage the frozen copies from their original objects. The action > they should trigger is simply that: > > def on_contained_object_change(self): > self._frozen_copy = None > while self._callbacks: > self._callbacks.pop()() > > What's also interesting is that this freezing mechanism can be > provided automatically for user-created classes, since those are > simply containers of other objects, which behave exactly like dicts, > for this matter. > > It allows everything in Python to be both mutable and hashable, > without changing the O() complexity! Wow! > > Ok, I'm going to sleep now. If you find something wrong with this > idea, please tell me. Here is where you are wrong. x = [] for i in xrange(k): x.append(range(k)) We now have a simple list of lists, k lists, each of length k. Let's freeze it. y = frozen(x) Ok, now we have a recursively frozen list of lists y, implemented however you want, and you've ammortized this ONE call to creation time. We'll give y to someone who does whatever he wants to it, and we'll continue on. z = frozen(x) Your claim is that due to the cache, the above operation can be done in constant time after you have already frozen x, this is wrong, but we'll get to that. Let us mutate one of the contained lists, and see if this can continue... x[0][0] = 7 Oh hrm. This invalidates x[0]'s cached frozen object, which would suggest that x's cached frozen object was also invalidated, even though Python objects tend to know nothing about the objects which point to them. Well, that's a rub. In order to /validate/ that an object's cached object is valid, you must validate that the contents of your cached frozen object points to the cached frozen objects of your contents. Or in code (for this two-level example)... def frozen(x): if x.frozen_cache: for i,j in zip(x.contents, x.frozen_cache): if i.frozen_cache is not j: x.frozen_cache = None x.frozen_cache = frozen(x) return x.frozen_cache x.frozen_cache = tuple(map(frozen, x.contents)) return x.frozen_cache Ouch, for the list of lists example with a total size O(k**2), you need to spend O(k) time. We'll say that n == k**2, so really, for this particular object of size O(n), you need to spend O(sqrt(n)) time verifying. Not quite constant. But wait...in order to verify that every cached frozen object is valid...we should have been checking the contents of x[i] to verify that they were frozen too! Wow, that would take us O(n) time just to verify. And we would need to do that EVERY TIME WE CALLED frozen(x), REGARDLESS OF WHETHER x WAS MUTATED! Wait a second...isn't that just the same as just recursively calling freeze? Yes. Are we actually saving any time? No. What has this idea resulted in? The incorrect belief that caching frozen objects will reduce the computation necessary in freeze(x), and a proposed new attribute on literally every object which points to an immutable copy of itself, generally doubling the amount of memory used. Like I said before, it's not so easy. In order to actually implement the system, every object in an object heirarchy would need to know about its parent, but such cannot work because... a = range(10) b = [a] c = [a] bf = frozen(b) cf = frozen(c) a[0] = 10 #oops! That last line is the killer. Every object would necessarily need to know about all containers which point to it, and would necessarily need to notify them all that their contents had changed. I personally don't see Python doing that any time soon. Hope you sleep/slept well, - Josiah _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com