Question regarding structural sharing, records, and maps
Hi all, this is my first post. I'm currently building a tree data structure (quadtree to be exact) while doing some exploratory game server programming using Clojure. This tree will end up with a lot of entities stored in the nodes, whose positions determine which node they end up in. As game state progresses, updates will come from player and AI controlled entities which need to be reflected in the tree. Since player updates will come from any number of external clients, the tree will need some form of concurrency management. So far, I have settled on just using a single ref wrapping the root node of the tree with all update functions acting on that ref via dosync. My question is one of implementation, as I don't yet have a good feeling how different approaches to storing data will affect memory usage and performance in the long run and I'm trying to better my intuition for designing applications with Clojure. I am currently using maps to represent the nodes of the tree, with the hypothesis being that new versions of the tree will save memory and do less thrashing this way. I'm aware that Clojure's persistent data structures have some method of structural sharing between different states of the same identity, which is what leads me to that guess. I haven't found anything that conclusively says that structures defined with defrecord do not perform this sort of sharing, but from what I can tell it seems like they do not, which makes sense given that they provide accessors and expose a Java class. So, some trade-off questions would be: 1.) Does structural sharing play well with nested structures? With a tree whose nodes are represented by nested maps, if a leaf node is updated with new data, will structural sharing efficiently represent the new version of the tree as all the rest of the tree + modified node? 2.) If one were to use nested records instead, I assume producing a new version of the tree will result in a cloning operation over the nodes of the tree, less the nodes we are modifying ourselves. If updates were being performed very frequently, is there a possibility of adverse interaction with a GC? I know that GC implementations differ, but it seems like this could put a lot of pressure on a typical GC. 3.) Are there alternatives to a single ref around the root node of the tree that are worth exploring? I've considered wrapping each node in a ref and performing updates that way, or possibly taking the async route and dispatching cals using agents, but my experience there is nil. Any insight there? As an aside, I am not looking for specific answers as without numbers or hard data there really can't be any right now. I will end up measuring the performance of these approaches, but I do want to have some understanding of how the variables are related when trying to understand the empirical data. Thanks, Trent -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Question regarding structural sharing, records, and maps
On Thu, Jul 28, 2011 at 11:59 PM, Trenton Strong trenton.str...@gmail.com wrote: 1.) Does structural sharing play well with nested structures? With a tree whose nodes are represented by nested maps, if a leaf node is updated with new data, will structural sharing efficiently represent the new version of the tree as all the rest of the tree + modified node? Yes. In fact this should work with records, too -- all the native fields of the record need to be copied for each version, but most will typically be pointers and most of those will typically be to existing objects. Only the ones that were changed will point to new nested objects. So unless your records have huge numbers of predefined fields there shouldn't be much time or memory cost in copying them. 3.) Are there alternatives to a single ref around the root node of the tree that are worth exploring? Probably. Any two changes to the tree will cause one to retry in the single-ref case, unless you're sure all the changes will commute. (Changes to distinct fields where one change doesn't depend on what the other changes will commute, and changes to the same field that don't matter as to sequencing, such as inc and dec, will commute, but otherwise no.) You will probably want something structured more like a database, with tables that are maps and rows that are records or more complex, nested structures, such that most concurrent changes will not affect a common row. The tables can then be refs of maps of refs, with most changes operating on the row-refs. The table refs only need to change if rows are inserted or deleted; those could be bottlenecks, though if all objects have GUIDs as keys in these tables then row insertions and deletions will commute, leaving only the GUID generators for each table. Unfortunately, (let [new-id (commute table-x-guid-counter inc)]) sounds nice but might result in new-id taking on the same value in two concurrent transactions, so you'll need to use alter and GUID generation may be a bottleneck. But it may be much faster than your other, more complex transactions. Be sure to do it separately, e.g.: (let [new-id (dosync (alter table-x-guid-counter inc))] (dosync (commute table-x assoc new-id some-thingy))) You can even use atoms for the guid counters instead of refs, and then the ! in swap! will remind you to get the ID before entering a transaction. Do as much of what's needed to generate some-thingy before the transaction, too -- though if it depends on other refs' values, and those values should be current rather than it being acceptable for them to be values that were there recently but might have changed, then some of the work will need to be done in the transaction. I've considered wrapping each node in a ref and performing updates that way, or possibly taking the async route and dispatching cals using agents, but my experience there is nil. Any insight there? I'd avoid having refs nesting more than two deep. Complex arrangements of cross-references among nodes can create a giant headache in combination with refs and transactions, so using a database approach with table maps, ID token keys, and cross-references to ID tokens rather than objects is often sensible. In particular, complex objects with nested refs effectively have mutable state inside them, which is usually not desired in Clojure. (One can argue, though, that the ref object is not different in principle from the ID token and removes a layer of indirection, and possibly obviates the need to generate IDs from counters that act as bottlenecks -- though, really, the Java new operator's allocation of memory now takes the role of ID generator. I'm given to understand modern VMs do some clever things to make new able to be fairly concurrent, though, such as giving each thread its own subset of the eden generation to allocate in.) -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Question regarding structural sharing, records, and maps
On Jul 28, 2011, at 10:59 PM, Trenton Strong wrote: So far, I have settled on just using a single ref wrapping the root node of the tree with all update functions acting on that ref via dosync. An atom would be more appropriate, unless you're coordinating updates to the tree with updates to some other ref. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Question regarding structural sharing, records, and maps
On Jul 29, 8:58 am, Michael Gardner gardne...@gmail.com wrote: An atom would be more appropriate, unless you're coordinating updates to the tree with updates to some other ref. Ah, for some reason that didn't occur to me, thanks. Since the tree in this case acts as a supporting index on spatial data, it pretty much defers to the outcome of any updates to the original entities. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: Question regarding structural sharing, records, and maps
On Jul 29, 4:38 am, Ken Wesson kwess...@gmail.com wrote: Yes. In fact this should work with records, too -- all the native fields of the record need to be copied for each version, but most will typically be pointers and most of those will typically be to existing objects. Only the ones that were changed will point to new nested objects. So unless your records have huge numbers of predefined fields there shouldn't be much time or memory cost in copying them. Thanks for clearing that up. I wasn't exactly sure how the memory model of records worked. The semantics that records provide seem like a good fit for applications like this. Probably. Any two changes to the tree will cause one to retry in the single-ref case, unless you're sure all the changes will commute. (Changes to distinct fields where one change doesn't depend on what the other changes will commute, and changes to the same field that don't matter as to sequencing, such as inc and dec, will commute, but otherwise no.) Yeah, tree updates are pretty much strictly non-commutative. The retry overhead will probably grow quite fast with the number of concurrent updates. You will probably want something structured more like a database, with tables that are maps and rows that are records or more complex, nested structures, such that most concurrent changes will not affect a common row. The tables can then be refs of maps of refs, with most changes operating on the row-refs. The table refs only need to change if rows are inserted or deleted; those could be bottlenecks, though if all objects have GUIDs as keys in these tables then row insertions and deletions will commute, leaving only the GUID generators for each table. I think I understand the structure here, but just to be clear, you're proposing flattening of the tree structure from something like this: (ref { :contents (some stuff), :child-node1 { :contents (other things) :child-node1 ... }, :child-node2 { :contents (thingy1 thingy2) ... } ... }) into something like this? (ref { :some-guid (ref { could be a map, record, or what have you :contents (some stuff) :child-node1 :another-guid :child-node2 :and-another-guid }) :another-guid (ref { :contents (other things) ... }) :and-another-guid (ref { :contents (thingy1 thingy2) ... }) } Interesting. I feel like I should have thought of this since I've done something similar in more object oriented code, but riddled with lots of fine grained locking. Here if updating one row references another the transaction is going to handle the snapshotting. I begin to see... Unfortunately, (let [new-id (commute table-x-guid-counter inc)]) sounds nice but might result in new-id taking on the same value in two concurrent transactions, so you'll need to use alter and GUID generation may be a bottleneck. But it may be much faster than your other, more complex transactions. Be sure to do it separately, e.g.: (let [new-id (dosync (alter table-x-guid-counter inc))] (dosync (commute table-x assoc new-id some-thingy))) You can even use atoms for the guid counters instead of refs, and then the ! in swap! will remind you to get the ID before entering a transaction. Do as much of what's needed to generate some-thingy before the transaction, too -- though if it depends on other refs' values, and those values should be current rather than it being acceptable for them to be values that were there recently but might have changed, then some of the work will need to be done in the transaction. I've considered wrapping each node in a ref and performing updates that way, or possibly taking the async route and dispatching cals using agents, but my experience there is nil. Any insight there? I'd avoid having refs nesting more than two deep. Complex arrangements of cross-references among nodes can create a giant headache in combination with refs and transactions, so using a database approach with table maps, ID token keys, and cross-references to ID tokens rather than objects is often sensible. In particular, complex objects with nested refs effectively have mutable state inside them, which is usually not desired in Clojure. (One can argue, though, that the ref object is not different in principle from the ID token and removes a layer of indirection, and possibly obviates the need to generate IDs from counters that act as bottlenecks -- though, really, the Java new operator's allocation of memory now takes the role of ID generator. I'm given to understand modern VMs do some clever things to make new able to be fairly concurrent, though, such as giving each thread its own subset of the eden generation to allocate in.) Makes sense. I need to look into generating GUIDs quickly in a concurrent manner for other parts of the application anyways. I'm sure if this becomes the bottleneck there are workarounds like pooling that can
Re: Question regarding structural sharing, records, and maps
On Fri, Jul 29, 2011 at 4:33 PM, Trenton Strong trenton.str...@gmail.com wrote: Thanks for clearing that up. ... Thanks for taking the time to craft such a thoughtful reply, it is really helpful. You're welcome. -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en