Question regarding structural sharing, records, and maps

2011-07-29 Thread Trenton Strong
Hi all, this is my first post.

I'm currently building a tree data structure (quadtree to be exact)
while doing some exploratory game server programming using Clojure.
This tree will end up with a lot of entities stored in the nodes,
whose positions determine which node they end up in.  As game state
progresses, updates will come from player and AI controlled entities
which need to be reflected in the tree.

Since player updates will come from any number of external clients,
the tree will need some form of concurrency management.  So far, I
have settled on just using a single ref wrapping the root node of the
tree with all update functions acting on that ref via dosync.

My question is one of implementation, as I don't yet have a good
feeling how different approaches to storing data will affect memory
usage and performance in the long run and I'm trying to better my
intuition for designing applications with Clojure.  I am currently
using maps to represent the nodes of the tree, with the hypothesis
being that new versions of the tree will save memory and do less
thrashing this way.  I'm aware that Clojure's persistent data
structures have some method of structural sharing between different
states of the same identity, which is what leads me to that guess.  I
haven't found anything that conclusively says that structures defined
with defrecord do not perform this sort of sharing, but from what I
can tell it seems like they do not, which makes sense given that they
provide accessors and expose a Java class.

So, some trade-off questions would be:

1.) Does structural sharing play well with nested structures? With a
tree whose nodes are represented by nested maps, if a leaf node is
updated with new data, will structural sharing efficiently represent
the new version of the tree as all the rest of the tree + modified
node?

2.) If one were to use nested records instead, I assume producing a
new version of the tree will result in a cloning operation over the
nodes of the tree, less the nodes we are modifying ourselves.  If
updates were being performed very frequently, is there a possibility
of adverse interaction with a GC?  I know that GC implementations
differ, but it seems like this could put a lot of pressure on a
typical GC.

3.) Are there alternatives to a single ref around the root node of the
tree that are worth exploring?  I've considered wrapping each node in
a ref and performing updates that way, or possibly taking the async
route and dispatching cals using agents, but my experience there is
nil.  Any insight there?

As an aside, I am not looking for specific answers as without numbers
or hard data there really can't be any right now.  I will end up
measuring the performance of these approaches, but I do want to have
some understanding of how the variables are related when trying to
understand the empirical data.

Thanks,

Trent


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Ken Wesson
On Thu, Jul 28, 2011 at 11:59 PM, Trenton Strong
trenton.str...@gmail.com wrote:
 1.) Does structural sharing play well with nested structures? With a
 tree whose nodes are represented by nested maps, if a leaf node is
 updated with new data, will structural sharing efficiently represent
 the new version of the tree as all the rest of the tree + modified
 node?

Yes. In fact this should work with records, too -- all the native
fields of the record need to be copied for each version, but most will
typically be pointers and most of those will typically be to existing
objects. Only the ones that were changed will point to new nested
objects. So unless your records have huge numbers of predefined fields
there shouldn't be much time or memory cost in copying them.

 3.) Are there alternatives to a single ref around the root node of the
 tree that are worth exploring?

Probably. Any two changes to the tree will cause one to retry in the
single-ref case, unless you're sure all the changes will commute.
(Changes to distinct fields where one change doesn't depend on what
the other changes will commute, and changes to the same field that
don't matter as to sequencing, such as inc and dec, will commute, but
otherwise no.)

You will probably want something structured more like a database, with
tables that are maps and rows that are records or more complex,
nested structures, such that most concurrent changes will not affect a
common row. The tables can then be refs of maps of refs, with most
changes operating on the row-refs. The table refs only need to change
if rows are inserted or deleted; those could be bottlenecks, though if
all objects have GUIDs as keys in these tables then row insertions and
deletions will commute, leaving only the GUID generators for each
table.

Unfortunately, (let [new-id (commute table-x-guid-counter inc)])
sounds nice but might result in new-id taking on the same value in two
concurrent transactions, so you'll need to use alter and GUID
generation may be a bottleneck. But it may be much faster than your
other, more complex transactions. Be sure to do it separately, e.g.:

(let [new-id (dosync (alter table-x-guid-counter inc))]
  (dosync
(commute table-x assoc new-id some-thingy)))

You can even use atoms for the guid counters instead of refs, and then
the ! in swap! will remind you to get the ID before entering a
transaction. Do as much of what's needed to generate some-thingy
before the transaction, too -- though if it depends on other refs'
values, and those values should be current rather than it being
acceptable for them to be values that were there recently but might
have changed, then some of the work will need to be done in the
transaction.

 I've considered wrapping each node in
 a ref and performing updates that way, or possibly taking the async
 route and dispatching cals using agents, but my experience there is
 nil.  Any insight there?

I'd avoid having refs nesting more than two deep. Complex arrangements
of cross-references among nodes can create a giant headache in
combination with refs and transactions, so using a database approach
with table maps, ID token keys, and cross-references to ID tokens
rather than objects is often sensible. In particular, complex objects
with nested refs effectively have mutable state inside them, which is
usually not desired in Clojure. (One can argue, though, that the ref
object is not different in principle from the ID token and removes a
layer of indirection, and possibly obviates the need to generate IDs
from counters that act as bottlenecks -- though, really, the Java new
operator's allocation of memory now takes the role of ID generator.
I'm given to understand modern VMs do some clever things to make new
able to be fairly concurrent, though, such as giving each thread its
own subset of the eden generation to allocate in.)

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Michael Gardner
On Jul 28, 2011, at 10:59 PM, Trenton Strong wrote:

 So far, I
 have settled on just using a single ref wrapping the root node of the
 tree with all update functions acting on that ref via dosync.

An atom would be more appropriate, unless you're coordinating updates to the 
tree with updates to some other ref.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Trenton Strong
On Jul 29, 8:58 am, Michael Gardner gardne...@gmail.com wrote:

 An atom would be more appropriate, unless you're coordinating updates to the 
 tree with updates to some other ref.

Ah, for some reason that didn't occur to me, thanks.  Since the tree
in this case acts as a supporting index on spatial data, it pretty
much defers to the outcome of any updates to the original entities.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Trenton Strong
On Jul 29, 4:38 am, Ken Wesson kwess...@gmail.com wrote:
 Yes. In fact this should work with records, too -- all the native
 fields of the record need to be copied for each version, but most will
 typically be pointers and most of those will typically be to existing
 objects. Only the ones that were changed will point to new nested
 objects. So unless your records have huge numbers of predefined fields
 there shouldn't be much time or memory cost in copying them.

Thanks for clearing that up.  I wasn't exactly sure how the memory
model of records worked.  The semantics that records provide seem like
a good fit for applications like this.

 Probably. Any two changes to the tree will cause one to retry in the
 single-ref case, unless you're sure all the changes will commute.
 (Changes to distinct fields where one change doesn't depend on what
 the other changes will commute, and changes to the same field that
 don't matter as to sequencing, such as inc and dec, will commute, but
 otherwise no.)

Yeah, tree updates are pretty much strictly non-commutative.  The
retry overhead will probably grow quite fast with the number of
concurrent updates.


 You will probably want something structured more like a database, with
 tables that are maps and rows that are records or more complex,
 nested structures, such that most concurrent changes will not affect a
 common row. The tables can then be refs of maps of refs, with most
 changes operating on the row-refs. The table refs only need to change
 if rows are inserted or deleted; those could be bottlenecks, though if
 all objects have GUIDs as keys in these tables then row insertions and
 deletions will commute, leaving only the GUID generators for each
 table.

I think I understand the structure here, but just to be clear, you're
proposing flattening
of the tree structure from something like this:

(ref
{
  :contents (some stuff),

  :child-node1 {
:contents (other things)
:child-node1 ...
  },

  :child-node2 {
:contents (thingy1 thingy2)
...
  }
  ...
})

into something like this?

(ref
{
  :some-guid  (ref { could be a map, record, or what have
you
:contents (some stuff)
:child-node1 :another-guid
:child-node2 :and-another-guid
  })

  :another-guid (ref {
:contents (other things)
...
  })

  :and-another-guid (ref {
:contents (thingy1 thingy2)
...
  })
}

Interesting.  I feel like I should have thought of this since I've
done something similar in more object oriented code, but riddled with
lots of fine grained locking.  Here if updating one row references
another
the transaction is going to handle the snapshotting.  I begin to
see...


 Unfortunately, (let [new-id (commute table-x-guid-counter inc)])
 sounds nice but might result in new-id taking on the same value in two
 concurrent transactions, so you'll need to use alter and GUID
 generation may be a bottleneck. But it may be much faster than your
 other, more complex transactions. Be sure to do it separately, e.g.:

 (let [new-id (dosync (alter table-x-guid-counter inc))]
   (dosync
     (commute table-x assoc new-id some-thingy)))

 You can even use atoms for the guid counters instead of refs, and then
 the ! in swap! will remind you to get the ID before entering a
 transaction. Do as much of what's needed to generate some-thingy
 before the transaction, too -- though if it depends on other refs'
 values, and those values should be current rather than it being
 acceptable for them to be values that were there recently but might
 have changed, then some of the work will need to be done in the
 transaction.

  I've considered wrapping each node in
  a ref and performing updates that way, or possibly taking the async
  route and dispatching cals using agents, but my experience there is
  nil.  Any insight there?

 I'd avoid having refs nesting more than two deep. Complex arrangements
 of cross-references among nodes can create a giant headache in
 combination with refs and transactions, so using a database approach
 with table maps, ID token keys, and cross-references to ID tokens
 rather than objects is often sensible. In particular, complex objects
 with nested refs effectively have mutable state inside them, which is
 usually not desired in Clojure. (One can argue, though, that the ref
 object is not different in principle from the ID token and removes a
 layer of indirection, and possibly obviates the need to generate IDs
 from counters that act as bottlenecks -- though, really, the Java new
 operator's allocation of memory now takes the role of ID generator.
 I'm given to understand modern VMs do some clever things to make new
 able to be fairly concurrent, though, such as giving each thread its
 own subset of the eden generation to allocate in.)

Makes sense.  I need to look into generating GUIDs quickly in a
concurrent manner for other parts
of the application anyways.  I'm sure if this becomes the bottleneck
there are workarounds like pooling
that can 

Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Ken Wesson
On Fri, Jul 29, 2011 at 4:33 PM, Trenton Strong
trenton.str...@gmail.com wrote:
 Thanks for clearing that up. ...

 Thanks for taking the time to craft such a thoughtful reply, it is
 really helpful.

You're welcome.

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en