Re: PDX best practices

Darrel Schneider Wed, 27 Jan 2016 10:54:02 -0800

The following is from:
http://geode-docs.cfapps.io/docs/developing/data_serialization/data_serialization_options.html

Geode serialization (either PDX Serialization or Data Serialization) does
> not support circular object graphs whereas Java serialization does. In
> Geode serialization, if the same object is referenced more than once in
> an object graph, the object is serialized for each reference, and
> deserialization *produces multiple copies* of the object. By contrast in
> this situation, Java serialization serializes the object once and when
> deserializing the object, it produces one instance of the object with
> multiple references.

So even if your graphs do not have cycles you may get duplication if nodes
in the graph are referenced more than once.

Keep in mind that each value stored in a geode region is serialized as a
single BLOB and transmitted over the network.

If your large arrays or graphs are something you will be modifying and you
only need to change a relatively small part of them look into the delta
propagation feature:
http://geode.docs.pivotal.io/docs/developing/delta_propagation/chapter_overview.html

For very large objects I would think you might want to keep all the access
to the data on the server instead of transmitting the large object back to
the client. So you might be planning to do this with functions that access
that large arrays and graphs on the server, compute some result on the
server, and then just send back that result to your client. In this case
you would want to keep the data deserialized on the server so you can
quickly access your data without needing to deserialize it.

One of the features of PDX is that it allows you to access the fields of an
object without needing that class on the server and without needing to
deserialize that data. But I don't think you need this feature of PDX. Your
data will initially be stored in serialized form in the region but once you
access a region value on the server (for example from a function or a cache
listener) it will be kept from then on in deserialized form on the server.

On Tue, Jan 26, 2016 at 6:03 PM, Joseph Winston <[email protected]>
wrote:

> I am looking for a document or hints on best practices when using PDX.
> The two specific use cases that I’m interested in understanding are:
> 1. Large arrays — Currently these data types are kept in a shared memory
> segments that are organized using the most common access pattern (For
> example: z fastest, then y, then x).  When using PDX, should a single large
> array that normally is on the order of 100s of GB be broken into smaller
> objects, say z slices to help with loading the data?  Are there better ways
> to use PDX for these 3D and higher dimension arrays?
> 2. Graphs — One common data type is a directed acyclic graph, specifically
> a scene graph, that holds graphical representations of business objects.
> What is the best way to use PDX for large graphs?
>
> Thanks
>
>

Re: PDX best practices

Reply via email to