Re: Page De-Serialization and memory

richard emberson Tue, 19 Jul 2011 16:44:30 -0700

Martin,

The reason I was interested in Wicket memory usage was because
of the potential use of Scala traits, rather than the two possible
Java approaches, might be compelling when it comes to memory usage.


First, the two Java approaches: proxy/wrapper object or bundle everything
into the base class.

The proxy/wrapper approach lets one have a single implementation
that can be share by multiple classes. The down side is that
proxy/wrapper object requires an additional reference in the
class using it and hence additional memory usage.

The bundle everything into the base class approach violates
OOP 101 dictum about having small objects focused on their
own particular behavior thus avoiding bloat.
(Not executable Java/Scala code below.)

interface Parent {
  getParent
  setParent
}
// Potentially shared implementation
class ParentProxy implements Parent {
  parent
  getParent = parent
  setParent(parent) = this.parent = parent
}

// Issue: Has additional instance variable: parentProxy
class CompWithProxy with Parent {
  parentProxy = new ParentProxy
  getParent = parentProxy.getParent
  setParent(parent) = parentProxy.setParent(parent)
}

// Issue: Does not share implementation
class CompAllInOne with Parent {
  parent
  getParent = parent
  setParent(parent) = this.parent = parent
}

Wicket has taken the "bundle everything into base class" in order
to lessen memory usage - a certainly reasonable Java approach
to the problem.

With Scala one can do the following:

// Shared implementation
trait ParentTrait {
  parent
  getParent = parent
  setParent(parent) = this.parent = parent
}

// Uses implementation
class Comp with ParentTrait

The implementation, ParentTrait, can be used by any
number of classes.
In addition, one can add to a base class any number of
such implementation traits sharing multiple implementations
across multiple classes.

So, can using such approach result in smaller (less in-memory)
object in Scala than in Java?

The ParentTrait does not really save very much. I assume
that its only the Page class and sub-classes that do not have
parent components in Wicket, so the savings per Page component
tree is very small indeed. But, there are other behaviors that
can be converted to traits, for example, Models.
Many of the instance variables in the Java Models which
take memory can be converted to methods return values which only
add to the size of the class, not to every instance of the class.
Also, with Model traits that use Component self-types, one can
do away with IComponentAssignedModel wrapping and such.

So, how to demonstrate such memory differences. I created
stripped down versions of the Component and Label classes in
both Java and Scala (only ids and Models) .
Created different Model usage scenarios
with Model object in Java and Traits in Scala, and, finally,
serialized (Java Serialization) the result comparing the size
of the resulting array of bytes. There are two runs, one with
all Strings being the empty string and the next where all
strings are 10-character strings:

The Java versions (empty string):
Label.Empty               99
Label.ReadOnly           196
Label.ReadWrite          159
Label.Resource           333
Label.Property           223
Label.ComponentProperty  351
Label.CompoundProperty   208

The Scala versions (empty string):
Label.Empty              79
Label.ReadOnly           131
Label.ReadWrite          150
Label.Resource           164
Label.Property           207
Label.ComponentProperty  134
Label.CompoundProperty   184


The Java versions (10-character strings):
Label.Empty              109
Label.ReadOnly           214
Label.ReadWrite          177
Label.Resource           359
Label.Property           241
Label.ComponentProperty  369
Label.CompoundProperty   218


The Scala versions (10-character strings):
Label.Empty               89
Label.ReadOnly           149
Label.ReadWrite          168
Label.Resource           190
Label.Property           225
Label.ComponentProperty  152
Label.CompoundProperty   194

[Note that the Java Label.Empty result is misleading since in Wicket
there is no memory overhead when a Component has no Model.]

While this does indicate that using Model traits with Scala
will result in less memory usage than the comparable Java
approach, Java Serialization adds a whole lot of extra stuff
to the array of bytes that masks the true change in
in-memory usage. With Java Serialization, the class descriptor
for each instance serialized is also added to the byte array and,
it is this, that takes up most of the array of bytes.

Thinking about it, I realized that Java Serialization is rather
a blunt tool when it comes to the requirement of (Scala) Wicket
Page serialization. Java Serialization creates a byte array
that is rather self-contained/self-descriptive in its content.
This is not required for (Scala) Wicket which has very
specific requirements and use-cases.

But first, before I describe what I did, here are the results.
The byte array size data for the serializer I created just to
show that one can do a lot better than Java Serialization:

The Scala versions (empty string):
Label.Empty                6
Label.ReadOnly             8
Label.ReadWrite            8
Label.Resource            10
Label.Property            13
Label.ComponentProperty    8
Label.CompoundProperty    11

The Scala versions (10-character strings):
Label.Empty                8
Label.ReadOnly            12
Label.ReadWrite           12
Label.Resource            16
Label.Property            17
Label.ComponentProperty   12
Label.CompoundProperty    13

Yes, better by more than a factor of 10. I assume factors
of 10 are compelling.

So, back to the requirements. I spent a couple of days creating
the serializer (currently 3.8Kloc) that focused on what I thought
would be needed by (Scala) Wicket.
The same application using (Scala) Wicket is running on either a
single machine or a group of machines.
The serialized Page system can have:

  In-memory repository
    (single-machine, testing);
  In-memory cache with local disk backstore
    (single-machine, production, re-start) and
  In-memory cache with database backstore used by a number of machines
    (multi-machine, production, fail-over, session-migration, re-start)

  Strings and associated id are cached/backstored where it is the id
    that is used in the serialized array.
  Classes and associated id are cached/backstored where it is the id
    that is used in the serialized array.
  Optimizations allow, for example, the Long value 1L to be serialized
    as 1 byte or (un-optimized) as 9 bytes.
  When using a backstore, a header is prepended to each byte array
    that includes the serializer magic number (2 bytes), serializer
    protocol version (2 bytes?) and application information (version, etc.)
    (2 bytes?).

In addition, there are two cases where one might be serializing
the same object more than once.

The first case is dealt with by most serializers, an object
appears more than once in the tree of objects being serialized.
Java Serialization deals with this. One must keep track of
the identity of all objects being serialized. Then, if an object
appears for serialization for a second (third, etc.) time, some
sort of reference object and tag is serialized rather than the
object. De-serialization is ....  obvious.
I do not know, but I assume that this does not arise in Wicket; the
same Component appearing more than once in the same Page tree of
components. If it does happen, please let me know. If it should
not happen but could, is there some visitor well-formness traversal
that check for duplicate object appearances in a given tree?

The second case is one that probably does (or could) occur with
Wicket and I've never heard of a serializer dealing with, namely,
the same object appears in more than one Page tree - knowledge
of what is being serialized is shared across serializations.
For this to work, the
Component (which could be a tree of Components) has to be
immutable like a Label with a read-only value or read-only Model
(and the Model object is never changed), etc. Here, there can be
a saving if the shared object is serialized in its own backstore
and only its identifier appears in the byte arrays of each Page.
If there was an Immutable interface which could tag immutable
objects, it would be much easier for the serializer to identify
them (well, not just easier, but, rather, plain old possible
versus impossible) - just a last minute thought.

I've not create a Java version of my serializer. But, since the
Scala version does not use much Scala magic, a Java version
would not be too hard to port to. I also have some 500 unit tests.

Well, enough for now.

Richard





On 07/10/2011 02:37 AM, Martin Grigorov wrote:

Hi,

About the use cases: my experience is that most of the time the uses
the in-memory pages (for each listener callback execution, for ajax
requests,...).
Previous version of a page, or previous page is needed when the user
clicks browser back button. Even in this case most of the time the
in-memory cache is hit. Only when the user goes several pages back and
this page is not in-memory then the disk store is used.

So far so good, but...! Even in-memory store contains serialized
versions of the Page, named SerializedPage. This is a struct which
contains
{
   sessionId: String,
   pageId: int,
   data: byte[]
}
so the Page is serialized back and forth when stored in *any*
IPageStore/IDataStore.

This is the current state in Wicket 1.5.

Me and Pedro noticed that IPageStore impl (DefaultPageStore) can be
improved to work with Page instances but we decided to postpone this
optimization for 1.5.0+.

About new String("someLiteral"): I don't remember lately seeing this
code neither in libraries, nor in applications. This constructor
should be used only when the developer explicitly wants this string to
not be interned and stored in the PermGen space, i.e. it will be
stored in the heap space.
Your benchmark test tests exactly this - the heap space.
I'll try the app with MemoryMXBean to see whether the non-heap changes
after deserialization.
I'm not very into Java Serialization but indeed it seems the Strings
are deserialized in the heap. But even in this case they go in the
Eden space, i.e. they are reclaimed soon after.

On Sun, Jul 10, 2011 at 2:37 AM, richard emberson
<richard.ember...@gmail.com>  wrote:

I you run the little Java program I included, you will see that
there is an impact - de-serialized objects take more memory.

Richard

On 07/09/2011 05:23 PM, Igor Vaynberg wrote:


string literals are interned by the jvm so they should have a minimal
memory impact.

-igor

On Sat, Jul 9, 2011 at 5:10 PM, richard emberson
<richard.ember...@gmail.com>    wrote:


Martin,

The reason I was interested was because it struck me a couple of
days ago that while each Page, tree of Components, is created
many (almost all?) of the non-end-user-generated Strings stored
as instance variables in the tree are shared
between all copies of the Page but that when such a Page is
serialized to disk and then de-serialized, each String becomes its own
copy unique to that particular Page. This means that if an
appreciable number of Pages in-memory are reanimated Pages, then
there could be a bunch of memory being used for all the String
copies.

In the attached simple Java file (yes, I still write Java when I must)
there are three different ways of creating an array of
Label objects (not Wicket Label) where each Label takes a String:
    new Label(some_string)

The first is to share the same String over all instance of the Label.
    new Label(the_string)
The second is to make a copy of the String when creating each
Label;
    new Label(new String(the_string))
The third is to create a single Label, serialize it to an array of
bytes and then generate the Labels in the array by de-serialized
the byte array for each Label.

Needless to say, the first uses the least memory; the label string
is shared by all Labels while the second and third approach
uses more memory. Also, if during the de-serialization process, the
de-serialized String is replaced with the original instance of the
String, then the third approach uses only as much memory as the
first approach.

No rocket science here, but it does seem to imply that if a
significant number of Pages in-memory are actually reanimated Pages,
then there could be a memory saving by
making de-serialization smarter about possible shared objects.
Even it it is only, say, a 5% saving for only certain Wicket
usage patterns, it might be worth looking into.

Hence, my question to the masters of Wicket and developers whose
application might fit the use-case.

Richard

On 07/09/2011 11:03 AM, Martin Makundi wrote:


Difficult to say ... we have disabled page versioning and se dump
sessions onto disk every 5 minutes to minimize memory hassles.

But I am no master ;)

**
Martin

2011/7/9 richard emberson<richard.ember...@gmail.com>:


This is a question for Wicket masters and those application builders
whose application match the criteria as specified below.

[In this case, a Wicket master is someone with a knowledge
of how Wicket is being used in a wide spectrum of applications
so that they have a feel for what use-cases exist in the real world.]

Wicket is used in a wide range of applications with a variety of
usage patterns. What I am interested in are those applications where
an appreciable number of the pages in memory are pages that had
previously been serialized and stored to disk and then reanimated,
not found in an in-memory cache and had to be read from disk and
de-serialized back into an in-memory page; which is to say,
applications with an appreciable number of reanimated pages.

Firstly, do such applications exists? These are real-world
applications where a significant number of pages in-memory
are reanimated pages.

For such applications, what percentage of all pages at any
given time are reanimated pages?
Is it, say, a couple of percent? Two or three in which case its not
very significant.
Or, is it, say, 50%? Meaning that half of all pages currently in
memory had been serialized to disk, flushed from any in-memory cache
and then, as needed, de-serialized back into a Page.

Thanks

Richard
--
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org


--
Quis custodiet ipsos custodes


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org


--
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org


--
Quis custodiet ipsos custodes

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@wicket.apache.org
For additional commands, e-mail: users-h...@wicket.apache.org

Re: Page De-Serialization and memory

Reply via email to