Re: [uportal-dev] NaturalIdCache annotation

Eric Dalquist Fri, 23 May 2014 16:06:01 -0700

So this was a partially realized feature in Hibernate that I helped them
complete a few years ago. To really understand it you need to first have a
decent grasp of the multi-layer caching system that hibernate uses. I'll
like a bunch of blogs that I would HIGHLY recommend you read but I'll also
give a short summary.


http://www.javalobby.org/java/forums/t48846.html
http://apmblog.compuware.com/2009/02/16/understanding-caching-in-hibernate-part-two-the-query-cache/
http://learningviacode.blogspot.com/2013/12/update-timestamps-cache-in-action.html
http://tech.puredanger.com/2009/07/10/hibernate-query-cache/

The last one is the most relevant to the problem natural id caching solves,
in that it asks the question "Hibernate query cache considered harmful?"

A few terms:

   - *entity* - a hibernate managed persistent object
   - *id *- the primary key of the object. If you're following good data
   design principals this is an auto-generated number with absolutely zero
   business meaning. It should be treated as a completely opaque identifier
   and EVERY entity stored in the DB should have one.
   - *natural id* - the field or combination of fields that make the object
   unique in your application. For example in uPortal a portlet definition has
   a fname and a user has a username. These are immutable, non-nullable,
   unique fields for those objects.


So the short summary of caching layers, this isn't going to be 100%
accurate as I haven't looked at this stuff in a while.

   - *Session Cache (first level) *- Bound to the session (in uPortal this
   is thread/request scoped) caches full constructed entities (and all their
   references) that have been "loaded" into the session by primary id.
   - *Second Level Cache* - This is globally shared (lives in ehcache) and
   caches objects by primary id, hibernate does a bunch of work to keep the
   data in here from getting stale and this is what the jgroups invalidation
   stuff in uPortal helps with by saying "remove entry 12345 from cache 'foo'"
   when that entry is modified. Note that the data cached in here is in an
   intermediate form to deal with referential freshness. If EntityA contains a
   reference to EntityB the cached version of EntityA just contains the id of
   EntityB so that when the data is loaded the freshest version of EntityB is
   used.
   - *Query Cache* - This is a global cache of queries keyed off of the
   query string. For example "select person from Person where person.firstName
   = 'Eric'" would be the key and hibernate caches the RAW SQL RESULT SET
   - *Update Timestamp Cache* - Since the query cache is not keyed off of
   any sort of id hibernate has to be REALLY careful that it doesn't use it
   and get stale data. To that end this cache tracks the timestamp of the last
   time each table it manages was modified. When hibernate gets a result set
   from the query cache it checks to see if it is older than the last
   modification and if it is the cached result set is ignored as hibernate
   cannot be sure if the data it contains is still fresh.
   - *Natural ID Cache* - A very simple cache that maps the natural id of
   the entity to the id of the entity. This cache is never invalidated (your
   natural id never changes right?) and provides very fast lookup from natural
   id -> id -> entity without having to worry about the Update Timestamp Cache.


So to help show how this all works lets talk through a few query/load
scenarios. Like any good software these layer on each other.

   - *load by primary id* - this is nice and easy. You say
   hibernate.load(PortletDefinition.class, 12345); Hibernate looks in the
   Session Cache first for a PortletDefinition with ID 12345. Then it looks in
   the Second Level Cache for a PortletDefinition with ID 12345. Then it does
   a database query for a portlet definition with that primary key value. This
   is the fastest, most efficient way to get at a hibernate managed entity and
   why much of uPortal just passes around primary IDs and ALWAYS goes back to
   the DAO every time the actual object is needed. At worst you get 1 fully
   indexed SQL query for the entire duration of your http request handling
   which then primes the session cache, second level cache and the natural id
   cache. Realistically you only go down to the Second Level Cache and every
   request after that for the rest of each request just hits the session cache
   which doesn't even have to be thread safe so is REALLY fast.
   - *load by natural id* - this is the second best way to load an entity.
   Hibernate looks in the natural id cache for the mapping to the id, if it
   finds it a simple load(id) can be done. If there is a miss hibernate does a
   *much* simpler SQL query to get the id for the natural id then does a
   load(id). At worst here you get 2 fully indexed SQL queries and generally
   the same cache behavior as load by primary key
   - *query* - These are VERY hard to cache for entities where the dataset
   changes with any sort of frequency and realistically very few entity types
   in an application get any value out of a query cache. In this case
   hibernate generates the canonical SQL, checks the query cache to see if an
   entry exists for that SQL, if it does checks that no entity that touches
   any of the tables involved in that SQL query have changed since this result
   set was cached and if that is the case it can then parse the result set and
   try to use the second level cache to alleviate the SQL resultset
   unmarshalling it now has to do.
   - *entity save/update* - So the final part here is the write bit. When
   an entity is created or updated hibernate has to make sure the caches are
   all still valid. The session cache is easy, there is no concurrent access
   so the new data can just be put in place. The second level cache is also
   easy, just replace the id -> entity mapping with the new entity. The
   natural id cache is even easier, for new entities add a naturalId -> id
   mapping and updates should never modify the naturalId so there is nothing
   to do. The query cache is a pain, we mark all the affected tables as
   updated and now ALL query results that touch those tables are useless.


Ok, that was a lot of explanation about ids, caches, and entity operations.
So lets think about what life was like before @NaturalId. In that world
every time uPortal asked for a user by username the best we could do cache
wise was to hit the query cache, hope that nothing had modified UP_USER
since we last asked for that user otherwise we were going to run a SQL
query to find them, load a bunch of data that might already be in the
second level cache (or even the session cache) and cache the new query
result only to have a large chance of not using it again. The solution to
this problem that some people would do would be to overload the natural id
and use it as the id as well. That gets really hard when you start having
complex multi-column identities for entities.

So I hope that helps answer the "why does this exist and what does it do"
bits.

One more thing with this. This design is also why uPortal's DAOs are so
defensive about object creation. To help ensure that there is always data
consistency the DAOs require that you provide all of the data needed to
populate the natural id of the entity in the create function. That then
returns an *interface* that has a package private implementation to help
protect the id and natural id from mutation. It also serves to boost
confidence for working in other parts of the code base. I know that if I
have a reference to a IPortletDefinition it has already been persisted, the
id field is populated and I don't have to worry about detached entities or
any of the other dirty little secrets you get from working with an ORM
layer.

Something I tried to reiterate a lot when working on the persistence layer
is that ORMs are NOT magic. They provide some great features such as
database agnostic APIs, very complex caching and data consistency
architectures that provide for performance which would otherwise be very
hard to get to, and they shield developers from the API hell that is JDBC.
That said you need to understand how the ORM layer works and at least at a
high level what is going on. The hibernate documentation is generally very
good and I always found a lot of support in the hibernate-dev IRC room.


On Fri, May 23, 2014 at 3:17 PM, James Wennmacher <[email protected]>wrote:

>  I'd love to get a little more clarity on the @NaturalIdCache annotation
> in terms of what it does, how it aids performance, how it is used when a
> class caches the object itself and the natural id.  When implementing
> https://issues.jasig.org/browse/UP-4108 I found a number of Event
> Aggregation classes and some others (see
> https://github.com/Jasig/uPortal/pull/328) use it (you can also search
> ehcache.xml for NaturalId to get some idea).  I also found we have some
> inconsistencies ( https://issues.jasig.org/browse/UP-4110) I'd like to
> fix once I get a better handle on it.
>
> Thanks,
>
> --
> James Wennmacher - Unicon480.558.2420
>
> --
>
> You are currently subscribed to [email protected] as: 
> [email protected]
> To unsubscribe, change settings or access archives, see 
> http://www.ja-sig.org/wiki/display/JSG/uportal-dev
>
>

-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/uportal-dev

Re: [uportal-dev] NaturalIdCache annotation

Reply via email to