So this was a partially realized feature in Hibernate that I helped them complete a few years ago. To really understand it you need to first have a decent grasp of the multi-layer caching system that hibernate uses. I'll like a bunch of blogs that I would HIGHLY recommend you read but I'll also give a short summary.
http://www.javalobby.org/java/forums/t48846.html http://apmblog.compuware.com/2009/02/16/understanding-caching-in-hibernate-part-two-the-query-cache/ http://learningviacode.blogspot.com/2013/12/update-timestamps-cache-in-action.html http://tech.puredanger.com/2009/07/10/hibernate-query-cache/ The last one is the most relevant to the problem natural id caching solves, in that it asks the question "Hibernate query cache considered harmful?" A few terms: - *entity* - a hibernate managed persistent object - *id *- the primary key of the object. If you're following good data design principals this is an auto-generated number with absolutely zero business meaning. It should be treated as a completely opaque identifier and EVERY entity stored in the DB should have one. - *natural id* - the field or combination of fields that make the object unique in your application. For example in uPortal a portlet definition has a fname and a user has a username. These are immutable, non-nullable, unique fields for those objects. So the short summary of caching layers, this isn't going to be 100% accurate as I haven't looked at this stuff in a while. - *Session Cache (first level) *- Bound to the session (in uPortal this is thread/request scoped) caches full constructed entities (and all their references) that have been "loaded" into the session by primary id. - *Second Level Cache* - This is globally shared (lives in ehcache) and caches objects by primary id, hibernate does a bunch of work to keep the data in here from getting stale and this is what the jgroups invalidation stuff in uPortal helps with by saying "remove entry 12345 from cache 'foo'" when that entry is modified. Note that the data cached in here is in an intermediate form to deal with referential freshness. If EntityA contains a reference to EntityB the cached version of EntityA just contains the id of EntityB so that when the data is loaded the freshest version of EntityB is used. - *Query Cache* - This is a global cache of queries keyed off of the query string. For example "select person from Person where person.firstName = 'Eric'" would be the key and hibernate caches the RAW SQL RESULT SET - *Update Timestamp Cache* - Since the query cache is not keyed off of any sort of id hibernate has to be REALLY careful that it doesn't use it and get stale data. To that end this cache tracks the timestamp of the last time each table it manages was modified. When hibernate gets a result set from the query cache it checks to see if it is older than the last modification and if it is the cached result set is ignored as hibernate cannot be sure if the data it contains is still fresh. - *Natural ID Cache* - A very simple cache that maps the natural id of the entity to the id of the entity. This cache is never invalidated (your natural id never changes right?) and provides very fast lookup from natural id -> id -> entity without having to worry about the Update Timestamp Cache. So to help show how this all works lets talk through a few query/load scenarios. Like any good software these layer on each other. - *load by primary id* - this is nice and easy. You say hibernate.load(PortletDefinition.class, 12345); Hibernate looks in the Session Cache first for a PortletDefinition with ID 12345. Then it looks in the Second Level Cache for a PortletDefinition with ID 12345. Then it does a database query for a portlet definition with that primary key value. This is the fastest, most efficient way to get at a hibernate managed entity and why much of uPortal just passes around primary IDs and ALWAYS goes back to the DAO every time the actual object is needed. At worst you get 1 fully indexed SQL query for the entire duration of your http request handling which then primes the session cache, second level cache and the natural id cache. Realistically you only go down to the Second Level Cache and every request after that for the rest of each request just hits the session cache which doesn't even have to be thread safe so is REALLY fast. - *load by natural id* - this is the second best way to load an entity. Hibernate looks in the natural id cache for the mapping to the id, if it finds it a simple load(id) can be done. If there is a miss hibernate does a *much* simpler SQL query to get the id for the natural id then does a load(id). At worst here you get 2 fully indexed SQL queries and generally the same cache behavior as load by primary key - *query* - These are VERY hard to cache for entities where the dataset changes with any sort of frequency and realistically very few entity types in an application get any value out of a query cache. In this case hibernate generates the canonical SQL, checks the query cache to see if an entry exists for that SQL, if it does checks that no entity that touches any of the tables involved in that SQL query have changed since this result set was cached and if that is the case it can then parse the result set and try to use the second level cache to alleviate the SQL resultset unmarshalling it now has to do. - *entity save/update* - So the final part here is the write bit. When an entity is created or updated hibernate has to make sure the caches are all still valid. The session cache is easy, there is no concurrent access so the new data can just be put in place. The second level cache is also easy, just replace the id -> entity mapping with the new entity. The natural id cache is even easier, for new entities add a naturalId -> id mapping and updates should never modify the naturalId so there is nothing to do. The query cache is a pain, we mark all the affected tables as updated and now ALL query results that touch those tables are useless. Ok, that was a lot of explanation about ids, caches, and entity operations. So lets think about what life was like before @NaturalId. In that world every time uPortal asked for a user by username the best we could do cache wise was to hit the query cache, hope that nothing had modified UP_USER since we last asked for that user otherwise we were going to run a SQL query to find them, load a bunch of data that might already be in the second level cache (or even the session cache) and cache the new query result only to have a large chance of not using it again. The solution to this problem that some people would do would be to overload the natural id and use it as the id as well. That gets really hard when you start having complex multi-column identities for entities. So I hope that helps answer the "why does this exist and what does it do" bits. One more thing with this. This design is also why uPortal's DAOs are so defensive about object creation. To help ensure that there is always data consistency the DAOs require that you provide all of the data needed to populate the natural id of the entity in the create function. That then returns an *interface* that has a package private implementation to help protect the id and natural id from mutation. It also serves to boost confidence for working in other parts of the code base. I know that if I have a reference to a IPortletDefinition it has already been persisted, the id field is populated and I don't have to worry about detached entities or any of the other dirty little secrets you get from working with an ORM layer. Something I tried to reiterate a lot when working on the persistence layer is that ORMs are NOT magic. They provide some great features such as database agnostic APIs, very complex caching and data consistency architectures that provide for performance which would otherwise be very hard to get to, and they shield developers from the API hell that is JDBC. That said you need to understand how the ORM layer works and at least at a high level what is going on. The hibernate documentation is generally very good and I always found a lot of support in the hibernate-dev IRC room. On Fri, May 23, 2014 at 3:17 PM, James Wennmacher <[email protected]>wrote: > I'd love to get a little more clarity on the @NaturalIdCache annotation > in terms of what it does, how it aids performance, how it is used when a > class caches the object itself and the natural id. When implementing > https://issues.jasig.org/browse/UP-4108 I found a number of Event > Aggregation classes and some others (see > https://github.com/Jasig/uPortal/pull/328) use it (you can also search > ehcache.xml for NaturalId to get some idea). I also found we have some > inconsistencies ( https://issues.jasig.org/browse/UP-4110) I'd like to > fix once I get a better handle on it. > > Thanks, > > -- > James Wennmacher - Unicon480.558.2420 > > -- > > You are currently subscribed to [email protected] as: > [email protected] > To unsubscribe, change settings or access archives, see > http://www.ja-sig.org/wiki/display/JSG/uportal-dev > > -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-dev
