Re:[uportal-dev] EhCache and jgroups question

Eric Dalquist Wed, 05 Feb 2014 17:31:23 -0800

+uportal-dev so everyone sees the background on this.

UDP multicast is great ... in theory. In practice across the complex
networks in most data centers it is a nightmare. At UW and other places I
tested we had constant problems with peer discovery, message routing and
other issues.

As you said TCP is a pain because you have the discovery issue but uPortal
doesn't have this discovery issue. One of the neat things with jGroups is
the ability to write your own "protocol" handlers. uPortal provides a
custom implementation of
PING<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/jgroups.xml?source=c#L64>,
the jGroups discovery protocol via
DAO_PING<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/DAO_PING.java>.
This handler uses the already shared uPortal database to coordinate node
discovery.

When a uPortal instance starts up (which starts ehcache, which starts
jgroups) and instance of DAO_PING is created and start() is called. This
schedules a Timer that runs every 60 seconds (configurable in jgroups.xml)
that writes out the JVM's current physical address (as determined by
jGroups, again configurable if it auto-discovers the wrong one) to the
database via the
JdbcPingDao<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/JdbcPingDao.java>
.

The next thing jGroups needs to do after start is discover peers, to do
this it calls fetchClusterMembers on DAO_PING which uses the JdbcPingDao to
get a list of all of the physical address that have been written to that
table. jGroups then uses that list to join the cluster.

The last part of the process is what the coordinator node (there is always
a coordinator that is elected in a jGroups cluster) does. Every time the
view (what jgroups calls the list of currently active cluster members)
changes the coordinator purges the database by removing all rows that do
not match known members. This handles pruning addresses of old/dead
instances.

This system has worked very well and effectively anyone running uPortal
4.0.8 or later very likely has a coherent jGroups cluster doing ehcache
invalidation with zero extra work: https://issues.jasig.org/browse/UP-3607

You can take a look at which caches uPortal uses jGroups for and how they
are configured:
https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/ehcache.xml

Note that uPortal does not do true replication anywhere. All of the data
cached in uPortal can be retrieved from the database or recalculated very
quickly so the caches are configured to do invalidation based replication
where when a key in a cache is replaced with a new value or deleted a
message is sent to the cluster that results in the other caches removing
the key so that the value is reloaded the next time it is needed.

As for overhead, the recommended approach is to have the
replicateAsynchronously flag set to true in which case ehcache batches up
replication messages and sends them in the background (very quickly but
still in batches).

For what you need in CAS tickets which I believe are ephemeral you would
need to set replicatePuts=true and replicateUpdatesViaCopy=true to copy the
actual data between nodes.

As for performance, you configure replication behavior on a cache by cache
basis. There are a bunch of caches in uPortal that are not replicated at
all either because the data doesn't change or is local to the instance.

Something that might be worth investigating is a way to share the jGroups
Channel that gets created for ehcache in uPortal across all of the portlets
in Tomcat. I had wanted to look into that but never had time to. I doubt it
is a simple change but could be VERY valuable in providing cache
consistency for portlets as well as uPortal. The general concept I was
thinking of was to do the following (large chunk of work)

   - Have uPortal initialize jGroups at start time (see the ehcache

JGroupsCacheManagerPeerProvider<http://grepcode.com/file/repo1.maven.org/maven2/net.sf.ehcache/ehcache-jgroupsreplication/1.7/net/sf/ehcache/distribution/jgroups/JGroupsCacheManagerPeerProvider.java#130>
   )
   - Have uPortal expose the JChannel as an attribute in the PortletContext
   each portlet gets access to at init time
      - You probably need a tomcat context scoped wrapper around it that
      hides each context's messages from each other context
   - Write a custom Ehcache replication service (that likely extends the
   existing jgroups replication service) which has:
      - A spring listener that gets the PortletContext injected in, gets
      the jGroups channel and stores it in some context-global location
      - A version of jGroupsCacheManagerPeerProvider that uses the jGroups
      channel from the global location
      - This should fail-nice so that if uPortal doesn't provide a jChannel
      things just don't get replicated

Hope that is a helpful wall of text :)

On Wed, Feb 5, 2014 at 3:55 PM, James Wennmacher <[email protected]>wrote:

>  Hi Eric.
>
> I am starting to configure uPortal for CAS clearpass for a customer.  In
> the CAS documentation (Replicating PGT using
> "proxyGrantingTicketStorageClass" and Distributed Caching in
> https://wiki.jasig.org/display/CASC/Configuring+the+Jasig+CAS+Client+for+Java+in+the+web.xml),
> they reference an example ehcache config (
> https://github.com/mmoayyed/cas/blob/master/cas-server-integration-ehcache/src/test/resources/ehcache-replicated.xml)
> that has an option for
> net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory for
> multi-cast replication.  I noticed you added jGroups UDP multicast
> replication into uPortal's ehcache.xml, then changed it to TCP later, which
> has the disadvantage of requiring explicit knowledge of host IP addresses.
>
> What are the reasons you switched from UDP multicast to TCP?  Just looking
> for background, and possible suggestions.  Also Do you have insight into
> using jGroups vs.
> net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory? Why I
> might use one vs. the other? I suspect you might have been down this road
> before ... I haven't started doing research on one approach vs. the other
> yet.
>
> I suspect if I have ehcache replication configured it applies to all
> caches, which will likely be a performance issue.  Do you have
> thoughts/experience on that RE UW?  I haven't looked yet into having only
> the CAS info replicated.  I suspect there is a way to do that.
>
> Thanks,
>
> --
> James Wennmacher - Unicon480.558.2420
>
>

-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/uportal-dev

Re:[uportal-dev] EhCache and jgroups question

Reply via email to