+uportal-dev so everyone sees the background on this.
UDP multicast is great ... in theory. In practice across the complex networks in most data centers it is a nightmare. At UW and other places I tested we had constant problems with peer discovery, message routing and other issues. As you said TCP is a pain because you have the discovery issue but uPortal doesn't have this discovery issue. One of the neat things with jGroups is the ability to write your own "protocol" handlers. uPortal provides a custom implementation of PING<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/jgroups.xml?source=c#L64>, the jGroups discovery protocol via DAO_PING<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/DAO_PING.java>. This handler uses the already shared uPortal database to coordinate node discovery. When a uPortal instance starts up (which starts ehcache, which starts jgroups) and instance of DAO_PING is created and start() is called. This schedules a Timer that runs every 60 seconds (configurable in jgroups.xml) that writes out the JVM's current physical address (as determined by jGroups, again configurable if it auto-discovers the wrong one) to the database via the JdbcPingDao<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/JdbcPingDao.java> . The next thing jGroups needs to do after start is discover peers, to do this it calls fetchClusterMembers on DAO_PING which uses the JdbcPingDao to get a list of all of the physical address that have been written to that table. jGroups then uses that list to join the cluster. The last part of the process is what the coordinator node (there is always a coordinator that is elected in a jGroups cluster) does. Every time the view (what jgroups calls the list of currently active cluster members) changes the coordinator purges the database by removing all rows that do not match known members. This handles pruning addresses of old/dead instances. This system has worked very well and effectively anyone running uPortal 4.0.8 or later very likely has a coherent jGroups cluster doing ehcache invalidation with zero extra work: https://issues.jasig.org/browse/UP-3607 You can take a look at which caches uPortal uses jGroups for and how they are configured: https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/ehcache.xml Note that uPortal does not do true replication anywhere. All of the data cached in uPortal can be retrieved from the database or recalculated very quickly so the caches are configured to do invalidation based replication where when a key in a cache is replaced with a new value or deleted a message is sent to the cluster that results in the other caches removing the key so that the value is reloaded the next time it is needed. As for overhead, the recommended approach is to have the replicateAsynchronously flag set to true in which case ehcache batches up replication messages and sends them in the background (very quickly but still in batches). For what you need in CAS tickets which I believe are ephemeral you would need to set replicatePuts=true and replicateUpdatesViaCopy=true to copy the actual data between nodes. As for performance, you configure replication behavior on a cache by cache basis. There are a bunch of caches in uPortal that are not replicated at all either because the data doesn't change or is local to the instance. Something that might be worth investigating is a way to share the jGroups Channel that gets created for ehcache in uPortal across all of the portlets in Tomcat. I had wanted to look into that but never had time to. I doubt it is a simple change but could be VERY valuable in providing cache consistency for portlets as well as uPortal. The general concept I was thinking of was to do the following (large chunk of work) - Have uPortal initialize jGroups at start time (see the ehcache JGroupsCacheManagerPeerProvider<http://grepcode.com/file/repo1.maven.org/maven2/net.sf.ehcache/ehcache-jgroupsreplication/1.7/net/sf/ehcache/distribution/jgroups/JGroupsCacheManagerPeerProvider.java#130> ) - Have uPortal expose the JChannel as an attribute in the PortletContext each portlet gets access to at init time - You probably need a tomcat context scoped wrapper around it that hides each context's messages from each other context - Write a custom Ehcache replication service (that likely extends the existing jgroups replication service) which has: - A spring listener that gets the PortletContext injected in, gets the jGroups channel and stores it in some context-global location - A version of jGroupsCacheManagerPeerProvider that uses the jGroups channel from the global location - This should fail-nice so that if uPortal doesn't provide a jChannel things just don't get replicated Hope that is a helpful wall of text :) On Wed, Feb 5, 2014 at 3:55 PM, James Wennmacher <[email protected]>wrote: > Hi Eric. > > I am starting to configure uPortal for CAS clearpass for a customer. In > the CAS documentation (Replicating PGT using > "proxyGrantingTicketStorageClass" and Distributed Caching in > https://wiki.jasig.org/display/CASC/Configuring+the+Jasig+CAS+Client+for+Java+in+the+web.xml), > they reference an example ehcache config ( > https://github.com/mmoayyed/cas/blob/master/cas-server-integration-ehcache/src/test/resources/ehcache-replicated.xml) > that has an option for > net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory for > multi-cast replication. I noticed you added jGroups UDP multicast > replication into uPortal's ehcache.xml, then changed it to TCP later, which > has the disadvantage of requiring explicit knowledge of host IP addresses. > > What are the reasons you switched from UDP multicast to TCP? Just looking > for background, and possible suggestions. Also Do you have insight into > using jGroups vs. > net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory? Why I > might use one vs. the other? I suspect you might have been down this road > before ... I haven't started doing research on one approach vs. the other > yet. > > I suspect if I have ehcache replication configured it applies to all > caches, which will likely be a performance issue. Do you have > thoughts/experience on that RE UW? I haven't looked yet into having only > the CAS info replicated. I suspect there is a way to do that. > > Thanks, > > -- > James Wennmacher - Unicon480.558.2420 > > -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-dev
