As an update to the group, one minor difference with what Eric suggested I thought I'd mention to get group feedback on (and to share general knowledge) is that for the CAS Clearpass Proxy-Granting Tickets (PGTs) replication, I'm setting replicateAsynchronously=false to force synchronous replication which is really just an immediate channel.send() and doesn't wait for the peer acknowledgement, see http://jira.terracotta.org/jira/browse/EHC-874). I don't think this is a significant performance impact and no chance of deadlock, and is to reduce the chance that the uPortal server the user is on attempts to obtain the PGT before it has been replicated to it.
I'm adding commented-out configuration and wiki documentation about clustered CAS Clearpass so this should be much easier for anyone who wants to do it in the future. As a background for those interested (and I'll add this to the wiki documentation), the way the CAS Clearpass works with uPortal is: 1. When uPortal receives the service ticket in the URL from the CAS redirect after the user enters their credentials in CAS, uPortal does the service ticket validation with CAS to get the userId and verify authentication is good. uPortal always requests the clearpass PGT in the service ticket validation. 2. If enabled on CAS, CAS will not respond to uPortal until CAS initiates sending a PGT to uPortal and getting a response (unsure what happens if there is a failure). At that point, one of the uPortal servers has the PGT so CAS will reply to the service ticket validation. 3. uPortal at that point needs access to the PGT if it needs to provide the user's password to a portlet. I set replicateAsynchronously=false so the uPortal server that CAS has invoked with the PGT will at least fire off the replication request. The expectation and hope is that by the time CAS receives the PGT POST response and responds back to the original uPortal server for the service ticket validation, the PGT replication packet will be received and processed by peer uPortal nodes (or at least the one the user has logged into) and the PGT will be available in ehcache when the uPortal server the user has logged into receives the service ticket response from CAS. I hope all this makes sense. If anyone sees a problem with setting replicateAsynchronously=false let me know. Thanks! James Wennmacher - Unicon 480.558.2420 On 02/07/2014 11:00 AM, Eric Dalquist wrote: > That is correct. You can set those ports if you need to, I believe we > do at UW to deal with TCP firewalls but they are not required in most > cases. > > One thing to note is the important data in that table is stored in the > BLOBs. jGroups only supports using Java serialization to store their > PhysicalAddress class. I added the human-readable toString output > columns so it would be easier to debug. > > For debugging/monitoring all of this take a look at JMX (jConsole) > data. The jGroups instance registers itself and exposes a ton of > useful data via JMX. You can see who all the members of a current view > are, who is the coordinator, traffic stats, etc. > > > There is also some vestigial code in > https://github.com/Jasig/uPortal/tree/8de4d5030be8dbd219b73a28037185e1d2df661d/uportal-war/src/main/java/org/jasig/portal/jgroups/auth > > which I was trying to use to get group auth and encryption "just > working" as well but I never had much luck with it. If that could be > reliably enabled then the coordinator node would write out a random > auth/crypto token to the database which the other nodes would then use > to auth into the group. > > > On Fri, Feb 7, 2014 at 9:47 AM, James Wennmacher > <[email protected] <mailto:[email protected]>> wrote: > > To verify, we typically don't need to set any of the jgroups > property values in portlet.properties for a clustered > environment... in particular > #uPortal.cacheManager.jgroups.fd_sock.start_port= > #uPortal.cacheManager.jgroups.tcp.bind_port= > > jGroups will by default pick a random port and communicate the > port that node chose to the other nodes in the cluster so the > nodes can communicate (replicate cache invalidations or cache > data, in addition to discovery of new/removed nodes)? I assume > that's what is stored in the database (UP_JGROUPS_PING table, > PHYSICAL_ADDRESS column - holds a value like > fe80:0:0:0:a288:b4ff:febe:ed0%3:43362). > > Thanks, > > James Wennmacher - Unicon > 480.558.2420 <tel:480.558.2420> > > On 02/05/2014 06:30 PM, Eric Dalquist wrote: >> +uportal-dev so everyone sees the background on this. >> >> >> UDP multicast is great ... in theory. In practice across the >> complex networks in most data centers it is a nightmare. At UW >> and other places I tested we had constant problems with peer >> discovery, message routing and other issues. >> >> As you said TCP is a pain because you have the discovery issue >> but uPortal doesn't have this discovery issue. One of the neat >> things with jGroups is the ability to write your own "protocol" >> handlers. uPortal provides a custom implementation of PING >> >> <https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/jgroups.xml?source=c#L64>, >> the jGroups discovery protocol via DAO_PING >> >> <https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/DAO_PING.java>. >> This handler uses the already shared uPortal database to >> coordinate node discovery. >> >> When a uPortal instance starts up (which starts ehcache, which >> starts jgroups) and instance of DAO_PING is created and start() >> is called. This schedules a Timer that runs every 60 seconds >> (configurable in jgroups.xml) that writes out the JVM's current >> physical address (as determined by jGroups, again configurable if >> it auto-discovers the wrong one) to the database via the >> JdbcPingDao >> >> <https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/JdbcPingDao.java>. >> >> The next thing jGroups needs to do after start is discover peers, >> to do this it calls fetchClusterMembers on DAO_PING which uses >> the JdbcPingDao to get a list of all of the physical address that >> have been written to that table. jGroups then uses that list to >> join the cluster. >> >> The last part of the process is what the coordinator node (there >> is always a coordinator that is elected in a jGroups cluster) >> does. Every time the view (what jgroups calls the list of >> currently active cluster members) changes the coordinator purges >> the database by removing all rows that do not match known >> members. This handles pruning addresses of old/dead instances. >> >> This system has worked very well and effectively anyone running >> uPortal 4.0.8 or later very likely has a coherent jGroups cluster >> doing ehcache invalidation with zero extra work: >> https://issues.jasig.org/browse/UP-3607 >> >> You can take a look at which caches uPortal uses jGroups for and >> how they are configured: >> >> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/ehcache.xml >> >> Note that uPortal does not do true replication anywhere. All of >> the data cached in uPortal can be retrieved from the database or >> recalculated very quickly so the caches are configured to do >> invalidation based replication where when a key in a cache is >> replaced with a new value or deleted a message is sent to the >> cluster that results in the other caches removing the key so that >> the value is reloaded the next time it is needed. >> >> As for overhead, the recommended approach is to have the >> replicateAsynchronously flag set to true in which case ehcache >> batches up replication messages and sends them in the background >> (very quickly but still in batches). >> >> >> >> For what you need in CAS tickets which I believe are ephemeral >> you would need to set replicatePuts=true and >> replicateUpdatesViaCopy=true to copy the actual data between nodes. >> >> >> As for performance, you configure replication behavior on a cache >> by cache basis. There are a bunch of caches in uPortal that are >> not replicated at all either because the data doesn't change or >> is local to the instance. >> >> >> >> Something that might be worth investigating is a way to share the >> jGroups Channel that gets created for ehcache in uPortal across >> all of the portlets in Tomcat. I had wanted to look into that but >> never had time to. I doubt it is a simple change but could be >> VERY valuable in providing cache consistency for portlets as well >> as uPortal. The general concept I was thinking of was to do the >> following (large chunk of work) >> >> * Have uPortal initialize jGroups at start time (see the >> ehcache JGroupsCacheManagerPeerProvider) >> * Have uPortal expose the JChannel as an attribute in the >> PortletContext each portlet gets access to at init time >> o You probably need a tomcat context scoped wrapper around >> it that hides each context's messages from each other context >> * Write a custom Ehcache replication service (that likely >> extends the existing jgroups replication service) which has: >> o A spring listener that gets the PortletContext injected >> in, gets the jGroups channel and stores it in some >> context-global location >> o A version of jGroupsCacheManagerPeerProvider that uses >> the jGroups channel from the global location >> o This should fail-nice so that if uPortal doesn't provide >> a jChannel things just don't get replicated >> >> >> Hope that is a helpful wall of text :) >> >> >> >> On Wed, Feb 5, 2014 at 3:55 PM, James Wennmacher >> <[email protected] <mailto:[email protected]>> wrote: >> >> Hi Eric. >> >> I am starting to configure uPortal for CAS clearpass for a >> customer. In the CAS documentation (Replicating PGT using >> "proxyGrantingTicketStorageClass" and Distributed Caching in >> >> https://wiki.jasig.org/display/CASC/Configuring+the+Jasig+CAS+Client+for+Java+in+the+web.xml), >> they reference an example ehcache config ( >> >> https://github.com/mmoayyed/cas/blob/master/cas-server-integration-ehcache/src/test/resources/ehcache-replicated.xml) >> that has an option for >> net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory for >> multi-cast replication. I noticed you added jGroups UDP >> multicast replication into uPortal's ehcache.xml, then >> changed it to TCP later, which has the disadvantage of >> requiring explicit knowledge of host IP addresses. >> >> What are the reasons you switched from UDP multicast to TCP? >> Just looking for background, and possible suggestions. Also >> Do you have insight into using jGroups vs. >> net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory? >> Why I might use one vs. the other? I suspect you might have >> been down this road before ... I haven't started doing >> research on one approach vs. the other yet. >> >> I suspect if I have ehcache replication configured it applies >> to all caches, which will likely be a performance issue. Do >> you have thoughts/experience on that RE UW? I haven't looked >> yet into having only the CAS info replicated. I suspect >> there is a way to do that. >> >> Thanks, >> >> -- >> James Wennmacher - Unicon >> 480.558.2420 <tel:480.558.2420> >> >> >> -- >> >> You are currently subscribed [email protected] >> <mailto:[email protected]> as:[email protected] >> <mailto:[email protected]> >> To unsubscribe, change settings or access archives, >> seehttp://www.ja-sig.org/wiki/display/JSG/uportal-dev > > -- > > You are currently subscribed [email protected] > <mailto:[email protected]> as:[email protected] > <mailto:[email protected]> > > To unsubscribe, change settings or access archives, > seehttp://www.ja-sig.org/wiki/display/JSG/uportal-dev > > > -- > > You are currently subscribed to [email protected] as: > [email protected] > To unsubscribe, change settings or access archives, see > http://www.ja-sig.org/wiki/display/JSG/uportal-dev -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/uportal-dev
