Re: [uportal-dev] EhCache and jgroups question

Eric Dalquist Fri, 07 Feb 2014 13:23:24 -0800

Seems reasonable to me. I don't think there is anything inherently bad with
synchronous replication, you just need to have a good reason like you have
here to add in the extra cost of sending the replication data on cache put.



On Fri, Feb 7, 2014 at 12:24 PM, James Wennmacher <[email protected]>wrote:

>  As an update to the group, one minor difference with what Eric suggested
> I thought I'd mention to get group feedback on (and to share general
> knowledge) is that for the CAS Clearpass Proxy-Granting Tickets (PGTs)
> replication, I'm setting replicateAsynchronously=false to force synchronous
> replication which is really just an immediate channel.send() and doesn't
> wait for the peer acknowledgement, see
> http://jira.terracotta.org/jira/browse/EHC-874).  I don't think this is a
> significant performance impact and no chance of deadlock, and is to reduce
> the chance that the uPortal server the user is on attempts to obtain the
> PGT before it has been replicated to it.
>
> I'm adding commented-out configuration and wiki documentation about
> clustered CAS Clearpass so this should be much easier for anyone who wants
> to do it in the future.
>
> As a background for those interested (and I'll add this to the wiki
> documentation), the way the CAS Clearpass works with uPortal is:
>
> 1. When uPortal receives the service ticket in the URL from the CAS
> redirect after the user enters their credentials in CAS, uPortal does the
> service ticket validation with CAS to get the userId and verify
> authentication is good.  uPortal always requests the clearpass PGT in the
> service ticket validation.
>
> 2. If enabled on CAS, CAS will not respond to uPortal until CAS initiates
> sending a PGT to uPortal and getting a response (unsure what happens if
> there is a failure).  At that point, one of the uPortal servers has the PGT
> so CAS will reply to the service ticket validation.
>
> 3. uPortal at that point needs access to the PGT if it needs to provide
> the user's password to a portlet.
>
> I set replicateAsynchronously=false so the uPortal server that CAS has
> invoked with the PGT will at least fire off the replication request.  The
> expectation and hope is that by the time CAS receives the PGT POST response
> and responds back to the original uPortal server for the service ticket
> validation, the PGT replication packet will be received and processed by
> peer uPortal nodes (or at least the one the user has logged into) and the
> PGT will be available in ehcache when the uPortal server the user has
> logged into receives the service ticket response from CAS.
>
> I hope all this makes sense.  If anyone sees a problem with setting
> replicateAsynchronously=false let me know.
>
> Thanks!
>
> James Wennmacher - Unicon480.558.2420
>
> On 02/07/2014 11:00 AM, Eric Dalquist wrote:
>
> That is correct. You can set those ports if you need to, I believe we do
> at UW to deal with TCP firewalls but they are not required in most cases.
>
>  One thing to note is the important data in that table is stored in the
> BLOBs. jGroups only supports using Java serialization to store their
> PhysicalAddress class. I added the human-readable toString output columns
> so it would be easier to debug.
>
>  For debugging/monitoring all of this take a look at JMX (jConsole) data.
> The jGroups instance registers itself and exposes a ton of useful data via
> JMX. You can see who all the members of a current view are, who is the
> coordinator, traffic stats, etc.
>
>
>  There is also some vestigial code in
> https://github.com/Jasig/uPortal/tree/8de4d5030be8dbd219b73a28037185e1d2df661d/uportal-war/src/main/java/org/jasig/portal/jgroups/authwhich
>  I was trying to use to get group auth and encryption "just working"
> as well but I never had much luck with it. If that could be reliably
> enabled then the coordinator node would write out a random auth/crypto
> token to the database which the other nodes would then use to auth into the
> group.
>
>
> On Fri, Feb 7, 2014 at 9:47 AM, James Wennmacher 
> <[email protected]>wrote:
>
>>  To verify, we typically don't need to set any of the jgroups property
>> values in portlet.properties for a clustered environment... in particular
>> #uPortal.cacheManager.jgroups.fd_sock.start_port=
>> #uPortal.cacheManager.jgroups.tcp.bind_port=
>>
>> jGroups will by default pick a random port and communicate the port that
>> node chose to the other nodes in the cluster so the nodes can communicate
>> (replicate cache invalidations or cache data, in addition to discovery of
>> new/removed nodes)?  I assume that's what is stored in the database
>> (UP_JGROUPS_PING table, PHYSICAL_ADDRESS column - holds a value like
>> fe80:0:0:0:a288:b4ff:febe:ed0%3:43362).
>>
>> Thanks,
>>
>> James Wennmacher - Unicon480.558.2420
>>
>>   On 02/05/2014 06:30 PM, Eric Dalquist wrote:
>>
>>  +uportal-dev so everyone sees the background on this.
>>
>>
>>  UDP multicast is great ... in theory. In practice across the complex
>> networks in most data centers it is a nightmare. At UW and other places I
>> tested we had constant problems with peer discovery, message routing and
>> other issues.
>>
>>  As you said TCP is a pain because you have the discovery issue but
>> uPortal doesn't have this discovery issue. One of the neat things with
>> jGroups is the ability to write your own "protocol" handlers. uPortal
>> provides a custom implementation of 
>> PING<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/jgroups.xml?source=c#L64>,
>> the jGroups discovery protocol via 
>> DAO_PING<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/DAO_PING.java>.
>> This handler uses the already shared uPortal database to coordinate node
>> discovery.
>>
>>  When a uPortal instance starts up (which starts ehcache, which starts
>> jgroups) and instance of DAO_PING is created and start() is called. This
>> schedules a Timer that runs every 60 seconds (configurable in jgroups.xml)
>> that writes out the JVM's current physical address (as determined by
>> jGroups, again configurable if it auto-discovers the wrong one) to the
>> database via the 
>> JdbcPingDao<https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/java/org/jasig/portal/jgroups/protocols/JdbcPingDao.java>
>> .
>>
>>  The next thing jGroups needs to do after start is discover peers, to do
>> this it calls fetchClusterMembers on DAO_PING which uses the JdbcPingDao to
>> get a list of all of the physical address that have been written to that
>> table. jGroups then uses that list to join the cluster.
>>
>>  The last part of the process is what the coordinator node (there is
>> always a coordinator that is elected in a jGroups cluster) does. Every time
>> the view (what jgroups calls the list of currently active cluster members)
>> changes the coordinator purges the database by removing all rows that do
>> not match known members. This handles pruning addresses of old/dead
>> instances.
>>
>>  This system has worked very well and effectively anyone running uPortal
>> 4.0.8 or later very likely has a coherent jGroups cluster doing ehcache
>> invalidation with zero extra work:
>> https://issues.jasig.org/browse/UP-3607
>>
>>  You can take a look at which caches uPortal uses jGroups for and how
>> they are configured:
>> https://github.com/Jasig/uPortal/blob/master/uportal-war/src/main/resources/properties/ehcache.xml
>>
>>  Note that uPortal does not do true replication anywhere. All of the
>> data cached in uPortal can be retrieved from the database or recalculated
>> very quickly so the caches are configured to do invalidation based
>> replication where when a key in a cache is replaced with a new value or
>> deleted a message is sent to the cluster that results in the other caches
>> removing the key so that the value is reloaded the next time it is needed.
>>
>>  As for overhead, the recommended approach is to have the
>> replicateAsynchronously flag set to true in which case ehcache batches up
>> replication messages and sends them in the background (very quickly but
>> still in batches).
>>
>>
>>
>>  For what you need in CAS tickets which I believe are ephemeral you
>> would need to set replicatePuts=true and replicateUpdatesViaCopy=true to
>> copy the actual data between nodes.
>>
>>
>>  As for performance, you configure replication behavior on a cache by
>> cache basis. There are a bunch of caches in uPortal that are not replicated
>> at all either because the data doesn't change or is local to the instance.
>>
>>
>>
>>  Something that might be worth investigating is a way to share the
>> jGroups Channel that gets created for ehcache in uPortal across all of the
>> portlets in Tomcat. I had wanted to look into that but never had time to. I
>> doubt it is a simple change but could be VERY valuable in providing cache
>> consistency for portlets as well as uPortal. The general concept I was
>> thinking of was to do the following (large chunk of work)
>>
>>    - Have uPortal initialize jGroups at start time (see the ehcache
>>    JGroupsCacheManagerPeerProvider)
>>    - Have uPortal expose the JChannel as an attribute in the
>>    PortletContext each portlet gets access to at init time
>>       - You probably need a tomcat context scoped wrapper around it that
>>       hides each context's messages from each other context
>>    - Write a custom Ehcache replication service (that likely extends the
>>    existing jgroups replication service) which has:
>>       - A spring listener that gets the PortletContext injected in, gets
>>       the jGroups channel and stores it in some context-global location
>>       - A version of jGroupsCacheManagerPeerProvider that uses the
>>       jGroups channel from the global location
>>       - This should fail-nice so that if uPortal doesn't provide a
>>       jChannel things just don't get replicated
>>
>>
>>  Hope that is a helpful wall of text :)
>>
>>
>>
>> On Wed, Feb 5, 2014 at 3:55 PM, James Wennmacher 
>> <[email protected]>wrote:
>>
>>>  Hi Eric.
>>>
>>> I am starting to configure uPortal for CAS clearpass for a customer.  In
>>> the CAS documentation (Replicating PGT using
>>> "proxyGrantingTicketStorageClass" and Distributed Caching in
>>> https://wiki.jasig.org/display/CASC/Configuring+the+Jasig+CAS+Client+for+Java+in+the+web.xml),
>>> they reference an example ehcache config (
>>> https://github.com/mmoayyed/cas/blob/master/cas-server-integration-ehcache/src/test/resources/ehcache-replicated.xml)
>>> that has an option for
>>> net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory for
>>> multi-cast replication.  I noticed you added jGroups UDP multicast
>>> replication into uPortal's ehcache.xml, then changed it to TCP later, which
>>> has the disadvantage of requiring explicit knowledge of host IP addresses.
>>>
>>> What are the reasons you switched from UDP multicast to TCP?  Just
>>> looking for background, and possible suggestions.  Also Do you have insight
>>> into using jGroups vs.
>>> net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory? Why I
>>> might use one vs. the other? I suspect you might have been down this road
>>> before ... I haven't started doing research on one approach vs. the other
>>> yet.
>>>
>>> I suspect if I have ehcache replication configured it applies to all
>>> caches, which will likely be a performance issue.  Do you have
>>> thoughts/experience on that RE UW?  I haven't looked yet into having only
>>> the CAS info replicated.  I suspect there is a way to do that.
>>>
>>> Thanks,
>>>
>>> --
>>> James Wennmacher - Unicon480.558.2420
>>>
>>>
>>   --
>>
>> You are currently subscribed to [email protected] as: 
>> [email protected]
>> To unsubscribe, change settings or access archives, see 
>> http://www.ja-sig.org/wiki/display/JSG/uportal-dev
>>
>>
>> --
>>
>> You are currently subscribed to [email protected] as: 
>> [email protected]
>>
>> To unsubscribe, change settings or access archives, see 
>> http://www.ja-sig.org/wiki/display/JSG/uportal-dev
>>
>>
>  --
>
> You are currently subscribed to [email protected] as: 
> [email protected]
> To unsubscribe, change settings or access archives, see 
> http://www.ja-sig.org/wiki/display/JSG/uportal-dev
>
>
> --
>
> You are currently subscribed to [email protected] as: 
> [email protected]
> To unsubscribe, change settings or access archives, see 
> http://www.ja-sig.org/wiki/display/JSG/uportal-dev
>
>

-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/uportal-dev

Re: [uportal-dev] EhCache and jgroups question

Reply via email to