[ http://jira.jboss.com/jira/browse/JBAS-754?page=comments#action_12312673 
]
     
Sacha Labourey commented on JBAS-754:
-------------------------------------

Logged In: YES 
user_id=95900

JBossHA node naming is now di-associated from JavaGroups 
naming. Node name can be explicitely set but, by default, a 
name is created at startup by JBoss (localIP:JNDI_PORT as a 
first strategy). This should fix some of the singleton issues 
seen.

Furthermore the DRM.add method know makes synchronous 
calls over the cluster to avoid DRM.isMasterReplica consider 
itself as a master because state with other nodes is not yet 
synched. This should solved the remaining singleton issues.

As part of these changes a farming bug has been fixed which 
was causing already-deployed apps to be re-deployed on 
running nodes when starting a new node (most frequent 
when having at least 3 nodes).

Please TEST this new version and provide feedback if 
something is broken by this change.


> DistributedReplicantManager.isMasterReplica(String) false +
> -----------------------------------------------------------
>
>          Key: JBAS-754
>          URL: http://jira.jboss.com/jira/browse/JBAS-754
>      Project: JBoss Application Server
>         Type: Bug
>   Components: Clustering
>     Versions: JBossAS-3.2.6 Final
>     Reporter: SourceForge User
>     Assignee: Sacha Labourey

>
>
> SourceForge Submitter: slaboure .
> There is a race condition i the 
> DistributedReplicantManager.isMasterReplica(String) that 
> shows up when this 
> method is called from within a notifyKeyListeners as 
> shown by this stack trace:
> Thread "main"@65 status: RUNNING
> - isMasterReplica():437, 
> org.jboss.ha.framework.server.DistributedReplicantManag
> erImpl
> - isDRMMasterReplica():234, 
> org.jboss.ha.jmx.HAServiceMBeanSupport
> - partitionTopologyChanged():103, 
> org.jboss.ha.singleton.HASingletonSupport
> - replicantsChanged():197, 
> org.jboss.ha.jmx.HAServiceMBeanSupport$1
> - notifyKeyListeners():675, 
> org.jboss.ha.framework.server.DistributedReplicantManag
> erImpl
> - add():326, 
> org.jboss.ha.framework.server.DistributedReplicantManag
> erImpl
> - registerDRMListener():204, 
> org.jboss.ha.jmx.HAServiceMBeanSupport
> - startService():144, 
> org.jboss.ha.jmx.HAServiceMBeanSupport
> This is due the the choice to return true when the key in 
> question is in the
> localReplicants table, but not the replicants table:
>     public boolean isMasterReplica (String key)
>     {
>        if (!localReplicants.containsKey (key))
>           return false;
>        Vector allNodes = this.partition.getCurrentView ();
>        HashMap repForKey = (HashMap)replicants.get
> (key);
>        if (repForKey==null)
>           return true; ????
> This seems to be an ambiguous condition as this 
> condition exists for a node that 
> calls add and when the state has not synched or has 
> failed to synch. Another 
> problem I'm seeing at least in the context of the 
> singleton service is that the 
> notion of the master node is unstable. Here is the output 
> from one of 3 nodes 
> running the singleton service starting with the addition 
> of the final node shown 
> as view 2.
> 15:35:44,637 INFO  [Server] JBoss (MX MicroKernel) 
> [3.2.2RC3 (build: 
> CVSTag=Branch_3_2 date=200307312219)] Started in 
> 5s:948ms
> 15:36:27,719 INFO  [DefaultPartition] New cluster view: 
> 2 ([lamia:32947, 
> 172.17.66.54:2821, ironmaiden:51770] delta: 1)
> 15:36:27,749 INFO  [DefaultPartition:ReplicantManager] 
> Dead members: 0
> 15:37:13,555 INFO  [DefaultPartition] New cluster view 
> (id: 3, delta: -1) : 
> [172.17.66.54:2821, ironmaiden:51770]
> 15:37:13,575 INFO  [DefaultPartition:ReplicantManager] 
> Dead members: 1
> 15:38:13,321 INFO  [HASingletonMBeanExample] Notified 
> to start as singleton
> 15:38:13,321 INFO  [DefaultPartition] New cluster view 
> (id: 4, delta: 1) : 
> [172.17.66.54:2821, ironmaiden:51770, lamia:32949]
> 15:38:13,331 INFO  [DefaultPartition:ReplicantManager] 
> Dead members: 0
> 15:38:13,361 INFO  [HASingletonMBeanExample] Notified 
> to stop as singleton
> 15:39:13,447 INFO  [HASingletonMBeanExample] Notified 
> to start as singleton
> 15:39:13,457 INFO  [HASingletonMBeanExample] Notified 
> to stop as singleton
> With view 3 the orginal node and singleton is killed and 
> the node for which the 
> console output corresponds(172.17.66.54) is selected as 
> the singleton. When the 
> third node is started again there is some thrashing due 
> to the existing 2 nodes 
> both selecting themselves as the singleton and telling 
> the other to stop and it 
> appears that there is no singleton choosen. The problem 
> seems to be inconsistent 
>   matching of member names. Once only knows it IP 
> while the other node knows the 
> hostnames. Here is the console view of the second node 
> showing the hostnames and 
> its thrashing:
> 15:25:21,023 INFO  [Server] JBoss (MX MicroKernel) 
> [3.2.2RC3 (build: 
> CVSTag=Branch_3_2 date=200307312219)] Started in 
> 13s:597ms
> 15:26:05,562 INFO  [DefaultPartition] New cluster view: 
> 3 ([succubus:2821, 
> ironmaiden:51770] delta: -1)
> 15:26:05,573 INFO  [DefaultPartition:ReplicantManager] 
> Dead members: 1
> 15:27:05,506 INFO  [HASingletonMBeanExample] Notified 
> to start as singleton
> 15:27:05,509 INFO  [DefaultPartition] New cluster view: 
> 4 ([succubus:2821, 
> ironmaiden:51770, lamia:32949] delta: 1)
> 15:27:05,513 INFO  [DefaultPartition:ReplicantManager] 
> Dead members: 0
> 15:27:05,531 INFO  [HASingletonMBeanExample] Notified 
> to stop as singleton
> 15:28:05,520 INFO  [HASingletonMBeanExample] Notified 
> to start as singleton
> 15:28:05,526 INFO  [HASingletonMBeanExample] Notified 
> to stop as singleton
> Its not clear that the 
> DistributedReplicantManager.isMasterReplica was 
> designed 
> to be used for the selection of a singleton node, but if it 
> is, the logic needs 
> to be firmed up. If not, the singleton service needs to be 
> built on something else.
> -- 
> xxxxxxxxxxxxxxxxxxxxxxxx
> Scott Stark
> Chief Technology Officer
> JBoss Group, LLC
> xxxxxxxxxxxxxxxxxxxxxxxx

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://jira.jboss.com/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
JBoss-Development mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jboss-development

Reply via email to