Hi Varun, which helix version you are using?

Thanks,
Zhen
________________________________
From: Varun Sharma [[email protected]]
Sent: Monday, November 17, 2014 4:10 PM
To: [email protected]
Subject: Helix issue - External View out of sync

Hi,

I am seeing the following issue for many partitions in helix using a simple 
Online->Offline state model factory. The external view says that the partition 
has been assigned to 3 hosts. However, when I look at the hosts only 1 of them 
executed the OFFLINE --> ONLINE transition.

On the hosts, that did not execute the transition, I see the following:

2014-11-13 09:29:54,394 [pool-3-thread-11] 
(HelixStateTransitionHandler.java:206) WARN  Force CurrentState on Zk to be 
stateModel's CurrentState. partitionKey: 490, currentState: ONLINE, message: 
12690ce8-8098-46b1-a93d-279604f0e3db, {CREATE_TIMESTAMP=1415870993349, 
ClusterEventName=idealStateChange, EXECUTE_START_TIMESTAMP=1415870994382, 
EXE_SESSION_ID=149a14ada0d0013, FROM_STATE=OFFLINE, 
MSG_ID=12690ce8-8098-46b1-a93d-279604f0e3db, MSG_STATE=read, 
MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=490, READ_TIMESTAMP=1415870993787, 
RESOURCE_NAME=$terrapin$data$meta_pin_join$1415866960201, 
SRC_NAME=hdfsterrapin-a-namenode001_9090, SRC_SESSION_ID=147a7beb2dd8ed7, 
STATE_MODEL_DEF=OnlineOffline, STATE_MODEL_FACTORY_NAME=DEFAULT, 
TGT_NAME=hdfsterrapin-a-datanode-ba3ad256, TGT_SESSION_ID=149a14ada0d0013, 
TO_STATE=ONLINE}{}{}

When I grep the message ID in the controller, I see the following:


2014-11-14 09:34:56,265 [StatusDumpTimerTask] (ZKPathDataDumpTask.java:155) 
INFO  {

  "id" : "149a14ada0d0013__$terrapin$data$meta_pin_join$1415866960201",

  "mapFields" : {

    "HELIX_ERROR     20141113-092954.000419 STATE_TRANSITION 
c1193025-b416-49d7-adc2-10afe2389141" : {

      "AdditionalInfo" : "Message execution failed. msgId: 
12690ce8-8098-46b1-a93d-279604f0e3db, errorMsg: 
org.apache.helix.messaging.handling.HelixStateTransitionHandler$HelixStateMismatchException:
 Current state of stateModel does not match the fromState in Message, Current 
State:ONLINE, message expected:OFFLINE, partition: 490, from: 
hdfsterrapin-a-namenode001_9090, to: hdfsterrapin-a-datanode-ba3ad256",

      "Class" : "class 
org.apache.helix.messaging.handling.HelixStateTransitionHandler",

      "MSG_ID" : "12690ce8-8098-46b1-a93d-279604f0e3db",

      "Message state" : "READ"

    },


What could be causing this - when I restart the node, the error disappears 
(meaning that the node is able to perform the state transition). What could be 
causing this state mismatch ?


Thanks

Varun

Reply via email to