might be some race conditions. need to double check this. On Feb 15, 2015 11:38 PM, "Steph Meslin-Weber" <[email protected]> wrote:
> Hi Kishore, > > That's right, the node doesn't process any state transitions. They should > have been logged in the first set of logs had they occurred. > > Thanks, > Steph > On 16 Feb 2015 07:28, "kishore g" <[email protected]> wrote: > >> Hi Steph, >> >> When the NPE occurs, do you get the state transition callbacks? >> >> thanks, >> Kishore G >> >> >> >> On Sun, Feb 15, 2015 at 11:23 PM, Steph Meslin-Weber < >> [email protected]> wrote: >> >>> Unfortunately it appears that when the NPE occurs, dropping the >>> participant no longer cleans up the related INSTANCE node. Perhaps some >>> state is lost? >>> >>> Thanks, >>> Steph >>> On 16 Feb 2015 06:52, "Zhen Zhang" <[email protected]> wrote: >>> >>>> I think the NPE is not fatal. It happens when no message handler >>>> factory is registered for this message type. The message will not be >>>> removed and remain in UNREAD state. Later when the message handler factory >>>> is registered via: >>>> DefaultMessagingService#registerMessageHandlerFactory, we will send a >>>> NOP message, which will in turn trigger HelixTaskExecutor to process all >>>> UNREAD messages. We should definitely fix this by logging a warning message >>>> instead of throwing an NPE. >>>> >>>> Thanks, >>>> Jason >>>> >>>> >>>> On Sun, Feb 15, 2015 at 7:30 PM, kishore g <[email protected]> wrote: >>>> >>>>> Controller assuming the state transition occurred is even more >>>>> dangerous. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, Feb 15, 2015 at 7:18 PM, [email protected] <[email protected]> >>>>> wrote: >>>>> >>>>>> In my experience it was fatal. The callback would jot be called but >>>>>> the >>>>>> controller would somehow assume the state transition occurred. >>>>>> On Feb 15, 2015 7:13 PM, "kishore g" <[email protected]> wrote: >>>>>> >>>>>> > Thanks Vlad. That explains the problem. That also explains how >>>>>> adding >>>>>> > sleep of 3seconds work. >>>>>> > >>>>>> > Jason, is this exception fatal?. Will the message be processed >>>>>> again after >>>>>> > the handler is added. >>>>>> > >>>>>> > thanks, >>>>>> > Kishore G >>>>>> > >>>>>> > On Sun, Feb 15, 2015 at 6:41 PM, [email protected] < >>>>>> [email protected]> >>>>>> > wrote: >>>>>> > >>>>>> >> https://issues.apache.org/jira/browse/HELIX-548 >>>>>> >> On Feb 15, 2015 6:38 PM, "kishore g" <[email protected]> wrote: >>>>>> >> >>>>>> >> > Hi Vlad, >>>>>> >> > >>>>>> >> > Was there any jira associated with it? >>>>>> >> > >>>>>> >> > thanks. >>>>>> >> > Kishore G >>>>>> >> > >>>>>> >> > On Sun, Feb 15, 2015 at 4:36 PM, [email protected] < >>>>>> [email protected]> >>>>>> >> > wrote: >>>>>> >> > >>>>>> >> >> Looks like the same problem we encountered recently. >>>>>> >> >> >>>>>> >> >> Regards, >>>>>> >> >> Vlad >>>>>> >> >> On Feb 15, 2015 4:35 PM, "kishore g" <[email protected]> >>>>>> wrote: >>>>>> >> >> >>>>>> >> >> > Steph described this problem on IRC. >>>>>> >> >> > >>>>>> >> >> > He is using 0.7.1. On connecting to cluster he gets this NPE >>>>>> >> >> > >>>>>> >> >> > http://pastebin.com/YE3fwK5i >>>>>> >> >> > >>>>>> >> >> > java.lang.NullPointerException >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.createMessageHandler(HelixTaskExecutor.java:661) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.messaging.handling.HelixTaskExecutor.onMessage(HelixTaskExecutor.java:581) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.invoke(ZkCallbackHandler.java:202) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.init(ZkCallbackHandler.java:336) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkCallbackHandler.<init>(ZkCallbackHandler.java:130) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addListener(ZkHelixConnection.java:533) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkHelixConnection.addMessageListener(ZkHelixConnection.java:267) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.setupMsgHandler(ZkHelixParticipant.java:347) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.init(ZkHelixParticipant.java:383) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.onConnected(ZkHelixParticipant.java:401) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> org.apache.helix.manager.zk.ZkHelixParticipant.start(ZkHelixParticipant.java:428) >>>>>> >> >> > at >>>>>> >> >> > >>>>>> >> >> >>>>>> >> >>>>>> com.example.ProtostuffServerNode.spinUpParticipant(ProtostuffServerNode.java:134) >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > Here is his connection code. >>>>>> >> >> > >>>>>> >> >> > http://pastebin.com/QRfVU1tc >>>>>> >> >> > >>>>>> >> >> > private static HelixParticipant spinUpParticipant(HelixAdmin >>>>>> admin, >>>>>> >> >> > ParticipantId participantId) { >>>>>> >> >> > LOGGER.info("Starting up "+participantId); >>>>>> >> >> > HelixConnection connection = new >>>>>> ZkHelixConnection( >>>>>> >> >> > ZK_ADDRESS); >>>>>> >> >> > connection.connect(); >>>>>> >> >> > HelixParticipant participant = connection. >>>>>> >> >> > createParticipant(CLUSTER_ID, participantId); >>>>>> >> >> > StateMachineEngine stateMach = participant. >>>>>> >> >> > getStateMachineEngine(); >>>>>> >> >> > >>>>>> >> >> > >>>>>> StateTransitionHandlerFactory<LocalTransitionHandler> >>>>>> >> >> > transitionHandlerFactory = new OnlineOfflineHandlerFactory(); >>>>>> >> >> > >>>>>> stateMach.registerStateModelFactory(STATE_MODEL_NAME, >>>>>> >> >> > transitionHandlerFactory); >>>>>> >> >> > participant.start(); >>>>>> >> >> > >>>>>> >> >> > admin.enableInstance(CLUSTER_NAME, >>>>>> >> >> participantId.toString( >>>>>> >> >> > ), true); >>>>>> >> >> > >>>>>> >> >> > return participant; >>>>>> >> >> > } >>>>>> >> >> > >>>>>> >> >> > Adding 3s sleep after registerStateModelFactory works. Any >>>>>> idea what >>>>>> >> is >>>>>> >> >> > happening. >>>>>> >> >> > >>>>>> >> >> > thanks, >>>>>> >> >> > Kishore G >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> > >>>>>> >> >> >>>>>> >> > >>>>>> >> > >>>>>> >> >>>>>> > >>>>>> > >>>>>> >>>>> >>>>> >>>> >>
