[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-405080002 Do you have any suggestions on how to solve this problem? And I don't seem to be sure what the real cause of the problem is. @nwangtw Thanks so much. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-404118550 @nwangtw The Aurora scheduler shows that the job runs without problems, and the heron-ui also shows that the topology can run successfully. ![image 39](https://user-images.githubusercontent.com/13564504/42565038-eb40fc4c-8534-11e8-9cdf-a4d4e5f4174c.png) ![image 45](https://user-images.githubusercontent.com/13564504/42565054-f137ccca-8534-11e8-8ca7-7ddb1de40898.png) In addition, as for the data in the zookeeper can be successfully obtained in the single-node environment, but failed in the cluster environment. Is it possible that because the node running **tmaster** cannot get the data from the node running **zookeeper** (if they are running on different nodes) in a cluster environment? If it is, do you have any ideas about the cause of this problem? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-403926860 @nwangtw Thanks for your reply. In order to verify whether it is caused by the abnormality of zookeeper, I tested the same code in a single-node `local` environment with `the same zookeeper version`. The results show that in the local single-node environment, the physical plan data can be **successfully** obtained and the topology was updated successfully. But I don't know why it doesn't work in the Aurora+Mesos cluster environment. What's more, by testing, I found that using the following code, the data can be successfully obtained in the single-node environment with zookeeper, but in the cluster environment, the `Could not getNodeData Exception` occurs. The code: I can get the `ISchedulerStateManager `instance to obtain `physical plan` from zookeeper in single-node. But It does not work in Aurora cluster environment. `stateManagerAdaptor = Runtime.schedulerStateManagerAdaptor(this.runtime);` The Exception: ``` [2018-07-11 02:15:37 +0800] [WARNING] com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception processing future: java.lang.RuntimeException: Could not getNodeData Exception in thread "rescheduler" java.lang.NullPointerException at zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog.getPhysicalPlanInfo(AuroraHotEdgeSchedulerWithTxtLog.java:384) at zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog.triggerSchedule(AuroraHotEdgeSchedulerWithTxtLog.java:296) at zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog.access$400(AuroraHotEdgeSchedulerWithTxtLog.java:65) at zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog$2.run(AuroraHotEdgeSchedulerWithTxtLog.java:257) at java.lang.Thread.run(Thread.java:748) ``` So I confused that what is the difference between getting data from zookeeper in a single node or in the cluster environment? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-403768735 @nwangtw Thanks a lot. Your understanding is corrent. I do want to get physical plan after the topology is submitted. I tried to get the physical plan after successfully submitting the topology for 3 minutes. At that time, the instances was running normally and registered in stmgrs(zookeeper). **But running updateTopology function still shows the same error as before.** I extracted the specific error message as shown below. ``` [2018-07-10 16:52:00 +0800] [WARNING] com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception processing future: java.lang.RuntimeException: Failed to fetch data from path: /heron/pplans/AuroraMonitorSentenceWordCountTopology [2018-07-10 16:52:00 +0800] [FINEST] org.apache.curator.utils.DefaultTracerDriver: Trace: DeleteBuilderImpl-Foreground - 7 ms [2018-07-10 16:52:00 +0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181 [2018-07-10 16:52:00 +0800] [FINE] org.apache.curator.framework.imps.CuratorFrameworkImpl: Closing [2018-07-10 16:52:00 +0800] [FINE] org.apache.curator.CuratorZookeeperClient: Closing [2018-07-10 16:52:00 +0800] [FINE] org.apache.curator.ConnectionState: Closing [2018-07-10 16:52:00 +0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes Exception in thread "Thread-4" java.lang.NullPointerException at com.twitter.heron.scheduler.UpdateTopologyManager.getTopology(UpdateTopologyManager.java:338) at com.twitter.heron.scheduler.UpdateTopologyManager.updateTopology(UpdateTopologyManager.java:147) at com.twitter.heron.scheduler.UpdateTopologyManager.updateTopology(UpdateTopologyManager.java:112) at zyt.custom.my.scheduler.aurora.AuroraCustomScheduler.onUpdate(AuroraCustomScheduler.java:196) at com.twitter.heron.scheduler.client.LibrarySchedulerClient.updateTopology(LibrarySchedulerClient.java:70) at zyt.custom.my.scheduler.aurora.AuroraSchedulerController.triggerSchedule(AuroraSchedulerController.java:154) at zyt.custom.my.scheduler.aurora.AuroraSchedulerThread.run(AuroraSchedulerThread.java:45) ``` In Zookeeper, there will also be an Exception that the client disconnects after the connection is established. Is the failure to get physical plan because of the disconnection of the zookeeper? What's more, Heron cluster uses zookeeper to run normally, but the client connection of this zookeeper failed. I cannot figure it out what's going on about it. The EndOfStreamException of Zookeeper is as follows. And even if I use the default AuroraScheduler, Zookeeper still has the EndOfStreamException. Does this mean that there is a problem with zookeeper? For more information on Zookeeper EndOfStreamException, there is the link: #2955 ``` 2018-07-10 16:52:10,133 [myid:] - INFO [SyncThread:0:ZooKeeperServer@687] - Established session 0x164828618e9002c with negotiated timeout 3 for client /218.195.228.28:44428 2018-07-10 16:52:10,137 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x164828618e9002c type:create cxid:0x5b4473bc zxid:0x1685 txntype:-1 reqpath:n/a Error Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology Error:KeeperErrorCode = NodeExists for /heron/tmasters/AuroraMonitorSentenceWordCountTopology 2018-07-10 16:52:11,139 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x164828618e9002c type:create cxid:0x5b4473be zxid:0x1686 txntype:-1 reqpath:n/a Error Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology Error:KeeperErrorCode = NodeExists for /heron/tmasters/AuroraMonitorSentenceWordCountTopology 2018-07-10 16:52:12,143 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x164828618e9002c type:create cxid:0x5b4473c0 zxid:0x1687 txntype:-1 reqpath:n/a Error Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology Error:KeeperErrorCode = NodeExists for /heron/tmasters/AuroraMonitorSentenceWordCountTopology 2018-07-10 16:52:13,209 [myid:] - INFO [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when processing sessionid:0x164828618e9002c type:create cxid:0x5b4473c2 zxid:0x1688 txntype:-1 reqpath:n/a Error Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology Error:KeeperErrorCode = NodeExists for /heron/tmasters/AuroraMonitorSentenceWordCountTopology 2018-07-10 16:52:14,220 [myid:] - INFO [ProcessThread(sid:0 cport:21
[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-403420363 I have updated the question in detail. Does anyone have any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-403124959 @nwangtw Thanks for your commit. Let's me introduce my question in detail. First, I created a CustomScheduler that have been deployed on Heron. In the CustomScheduler, besides the main thread that heron runs, I created a new thread after created job by Heron. The new created Thread is responsible for running my programs. This problem is happening in the thread I created. Second, I can submit topologies and activate them normally using the CustomScheduler. So that means my CustomScheduler is deployed correctly and the main thread is right. For your curious, my submit commands is: `heron submit aurora/yitian/devel --config-path ~/.heron/conf ~/aurora-topolgoies/heron-with-dependencies.jar zyt.custom.topology.aurora.SentenceWordCountTopology SentenceWordCountTopology --deploy-deactivated --verbose` As for this problem happened in the new thread that I created as above mentioned. In this new thread, I wanted to update a topology when it was running by using UpdateTopologyManager with a new PackingPlan. I want to make the effect of this method just like using the update command to update the topology. So I attempted to create ISchedulerClient and new runtime Config to update the topology by invoking ISchedulerClient.updateTopology function. Then this problem happened. I sorry for I couldn't give more information about the WARNING, because it is the only information I can find out. But here is my code: ``` public void doSchedule(PackingPlan packingPlan) { String stateMgrClass = Context.stateManagerClass(this.config); // get state manager instance IStateManager stateMgr = null; try { stateMgr = ReflectionUtils.newInstance(stateMgrClass); FileUtils.writeToFile(filename, "Create IStateManager object success..."); } catch (ClassNotFoundException | InstantiationException | IllegalAccessException e) { e.printStackTrace(); } try { stateMgr.initialize(this.config); SchedulerStateManagerAdaptor stateManagerAdaptor = new SchedulerStateManagerAdaptor(stateMgr, 5000); // Then created the new packingplan. It is omitted here. PackingPlans.PackingPlan currentPackingPlan = serializer.toProto(packingPlan); PackingPlans.PackingPlan proposedPackingPlan = serializer.toProto(newPackingPlan); // build updatetopologyrequest object to update topogolgy Scheduler.UpdateTopologyRequest updateTopologyRequest = Scheduler.UpdateTopologyRequest.newBuilder() .setCurrentPackingPlan(currentPackingPlan) .setProposedPackingPlan(proposedPackingPlan) .build(); // create runtime config using statemanageradaptor, this adaptor not included topology information // just add topologyname and adaptor to build schedulerCLient object Config primaryRuntime = LauncherUtils.getInstance().createPrimaryRuntime(topology); Config newRuntime = Config.newBuilder() .putAll(primaryRuntime) .put(Key.TOPOLOGY_NAME, Context.topologyName(config)) .put(Key.SCHEDULER_STATE_MANAGER_ADAPTOR, stateManagerAdaptor) .build(); // Create a ISchedulerClient basing on the config ISchedulerClient schedulerClient = getSchedulerClient(newRuntime); // In fact, I can't get the physicalplan right here by testing. I don't know why. // TopologyAPI.Topology topology3 = stateManagerAdaptor.getPhysicalPlan(topologyName).getTopology(); if (!schedulerClient.updateTopology(updateTopologyRequest)) { throw new TopologyRuntimeManagementException(String.format( "Failed to update " + topology.getName() + " with Scheduler, updateTopologyRequest=" + updateTopologyRequest)); } } finally { // close zookeeper client connnection SysUtils.closeIgnoringExceptions(stateMgr); } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster URL: https://github.com/apache/incubator-heron/issues/2952#issuecomment-403100859 Does anyone know how to solve this problem? Or what is the cause of the problem? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services