[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-15 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-405080002
 
 
   Do you have any suggestions on how to solve this problem? And I don't seem 
to be sure what the real cause of the problem is. @nwangtw Thanks so much.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-11 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-404118550
 
 
   @nwangtw The Aurora scheduler shows that the job runs without problems, and 
the heron-ui also shows that the topology can run successfully.
   ![image 
39](https://user-images.githubusercontent.com/13564504/42565038-eb40fc4c-8534-11e8-9cdf-a4d4e5f4174c.png)
   ![image 
45](https://user-images.githubusercontent.com/13564504/42565054-f137ccca-8534-11e8-8ca7-7ddb1de40898.png)
   In addition, as for the data in the zookeeper can be successfully obtained 
in the single-node environment, but failed in the cluster environment.  Is it 
possible that because the node running **tmaster** cannot get the data from the 
node running **zookeeper** (if they are running on different nodes) in a 
cluster environment? If it is, do you have any ideas about the cause of this 
problem? Thanks.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-10 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-403926860
 
 
   @nwangtw Thanks for your reply. In order to verify whether it is caused by 
the abnormality of zookeeper, I tested the same code in a single-node `local` 
environment with `the same zookeeper version`. The results show that in the 
local single-node environment, the physical plan data can be **successfully** 
obtained and the topology was updated successfully. But I don't know why it 
doesn't work in the Aurora+Mesos cluster environment.
   
   What's more, by testing, I found that using the following code, the data can 
be successfully obtained in the single-node environment with zookeeper, but in 
the cluster environment, the `Could not getNodeData Exception` occurs.
   The code: I can get the `ISchedulerStateManager `instance to obtain 
`physical plan` from zookeeper in single-node. But It does not work in Aurora 
cluster environment.
   `stateManagerAdaptor = Runtime.schedulerStateManagerAdaptor(this.runtime);`
   The Exception:
   ```
   [2018-07-11 02:15:37 +0800] [WARNING] 
com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception 
processing future: java.lang.RuntimeException: Could not getNodeData  
   Exception in thread "rescheduler" java.lang.NullPointerException
at 
zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog.getPhysicalPlanInfo(AuroraHotEdgeSchedulerWithTxtLog.java:384)
at 
zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog.triggerSchedule(AuroraHotEdgeSchedulerWithTxtLog.java:296)
at 
zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog.access$400(AuroraHotEdgeSchedulerWithTxtLog.java:65)
at 
zyt.custom.my.scheduler.aurora.AuroraHotEdgeSchedulerWithTxtLog$2.run(AuroraHotEdgeSchedulerWithTxtLog.java:257)
at java.lang.Thread.run(Thread.java:748)
   ```
   So I confused that what is the difference between getting data from 
zookeeper in a single node or in the cluster environment?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-10 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-403768735
 
 
   @nwangtw Thanks a lot. Your understanding is corrent. I do want to get 
physical plan after the topology is submitted. I tried to get the physical plan 
after successfully submitting the topology for 3 minutes. At that time, the 
instances was running normally and registered in stmgrs(zookeeper). **But 
running updateTopology function still shows the same error as before.** I 
extracted the specific error message as shown below.
   ```
   [2018-07-10 16:52:00 +0800] [WARNING] 
com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception 
processing future: java.lang.RuntimeException: Failed to fetch data from path: 
/heron/pplans/AuroraMonitorSentenceWordCountTopology  
   [2018-07-10 16:52:00 +0800] [FINEST] 
org.apache.curator.utils.DefaultTracerDriver: Trace: 
DeleteBuilderImpl-Foreground - 7 ms  
   [2018-07-10 16:52:00 +0800] [INFO] 
com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the 
CuratorClient to: heron01:2181  
   [2018-07-10 16:52:00 +0800] [FINE] 
org.apache.curator.framework.imps.CuratorFrameworkImpl: Closing  
   [2018-07-10 16:52:00 +0800] [FINE] 
org.apache.curator.CuratorZookeeperClient: Closing  
   [2018-07-10 16:52:00 +0800] [FINE] org.apache.curator.ConnectionState: 
Closing  
   [2018-07-10 16:52:00 +0800] [INFO] 
com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the 
tunnel processes  
   Exception in thread "Thread-4" java.lang.NullPointerException
at 
com.twitter.heron.scheduler.UpdateTopologyManager.getTopology(UpdateTopologyManager.java:338)
at 
com.twitter.heron.scheduler.UpdateTopologyManager.updateTopology(UpdateTopologyManager.java:147)
at 
com.twitter.heron.scheduler.UpdateTopologyManager.updateTopology(UpdateTopologyManager.java:112)
at 
zyt.custom.my.scheduler.aurora.AuroraCustomScheduler.onUpdate(AuroraCustomScheduler.java:196)
at 
com.twitter.heron.scheduler.client.LibrarySchedulerClient.updateTopology(LibrarySchedulerClient.java:70)
at 
zyt.custom.my.scheduler.aurora.AuroraSchedulerController.triggerSchedule(AuroraSchedulerController.java:154)
at 
zyt.custom.my.scheduler.aurora.AuroraSchedulerThread.run(AuroraSchedulerThread.java:45)
   ```
   In Zookeeper, there will also be an Exception that the client disconnects 
after the connection is established. Is the failure to get physical plan 
because of  the disconnection of the zookeeper? What's more, Heron cluster uses 
zookeeper to run normally, but the client connection of this zookeeper failed. 
I cannot figure it out what's going on about it.
   The EndOfStreamException of Zookeeper is as follows. And even if I use the 
default AuroraScheduler, Zookeeper still has the EndOfStreamException. Does 
this mean that there is a problem with zookeeper? For more information on 
Zookeeper EndOfStreamException, there is the link: #2955 
   ```
   2018-07-10 16:52:10,133 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@687] - 
Established session 0x164828618e9002c with negotiated timeout 3 for client 
/218.195.228.28:44428
   2018-07-10 16:52:10,137 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x164828618e9002c type:create cxid:0x5b4473bc zxid:0x1685 
txntype:-1 reqpath:n/a Error 
Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology 
Error:KeeperErrorCode = NodeExists for 
/heron/tmasters/AuroraMonitorSentenceWordCountTopology
   2018-07-10 16:52:11,139 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x164828618e9002c type:create cxid:0x5b4473be zxid:0x1686 
txntype:-1 reqpath:n/a Error 
Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology 
Error:KeeperErrorCode = NodeExists for 
/heron/tmasters/AuroraMonitorSentenceWordCountTopology
   2018-07-10 16:52:12,143 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x164828618e9002c type:create cxid:0x5b4473c0 zxid:0x1687 
txntype:-1 reqpath:n/a Error 
Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology 
Error:KeeperErrorCode = NodeExists for 
/heron/tmasters/AuroraMonitorSentenceWordCountTopology
   2018-07-10 16:52:13,209 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@648] - Got user-level KeeperException when 
processing sessionid:0x164828618e9002c type:create cxid:0x5b4473c2 zxid:0x1688 
txntype:-1 reqpath:n/a Error 
Path:/heron/tmasters/AuroraMonitorSentenceWordCountTopology 
Error:KeeperErrorCode = NodeExists for 
/heron/tmasters/AuroraMonitorSentenceWordCountTopology
   2018-07-10 16:52:14,220 [myid:] - INFO  [ProcessThread(sid:0 
cport:21

[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-09 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-403420363
 
 
   I have updated the question in detail. Does anyone have any ideas?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-06 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-403124959
 
 
   @nwangtw  Thanks for your commit. Let's me introduce my question in detail. 
First, I created a CustomScheduler that have been deployed on Heron. In the 
CustomScheduler, besides the main thread that heron runs, I created a new 
thread after created job by Heron. The new created Thread is responsible for 
running my programs. This problem is happening in the thread I created.
   Second, I can submit topologies and activate them normally using the 
CustomScheduler.  So that means my CustomScheduler is deployed correctly and 
the main thread is right.
   For your curious, my submit commands is:
   `heron submit aurora/yitian/devel --config-path ~/.heron/conf 
~/aurora-topolgoies/heron-with-dependencies.jar 
zyt.custom.topology.aurora.SentenceWordCountTopology SentenceWordCountTopology 
--deploy-deactivated --verbose`
   As for this problem happened in the new thread that I created as above 
mentioned. In this new thread, I wanted to update a topology when it was 
running by using UpdateTopologyManager with a new PackingPlan. I want to make 
the effect of this method just like using the update command to update the 
topology. So I attempted to create ISchedulerClient and new runtime Config to 
update the topology by invoking ISchedulerClient.updateTopology function. Then 
this problem happened. 
   I sorry for I couldn't give more information about the WARNING, because it 
is the only information I can find out. But here is my code:
   ```
   public void doSchedule(PackingPlan packingPlan) {
   String stateMgrClass = Context.stateManagerClass(this.config); // 
get state manager instance
   IStateManager stateMgr = null;
   try {
   stateMgr = ReflectionUtils.newInstance(stateMgrClass);
   FileUtils.writeToFile(filename, "Create IStateManager object 
success...");
   } catch (ClassNotFoundException | InstantiationException | 
IllegalAccessException e) {
   e.printStackTrace();
   }
   
   try {
   stateMgr.initialize(this.config);
   SchedulerStateManagerAdaptor stateManagerAdaptor = new 
SchedulerStateManagerAdaptor(stateMgr, 5000);

   // Then created the new packingplan. It is omitted here.
   PackingPlans.PackingPlan currentPackingPlan = 
serializer.toProto(packingPlan);
   PackingPlans.PackingPlan proposedPackingPlan = 
serializer.toProto(newPackingPlan);
   
   // build updatetopologyrequest object to update topogolgy
   Scheduler.UpdateTopologyRequest updateTopologyRequest =
   Scheduler.UpdateTopologyRequest.newBuilder()
   .setCurrentPackingPlan(currentPackingPlan)
   .setProposedPackingPlan(proposedPackingPlan)
   .build();
   
   // create runtime config using statemanageradaptor, this adaptor 
not included topology information
   // just add topologyname and adaptor to build schedulerCLient 
object
   Config primaryRuntime = 
LauncherUtils.getInstance().createPrimaryRuntime(topology);
   Config newRuntime = Config.newBuilder()
   .putAll(primaryRuntime)
   .put(Key.TOPOLOGY_NAME, Context.topologyName(config))
   .put(Key.SCHEDULER_STATE_MANAGER_ADAPTOR, 
stateManagerAdaptor)
   .build();
  
   // Create a ISchedulerClient basing on the config
   ISchedulerClient schedulerClient = 
getSchedulerClient(newRuntime);
   
// In fact, I can't get the physicalplan right here by 
testing. I don't know why.
// TopologyAPI.Topology topology3 = 
stateManagerAdaptor.getPhysicalPlan(topologyName).getTopology(); 
   
   if (!schedulerClient.updateTopology(updateTopologyRequest)) {
   throw new TopologyRuntimeManagementException(String.format(
   "Failed to update " + topology.getName() + " with 
Scheduler, updateTopologyRequest="
   + updateTopologyRequest));
   }

   } finally {
   // close zookeeper client connnection
   SysUtils.closeIgnoringExceptions(stateMgr);
   }
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to fetch data from Zookeeper path in Heron Cluster

2018-07-06 Thread GitBox
Yitian-Zhang commented on issue #2952: SchedulerStateManagerAdaptor failed to 
fetch data from Zookeeper path in Heron Cluster
URL: 
https://github.com/apache/incubator-heron/issues/2952#issuecomment-403100859
 
 
   Does anyone know how to solve this problem? Or what is the cause of the 
problem?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services