Re: Issue when recreating EMR cluster with HBase data on S3

Xiaoxiang Yu Thu, 27 Jun 2019 03:51:18 -0700

Hi Andras,
   In fact, we currently have no way to backup or restore the streaming 
metadata which related to replica set/assignment etc. 
   I think these metadata are volatile, such as hostname of each worker may be 
different in two cluster. But if you find backup/restore is really useful for 
streaming metadata. Please submit a JIRA.



-----------------
-----------------
Best wishes to you ! 
From ：Xiaoxiang Yu

At 2019-06-27 17:54:08, "Andras Nagy" <[email protected]> wrote:

OK, this worked, so I could proceed one step. I disabled all HBase tables, 
manually altered them so the coprocessor locations point to the new HDFS 
cluster, and re-enabled them. After this, there are no errors in the 
regionserver's logs, and Kylin starts up, so this seems fine. (Interestingly, 
the DeployCoprocessorCLI did assemble the correct HDFS URL, but could not alter 
the table definitions, so after running DeployCoprocessorCLI, the table 
definitions have not changed. This is on HBase version 1.4.9.)


However when I try to query the existing cubes, I get a failure with a 
NullPointerException at 
org.apache.kylin.stream.coordinator.assign.AssignmentsCache.getReplicaSetsByCube(AssignmentsCache.java:61).
 Just quickly looking at it, it seems like these cube assignments come from 
Zookeeper, and I'm missing them. Since I'm now running on a completely new EMR 
cluster (with new Zookeeper), I wonder if there is some persistent state in 
Zookeeper that should also be backed up and restored. 


(This deployment used hdfs-working-dir on HDFS, so before terminating the old 
cluster I backed up the hdfs-working-dir and have restored it in the new 
cluster; but nothing from Zookeeper.)


Thanks in advance for any pointers about this.


On Thu, Jun 27, 2019 at 10:30 AM Andras Nagy <[email protected]> 
wrote:

Checked the table definition in HBase, and that's what explicitely references 
the coprocessor location on the old cluster. I'll update that and let you know.



On Thu, Jun 27, 2019 at 10:26 AM Andras Nagy <[email protected]> 
wrote:

Actually as I noticed, it's not the corpocessor that's failing, but HBase when 
trying to load the coprocessor itself from HDFS (form a reference somewhere 
that still points to the old HDFS namenode).



On Thu, Jun 27, 2019 at 10:19 AM Andras Nagy <[email protected]> 
wrote:

Hi ShaoFeng,


After disabling the "KYLIN_*" tables (but not 'kylin_metadata') the 
RegionServers could indeed start up and the coprocessor refresh succeeded.


But after re-enabling those tables again, the issue continues, and again the 
RegionServers fail by trying to connect to the old master node. What I noticed 
now from the stacktrace is that the coprocessor is actually trying to connect 
to the old HDFS namenode on port 8020 (and not to the HBase master).


Best regards,
Andras




On Thu, Jun 27, 2019 at 4:21 AM ShaoFeng Shi <[email protected]> wrote:

I see; Can you try this way: disable all "KYLIN_*" tables in HBase console, and 
then see whether the region servers can start.


If they can start, then run the above command to refresh the coprocessor.


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]









Andras Nagy <[email protected]> 于2019年6月26日周三 下午10:57写道：

Hi ShaoFeng,

Yes, but it fails as well. Actually it fails because the RegionServers are not 
running (as they fail when starting up).
Best regards,
Andras


On Wed, Jun 26, 2019 at 4:42 PM ShaoFeng Shi <[email protected]> wrote:

Hi Andras,


Did you try this? 
https://kylin.apache.org/docs/howto/howto_update_coprocessor.html


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]









Andras Nagy <[email protected]> 于2019年6月26日周三 下午10:05写道：

Greetings,



I'm testing a setup where HBase is running on AWS EMR and HBase data is stored 
on S3. It's working fine so far, but when I terminate the EMR cluster and 
recreate it with the same S3 location for HBase, HBase won't start up properly. 
Before shutting down, I did execute the disable_all_tables.sh script to flush 
HBase state to S3.

Actually the issue is that RegionServers don't start up. Maybe I'm missing 
something in the EMR setup and not in Kylin setup, but the exceptions I get in 
the RegionServer's log point at Kylin's CubeVisitService coprocessor, which is 
still trying to connect to the old HBase master on the old EMR cluster's master 
node and fails with: "coprocessor.CoprocessorHost: The coprocessor 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService 
threw java.net.NoRouteToHostException: No Route to Host from  
ip-172-35-5-11/172.35.5.11 to ip-172-35-7-125.us-west-2.compute.internal:8020 
failed on socket timeout exception: java.net.NoRouteToHostException: No route 
to host; "


(Here, ip-172-35-7-125 was the old clusters' master node.)

Does anyone have any idea what I'm doing wrong here?
The HBase master node's address seems to be cached somewhere, and when starting 
up HBase on the new cluster with the same S3 location for HFiles, this old 
address is used still.
Is there anything specific I have missed to get this scenario to work properly?

This is the full stacktrace:

2019-06-26 12:33:53,352 ERROR [RS_OPEN_REGION-ip-172-35-5-11:16020-1] 
coprocessor.CoprocessorHost: The coprocessor 
org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService 
threw java.net.NoRouteToHostException: No Route to Host from  
ip-172-35-5-11/172.35.5.11 to ip-172-35-7-125.us-west-2.compute.internal:8020 
failed on socket timeout exception: java.net.NoRouteToHostException: No route 
to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
java.net.NoRouteToHostException: No Route to Host from  
ip-172-35-5-11/172.35.5.11 to ip-172-35-7-125.us-west-2.compute.internal:8020 
failed on socket timeout exception: java.net.NoRouteToHostException: No route 
to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1493)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy36.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy37.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1452)
at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1466)
at 
org.apache.hadoop.hbase.util.CoprocessorClassLoader.getClassLoader(CoprocessorClassLoader.java:264)
at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:214)
at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:188)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:376)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.<init>(RegionCoprocessorHost.java:238)
at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:802)
at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:710)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:6716)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7020)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6992)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6948)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6899)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:364)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:131)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 43 more



Many thanks,
Andras

Re: Issue when recreating EMR cluster with HBase data on S3

Reply via email to