Hello Andras, >From 3.0 Kylin starts to persist some real-time metadata to zookeeper; I think it didn't consider such case (on AWS). We need to provide a guideline on how to backup/restore that part. Thank you for the feedback. Keep tuned.
Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: [email protected] Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: [email protected] Join Kylin dev mail group: [email protected] Andras Nagy <[email protected]> 于2019年6月27日周四 下午9:24写道: > Hi Xiaoxiang, > > >In fact, we currently have no way to backup or restore the streaming > metadata which related to replica set/assignment etc. > >I think these metadata are volatile, such as hostname of each worker may > be different in two cluster > > Exactly, I agree it makes no sense to persist these. It would make more > sense to rebuild these on the new cluster, based on the specifics of the > new cluster. > > What I'm looking for is how to ensure that the runtime environments (both > the Kylin processes and for EMR cluster that is hosting HBase, Spark and > MapReduce) become stateless and if they are failing or destroyed, this does > not affect the the cube data, which is persistent (in this case, in S3), so > the runtime infrastructure can fail over to new instances. > > By hosting the HBase data on S3 it seemed to be possible, as the data > (cubes) built in a previous HBase environment (EMR) are now available in a > new HBase (new EMR cluster). > Still, even though I have the cube data, I can't query it from Kylin, > because the query layer also relies on this volatile streaming metadata. > Is this understanding correct? > If it is, how far do you think Kylin is from being able to support this > scenario? > > Many thanks, > Andras > > On Thu, Jun 27, 2019 at 12:50 PM Xiaoxiang Yu <[email protected]> wrote: > >> Hi Andras, >> In fact, we currently have no way to backup or restore the streaming >> metadata which related to replica set/assignment etc. >> I think these metadata are volatile, such as hostname of each worker >> may be different in two cluster. But if you find backup/restore is really >> useful for streaming metadata. Please submit a JIRA. >> >> *-----------------* >> *-----------------* >> *Best wishes to you ! * >> *From :**Xiaoxiang Yu* >> >> At 2019-06-27 17:54:08, "Andras Nagy" <[email protected]> >> wrote: >> >> OK, this worked, so I could proceed one step. I disabled all HBase >> tables, manually altered them so the coprocessor locations point to the new >> HDFS cluster, and re-enabled them. After this, there are no errors in the >> regionserver's logs, and Kylin starts up, so this seems fine. >> (Interestingly, the DeployCoprocessorCLI did assemble the correct HDFS URL, >> but could not alter the table definitions, so after running >> DeployCoprocessorCLI, the table definitions have not changed. This is on >> HBase version 1.4.9.) >> >> However when I try to query the existing cubes, I get a failure with a >> NullPointerException at >> org.apache.kylin.stream.coordinator.assign.AssignmentsCache.getReplicaSetsByCube(AssignmentsCache.java:61). >> Just quickly looking at it, it seems like these cube assignments come from >> Zookeeper, and I'm missing them. Since I'm now running on a completely new >> EMR cluster (with new Zookeeper), I wonder if there is some persistent >> state in Zookeeper that should also be backed up and restored. >> >> (This deployment used hdfs-working-dir on HDFS, so before terminating the >> old cluster I backed up the hdfs-working-dir and have restored it in the >> new cluster; but nothing from Zookeeper.) >> >> Thanks in advance for any pointers about this. >> >> On Thu, Jun 27, 2019 at 10:30 AM Andras Nagy < >> [email protected]> wrote: >> >>> Checked the table definition in HBase, and that's what explicitely >>> references the coprocessor location on the old cluster. I'll update that >>> and let you know. >>> >>> On Thu, Jun 27, 2019 at 10:26 AM Andras Nagy < >>> [email protected]> wrote: >>> >>>> Actually as I noticed, it's not the corpocessor that's failing, but >>>> HBase when trying to load the coprocessor itself from HDFS (form a >>>> reference somewhere that still points to the old HDFS namenode). >>>> >>>> On Thu, Jun 27, 2019 at 10:19 AM Andras Nagy < >>>> [email protected]> wrote: >>>> >>>>> Hi ShaoFeng, >>>>> >>>>> After disabling the "KYLIN_*" tables (but not 'kylin_metadata') the >>>>> RegionServers could indeed start up and the coprocessor refresh succeeded. >>>>> >>>>> But after re-enabling those tables again, the issue continues, and >>>>> again the RegionServers fail by trying to connect to the old master node. >>>>> What I noticed now from the stacktrace is that the coprocessor is actually >>>>> trying to connect to the old HDFS namenode on port 8020 (and not to the >>>>> HBase master). >>>>> >>>>> Best regards, >>>>> Andras >>>>> >>>>> >>>>> On Thu, Jun 27, 2019 at 4:21 AM ShaoFeng Shi <[email protected]> >>>>> wrote: >>>>> >>>>>> I see; Can you try this way: disable all "KYLIN_*" tables in HBase >>>>>> console, and then see whether the region servers can start. >>>>>> >>>>>> If they can start, then run the above command to refresh the >>>>>> coprocessor. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Shaofeng Shi 史少锋 >>>>>> Apache Kylin PMC >>>>>> Email: [email protected] >>>>>> >>>>>> Apache Kylin FAQ: >>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >>>>>> Join Kylin user mail group: [email protected] >>>>>> Join Kylin dev mail group: [email protected] >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Andras Nagy <[email protected]> 于2019年6月26日周三 下午10:57写道: >>>>>> >>>>>>> Hi ShaoFeng, >>>>>>> Yes, but it fails as well. Actually it fails because the >>>>>>> RegionServers are not running (as they fail when starting up). >>>>>>> Best regards, >>>>>>> Andras >>>>>>> >>>>>>> On Wed, Jun 26, 2019 at 4:42 PM ShaoFeng Shi <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Andras, >>>>>>>> >>>>>>>> Did you try this? >>>>>>>> https://kylin.apache.org/docs/howto/howto_update_coprocessor.html >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Shaofeng Shi 史少锋 >>>>>>>> Apache Kylin PMC >>>>>>>> Email: [email protected] >>>>>>>> >>>>>>>> Apache Kylin FAQ: >>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >>>>>>>> Join Kylin user mail group: [email protected] >>>>>>>> Join Kylin dev mail group: [email protected] >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Andras Nagy <[email protected]> 于2019年6月26日周三 下午10:05写道: >>>>>>>> >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> I'm testing a setup where HBase is running on AWS EMR and HBase >>>>>>>>> data is stored on S3. It's working fine so far, but when I terminate >>>>>>>>> the >>>>>>>>> EMR cluster and recreate it with the same S3 location for HBase, HBase >>>>>>>>> won't start up properly. Before shutting down, I did execute the >>>>>>>>> disable_all_tables.sh script to flush HBase state to S3. >>>>>>>>> >>>>>>>>> Actually the issue is that RegionServers don't start up. Maybe I'm >>>>>>>>> missing something in the EMR setup and not in Kylin setup, but the >>>>>>>>> exceptions I get in the RegionServer's log point at Kylin's >>>>>>>>> CubeVisitService coprocessor, which is still trying to connect to the >>>>>>>>> old >>>>>>>>> HBase master on the old EMR cluster's master node and fails with: >>>>>>>>> "coprocessor.CoprocessorHost: The coprocessor >>>>>>>>> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService >>>>>>>>> threw java.net.NoRouteToHostException: No Route to Host from >>>>>>>>> ip-172-35-5-11/172.35.5.11 to >>>>>>>>> ip-172-35-7-125.us-west-2.compute.internal:8020 failed on socket >>>>>>>>> timeout >>>>>>>>> exception: java.net.NoRouteToHostException: No route to host; " >>>>>>>>> >>>>>>>>> (Here, ip-172-35-7-125 was the old clusters' master node.) >>>>>>>>> >>>>>>>>> Does anyone have any idea what I'm doing wrong here? >>>>>>>>> The HBase master node's address seems to be cached somewhere, and >>>>>>>>> when starting up HBase on the new cluster with the same S3 location >>>>>>>>> for >>>>>>>>> HFiles, this old address is used still. >>>>>>>>> Is there anything specific I have missed to get this scenario to >>>>>>>>> work properly? >>>>>>>>> >>>>>>>>> This is the full stacktrace: >>>>>>>>> >>>>>>>>> 2019-06-26 12:33:53,352 ERROR >>>>>>>>> [RS_OPEN_REGION-ip-172-35-5-11:16020-1] coprocessor.CoprocessorHost: >>>>>>>>> The >>>>>>>>> coprocessor >>>>>>>>> org.apache.kylin.storage.hbase.cube.v2.coprocessor.endpoint.CubeVisitService >>>>>>>>> threw java.net.NoRouteToHostException: No Route to Host from >>>>>>>>> ip-172-35-5-11/172.35.5.11 to >>>>>>>>> ip-172-35-7-125.us-west-2.compute.internal:8020 failed on socket >>>>>>>>> timeout >>>>>>>>> exception: java.net.NoRouteToHostException: No route to host; For more >>>>>>>>> details see: http://wiki.apache.org/hadoop/NoRouteToHost >>>>>>>>> java.net.NoRouteToHostException: No Route to Host from >>>>>>>>> ip-172-35-5-11/172.35.5.11 to >>>>>>>>> ip-172-35-7-125.us-west-2.compute.internal:8020 failed on socket >>>>>>>>> timeout >>>>>>>>> exception: java.net.NoRouteToHostException: No route to host; For more >>>>>>>>> details see: http://wiki.apache.org/hadoop/NoRouteToHost >>>>>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>>>>> Method) >>>>>>>>> at >>>>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) >>>>>>>>> at >>>>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) >>>>>>>>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758) >>>>>>>>> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1493) >>>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1435) >>>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1345) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) >>>>>>>>> at com.sun.proxy.$Proxy36.getFileInfo(Unknown Source) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796) >>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>>> at >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>>>> at >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:498) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) >>>>>>>>> at com.sun.proxy.$Proxy37.getFileInfo(Unknown Source) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1452) >>>>>>>>> at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1466) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.util.CoprocessorClassLoader.getClassLoader(CoprocessorClassLoader.java:264) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:214) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:188) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:376) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.<init>(RegionCoprocessorHost.java:238) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:802) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:710) >>>>>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>>>>> Method) >>>>>>>>> at >>>>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) >>>>>>>>> at >>>>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:423) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:6716) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7020) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6992) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6948) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6899) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:364) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:131) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129) >>>>>>>>> at >>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>>>>> at >>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>>>>> at java.lang.Thread.run(Thread.java:748) >>>>>>>>> Caused by: java.net.NoRouteToHostException: No route to host >>>>>>>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>>>>>>>> at >>>>>>>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >>>>>>>>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) >>>>>>>>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) >>>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1381) >>>>>>>>> ... 43 more >>>>>>>>> >>>>>>>>> Many thanks, >>>>>>>>> Andras >>>>>>>>> >>>>>>>>
