Thanks ShaoFeng Shi, As recommended in the installation tutorial, I use HDFS for intermediate data storage, so before shutting down the cluster I back up hdfs://user/kylin <hdfs://user/kylin> in S3 with dist-cp.
I have 2 buckets and I don’t do any modifications to S3 hbase root directly,these are my buckets : Configuration Bucket in s3://xxxx-config/metadata/kylin <s3://xxxx-config/metadata/kylin> where I store The contents of hdfs:///user/Kylin HBASE rootdir in s3://xxxx-hbase/storage <s3://xxxx-hbase/storage> When I shut down the cluster I execute these commands in a shutdown script: #!/bin/bash #stop kylin $KYLIN_HOME/bin/kylin.sh stop #To shut down an Amazon EMR cluster without losing data that hasn’t been written to Amazon S3, #the MemStore cache needs to flush to Amazon S3 to write new store files. #To do this, you can run a shell script provided on the EMR cluster. bash /usr/lib/hbase/bin/disable_all_tables.sh #Before you shutdown/restart the cluster, you must backup the “/kylin” data on HDFS to S3 with S3DistCp, # or you may lost data and couldn’t recover the cluster later. s3-dist-cp --src=hdfs:///kylin --dest=s3://da-config/metadata/kylin s3-dist-cp creates a hadoop Job, so it will be monitored by consistent view in EMRFS. So should I add these commands to my shutdown script?: emrfs delete s3://x <s3://x>xxx-config/metadata/kylin emrfs import s3://x <s3://x>xxx-config/metadata/kylin emrfs sync s3://x <s3://x>xxx-config/metadata/kylin emrfs delete s3://x <s3://x>xxx-hbase/storage emrfs import s3://x <s3://x>xxx-hbase/storage emrfs sync s3://x <s3://x>xxx-hbase/storage Should I do something in the hbase root directory in S3? When I start a brand new cluster, apart of doing: hadoop fs -mkdir /kylin s3-dist-cp --src=s3://xxxx-config/metadata/kylin --dest=hdfs:///kylin do I have to do any other action? Thank you very much for your help, A final question, ¿Is it worth using S3 as hbase storage for a production environment o it would be safer using just HDFS? My plan is to use Hive + Kylin as EDW Moisés Català Senior Data Engineer La Cupula Music - Sonosuite T: +34 93 250 38 05 www.lacupulamusic.com <http://www.lacupulamusic.com/> > El 1 ago 2018, a las 3:10, ShaoFeng Shi <shaofeng...@apache.org> escribió: > > Hi, > > Sometimes the EMRFS becomes inconsistent with S3; EMRFS uses a DynamoDB to > cache the object entries and status. If you or your applications directly > update S3 (not via EMRFS), then the entries in EMRFS are inconsistent. > > You can refer to this post: > https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working > > <https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working> > > In my experience, I did this one or two times: > > emrfs delete s3://path > emrfs import s3://path > emrfs sync s3://path > > The key point is, when using EMRFS, all update to the bucket should go > through EMRFS, not S3. Hope this can help. > > 2018-07-30 23:26 GMT+08:00 Moisés Català <moises.cat...@lacupulamusic.com > <mailto:moises.cat...@lacupulamusic.com>>: > Thanks for the tips Roberto, > > You’re right, when I deploy emr and install Kylie everything works like a > charm, even I can build the sample cube. > > I have added the config you suggested about using EMRFS in emrfs-site and I > have launched a brand new cluster. > I also deployed Kylie and built the cube. Finally I shut down Kylin & > disabled all hbase tables. > > Unfortunately, when I launch a new cluster, hbase master node can’t boot, > looking the log appears this: > > 2018-07-30 15:00:31,103 ERROR [ip-172-31-85-0:16000.activeMasterManager] > consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > 2018-07-30 15:02:49,220 ERROR [ip-172-31-85-0:16000.activeMasterManager] > consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > 2018-07-30 15:09:01,324 ERROR [ip-172-31-85-0:16000.activeMasterManager] > consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > 2018-07-30 15:09:01,325 FATAL [ip-172-31-85-0:16000.activeMasterManager] > master.HMaster: Failed to become active master > com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: 1 > items inconsistent (no s3 object for associated metadata item). First object: > /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001 > at > com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:749) > at > com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:519) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy30.listStatus(Unknown Source) > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.listStatus(S3NativeFileSystem2.java:206) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1532) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1558) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1603) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1597) > at > com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:347) > at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1737) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:377) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:358) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:339) > at > org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:59) > at > org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:526) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166) > at > org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:141) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:725) > at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:198) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1907) > at java.lang.Thread.run(Thread.java:748) > 2018-07-30 15:09:01,326 FATAL [ip-172-31-85-0:16000.activeMasterManager] > master.HMaster: Unhandled exception. Starting shutdown. > > I have attached the full log to the email. > > What am I missing??? > > Thanks in advance > > > > > > > >> El 30 jul 2018, a las 9:02, <roberto.tar...@stratebi.com >> <mailto:roberto.tar...@stratebi.com>> <roberto.tar...@stratebi.com >> <mailto:roberto.tar...@stratebi.com>> escribió: >> >> Hi Moisés, >> >> If I have understood right you have been able to deploy Kylin on EMR >> successfully . However you lose metadata when you terminate the cluster. is >> it right? >> >> Have you tried to restore Kylin metadata backup after cluster re-creation? >> Moreover, do you enable all HBase tables after cluster re-creation? >> >> We successfully deployed Kylin on EMR using S3 as storage for HBase and >> Hive. But our configuration differ by 2 points: >> · We use EMRFS >> https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html >> <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html> >> o { >> o "Classification": "emrfs-site", >> o "Properties": { >> o "fs.s3.consistent.retryPeriodSeconds": >> "10", >> o "fs.s3.consistent": "true", >> o "fs.s3.consistent.retryCount": "5", >> o "fs.s3.consistent.metadata.tableName": >> "EmrFSMetadata" >> o }, >> o "Configurations": [] >> o } >> · We deployed Kylin on an EC2 machine separated from the cluster. >> >> I hope this helps you. >> >> Roberto Tardío >> >> From: Moisés Català [mailto:moises.cat...@lacupulamusic.com >> <mailto:moises.cat...@lacupulamusic.com>] >> Sent: sábado, 28 de julio de 2018 16:17 >> To: user@kylin.apache.org <mailto:user@kylin.apache.org> >> Subject: Kylin with S3, cubes tables get in transition when new cluster >> booted >> >> Hi all, >> >> I’ve followed carefully the instructions provided in >> http://kylin.apache.org/docs23/install/kylin_aws_emr.html >> <http://kylin.apache.org/docs23/install/kylin_aws_emr.html> >> >> My idea is to use s3 as the storage for Hbase, I have configured the cluster >> following the instructions but I get that tables that contain cube >> definition keep "on transition" when deploying new cluster and Kylie >> metadata seems outdated... >> >> These are the steps I follow to create the cluster >> >> Cluster creation command: >> >> aws emr create-cluster \ >> --applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia >> Name=Hive Name=Hbase Name=HCatalog Name=Tez \ >> --tags 'hive=' 'spark=' 'zeppelin=' \ >> --ec2-attributes 'file://../config/ec2-attributes.json <>' \ >> --release-label emr-5.16.0 \ >> --log-uri 's3n://sns-da-logs/ <>' \ >> --instance-groups 'file://../config/instance-hive-datawarehouse.json <>' \ >> --configurations 'file://../config/hive-hbase-s3.json <>' \ >> --auto-scaling-role EMR_AutoScaling_DefaultRole \ >> --ebs-root-volume-size 10 \ >> --service-role EMR_DefaultRole \ >> --enable-debugging \ >> --name 'hbase-hive-datawarehouse' \ >> --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \ >> --region us-east-1 >> >> >> My configuration hive-hbase-s3.json: >> >> [ >> { >> "Classification": "hive-site", >> "Configurations": [], >> "Properties": { >> "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db >> <>", >> "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver", >> "javax.jdo.option.ConnectionPassword": “xxxxx", >> "javax.jdo.option.ConnectionURL": >> "jdbc:mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true <>", >> "javax.jdo.option.ConnectionUserName": “xxxx" >> } >> }, >> { >> "Classification": "hbase", >> "Configurations": [], >> "Properties": { >> "hbase.emr.storageMode": "s3" >> } >> }, >> { >> "Classification": "hbase-site", >> "Configurations": [], >> "Properties": { >> "hbase.rpc.timeout": "3600000", >> "hbase.rootdir": "s3://xxxxxx-hbase/ <>" >> } >> }, >> { >> "Classification": "core-site", >> "Properties": { >> "io.file.buffer.size": "65536" >> } >> }, >> { >> "Classification": "mapred-site", >> "Properties": { >> "mapred.map.tasks.speculative.execution": "false", >> "mapred.reduce.tasks.speculative.execution": "false", >> "mapreduce.map.speculative": "false", >> "mapreduce.reduce.speculative": "false" >> >> } >> } >> ] >> >> When I shut down the cluster I perform these commands: >> >> ../kylin_home/bin/kylin.sh stop >> >> >> #Before you shutdown/restart the cluster, you must backup the “/kylin” data >> on HDFS to S3 with S3DistCp, >> >> aws s3 rm s3://xxxxxx-config/metadata/kylin/* <> >> s3-dist-cp --src=hdfs:///kylin <> --dest=s3://xxxxxx-config/metadata/kylin <> >> >> bash /usr/lib/hbase/bin/disable_all_tables.sh >> >> >> Please, could you be so kind to indicate me what am I missing >> >> >> Thanks in advance > > > > > > -- > Best regards, > > Shaofeng Shi 史少锋 >