Thanks ShaoFeng Shi,

As recommended in the installation tutorial, I use HDFS for intermediate data 
storage, so before shutting down the cluster I back up hdfs://user/kylin 
<hdfs://user/kylin> in S3 with dist-cp.

I have 2 buckets and I don’t do any modifications to S3 hbase root 
directly,these are my buckets  :

Configuration Bucket in s3://xxxx-config/metadata/kylin 
<s3://xxxx-config/metadata/kylin> where I store The contents of 
hdfs:///user/Kylin
HBASE rootdir in s3://xxxx-hbase/storage <s3://xxxx-hbase/storage>

When I shut down the cluster I execute these commands in a shutdown script:

#!/bin/bash
#stop kylin
$KYLIN_HOME/bin/kylin.sh stop

#To shut down an Amazon EMR cluster without losing data that hasn’t been 
written to Amazon S3, 
#the MemStore cache needs to flush to Amazon S3 to write new store files. 
#To do this, you can run a shell script provided on the EMR cluster.

bash /usr/lib/hbase/bin/disable_all_tables.sh
 

#Before you shutdown/restart the cluster, you must backup the “/kylin” data on 
HDFS to S3 with S3DistCp,
# or you may lost data and couldn’t recover the cluster later.

s3-dist-cp --src=hdfs:///kylin --dest=s3://da-config/metadata/kylin


s3-dist-cp creates a hadoop Job, so it will be monitored by consistent view in 
EMRFS.

So should I add these commands to my shutdown script?:

emrfs delete s3://x <s3://x>xxx-config/metadata/kylin 
emrfs import s3://x <s3://x>xxx-config/metadata/kylin 
emrfs sync s3://x <s3://x>xxx-config/metadata/kylin 

emrfs delete s3://x <s3://x>xxx-hbase/storage
emrfs import s3://x <s3://x>xxx-hbase/storage
emrfs sync s3://x <s3://x>xxx-hbase/storage

Should I do something in the hbase root directory in S3?

When I start a brand new cluster, apart of doing: 

hadoop fs -mkdir /kylin 
s3-dist-cp --src=s3://xxxx-config/metadata/kylin  --dest=hdfs:///kylin 

do I have to do any other action?

Thank you very much for your help, 

A final question, ¿Is it worth using S3 as hbase storage for a production 
environment o it would be safer using just HDFS? My plan is to use Hive + Kylin 
as EDW
 



Moisés Català
Senior Data Engineer
La Cupula Music - Sonosuite
T: +34 93 250 38 05
www.lacupulamusic.com <http://www.lacupulamusic.com/>


> El 1 ago 2018, a las 3:10, ShaoFeng Shi <shaofeng...@apache.org> escribió:
> 
> Hi,
> 
> Sometimes the EMRFS becomes inconsistent with S3; EMRFS uses a DynamoDB to 
> cache the object entries and status. If you or your applications directly 
> update S3 (not via EMRFS), then the entries in EMRFS are inconsistent.
> 
> You can refer to this post: 
> https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working
>  
> <https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working>
> 
> In my experience, I did this one or two times:
> 
> emrfs delete s3://path
> emrfs import s3://path
> emrfs sync s3://path
> 
> The key point is, when using EMRFS, all update to the bucket should go 
> through EMRFS, not S3. Hope this can help.
> 
> 2018-07-30 23:26 GMT+08:00 Moisés Català <moises.cat...@lacupulamusic.com 
> <mailto:moises.cat...@lacupulamusic.com>>:
> Thanks for the tips Roberto,
> 
> You’re right, when I deploy emr and install Kylie everything works like a 
> charm, even I can build the sample cube.
> 
> I have added the config you suggested about using EMRFS in emrfs-site and I 
> have launched a brand new cluster.
> I also deployed Kylie and built the cube. Finally I shut down Kylin & 
> disabled all hbase tables.
> 
> Unfortunately, when I launch a new cluster, hbase master node can’t boot, 
> looking the log  appears this:
> 
> 2018-07-30 15:00:31,103 ERROR [ip-172-31-85-0:16000.activeMasterManager] 
> consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item 
> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 2018-07-30 15:02:49,220 ERROR [ip-172-31-85-0:16000.activeMasterManager] 
> consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item 
> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 2018-07-30 15:09:01,324 ERROR [ip-172-31-85-0:16000.activeMasterManager] 
> consistency.ConsistencyCheckerS3FileSystem: No s3 object for metadata item 
> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
> 2018-07-30 15:09:01,325 FATAL [ip-172-31-85-0:16000.activeMasterManager] 
> master.HMaster: Failed to become active master
> com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: 1 
> items inconsistent (no s3 object for associated metadata item). First object: 
> /da-hbase/storage/data/hbase/meta/.tabledesc/.tableinfo.0000000001
>       at 
> com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:749)
>       at 
> com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.listStatus(ConsistencyCheckerS3FileSystem.java:519)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy30.listStatus(Unknown Source)
>       at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.listStatus(S3NativeFileSystem2.java:206)
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1532)
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1558)
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1603)
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1597)
>       at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.listStatus(EmrFileSystem.java:347)
>       at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1737)
>       at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:377)
>       at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:358)
>       at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:339)
>       at 
> org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.needsMigration(FSTableDescriptorMigrationToSubdir.java:59)
>       at 
> org.apache.hadoop.hbase.util.FSTableDescriptorMigrationToSubdir.migrateFSTableDescriptorsIfNecessary(FSTableDescriptorMigrationToSubdir.java:45)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:526)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:166)
>       at 
> org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:141)
>       at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:725)
>       at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:198)
>       at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1907)
>       at java.lang.Thread.run(Thread.java:748)
> 2018-07-30 15:09:01,326 FATAL [ip-172-31-85-0:16000.activeMasterManager] 
> master.HMaster: Unhandled exception. Starting shutdown.
> 
> I have attached the full log to the email.
> 
> What am I missing???
> 
> Thanks in advance
> 
> 
> 
> 
> 
> 
> 
>> El 30 jul 2018, a las 9:02, <roberto.tar...@stratebi.com 
>> <mailto:roberto.tar...@stratebi.com>> <roberto.tar...@stratebi.com 
>> <mailto:roberto.tar...@stratebi.com>> escribió:
>> 
>> Hi Moisés,
>>  
>> If I have understood right you have been able to deploy Kylin on EMR 
>> successfully . However you lose metadata when you terminate the cluster. is 
>> it right? 
>>  
>> Have you tried to restore Kylin metadata backup after cluster re-creation? 
>> Moreover, do you enable all HBase tables after cluster re-creation? 
>>  
>> We successfully deployed Kylin on EMR using S3 as storage for HBase and 
>> Hive. But our configuration differ by 2 points:
>> ·         We use EMRFS 
>> https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html 
>> <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html>
>> o   {
>> o                  "Classification": "emrfs-site",
>> o                  "Properties": {
>> o                                  "fs.s3.consistent.retryPeriodSeconds": 
>> "10",
>> o                                  "fs.s3.consistent": "true",
>> o                                  "fs.s3.consistent.retryCount": "5",
>> o                                  "fs.s3.consistent.metadata.tableName": 
>> "EmrFSMetadata"
>> o                  },
>> o                  "Configurations": []
>> o   }
>> ·         We deployed Kylin on an EC2 machine separated from the cluster.
>>  
>> I hope this helps you.
>>  
>> Roberto Tardío
>>  
>> From: Moisés Català [mailto:moises.cat...@lacupulamusic.com 
>> <mailto:moises.cat...@lacupulamusic.com>] 
>> Sent: sábado, 28 de julio de 2018 16:17
>> To: user@kylin.apache.org <mailto:user@kylin.apache.org>
>> Subject: Kylin with S3, cubes tables get in transition when new cluster 
>> booted
>>  
>> Hi all,
>>  
>> I’ve followed carefully the instructions provided in 
>> http://kylin.apache.org/docs23/install/kylin_aws_emr.html 
>> <http://kylin.apache.org/docs23/install/kylin_aws_emr.html> 
>>  
>> My idea is to use s3 as the storage for Hbase, I have configured the cluster 
>> following the instructions but I get that tables that contain cube 
>> definition keep "on transition" when deploying new cluster and Kylie 
>> metadata seems outdated...
>>  
>> These are the steps I follow to create the cluster
>>  
>> Cluster creation command:
>>  
>> aws emr create-cluster \
>> --applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia 
>> Name=Hive Name=Hbase Name=HCatalog Name=Tez \
>> --tags 'hive=' 'spark=' 'zeppelin=' \
>> --ec2-attributes 'file://../config/ec2-attributes.json <>' \
>> --release-label emr-5.16.0 \
>> --log-uri 's3n://sns-da-logs/ <>' \
>> --instance-groups 'file://../config/instance-hive-datawarehouse.json <>' \
>> --configurations  'file://../config/hive-hbase-s3.json <>' \
>> --auto-scaling-role EMR_AutoScaling_DefaultRole \
>> --ebs-root-volume-size 10 \
>> --service-role EMR_DefaultRole \
>> --enable-debugging \
>> --name 'hbase-hive-datawarehouse' \
>> --scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
>> --region us-east-1
>>  
>>  
>> My configuration hive-hbase-s3.json:
>>  
>> [
>>   {
>>     "Classification": "hive-site",
>>     "Configurations": [],
>>     "Properties": {
>>       "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db 
>> <>",
>>       "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
>>       "javax.jdo.option.ConnectionPassword": “xxxxx",
>>       "javax.jdo.option.ConnectionURL": 
>> "jdbc:mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true <>",
>>       "javax.jdo.option.ConnectionUserName": “xxxx"
>>     }
>>   },
>>   {
>>     "Classification": "hbase",
>>     "Configurations": [],
>>     "Properties": {
>>       "hbase.emr.storageMode": "s3"
>>     }
>>   },
>>   {
>>     "Classification": "hbase-site",
>>     "Configurations": [],
>>     "Properties": {
>>       "hbase.rpc.timeout": "3600000",
>>       "hbase.rootdir": "s3://xxxxxx-hbase/ <>"
>>     }
>>   },
>>   {
>>       "Classification": "core-site",
>>       "Properties": {
>>         "io.file.buffer.size": "65536"
>>       }
>>   },
>>   {
>>       "Classification": "mapred-site",
>>       "Properties": {
>>         "mapred.map.tasks.speculative.execution": "false",
>>         "mapred.reduce.tasks.speculative.execution": "false",
>>         "mapreduce.map.speculative": "false",
>>         "mapreduce.reduce.speculative": "false"
>>  
>>       }
>>   } 
>> ]
>>  
>> When I shut down the cluster I perform these commands:
>>  
>> ../kylin_home/bin/kylin.sh stop
>>  
>>  
>> #Before you shutdown/restart the cluster, you must backup the “/kylin” data 
>> on HDFS to S3 with S3DistCp,
>>   
>> aws s3 rm s3://xxxxxx-config/metadata/kylin/* <>
>> s3-dist-cp --src=hdfs:///kylin <> --dest=s3://xxxxxx-config/metadata/kylin <>
>>  
>> bash /usr/lib/hbase/bin/disable_all_tables.sh
>>  
>>  
>> Please, could you be so kind to indicate me what am I missing
>>  
>>  
>> Thanks in advance
> 
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi 史少锋
> 

Reply via email to