Hi all,

I’ve followed carefully the instructions provided in 
http://kylin.apache.org/docs23/install/kylin_aws_emr.html 
<http://kylin.apache.org/docs23/install/kylin_aws_emr.html> 

My idea is to use s3 as the storage for Hbase, I have configured the cluster 
following the instructions but I get that tables that contain cube definition 
keep "on transition" when deploying new cluster and Kylie metadata seems 
outdated...

These are the steps I follow to create the cluster

Cluster creation command:

aws emr create-cluster \
--applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia 
Name=Hive Name=Hbase Name=HCatalog Name=Tez \
--tags 'hive=' 'spark=' 'zeppelin=' \
--ec2-attributes 'file://../config/ec2-attributes.json 
<file://../config/ec2-attributes.json>' \
--release-label emr-5.16.0 \
--log-uri 's3n://sns-da-logs/ <s3n://sns-da-logs/>' \
--instance-groups 'file://../config/instance-hive-datawarehouse.json 
<file://../config/instance-hive-datawarehouse.json>' \
--configurations  'file://../config/hive-hbase-s3.json 
<file://../config/hive-hbase-s3.json>' \
--auto-scaling-role EMR_AutoScaling_DefaultRole \
--ebs-root-volume-size 10 \
--service-role EMR_DefaultRole \
--enable-debugging \
--name 'hbase-hive-datawarehouse' \
--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \
--region us-east-1


My configuration hive-hbase-s3.json:

[
  {
    "Classification": "hive-site",
    "Configurations": [],
    "Properties": {
      "hive.metastore.warehouse.dir": "s3://xxxxxxxx-datawarehouse/hive.db 
<s3://xxxxxxxx-datawarehouse/hive.db>",
      "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
      "javax.jdo.option.ConnectionPassword": “xxxxx",
      "javax.jdo.option.ConnectionURL": 
"jdbc:mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true 
<mysql://xxxxxx:3306/hive_metastore?createDatabaseIfNotExist=true>",
      "javax.jdo.option.ConnectionUserName": “xxxx"
    }
  },
  {
    "Classification": "hbase",
    "Configurations": [],
    "Properties": {
      "hbase.emr.storageMode": "s3"
    }
  },
  {
    "Classification": "hbase-site",
    "Configurations": [],
    "Properties": {
      "hbase.rpc.timeout": "3600000",
      "hbase.rootdir": "s3://xxxxxx-hbase/ <s3://xxxxxx-hbase/>"
    }
  },
  {
      "Classification": "core-site",
      "Properties": {
        "io.file.buffer.size": "65536"
      }
  },
  {
      "Classification": "mapred-site",
      "Properties": {
        "mapred.map.tasks.speculative.execution": "false",
        "mapred.reduce.tasks.speculative.execution": "false",
        "mapreduce.map.speculative": "false",
        "mapreduce.reduce.speculative": "false"

      }
  } 
]

When I shut down the cluster I perform these commands:

../kylin_home/bin/kylin.sh stop
 

#Before you shutdown/restart the cluster, you must backup the “/kylin” data on 
HDFS to S3 with S3DistCp,
  
aws s3 rm s3://xxxxxx-config/metadata/kylin/* 
<s3://xxxxxx-config/metadata/kylin/*>
s3-dist-cp --src=hdfs:///kylin <hdfs:///kylin> 
--dest=s3://xxxxxx-config/metadata/kylin <s3://xxxxxx-config/metadata/kylin>

bash /usr/lib/hbase/bin/disable_all_tables.sh


Please, could you be so kind to indicate me what am I missing


Thanks in advance

Reply via email to