Hi Robert, No need to set *kylin.storage.hbase.cluster-fs to the same bucket again.*
For the stuck job, did you check YARN RM to see whether there is any indicator? 2017-11-09 17:38 GMT+08:00 Roberto Tardío <roberto.tar...@stratebi.com>: > Hi, > > EMR version is 5.7 and Kylin version is 2.1. We have changed > kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not > changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could it > be because we did not change this *kylin.storage.hbase.cluster-fs *parameter > to S3? > > We have tried also with the last versión of Kylin (2.2). In this case when > build job start the first step get stucked with no errors or warns in log > files. Maybe we are doing something wrong. We are going to try tomorrow > setting *kylin.storage.hbase.cluster-fs *to S3. > > Others details about abour our architecture are: > > - Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with > Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3. > - EMR 5.7 cluster (1 master and 4 cores) > - HBase on S3 > - Hive warehouse on S3 and metastore configured on MySQL in the ec2 > machine (the same where Kylin runs) > - HDFS > - S3 with EMRFS > - Zookeeper. > > I will give you feedback about tomorrow new tests. > > Many thanks ShaoFeng! > > El 09/11/2017 a las 1:12, ShaoFeng Shi escribió: > > Hi Roberto, > > What's your EMR version? I know that in 4.x version, EMR's Hive has a > problem with "insert overwrite" over S3, that is just what Kylin need in > the "redistribute flat hive table" step. You can also skip the > "redistribute" step by setting "kylin.source.hive. > redistribute-flat-table=false" in kylin.properties. (On EMR 5.7, there > is no such issue). > > The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, and > "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on S3). > Kylin will build the cube on HDFS and then output HFile to S3, and finally > load to HBase on S3. This will gain better build performance and also > ensure Cube data in S3 for high availability and durability. But if you > stop EMR, the intermediate cuboid files will be lost, which cause segments > couldn't be merged. > > The third option is to use a newer version like EMR 5.7, use S3 as the > working dir (and HBase also on S3). > > For all the scenarios, please use Kylin v2.2, which includes the fix of > KYLIN-2788. > > > > > > 2017-11-09 3:45 GMT+08:00 Roberto Tardío <roberto.tar...@stratebi.com>: > >> Hi, >> >> We have deployed Kylin on ec2 machine using an EMR cluster. After adding >> the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have >> succesfully build sample cube. However, kylin data is stored on hdfs path >> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased >> if you Terminate the cluster (e.g. to save costs of use, to change the kind >> of instances,...), we have to store data on S3. >> >> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3, >> like s3://your-bucket/kylin. But after this change if we try to build >> sample cube, the build job starts but it gets stuck in step 2 "Redistribute >> Flat Hive Table". We have checked that this step never start and kylin logs >> do not show any error or warn. >> >> Do you have any idea how to solve this and make possible that Kylin works >> with S3? >> >> So far the only solution we have found is to copy the HDFS folder to S3 >> before terminate the EMR cluster and copy it from S3 to HDFS when it is >> turned on. However this is a half solution, since the HDFS storage of EMR >> is ephemeral and we do not have as much space available as in S3. Which >> data stores kylin on kylin path? HBase tables are stored in this folder? >> >> We will appreciate you help, >> >> Roberto >> -- >> >> *Roberto Tardío Olmos* >> *Senior Big Data & Business Intelligence Consultant* >> Avenida de Brasil, 17 >> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>, >> Planta 16.28020 Madrid >> Fijo: 91.788.34.10 >> > > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > > -- > > *Roberto Tardío Olmos* > *Senior Big Data & Business Intelligence Consultant* > Avenida de Brasil, 17 > <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>, > Planta 16.28020 Madrid > Fijo: 91.788.34.10 > -- Best regards, Shaofeng Shi 史少锋