Hi Roberto,

What's your EMR version? I know that in 4.x version, EMR's Hive has a
problem with "insert overwrite" over S3, that is just what Kylin need in
the "redistribute flat hive table" step. You can also skip the
"redistribute" step by setting "
kylin.source.hive.redistribute-flat-table=false" in kylin.properties.  (On
EMR 5.7, there is no such issue).

The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, and "
kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on S3).
Kylin will build the cube on HDFS and then output HFile to S3, and finally
load to HBase on S3. This will gain better build performance and also
ensure Cube data in S3 for high availability and durability. But if you
stop EMR, the intermediate cuboid files will be lost, which cause segments
couldn't be merged.

The third option is to use a newer version like EMR 5.7,  use S3 as the
working dir (and HBase also on S3).

For all the scenarios, please use Kylin v2.2, which includes the fix of
KYLIN-2788.





2017-11-09 3:45 GMT+08:00 Roberto Tardío <roberto.tar...@stratebi.com>:

> Hi,
>
> We have deployed Kylin on ec2 machine using an EMR cluster. After adding
> the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have
> succesfully build sample cube. However, kylin data is stored on hdfs path
> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased
> if you Terminate the cluster (e.g. to save costs of use, to change the kind
> of instances,...), we have to store data on S3.
>
> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3, like
> s3://your-bucket/kylin. But after this change if we try to build sample
> cube, the build job starts but it gets stuck in step 2 "Redistribute Flat
> Hive Table". We have checked that this step never start and kylin logs do
> not show any error or warn.
>
> Do you have any idea how to solve this and make possible that Kylin works
> with S3?
>
> So far the only solution we have found is to copy the HDFS folder to S3
> before terminate the EMR cluster and copy it from S3 to HDFS when it is
> turned on. However this is a half solution, since the HDFS storage of EMR
> is ephemeral and we do not have as much space available as in S3. Which
> data stores kylin on kylin path? HBase tables are stored in this folder?
>
> We will appreciate you help,
>
> Roberto
> --
>
> *Roberto Tardío Olmos*
> *Senior Big Data & Business Intelligence Consultant*
> Avenida de Brasil, 17
> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
> Planta 16.28020 Madrid
> Fijo: 91.788.34.10
>



-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to