Hi Robert,

No need to set
*kylin.storage.hbase.cluster-fs to the same bucket again.*

For the stuck job, did you check YARN RM to see whether there is any
indicator?


2017-11-09 17:38 GMT+08:00 Roberto Tardío <roberto.tar...@stratebi.com>:

> Hi,
>
> EMR version is 5.7 and Kylin version is 2.1. We have changed
> kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not
> changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could it
> be because we did not change this *kylin.storage.hbase.cluster-fs *parameter
> to S3?
>
> We have tried also with the last versión of Kylin (2.2). In this case when
> build job start the first step get stucked with no errors or warns in log
> files. Maybe we are doing something wrong. We are going to try tomorrow
> setting *kylin.storage.hbase.cluster-fs *to S3.
>
> Others details about abour our architecture are:
>
>    - Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with
>    Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
>    - EMR 5.7 cluster (1 master and 4 cores)
>    - HBase on S3
>       - Hive warehouse on S3 and metastore configured on MySQL in the ec2
>       machine (the same where Kylin runs)
>       - HDFS
>       - S3 with EMRFS
>       - Zookeeper.
>
> I will give you feedback about tomorrow new tests.
>
> Many thanks ShaoFeng!
>
> El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:
>
> Hi Roberto,
>
> What's your EMR version? I know that in 4.x version, EMR's Hive has a
> problem with "insert overwrite" over S3, that is just what Kylin need in
> the "redistribute flat hive table" step. You can also skip the
> "redistribute" step by setting "kylin.source.hive.
> redistribute-flat-table=false" in kylin.properties.  (On EMR 5.7, there
> is no such issue).
>
> The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, and
> "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on S3).
> Kylin will build the cube on HDFS and then output HFile to S3, and finally
> load to HBase on S3. This will gain better build performance and also
> ensure Cube data in S3 for high availability and durability. But if you
> stop EMR, the intermediate cuboid files will be lost, which cause segments
> couldn't be merged.
>
> The third option is to use a newer version like EMR 5.7,  use S3 as the
> working dir (and HBase also on S3).
>
> For all the scenarios, please use Kylin v2.2, which includes the fix of
> KYLIN-2788.
>
>
>
>
>
> 2017-11-09 3:45 GMT+08:00 Roberto Tardío <roberto.tar...@stratebi.com>:
>
>> Hi,
>>
>> We have deployed Kylin on ec2 machine using an EMR cluster. After adding
>> the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have
>> succesfully build sample cube. However, kylin data is stored on hdfs path
>> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased
>> if you Terminate the cluster (e.g. to save costs of use, to change the kind
>> of instances,...), we have to store data on S3.
>>
>> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3,
>> like s3://your-bucket/kylin. But after this change if we try to build
>> sample cube, the build job starts but it gets stuck in step 2 "Redistribute
>> Flat Hive Table". We have checked that this step never start and kylin logs
>> do not show any error or warn.
>>
>> Do you have any idea how to solve this and make possible that Kylin works
>> with S3?
>>
>> So far the only solution we have found is to copy the HDFS folder to S3
>> before terminate the EMR cluster and copy it from S3 to HDFS when it is
>> turned on. However this is a half solution, since the HDFS storage of EMR
>> is ephemeral and we do not have as much space available as in S3. Which
>> data stores kylin on kylin path? HBase tables are stored in this folder?
>>
>> We will appreciate you help,
>>
>> Roberto
>> --
>>
>> *Roberto Tardío Olmos*
>> *Senior Big Data & Business Intelligence Consultant*
>> Avenida de Brasil, 17
>> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
>> Planta 16.28020 Madrid
>> Fijo: 91.788.34.10
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
> --
>
> *Roberto Tardío Olmos*
> *Senior Big Data & Business Intelligence Consultant*
> Avenida de Brasil, 17
> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
> Planta 16.28020 Madrid
> Fijo: 91.788.34.10
>



-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to