Hi,
With Kylin 2.1 YARN RM shows one JOB for Step1 was finished with
successful. But there is no job when step2 get stucked. When we use HDFS
as working dir this steps works fine and launch a Tez job on YARN RM
that finish with success (and also all the sample cube build process).
With Kylin 2.2 YARN RM do not show any MR job when Step 1 get stucked.
However we are going to do again the test, maybe due to change kylin
version from 2.1 to 2.2 we forget to clean some metadata, coprocessor,...
El 09/11/2017 a las 11:10, ShaoFeng Shi escribió:
Hi Robert,
No need to set *kylin.storage.hbase.cluster-fs to the same bucket again.
*
*
*
For the stuck job, did you check YARN RM to see whether there is any
indicator?
2017-11-09 17:38 GMT+08:00 Roberto Tardío <[email protected]
<mailto:[email protected]>>:
Hi,
EMR version is 5.7 and Kylin version is 2.1. We have changed
kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have
not changed **kylin.storage.hbase.cluster-fs to the same S3
bucket*. Could it be because we did not change this
*kylin.storage.hbase.cluster-fs *parameter to S3?
We have tried also with the last versión of Kylin (2.2). In this
case when build job start the first step get stucked with no
errors or warns in log files. Maybe we are doing something wrong.
We are going to try tomorrow setting
*kylin.storage.hbase.cluster-fs *to S3.
Others details about abour our architecture are:
* Kylin 2.1 (also tried with 2.2) on a separated ec2 machine,
with Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
* EMR 5.7 cluster (1 master and 4 cores)
o HBase on S3
o Hive warehouse on S3 and metastore configured on MySQL in
the ec2 machine (the same where Kylin runs)
o HDFS
o S3 with EMRFS
o Zookeeper.
I will give you feedback about tomorrow new tests.
Many thanks ShaoFeng!
El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:
Hi Roberto,
What's your EMR version? I know that in 4.x version, EMR's Hive
has a problem with "insert overwrite" over S3, that is just what
Kylin need in the "redistribute flat hive table" step. You can
also skip the "redistribute" step by setting
"kylin.source.hive.redistribute-flat-table=false" in
kylin.properties. (On EMR 5.7, there is no such issue).
The second option is, set "kylin.env.hdfs-working-dir" to local
HDFS, and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase
data also on S3). Kylin will build the cube on HDFS and then
output HFile to S3, and finally load to HBase on S3. This will
gain better build performance and also ensure Cube data in S3 for
high availability and durability. But if you stop EMR, the
intermediate cuboid files will be lost, which cause segments
couldn't be merged.
The third option is to use a newer version like EMR 5.7, use S3
as the working dir (and HBase also on S3).
For all the scenarios, please use Kylin v2.2, which includes the
fix of KYLIN-2788.
2017-11-09 3:45 GMT+08:00 Roberto Tardío
<[email protected] <mailto:[email protected]>>:
Hi,
We have deployed Kylin on ec2 machine using an EMR cluster.
After adding the "hbase.zookeeper.quorum" property to
kylin_job_conf.xml, we have succesfully build sample cube.
However, kylin data is stored on hdfs path /kylin. Due to the
HDFS is ephemeral storage on EMR and it will be erased if you
Terminate the cluster (e.g. to save costs of use, to change
the kind of instances,...), we have to store data on S3.
With this aim we changed 'kylin.env.hdfs-working-dir'
property to s3, like s3://your-bucket/kylin. But after this
change if we try to build sample cube, the build job starts
but it gets stuck in step 2 "Redistribute Flat Hive Table".
We have checked that this step never start and kylin logs do
not show any error or warn.
Do you have any idea how to solve this and make possible that
Kylin works with S3?
So far the only solution we have found is to copy the HDFS
folder to S3 before terminate the EMR cluster and copy it
from S3 to HDFS when it is turned on. However this is a half
solution, since the HDFS storage of EMR is ephemeral and we
do not have as much space available as in S3. Which data
stores kylin on kylin path? HBase tables are stored in this
folder?
We will appreciate you help,
Roberto
--
*Roberto Tardío Olmos*
/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17
<https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
Planta 16.28020 Madrid
Fijo: 91.788.34.10
--
Best regards,
Shaofeng Shi 史少锋
--
*Roberto Tardío Olmos*
/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17
<https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>,
Planta 16.28020 Madrid
Fijo: 91.788.34.10
--
Best regards,
Shaofeng Shi 史少锋
--
*Roberto Tardío Olmos*
/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17, Planta 16.28020 Madrid
Fijo: 91.788.34.10