Thanks Roberto; I will also try that on tomorrow or this weekend; I had planned to draft a document for EMR, it's time to do that now.
2017-11-09 19:54 GMT+08:00 Roberto Tardío <[email protected]>: > Hi, > > With Kylin 2.1 YARN RM shows one JOB for Step1 was finished with > successful. But there is no job when step2 get stucked. When we use HDFS as > working dir this steps works fine and launch a Tez job on YARN RM that > finish with success (and also all the sample cube build process). > > With Kylin 2.2 YARN RM do not show any MR job when Step 1 get stucked. > > However we are going to do again the test, maybe due to change kylin > version from 2.1 to 2.2 we forget to clean some metadata, coprocessor,... > > El 09/11/2017 a las 11:10, ShaoFeng Shi escribió: > > Hi Robert, > > No need to set > *kylin.storage.hbase.cluster-fs to the same bucket again. * > > For the stuck job, did you check YARN RM to see whether there is any > indicator? > > > 2017-11-09 17:38 GMT+08:00 Roberto Tardío <[email protected]>: > >> Hi, >> >> EMR version is 5.7 and Kylin version is 2.1. We have changed >> kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not >> changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could >> it be because we did not change this *kylin.storage.hbase.cluster-fs >> *parameter >> to S3? >> >> We have tried also with the last versión of Kylin (2.2). In this case >> when build job start the first step get stucked with no errors or warns in >> log files. Maybe we are doing something wrong. We are going to try tomorrow >> setting *kylin.storage.hbase.cluster-fs *to S3. >> >> Others details about abour our architecture are: >> >> - Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with >> Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3. >> - EMR 5.7 cluster (1 master and 4 cores) >> - HBase on S3 >> - Hive warehouse on S3 and metastore configured on MySQL in the >> ec2 machine (the same where Kylin runs) >> - HDFS >> - S3 with EMRFS >> - Zookeeper. >> >> I will give you feedback about tomorrow new tests. >> >> Many thanks ShaoFeng! >> >> El 09/11/2017 a las 1:12, ShaoFeng Shi escribió: >> >> Hi Roberto, >> >> What's your EMR version? I know that in 4.x version, EMR's Hive has a >> problem with "insert overwrite" over S3, that is just what Kylin need in >> the "redistribute flat hive table" step. You can also skip the >> "redistribute" step by setting "kylin.source.hive.redistribut >> e-flat-table=false" in kylin.properties. (On EMR 5.7, there is no such >> issue). >> >> The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, >> and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on >> S3). Kylin will build the cube on HDFS and then output HFile to S3, and >> finally load to HBase on S3. This will gain better build performance and >> also ensure Cube data in S3 for high availability and durability. But if >> you stop EMR, the intermediate cuboid files will be lost, which cause >> segments couldn't be merged. >> >> The third option is to use a newer version like EMR 5.7, use S3 as the >> working dir (and HBase also on S3). >> >> For all the scenarios, please use Kylin v2.2, which includes the fix of >> KYLIN-2788. >> >> >> >> >> >> 2017-11-09 3:45 GMT+08:00 Roberto Tardío <[email protected]>: >> >>> Hi, >>> >>> We have deployed Kylin on ec2 machine using an EMR cluster. After adding >>> the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have >>> succesfully build sample cube. However, kylin data is stored on hdfs path >>> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased >>> if you Terminate the cluster (e.g. to save costs of use, to change the kind >>> of instances,...), we have to store data on S3. >>> >>> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3, >>> like s3://your-bucket/kylin. But after this change if we try to build >>> sample cube, the build job starts but it gets stuck in step 2 "Redistribute >>> Flat Hive Table". We have checked that this step never start and kylin logs >>> do not show any error or warn. >>> >>> Do you have any idea how to solve this and make possible that Kylin >>> works with S3? >>> >>> So far the only solution we have found is to copy the HDFS folder to S3 >>> before terminate the EMR cluster and copy it from S3 to HDFS when it is >>> turned on. However this is a half solution, since the HDFS storage of EMR >>> is ephemeral and we do not have as much space available as in S3. Which >>> data stores kylin on kylin path? HBase tables are stored in this folder? >>> >>> We will appreciate you help, >>> >>> Roberto >>> -- >>> >>> *Roberto Tardío Olmos* >>> *Senior Big Data & Business Intelligence Consultant* >>> Avenida de Brasil, 17 >>> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>, >>> Planta 16.28020 Madrid >>> Fijo: 91.788.34.10 >>> >> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> >> -- >> >> *Roberto Tardío Olmos* >> *Senior Big Data & Business Intelligence Consultant* >> Avenida de Brasil, 17 >> <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>, >> Planta 16.28020 Madrid >> Fijo: 91.788.34.10 >> > > > > -- > Best regards, > > Shaofeng Shi 史少锋 > > > -- > > *Roberto Tardío Olmos* > *Senior Big Data & Business Intelligence Consultant* > Avenida de Brasil, 17 > <https://maps.google.com/?q=Avenida+de+Brasil,+17&entry=gmail&source=g>, > Planta 16.28020 Madrid > Fijo: 91.788.34.10 > -- Best regards, Shaofeng Shi 史少锋
