Re: About degenerate dimensions on Kylin cubes

2017-11-09 Thread Billy Liu
Hi Roberto,

Degenerate dimensions on fact table is not supported I think. There is only
two types dimensions: "normal" and "derived". All "normla" dimensions will
be precaculated into cube. So it affects the construction cost and query
latency. If some column in fact table do not need to be dimension, you
could define it as "extended column". The "extended column" will not be
precaculated.

2017-11-04 18:04 GMT+08:00 Roberto Tardío :

> Hi,
>
> I have a question about how Kylin compute degenerate dimensions, i.e.,
> dimensions on the fact fable that do not need dimension lookup table. These
> type of these dimension is "Normal" by default but, What is the cost of add
> this dimensions? I guess they are not used on the cuboids concept because
> they are naturally combinated on fact table, so the question are:
>
>- Do they add appreciable complexity to the construction process of
>the cube?
>- Do they affect the query latency over cube built in any way?
>
> Thanks in advance!
> --
>
> *Roberto Tardío Olmos*
> *Senior Big Data & Business Intelligence Consultant*
> Avenida de Brasil, 17
> ,
> Planta 16.28020 Madrid
> Fijo: 91.788.34.10
>


Re: Seperate ZooKeeper nodes when deploy StandAlone Hbase cluster

2017-11-09 Thread Li Yang
Sorry for the late reply. Was very occupied recently.

> Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
different from the main cluster?
> Is this imply that the main cluster & the Hbase cluster should share the
same ZK node?
Looked again. My previous answer confused you. Sorry for that. I thought
you were asking about using 2 HBase clusters, but actually the question was
about r/w separation deployment.

Yes, Kylin can work with 2 clusters. One called read cluster which hosts
HBase and provides query horsepower. Another called write cluster (the main
cluster in the question) which is responsible for cube building. Kylin uses
the Zookeeper of the HBase cluster for its job coordination by default.

When building cube, the write cluster (or main cluster) will write to the
HBase cluster, to create the HBase table and bulk load data. The
kylin.env.hdfs-working-dir should be on the write cluster by design.

In the step "Create HTable", Kylin wrote a partition file based on which a
new HTable is created. That must be the write operation you observed.

Cheers
Yang


On Mon, Oct 30, 2017 at 8:23 PM, Yuxiang Mai  wrote:

> Hi, Li Yang
>
> Thanks for your reply.
>
> > Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
> different from the main cluster?
> No. Kylin only works with 1 HBase and its related Zookeeper.
>
> Is this imply that the main cluster & the Hbase cluster should share the
> same ZK node?
>
> And I have one more question. about the kylin.env.hdfs-working-dir, is
> the HDFS working should placed on main cluster or the Hbase cluster?
>
> Because during building a cube, after Extract Fact Table Distinct Columns
> &  Save Cuboid Statistics, In the step " Create HTable", It means stuck
> and no response for a long time;
> In kylin.log, it seems stuck in this job:
>
> 2017-10-30 20:16:46,730 INFO  [Job e82dca5a-93c6-47ca-a707-674372708b5f-193]
> common.HadoopShellExecutable:59 :  -cubename 123 -segmentid
> 6223ddc9-ac80-4a10-b3c8-33165fe8be4c -partitions hdfs://maincluster/
> kylinworkingdir/kylin_metadata/kylin-e82dca5a-93c6-
> 47ca-a707-674372708b5f/123/rowkey_stats/part-r-0 -statisticsenabled
> true
>
>  In this step, it seems generating hbase table in the HDFS working dir.
> Does it mean the HDFS working dir is on Hbase cluster, not main cluster?
>
> Thanks a lot
>
> Yuxiang MAI
>
>
>
> On Sun, Oct 29, 2017 at 6:41 PM, Li Yang  wrote:
>
>> > Is it OK  to  deploy StandAlone Hbase cluster with seperate Zookeeper
>> different from the main cluster?
>> No. Kylin only works with 1 HBase and its related Zookeeper.
>>
>> > How Kylin get yarn config when submmiting job?
>> Kylin took Hadoop config from classpath. And the most classpath comes
>> from HBase shell.
>>
>> On Wed, Oct 25, 2017 at 4:33 PM, Yuxiang Mai 
>> wrote:
>>
>>> Hi, experts
>>>
>>> We are now deploying standalone Hbase out of the hadoop cluster to
>>> improve the query performance.
>>> http://kylin.apache.org/blog/2016/06/10/standalone-hbase-cluster/
>>>
>>> The new Hbase cluster use seperate zookeeper nodes from the main
>>> cluster. Kylin server can access both the Hbase, hadoop & hive resource.
>>> But in this configuration, cude build failed in the first step:
>>>
>>> There are 3 hive commands in the first step:
>>> DROP TABLE IF EXISTS kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc;
>>>
>>> CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc
>>> ...
>>> INSERT OVERWRITE TABLE kylin_intermediate_test1_ba3c5
>>> 910_ff7d_4669_b28a_4ec2736d60dc SELECT
>>> ..
>>>
>>>
>>> drop & create table are OK, but failed on "insert overwrite" with the
>>> following exception.
>>>
>>>
>>> FAILED: IllegalArgumentException java.net.UnknownHostException:
>>> maincluster
>>>
>>> at org.apache.kylin.common.util.CliCommandExecutor.execute(CliC
>>> ommandExecutor.java:92)
>>> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.createF
>>> latHiveTable(CreateFlatHiveTableStep.java:52)
>>> at org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(
>>> CreateFlatHiveTableStep.java:70)
>>> at org.apache.kylin.job.execution.AbstractExecutable.execute(Ab
>>> stractExecutable.java:124)
>>> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWo
>>> rk(DefaultChainedExecutable.java:64)
>>> at org.apache.kylin.job.execution.AbstractExecutable.execute(Ab
>>> stractExecutable.java:124)
>>> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRun
>>> ner.run(DefaultScheduler.java:142)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1145)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>> It seems MR job are failed to submit to YARN. In our debug, seems job is
>>> not submitted to main cluster.
>>> So my question is:
>>> 1. Is it OK 

Re: QUESTIONS ABOUT BUILD CUBE BY RESTFUL API & SERCURITY WITH LADP

2017-11-09 Thread ShaoFeng Shi
Hi wei, welcome to join Kylin family!

For question 1, it seems you're sending request to "/kylin/api/Cubes",
while the expected is "/kylin/api/cubes"; The 'C' is in lower case, please
change that and try again.

For question 2, is the user "ADMIN" be added to "KYLIN-ADMIN" group? It
should be in the member-list of that group. Please double check. The logs
when the user login can tell more details.

2017-11-10 9:12 GMT+08:00 李巍 :

> Hi shaofeng & all:(后面有中文)
>
> I  use Kylin 2.2.0 SNAPSHOT which install by src from Github , but meet
> two problems:
>
> 1 When i build a cube by restful api , the code as below
>
> curl -b $KYLIN_HOME/path/to/cookiefile.txt -X PUT -H 'Content-Type:
> application/json' -d '{"startTime":'145160610',
> "endTime":'148322850', "buildType":"BUILD"}' http://ip:7070/kylin/api/
> cubes/ZENGLIANG_ZDRY/rebulid
> 
> nothiing has happened , then i check logs , error as below
> No mapping for HTTP request with URL [/kylin/api/Cubes/ZENGLIANG_
> ZDRY/rebulid]
>
> I'm sure about the cubeName , because other apis all response SUCCESS.
>
> 2 When I use ldap security , user ADMIN can't new a project , and button
> for  DataSource load is missing.
> kylin.properties as fellows
>
> ## Spring security profile, options: testing, ldap, saml
>
> ## with "testing" profile, user can use pre-defined name/pwd like
> KYLIN/ADMIN to login
>
> kylin.security.profile=ldap
>
> #
>
> ## Default roles and admin roles in LDAP, for ldap and saml
>
> kylin.security.acl.default-role=ROLE_ANALYST,ROLE_MODELER
>
> kylin.security.acl.admin-role=ROLE_KYLIN_ADMIN
>
> #
>
> ## LDAP authentication configuration
>
> kylin.security.ldap.connection-server=ldap://ip:389
>
> kylin.security.ldap.connection-username=OU=FHZZ,DC=kylin,DC=com
>
> kylin.security.ldap.connection-password=uIS3e+hZQiYh4kFrsyjekA==
>
> #
>
> ## LDAP user account directory;
>
> kylin.security.ldap.user-search-base=OU=USER,DC=kylin,DC=com
>
> kylin.security.ldap.user-search-pattern=(&(cn={0}))
>
> kylin.security.ldap.user-group-search-base=OU=ROLE,DC=kylin,DC=com
>
> #
>
> ## LDAP service account directory
>
> #kylin.security.ldap.service-search-base=
>
> #kylin.security.ldap.service-search-pattern=
>
> #kylin.security.ldap.service-group-search-base=
>
> ldap tree:
>
> THANKS!
> 少峰:
> 您好!
> 我已经用KYLIN三周了,感谢您及团队的贡献!我是从Github上下载源码进行安装的,因为最新的源码提供了指定Hbase表空间的配置功能(正是我所需要的)。然后,我在使用restful
> api (增量)构建cube和ldap用户授权管理时,遇到了问题:
> 1、基于官网文档的操作,restful api 的其他命令都能运行成功,除了(增量)构建cube,报错找不到HTTP URL请求,代码如上;
> 2、使用ldap进行用户权限管理,我可以实现多用户的登录,以及project的授权,但正当我高兴的时候,
> 发现web页面Datasource导入表页面旁边的三个按钮不见了,并且ADMIN用户也不能新建project,提示no
> access。配置文件和ldap树如上。
>
> 谢谢!
>
>


-- 
Best regards,

Shaofeng Shi 史少锋


6BD0D232@5A68E266.1BFD045A
Description: Binary data


51113AFA@78382027.1BFD045A
Description: Binary data


QUESTIONS ABOUT BUILD CUBE BY RESTFUL API & SERCURITY WITH LADP

2017-11-09 Thread 李巍
Hi shaofeng & all:(后面有中文)

I  use Kylin 2.2.0 SNAPSHOT which install by src from Github , but meet two 
problems:


1 When i build a cube by restful api , the code as below

curl -b $KYLIN_HOME/path/to/cookiefile.txt -X PUT -H 'Content-Type: 
application/json' -d '{"startTime":'145160610', "endTime":'148322850', 
"buildType":"BUILD"}' http://ip:7070/kylin/api/cubes/ZENGLIANG_ZDRY/rebulid

nothiing has happened , then i check logs , error as below
No mapping for HTTP request with URL [/kylin/api/Cubes/ZENGLIANG_ZDRY/rebulid]



I'm sure about the cubeName , because other apis all response SUCCESS.


2 When I use ldap security , user ADMIN can't new a project , and button for  
DataSource load is missing.

kylin.properties as fellows

## Spring security profile, options: testing, ldap, saml
 
## with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN 
to login
 
kylin.security.profile=ldap
 
#
 
## Default roles and admin roles in LDAP, for ldap and saml
 
kylin.security.acl.default-role=ROLE_ANALYST,ROLE_MODELER
 
kylin.security.acl.admin-role=ROLE_KYLIN_ADMIN
 
#
 
## LDAP authentication configuration
 
kylin.security.ldap.connection-server=ldap://ip:389
 
kylin.security.ldap.connection-username=OU=FHZZ,DC=kylin,DC=com
 
kylin.security.ldap.connection-password=uIS3e+hZQiYh4kFrsyjekA==
 
#
 
## LDAP user account directory;
 
kylin.security.ldap.user-search-base=OU=USER,DC=kylin,DC=com
 
kylin.security.ldap.user-search-pattern=(&(cn={0}))
 
kylin.security.ldap.user-group-search-base=OU=ROLE,DC=kylin,DC=com
 
#
 
## LDAP service account directory
 
#kylin.security.ldap.service-search-base=
 
#kylin.security.ldap.service-search-pattern=
 
#kylin.security.ldap.service-group-search-base=

ldap tree:



THANKS!
少峰:
您好!
我已经用KYLIN三周了,感谢您及团队的贡献!我是从Github上下载源码进行安装的,因为最新的源码提供了指定Hbase表空间的配置功能(正是我所需要的)。然后,我在使用restful
 api (增量)构建cube和ldap用户授权管理时,遇到了问题:
1、基于官网文档的操作,restful api 的其他命令都能运行成功,除了(增量)构建cube,报错找不到HTTP URL请求,代码如上;
2、使用ldap进行用户权限管理,我可以实现多用户的登录,以及project的授权,但正当我高兴的时候,发现web页面Datasource导入表页面旁边的三个按钮不见了,并且ADMIN用户也不能新建project,提示no
 access。配置文件和ldap树如上。


谢谢!

51113AFA@78382027.1BFD045A
Description: Binary data


6BD0D232@5A68E266.1BFD045A
Description: Binary data


Re: Issues with Kylin with EMR and S3

2017-11-09 Thread ShaoFeng Shi
Thanks Roberto;

I will also try that on tomorrow or this weekend; I had planned to draft a
document for EMR, it's time to do that now.

2017-11-09 19:54 GMT+08:00 Roberto Tardío :

> Hi,
>
> With Kylin 2.1 YARN RM shows one JOB for Step1 was finished with
> successful. But there is no job when step2 get stucked. When we use HDFS as
> working dir this steps works fine and launch a Tez job on YARN RM that
> finish with success (and also all the sample cube build process).
>
> With Kylin 2.2 YARN RM do not show any MR job when Step 1 get stucked.
>
> However we are going to do again the test, maybe due to change kylin
> version from 2.1 to 2.2 we forget to clean some metadata, coprocessor,...
>
> El 09/11/2017 a las 11:10, ShaoFeng Shi escribió:
>
> Hi Robert,
>
> No need to set
> *kylin.storage.hbase.cluster-fs to the same bucket again. *
>
> For the stuck job, did you check YARN RM to see whether there is any
> indicator?
>
>
> 2017-11-09 17:38 GMT+08:00 Roberto Tardío :
>
>> Hi,
>>
>> EMR version is 5.7 and Kylin version is 2.1. We have changed
>> kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not
>> changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could
>> it be because we did not change this *kylin.storage.hbase.cluster-fs 
>> *parameter
>> to S3?
>>
>> We have tried also with the last versión of Kylin (2.2). In this case
>> when build job start the first step get stucked with no errors or warns in
>> log files. Maybe we are doing something wrong. We are going to try tomorrow
>> setting *kylin.storage.hbase.cluster-fs *to S3.
>>
>> Others details about abour our architecture are:
>>
>>- Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with
>>Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
>>- EMR 5.7 cluster (1 master and 4 cores)
>>- HBase on S3
>>   - Hive warehouse on S3 and metastore configured on MySQL in the
>>   ec2 machine (the same where Kylin runs)
>>   - HDFS
>>   - S3 with EMRFS
>>   - Zookeeper.
>>
>> I will give you feedback about tomorrow new tests.
>>
>> Many thanks ShaoFeng!
>>
>> El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:
>>
>> Hi Roberto,
>>
>> What's your EMR version? I know that in 4.x version, EMR's Hive has a
>> problem with "insert overwrite" over S3, that is just what Kylin need in
>> the "redistribute flat hive table" step. You can also skip the
>> "redistribute" step by setting "kylin.source.hive.redistribut
>> e-flat-table=false" in kylin.properties.  (On EMR 5.7, there is no such
>> issue).
>>
>> The second option is, set "kylin.env.hdfs-working-dir" to local HDFS,
>> and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on
>> S3). Kylin will build the cube on HDFS and then output HFile to S3, and
>> finally load to HBase on S3. This will gain better build performance and
>> also ensure Cube data in S3 for high availability and durability. But if
>> you stop EMR, the intermediate cuboid files will be lost, which cause
>> segments couldn't be merged.
>>
>> The third option is to use a newer version like EMR 5.7,  use S3 as the
>> working dir (and HBase also on S3).
>>
>> For all the scenarios, please use Kylin v2.2, which includes the fix of
>> KYLIN-2788.
>>
>>
>>
>>
>>
>> 2017-11-09 3:45 GMT+08:00 Roberto Tardío :
>>
>>> Hi,
>>>
>>> We have deployed Kylin on ec2 machine using an EMR cluster. After adding
>>> the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have
>>> succesfully build sample cube. However, kylin data is stored on hdfs path
>>> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased
>>> if you Terminate the cluster (e.g. to save costs of use, to change the kind
>>> of instances,...), we have to store data on S3.
>>>
>>> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3,
>>> like s3://your-bucket/kylin. But after this change if we try to build
>>> sample cube, the build job starts but it gets stuck in step 2 "Redistribute
>>> Flat Hive Table". We have checked that this step never start and kylin logs
>>> do not show any error or warn.
>>>
>>> Do you have any idea how to solve this and make possible that Kylin
>>> works with S3?
>>>
>>> So far the only solution we have found is to copy the HDFS folder to S3
>>> before terminate the EMR cluster and copy it from S3 to HDFS when it is
>>> turned on. However this is a half solution, since the HDFS storage of EMR
>>> is ephemeral and we do not have as much space available as in S3. Which
>>> data stores kylin on kylin path? HBase tables are stored in this folder?
>>>
>>> We will appreciate you help,
>>>
>>> Roberto
>>> --
>>>
>>> *Roberto Tardío Olmos*
>>> *Senior Big Data & Business Intelligence Consultant*
>>> Avenida de Brasil, 17
>>> ,
>>> Planta 16.28020 Madrid
>>> Fijo: 91.788.34.10
>>>
>>
>>
>>
>> --

Re: Kylin 2.1.0 new features than old versions.

2017-11-09 Thread prasanna lakshmi
Ok thank you for your suggestion. Please let us know after its merging the
code.


Re: Kylin 2.1.0 new features than old versions.

2017-11-09 Thread Billy Liu
I will suggest waiting a few days. I know the bug has been fixed recently.
But the code has not merged into master yet.

2017-11-09 14:29 GMT+08:00 Prasanna :

> Hi all,
>
>
>
> Present I am using kylin 1.6.0 . If  anybody is using kylin latest version
> 2.X ,can you please give me what are the new features  available than old
> versions. In  kylin 1.6.0 I am facing Holes between segments while merging
> problem. Is this problem will solve in new versions? Am I able to merge
> segments with holes also. Please suggest me in this regarding.
>


Re: Issues with Kylin with EMR and S3

2017-11-09 Thread Roberto Tardío

Hi,

With Kylin 2.1 YARN RM shows one JOB for Step1 was finished with 
successful. But there is no job when step2 get stucked. When we use HDFS 
as working dir this steps works fine and launch a Tez job on YARN RM 
that finish with success (and also all the sample cube build process).


With Kylin 2.2 YARN RM do not show any MR job when Step 1 get stucked.

However we are going to do again the test, maybe due to change kylin 
version from 2.1 to 2.2 we forget to clean some metadata, coprocessor,...



El 09/11/2017 a las 11:10, ShaoFeng Shi escribió:

Hi Robert,

No need to set *kylin.storage.hbase.cluster-fs to the same bucket again.
*
*
*
For the stuck job, did you check YARN RM to see whether there is any 
indicator?



2017-11-09 17:38 GMT+08:00 Roberto Tardío >:


Hi,

EMR version is 5.7 and Kylin version is 2.1. We have changed
kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have
not changed **kylin.storage.hbase.cluster-fs to the same S3
bucket*. Could it be because we did not change this
*kylin.storage.hbase.cluster-fs *parameter to S3?

We have tried also with the last versión of Kylin (2.2). In this
case when build job start the first step get stucked with no
errors or warns in log files. Maybe we are doing something wrong.
We are going to try tomorrow setting
*kylin.storage.hbase.cluster-fs *to S3.

Others details about abour our architecture are:

  * Kylin 2.1 (also tried with 2.2) on a separated ec2 machine,
with Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
  * EMR 5.7 cluster (1 master and 4 cores)
  o HBase on S3
  o Hive warehouse on S3 and metastore configured on MySQL in
the ec2 machine (the same where Kylin runs)
  o HDFS
  o S3 with EMRFS
  o Zookeeper.

I will give you feedback about tomorrow new tests.

Many thanks ShaoFeng!


El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:

Hi Roberto,

What's your EMR version? I know that in 4.x version, EMR's Hive
has a problem with "insert overwrite" over S3, that is just what
Kylin need in the "redistribute flat hive table" step. You can
also skip the "redistribute" step by setting
"kylin.source.hive.redistribute-flat-table=false" in
kylin.properties.  (On EMR 5.7, there is no such issue).

The second option is, set "kylin.env.hdfs-working-dir" to local
HDFS, and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase
data also on S3). Kylin will build the cube on HDFS and then
output HFile to S3, and finally load to HBase on S3. This will
gain better build performance and also ensure Cube data in S3 for
high availability and durability. But if you stop EMR, the
intermediate cuboid files will be lost, which cause segments
couldn't be merged.

The third option is to use a newer version like EMR 5.7,  use S3
as the working dir (and HBase also on S3).

For all the scenarios, please use Kylin v2.2, which includes the
fix of KYLIN-2788.




2017-11-09 3:45 GMT+08:00 Roberto Tardío
>:

Hi,

We have deployed Kylin on ec2 machine using an EMR cluster.
After adding the "hbase.zookeeper.quorum" property to
kylin_job_conf.xml, we have succesfully build sample cube.
However, kylin data is stored on hdfs path /kylin. Due to the
HDFS is ephemeral storage on EMR and it will be erased if you
Terminate the cluster (e.g. to save costs of use, to change
the kind of instances,...), we have to store data on S3.

With this aim we changed 'kylin.env.hdfs-working-dir'
property to s3, like s3://your-bucket/kylin. But after this
change if we try to build sample cube, the build job starts
but it gets stuck in step 2 "Redistribute Flat Hive Table".
We have checked that this step never start and kylin logs do
not show any error or warn.

Do you have any idea how to solve this and make possible that
Kylin works with S3?

So far the only solution we have found is to copy the HDFS
folder to S3 before terminate the EMR cluster and copy it
from S3 to HDFS when it is turned on. However this is a half
solution, since the HDFS storage of EMR is ephemeral and we
do not have as much space available as in S3. Which data
stores kylin on kylin path? HBase tables are stored in this
folder?

We will appreciate you help,

Roberto

-- 


*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17
,
Planta 16.28020 Madrid
Fijo: 91.788.34.10




-- 
Best regards,



Re: Issues with Kylin with EMR and S3

2017-11-09 Thread ShaoFeng Shi
Hi Robert,

No need to set
*kylin.storage.hbase.cluster-fs to the same bucket again.*

For the stuck job, did you check YARN RM to see whether there is any
indicator?


2017-11-09 17:38 GMT+08:00 Roberto Tardío :

> Hi,
>
> EMR version is 5.7 and Kylin version is 2.1. We have changed
> kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not
> changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could it
> be because we did not change this *kylin.storage.hbase.cluster-fs *parameter
> to S3?
>
> We have tried also with the last versión of Kylin (2.2). In this case when
> build job start the first step get stucked with no errors or warns in log
> files. Maybe we are doing something wrong. We are going to try tomorrow
> setting *kylin.storage.hbase.cluster-fs *to S3.
>
> Others details about abour our architecture are:
>
>- Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with
>Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
>- EMR 5.7 cluster (1 master and 4 cores)
>- HBase on S3
>   - Hive warehouse on S3 and metastore configured on MySQL in the ec2
>   machine (the same where Kylin runs)
>   - HDFS
>   - S3 with EMRFS
>   - Zookeeper.
>
> I will give you feedback about tomorrow new tests.
>
> Many thanks ShaoFeng!
>
> El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:
>
> Hi Roberto,
>
> What's your EMR version? I know that in 4.x version, EMR's Hive has a
> problem with "insert overwrite" over S3, that is just what Kylin need in
> the "redistribute flat hive table" step. You can also skip the
> "redistribute" step by setting "kylin.source.hive.
> redistribute-flat-table=false" in kylin.properties.  (On EMR 5.7, there
> is no such issue).
>
> The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, and
> "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also on S3).
> Kylin will build the cube on HDFS and then output HFile to S3, and finally
> load to HBase on S3. This will gain better build performance and also
> ensure Cube data in S3 for high availability and durability. But if you
> stop EMR, the intermediate cuboid files will be lost, which cause segments
> couldn't be merged.
>
> The third option is to use a newer version like EMR 5.7,  use S3 as the
> working dir (and HBase also on S3).
>
> For all the scenarios, please use Kylin v2.2, which includes the fix of
> KYLIN-2788.
>
>
>
>
>
> 2017-11-09 3:45 GMT+08:00 Roberto Tardío :
>
>> Hi,
>>
>> We have deployed Kylin on ec2 machine using an EMR cluster. After adding
>> the "hbase.zookeeper.quorum" property to kylin_job_conf.xml, we have
>> succesfully build sample cube. However, kylin data is stored on hdfs path
>> /kylin. Due to the HDFS is ephemeral storage on EMR and it will be erased
>> if you Terminate the cluster (e.g. to save costs of use, to change the kind
>> of instances,...), we have to store data on S3.
>>
>> With this aim we changed 'kylin.env.hdfs-working-dir' property to s3,
>> like s3://your-bucket/kylin. But after this change if we try to build
>> sample cube, the build job starts but it gets stuck in step 2 "Redistribute
>> Flat Hive Table". We have checked that this step never start and kylin logs
>> do not show any error or warn.
>>
>> Do you have any idea how to solve this and make possible that Kylin works
>> with S3?
>>
>> So far the only solution we have found is to copy the HDFS folder to S3
>> before terminate the EMR cluster and copy it from S3 to HDFS when it is
>> turned on. However this is a half solution, since the HDFS storage of EMR
>> is ephemeral and we do not have as much space available as in S3. Which
>> data stores kylin on kylin path? HBase tables are stored in this folder?
>>
>> We will appreciate you help,
>>
>> Roberto
>> --
>>
>> *Roberto Tardío Olmos*
>> *Senior Big Data & Business Intelligence Consultant*
>> Avenida de Brasil, 17
>> ,
>> Planta 16.28020 Madrid
>> Fijo: 91.788.34.10
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
> --
>
> *Roberto Tardío Olmos*
> *Senior Big Data & Business Intelligence Consultant*
> Avenida de Brasil, 17
> ,
> Planta 16.28020 Madrid
> Fijo: 91.788.34.10
>



-- 
Best regards,

Shaofeng Shi 史少锋


Re: Issues with Kylin with EMR and S3

2017-11-09 Thread Roberto Tardío

Hi,

EMR version is 5.7 and Kylin version is 2.1. We have changed 
kylin.env.hdfs-working-dir to s3://your-bucket/kylin but *we have not 
changed **kylin.storage.hbase.cluster-fs to the same S3 bucket*. Could 
it be because we did not change this *kylin.storage.hbase.cluster-fs 
*parameter to S3?


We have tried also with the last versión of Kylin (2.2). In this case 
when build job start the first step get stucked with no errors or warns 
in log files. Maybe we are doing something wrong. We are going to try 
tomorrow setting *kylin.storage.hbase.cluster-fs *to S3.

**

Others details about abour our architecture are:

 * Kylin 2.1 (also tried with 2.2) on a separated ec2 machine, with
   Hadoop CLI for EMR and access to HDFS (EMR ephemeral) and S3.
 * EMR 5.7 cluster (1 master and 4 cores)
 o HBase on S3
 o Hive warehouse on S3 and metastore configured on MySQL in the
   ec2 machine (the same where Kylin runs)
 o HDFS
 o S3 with EMRFS
 o Zookeeper.

I will give you feedback about tomorrow new tests.

Many thanks ShaoFeng!


El 09/11/2017 a las 1:12, ShaoFeng Shi escribió:

Hi Roberto,

What's your EMR version? I know that in 4.x version, EMR's Hive has a 
problem with "insert overwrite" over S3, that is just what Kylin need 
in the "redistribute flat hive table" step. You can also skip the 
"redistribute" step by setting 
"kylin.source.hive.redistribute-flat-table=false" in kylin.properties. 
 (On EMR 5.7, there is no such issue).


The second option is, set "kylin.env.hdfs-working-dir" to local HDFS, 
and "kylin.storage.hbase.cluster-fs" to a S3 bucket (HBase data also 
on S3). Kylin will build the cube on HDFS and then output HFile to S3, 
and finally load to HBase on S3. This will gain better build 
performance and also ensure Cube data in S3 for high availability and 
durability. But if you stop EMR, the intermediate cuboid files will be 
lost, which cause segments couldn't be merged.


The third option is to use a newer version like EMR 5.7, use S3 as the 
working dir (and HBase also on S3).


For all the scenarios, please use Kylin v2.2, which includes the fix 
of KYLIN-2788.





2017-11-09 3:45 GMT+08:00 Roberto Tardío >:


Hi,

We have deployed Kylin on ec2 machine using an EMR cluster. After
adding the "hbase.zookeeper.quorum" property to
kylin_job_conf.xml, we have succesfully build sample cube.
However, kylin data is stored on hdfs path /kylin. Due to the HDFS
is ephemeral storage on EMR and it will be erased if you Terminate
the cluster (e.g. to save costs of use, to change the kind of
instances,...), we have to store data on S3.

With this aim we changed 'kylin.env.hdfs-working-dir' property to
s3, like s3://your-bucket/kylin. But after this change if we try
to build sample cube, the build job starts but it gets stuck in
step 2 "Redistribute Flat Hive Table". We have checked that this
step never start and kylin logs do not show any error or warn.

Do you have any idea how to solve this and make possible that
Kylin works with S3?

So far the only solution we have found is to copy the HDFS folder
to S3 before terminate the EMR cluster and copy it from S3 to HDFS
when it is turned on. However this is a half solution, since the
HDFS storage of EMR is ephemeral and we do not have as much space
available as in S3. Which data stores kylin on kylin path? HBase
tables are stored in this folder?

We will appreciate you help,

Roberto

-- 


*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17
,
Planta 16.28020 Madrid
Fijo: 91.788.34.10




--
Best regards,

Shaofeng Shi 史少锋



--

*Roberto Tardío Olmos*

/Senior Big Data & Business Intelligence Consultant/
Avenida de Brasil, 17, Planta 16.28020 Madrid
Fijo: 91.788.34.10