Hive table design (multiple fact tables or rolled up)

2017-03-08 Thread Sonny Heer
Hi I'm somewhat new to Kylin. we have a relational db schema imported into hive as is at the moment. The schema is highly normalized with lots of tables. I can see this database having multiple fact tables or a handful of fact tables. In Kylin I see when creating a model (star) you have the opt

Re: Hive table design (multiple fact tables or rolled up)

2017-03-08 Thread Sonny Heer
from Kylin perspective which is better. have one-to-many in a single table or some normalized form? On Wed, Mar 8, 2017 at 4:24 PM, Billy Liu wrote: > please check star schema first: https://en.wikipedia.org/wiki/Star_schema > > 2017-03-08 12:48 GMT-08:00 Sonny Heer : > >> Hi I

Re: Hive table design (multiple fact tables or rolled up)

2017-03-08 Thread Sonny Heer
to clarify in my use case the data can be organized to either have a couple fact tables or a large single one. Queries are open ended at this point. queries may cross facts or may not. On Wed, Mar 8, 2017 at 5:13 PM, Sonny Heer wrote: > Let me put it anther way. assume a SALES table an

Re: Hive table design (multiple fact tables or rolled up)

2017-03-09 Thread Sonny Heer
2.x, it supports multiple Fact tables in one Cube; You don't > need create additional view or flat table, just use the original names. > > 2017-03-09 9:45 GMT+08:00 Sonny Heer : > >> to clarify in my use case the data can be organized to either have a >> couple fact ta

Re: Hive table design (multiple fact tables or rolled up)

2017-03-13 Thread Sonny Heer
How about Hive view fed into kylin vs materialized table...performance impact? On Fri, Mar 10, 2017 at 1:34 AM, ShaoFeng Shi wrote: > One Cube is one topic, all activities like build and ACL are managed by > cube; one query should only hit one cube; acrossing cube query isn't > suggested. > -

Re: Hive table design (multiple fact tables or rolled up)

2017-03-15 Thread Sonny Heer
wrote: > View might be slower, but it is flexible. This is a tradeoff. > > 2017-03-14 4:15 GMT+08:00 Sonny Heer : > >> How about Hive view fed into kylin vs materialized table...performance >> impact? >> >> On Fri, Mar 10, 2017 at 1:34 AM, ShaoFeng Shi >> wrot

Re: Hive table design (multiple fact tables or rolled up)

2017-03-16 Thread Sonny Heer
o update existing cube since the underline data > changed, please click "Refresh" on the Cube. > > 2017-03-15 13:23 GMT-07:00 Sonny Heer : > >> Thanks ShaoFeng, >> >> views is what we are doing now. How does kylin handle >> new/updated/deleted records?

Drill integration ?

2017-03-16 Thread Sonny Heer
Is there any support for Apache Drill within Kylin? Any tickets for work done around this yet? Thanks

Re: Drill integration ?

2017-03-16 Thread Sonny Heer
tical queries) with more coverage (drill).(e.g. cases of no realization found or cross cube join)... On Thu, Mar 16, 2017 at 12:20 PM, Billy Liu wrote: > Hi Heer, > > May I ask more more for this proposal, the benefits or use case? > > 2017-03-16 7:58 GMT-07:00 Sonny Heer : >

Create Intermediate Flat Hive Table slow

2017-05-19 Thread Sonny Heer
Kylin users, We use hive views to feed kylin. This worked fine until we added more tables to the hive join. Now Kylin never finishes from Step#1. I understand a really large view will take time, but which properties should I look at in order to provide more resources for this step? --

Error using GlobalDictionary

2017-06-03 Thread Sonny Heer
Kylin version 1.6.0 Our data has High cardinality columns that require count distinct measures. Therefore using GlobalDictionary. The .index file exists on HDFS, but kylin errors out with FileNotFound exception (see below). RowKey is set to "dict". Any ideas if this is a known issue or somethi

Re: Error using GlobalDictionary

2017-06-04 Thread Sonny Heer
Any ideas on this? Not sure ,but appears the dictionary is looking on local FS vs HDFS? ...what am i missing here? On Sat, Jun 3, 2017 at 9:05 PM, Sonny Heer wrote: > Kylin version 1.6.0 > > Our data has High cardinality columns that require count distinct > measures. The

Re: Error using GlobalDictionary

2017-06-04 Thread Sonny Heer
of HDFS. You need > check whether the proper core-site.xml (with hdfs as default file system) > is used by kylin. Or you can upgrade to kylin 2.0 to see whether it works. > > 2017-06-05 11:25 GMT+08:00 Sonny Heer : > >> Any ideas on this? Not sure ,but appears the dictionary i

Re: Error using GlobalDictionary

2017-06-04 Thread Sonny Heer
overs/adds core-site to the classpath. I'm a little surprised no one has run into this yet...am i missing something? On Sun, Jun 4, 2017 at 10:28 PM, ShaoFeng Shi wrote: > @kangkaisen, kaisen, any idea about this error? > > 2017-06-05 13:00 GMT+08:00 Sonny Heer : > >> wher

Re: Error using GlobalDictionary

2017-06-05 Thread Sonny Heer
Does KYLIN-2192 fix this? Anyone run into this? On Sun, Jun 4, 2017 at 10:33 PM, Sonny Heer wrote: > Looks like a new hadoop conf is initialized and the hadoop FileSystem > object is used after that: > > Configuration conf = new Configuration(); > > (FileSystem.get(file

Re: Error using GlobalDictionary

2017-06-05 Thread Sonny Heer
t > find problem. In Meituan.com, there are many Cubes using the > GlobalDictionary to implement the concice distinct count, and runs well. I > still suggest you check the environment configurations. > > 2017-06-05 21:31 GMT+08:00 Sonny Heer : > >> Does KYLIN-2192 fix this? An

Re: Error using GlobalDictionary

2017-06-05 Thread Sonny Heer
is used when run Kylin. You can check whether > your *-site.xml are in the classpath and in a front position. > > 2017-06-05 22:18 GMT+08:00 Sonny Heer : > >> okay - checking. Is he running on HDP 2.4 and 1.6.0 version of kylin? >> Where does kylin add core-site to CP? in

Re: Error using GlobalDictionary

2017-06-05 Thread Sonny Heer
:26 AM, ShaoFeng Shi wrote: > Hi Sonny, I need more info: > 1) where you see this error trace, in kylin.log or in Mapreduce's log? > 2) what's the configuration of "kylin.hdfs.working.dir" in > conf/kylin.properties? > > 2017-06-05 23:58 GMT+08:00 Sonny Heer

AppendTrieDictionary with GlobalDictionary 1.6

2017-06-21 Thread Sonny Heer
After finally getting the global dictionary to work with building the cube there are now exceptions during query. ERROR in query: "AppendTrieDictionary can't retrive value from id" Here is where it ends up in the code::: -> @Override final protected T getValueFromIdImpl(int id) {

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-21 Thread Sonny Heer
ionary, as it can only encode a > String to an integer, it doesn't support decode the String from an integer. > The main usage for GlobalDictionary is the precise Count Distinct, as > bitmap only accepts integer as input, so Kylin use the GD to do the > conversion. > > 2017

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-21 Thread Sonny Heer
how best kylin can handle this? should I remove it as GD and add as dim & fix length? On Wed, Jun 21, 2017 at 10:33 PM, Sonny Heer wrote: > Hi, > > No, not as a dimension. Only for Count distinct measures. > > > On Wed, Jun 21, 2017 at 10:25 PM, ShaoFeng Shi > wrot

Usage of aggregation groups

2017-06-22 Thread Sonny Heer
Hi users, I need some clarification on how to properly use aggregation groups. Assume I have report page 1 which has filters A, B, C, D. When user is in page 2, these filters are passed along to (drilldown). Page 2 has other filterable fields (1,2,3), but each is independently connected only to

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-22 Thread Sonny Heer
ct size will beyond Java heap size. In this case, please use > fixed_length encoding; If that column is integer or long type, you can use > "integer" encoding. In the meanwhile, keep using GD for the count distinct > measure. > > 2017-06-22 13:37 GMT+08:00 Sonny Heer : >

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-22 Thread Sonny Heer
; as the encoding in the dimension, > and leave blank for the global dictionary. > > 2017-06-23 6:30 GMT+08:00 Sonny Heer : > >> Thanks ShaoFeng. >> >> so to clarify. for UHC dimension. It is integer. So i can set encoding >> to integer and then also include it

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-22 Thread Sonny Heer
"JSON(Cube)" tab. > > 2017-06-23 8:48 GMT+08:00 Sonny Heer : > >> The column has count distinct measure as well. so it still doesn't need >> GD? i tried, but appears it ran out of memory. >> >> On Thu, Jun 22, 2017 at 5:36 PM, ShaoFeng Shi >>

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-22 Thread Sonny Heer
quot;last_modified_time":1498183660303},"dictionary_class":null,"cardinality":0} On Thu, Jun 22, 2017 at 10:47 PM, ShaoFeng Shi wrote: > Seems Kylin still trying to build dictionary for the UHC dimension. Could > you double check the dimension encoding

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-22 Thread Sonny Heer
It's a dimension and count distinct measure. No GD On Thu, Jun 22, 2017 at 11:27 PM ShaoFeng Shi wrote: > Does the "USER_ID" column appear in other measures? > > 2017-06-23 13:57 GMT+08:00 Sonny Heer : > >> It is set to this: >> >

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-23 Thread Sonny Heer
tail metadata for trouble shooting, that is important for > analysis; otherwise we can only guess, but there are many possiblilies > cause a problem... > > 2017-06-23 14:31 GMT+08:00 Sonny Heer : > >> It's a dimension and count distinct measure. No GD >> >>

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-23 Thread Sonny Heer
Another question. Is there any way to set properties per step in cube building? On Fri, Jun 23, 2017 at 6:56 AM, Sonny Heer wrote: > Yeah...it is. making it fix length doesn't require dict. I thought it > was int in hive, but yah its bigint. It got past that, but is now stuck

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-23 Thread Sonny Heer
If it is fix length dimension and also a count distinct measure. the hive type is bigint. then should that be building a dictionary or not? On Fri, Jun 23, 2017 at 11:08 AM, Sonny Heer wrote: > Another question. Is there any way to set properties per step in cube > building? > >

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-23 Thread Sonny Heer
12 PM, Sonny Heer wrote: > If it is fix length dimension and also a count distinct measure. the hive > type is bigint. then should that be building a dictionary or not? > > On Fri, Jun 23, 2017 at 11:08 AM, Sonny Heer wrote: > >> Another question. Is there any way to set p

Re: AppendTrieDictionary with GlobalDictionary 1.6

2017-06-26 Thread Sonny Heer
nct >> measure. then it will fall into the second for loop and it is a column >> that needs a dictionary because its bitmap (getcolumnsNeedDictionary >> method). >> >> >> It is still running out of java heap. where is this actually running? >> is it on the

Re: Usage of aggregation groups

2017-07-03 Thread Sonny Heer
deed. > > On Thu, Jun 22, 2017 at 10:49 PM, Sonny Heer wrote: > >> Hi users, >> >> I need some clarification on how to properly use aggregation groups. >> >> Assume I have report page 1 which has filters A, B, C, D. When user is >> in page 2, these filter

Move Kylin cube from one instance to another

2017-08-28 Thread Sonny Heer
Whats the best way to move kylin cube environment1 to environment2? e.g. Hbase1 -> Hbase2. I know HBase has tools for hbase part, more interested in kylin. Also I'm not talking about the cube desc, etc. I understand Kylin has the rest call for loading cube desc. Looking to avoid building the c

Kylin acl - ldap

2017-09-16 Thread Sonny Heer
Kylin versions is 1.6 Is there a way to give full access to a project? Currently we are able to give access to a project via ROLE in ldap, but that doesn't allow user to sync/load hive tables (the blue buttons are missing). Also unable to edit model. In order to give that permission we have to

Re: Kylin acl - ldap

2017-09-18 Thread Sonny Heer
have system wide impact. > > Once KYLIN-2717 <https://issues.apache.org/jira/browse/KYLIN-2717> is > done, tables are isolated by project, we will be ready to grant table > permissions to project level admin. > > On Sun, Sep 17, 2017 at 6:23 AM, Sonny Heer wrote: > >>

Re: Kylin acl - ldap

2017-11-15 Thread Sonny Heer
Any updates on this issue? Has it been fixed in later versions? we are on 1.6 On Fri, Sep 22, 2017 at 5:34 PM, Li Yang wrote: > The JIRA is good. Thanks Sonny! > > On Tue, Sep 19, 2017 at 8:46 AM, Sonny Heer wrote: > >> Here is the JIRA: https://issues.apache.org/jira

incorrect cardinality

2017-12-06 Thread Sonny Heer
We have a table in hive which has a gender column (char(1)). The group by shows the following: M 8946041 8 9 F 14215364 215400 Kylin shows: 10 GENDER char(1) 274693 Looking at the HiveColumnCardinalityJob code I don't see anything obviously wrong. Any idea why that value is wrong in th

Different backend storage (columnar)

2017-12-08 Thread Sonny Heer
We are using Kylin 1.6 for a while now. One the problems we continue to run into is having to maintain HBase backend. Typically regionservers go down for different reasons. We prefer to move to a columnar storage backend. I heard there was a version of kylin that replaced HBase? Any updates on

Re: Different backend storage (columnar)

2017-12-08 Thread Sonny Heer
he vendor > products. > > 2017-12-09 5:43 GMT+08:00 Sonny Heer : > >> We are using Kylin 1.6 for a while now. One the problems we continue to >> run into is having to maintain HBase backend. Typically regionservers go >> down for different reasons. >> >> We pr

Extract Fact Table Distinct Columns Step

2017-12-19 Thread Sonny Heer
can someone explain what step 3 does? specifically how it relates dimensions, measures, and row keys. our input fact table is abou 234 million records and this step is taking forever. we have 450gb memory with 25 slots per node, which is about 225 concurrently running slots, and its still taking

EMR to Persistent cluster

2017-12-19 Thread Sonny Heer
Has anyone used kylin in EMR and push data to S3 and finally down to persistent cluster (e.g. ec2/ambari/HDFS)? How would kylin map HBase tables to kylin project/cube? Thanks

Re: Extract Fact Table Distinct Columns Step

2017-12-19 Thread Sonny Heer
reducers; Are all mapper/reducers take a similar > time, or some specific took much longer than others? > > Furthermore, for deep div, please provide the cube definition; We need to > know the dimension number, aggregation groups, encodings method as well as > other possible facto

Multiple versions on same cluster

2018-01-12 Thread Sonny Heer
Kylin users, Is it possible to run new version of kylin 2.x along side old version 1.6.x ? Thanks

Re: Multiple versions on same cluster

2018-01-12 Thread Sonny Heer
ot of ram memory available on the machine. > > Regards, > > El 12/01/2018 a las 18:10, Sonny Heer escribió: > > Kylin users, > > Is it possible to run new version of kylin 2.x along side old version > 1.6.x ? > > Thanks > > > -- > > *Roberto Tardío Olm

Re: running spark on kylin 2.2

2018-02-28 Thread Sonny Heer
gt; > > 2018-02-28 22:53 GMT+08:00 Sonny Heer : > > Anyone know what I need to set in order for spark-submit to use the HDP > > version of spark and not the internal one? > > > > currently i see: > > > > export HADOOP_CONF_DIR=/ebs/kylin/hadoop-conf &am

Re: running spark on kylin 2.2

2018-02-28 Thread Sonny Heer
I don't see spark-libs.jar under $KYLIN_HOME/spark/jars per this doc: http://kylin.apache.org/docs21/tutorial/cube_spark.html On Wed, Feb 28, 2018 at 10:30 AM, Sonny Heer wrote: > Hi Billy > Looks like the current error is this: > > Error: Could not find o

Re: running spark on kylin 2.2

2018-02-28 Thread Sonny Heer
text.table(Ljava/lang/String;)Lorg/apache/spark/sql/Dataset; at org.apache.kylin.engine.spark.SparkCubingByLayer.execute(SparkCubingByLayer.java:167 It appears it only supports spark 2.x? Please advise what we can do to make this work on HDP 2.4... Thanks On Wed, Feb 28, 2018 at 2:07 PM, Sonny H

Re: running spark on kylin 2.2

2018-02-28 Thread Sonny Heer
> >> Please use vendor's forum. >> >> Thanks >> >> Original message >> From: Sonny Heer >> Date: 2/28/18 2:35 PM (GMT-08:00) >> To: user@kylin.apache.org >> Subject: Re: running spark on kylin 2.2 >> >&g

#3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Sonny Heer
Step 3 isn't using our full cluster. How can i increase the mappers/reducers to use all the slots? Any config to look at in kylin? Thanks

Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Sonny Heer
YARN is properly configured. we use many other m/r and spark programs that utilize the full slots. It's only when building cubes. On Wed, Mar 14, 2018 at 9:46 AM, Alberto Ramón wrote: > You need check your yarn configuration first > > On Wed, 14 Mar 2018, 14:58 Sonny Heer, wr

Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Sonny Heer
8 YARN nodes with 11 slots each. each slot is configured to ~2gb. Step #3 in Kylin is launching 19 mappers and 5 reducers. 5 reducers when there are 88 slots. btw: kylin version is 1.6 On Wed, Mar 14, 2018 at 9:48 AM, Sonny Heer wrote: > YARN is properly configured. we use many other

Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Sonny Heer
n 14 March 2018 at 16:54, Sonny Heer wrote: > >> 8 YARN nodes with 11 slots each. each slot is configured to ~2gb. Step >> #3 in Kylin is launching 19 mappers and 5 reducers. 5 reducers when there >> are 88 slots. >> >> btw: kylin version is 1.6 >> >&g

fast cubing 1.6?

2018-03-27 Thread Sonny Heer
Any reason why fast cubing was removed in 1.6? I see the following public enum AlgorithmEnum { LAYER, INMEM }

error 9.75 to precisely

2018-04-07 Thread Sonny Heer
Does changing only the count distinct measure from HLL to precisely make the query slower? With HLL some of our queries were sub-second, but after moving to precisely - the same queries are slow. Is this expected? How to fix? Thanks

job failures

2018-04-11 Thread Sonny Heer
we have a daily job that builds cubes. It works fine for some number of days, but at times fails with this: Error Log: org.apache.kylin.job.exception.ExecuteException: java.lang.IllegalStateException: Overwriting conflict /execute_output/a1052507-e3bc-4302-ac73-bbc169a597ff-07, expect old TS 152

Re: job failures

2018-04-12 Thread Sonny Heer
3 nodes are set to "query" and 1 is set to "all". kylin 1.6 kylin.server.mode=query kylin.server.mode=all On Wed, Apr 11, 2018 at 10:13 PM, ShaoFeng Shi wrote: > Do you have multiple Kylin "job" instances? > > 2018-04-12 12:35 GMT+08:00 Sonny Heer

kylin 1.6 alongside 2.0

2018-04-13 Thread Sonny Heer
The latest version of kylin has various properties to override the prefix for hbase table name and zookeeper locations. Our stack is on spark 1.6 - will installing Kylin 2.0 on the same cluster as 1.6 cause any issues (HBase metadata / tables etc.)? Note: those prefix properties do not appear to

Re: kylin 1.6 alongside 2.0

2018-04-13 Thread Sonny Heer
Yes we currently have 1.6.0 Hbase1x. Would like to test things on same cluster with 2.0 On Fri, Apr 13, 2018 at 9:02 AM Ted Yu wrote: > bq. Our stack is on spark 1.6 > > Did you mean that you have Kylin 1.6 in your cluster ? > > Cheers > > On Fri, Apr 13, 2018 at 8:48

kylin metadata on single server

2018-04-17 Thread Sonny Heer
Not sure if this is normal or not, but I see kylin metadata is on a single region server (DN & RS on node). if this datanode goes down... it appears kylin isn't able to pull jobs for monitor or complete jobs? hbase requests: kylin_metadata,,1467291011009.2cc83fc3fb51700a8a9884e5c5401e20. 553455

Re: kylin metadata on single server

2018-04-17 Thread Sonny Heer
OK it does move to another RegionServer. we're doing more testing, but it appears DN that hosts the kylin_metadata goes down sometimes. Sometimes the same job succeeds... On Tue, Apr 17, 2018 at 10:36 AM, Sonny Heer wrote: > Not sure if this is normal or not, but I see kylin metadata

Re: kylin metadata on single server

2018-04-17 Thread Sonny Heer
d=713832, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204) ... 22 more On Tue, Apr 17, 2018 at 10:44 AM, Sonny Heer wrote: > OK it does move to another RegionServer. we're doing

multiple EMRs sync

2018-08-02 Thread Sonny Heer
Is it possible in the new version of kylin to have multiple EMR clusters with Kylin installed on master node but talking to the same S3 location. e.g. one Write EMR cluster and one Read EMR cluster ?

Re: multiple EMRs sync

2018-08-03 Thread Sonny Heer
e approach, but that is prone to errors as emr libs have to copied around.. ref: https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ Anyone else have experience or can share their use case on emr? Thanks! On Thu, Aug 2, 2018 at 2:32 PM Sonny Heer wr

Re: multiple EMRs sync

2018-08-05 Thread Sonny Heer
: > Hi Sonny, > > EMR HBase read replica is a great feature, but we didn't try. Are you > going to using this feature? or just want to deploy Kylin as a cluster? > > If putting Kylin metadata to RDS, can it be easier for you? > > 2018-08-04 0:05 GMT+08:00 Sonny Heer : > &g

Re: multiple EMRs sync

2018-08-06 Thread Sonny Heer
ot tested with EMR, but I > think they are similar. > > > 2018-08-06 10:55 GMT+08:00 Sonny Heer : > >> Yea that would be great if Kylin can have a centralized metastore in RDS. >> >> The big problem for us now is this: >> >> 2 emr clusters each running kylin on

Re: multiple EMRs sync

2018-08-06 Thread Sonny Heer
ShaoFeng, Is Strikingly open to sharing their work? It appears our use case is similar and would love to see what work they have matches ours. On Mon, Aug 6, 2018 at 7:01 AM Sonny Heer wrote: > Does that require a HA cluster & kylin installed on its own instance? EMR > doesn

Re: multiple EMRs sync

2018-08-07 Thread Sonny Heer
; > I'm afraid we don't have a video (even there is one, it will be in Chinese > which I think won't be helpful). Our docker file hasn't yet open sourced. I > will follow the progress and notify you if there is any news. > > On Aug 7, 2018, 11:12 PM +0800, Sonny

Kylin metadata on different store

2018-08-09 Thread Sonny Heer
Has anyone done any work around moving kylin metadata off of hbase? We'd like to utilize EMR hbase read replica option with kylin, but kylin writes to hbase from even query nodes to hbase. Thoughts?

Re: Kylin metadata on different store

2018-08-09 Thread Sonny Heer
On Thu, Aug 9, 2018 at 7:12 PM ShaoFeng Shi wrote: > Hi Sonny, > > We have a JDBC metadata store for Kylin (support MySQL/SQLServer); I think > that can address your problem. If the community has the need, we can > opensource it into Kylin. > > 2018-08-10 7:21 GMT+08:00 So

EMR autoscale kylin metadata issues

2018-08-28 Thread Sonny Heer
while using EMR and autoscaling during multiple cube builds. Some builds intermittently fail with the following exception and halt that cube build. Typically restarting the build completes successfully. --- 5d388edae3b13f9f1d6a709720cc5378. is closing at org.apache.hadoop.hbase.reg

Re: EMR autoscale kylin metadata issues

2018-08-28 Thread Sonny Heer
n.run(BlockingRpcConnection.java:334) ... 1 more On Tue, Aug 28, 2018 at 7:59 AM Sonny Heer wrote: > while using EMR and autoscaling during multiple cube builds. > > Some builds intermittently fail with the following exception and halt that > cube build. Typically restarting th

Spark cubing on EMR

2018-08-28 Thread Sonny Heer
Unable to build cube at step "#6 Step Name: Build Cube with Spark" Looks to be a classpath issue with spark not able to find some amazon emr libs. when i look in spark defaults /etc/spark/conf i do see the classpath being set correctly. any ideas? - Exception in thread "main" java

Re: Spark cubing on EMR

2018-08-28 Thread Sonny Heer
) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) On Tue, Aug 28, 2018 at 8:11 AM Sonny Heer wrote: > Unable

Re: Spark cubing on EMR

2018-08-28 Thread Sonny Heer
our reference. As > EMR version keeps changing, there might be other cases. > > Please let me know if it works. I can add this piece to the documentation > if got verified. > > 2018-08-29 6:04 GMT+08:00 Sonny Heer : > >> After fixing the above issue by updating spark_hom

Re: Spark cubing on EMR

2018-08-29 Thread Sonny Heer
-configure.html Thanks On Tue, Aug 28, 2018 at 8:17 PM Sonny Heer wrote: > yeah seems that way. I did copy over the spark-defaults.conf from EMR to > KYLIN_HOME/spark/conf > > e.g. > > spark.driver.extraClassPath > :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/u

Re: Spark cubing on EMR

2018-08-30 Thread Sonny Heer
his work has not been kicked off.) > > For the Spark cubing optimization, I uploaded the slide we talked in Kylin > Meetup @Shanghai, hope it is helpful to you: > https://www.slideshare.net/ShiShaoFeng1/spark-tunning-in-apache-kylin > > 2018-08-30 13:39 GMT+08:00 Sonny Heer :