Syntax error in Kylin - even though query works in Hive

2016-09-06 Thread Something Something
Here's a query that works in Hive, but gives a syntax error in Kylin. Any
idea why?

select part_dt, kylin_sales.LEAF_CATEG_ID, LSTG_FORMAT_NAME, LSTG_SITE_ID,
META_CATEG_NAME, sum(price), max(price), min(price) from kylin_sales INNER
JOIN DEFAULT.KYLIN_CATEGORY_GROUPINGS as KYLIN_CATEGORY_GROUPINGS ON
KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID AND
KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID group by
part_dt, kylin_sales.LEAF_CATEG_ID, LSTG_FORMAT_NAME, LSTG_SITE_ID,
META_CATEG_NAME order by part_dt, kylin_sales.LEAF_CATEG_ID,
LSTG_FORMAT_NAME, LSTG_SITE_ID, META_CATEG_NAME;


Error message:

Encountered "DEFAULT" at line 1, column 156. Was expecting one of:
 ...  ...  ...
 ...  ... "LATERAL"
... "(" ... "UNNEST" ... "TABLE" ...


Re: SEVERE logs when installing Kylin-Hbase1.x on a HDP cluster

2016-09-06 Thread udana pathirana
Greetings,

Could someone give me some tips on resolving this issue please.

1) Do I have to fix the warning about capacity-scheduler JARS ? (According
to our Hadoop Admins, they cannot find this folder in any of the nodes.We
are running HortonWorks 2.4.2)
 "WARNING: Failed to process JAR [jar:file:/home/udana/hdp_
c5000/hadoop-2.7.1.2.4.2.0-258/contrib/capacity-scheduler/*.jar!/] "

2) How to get the detailed logs for these errors for further investigation ?
3) I can successfully acess 'hadoop','hbase' and 'hive' commands from the
client node.Do I need any other special settings to install Kylin ?
4) Our cluster is secured.Do I have to set any special kerberos related
settings for Kylin? (I have keytab files and krb5.conf for hadoop client
node)

Best regards,


On Tue, Sep 6, 2016 at 10:53 AM, udana pathirana 
wrote:

> Greetings,
>
> I am trying to install "apache-kylin-1.5.3-HBase1.x-bin" on one of the
> client nodes(edge node).
> Our Hadoop cluster is HDP 2.4.2.
> HBase version is 1.x running on Slider.
> I can succesfully access hive,hbase,hadoop commands from the client node.
>
> When I try to start Kylin I get the following SEVERE log messeges, and
> cannot see any UI when I access the 7070 port.
> Why I am getting these SEVERE  logs?
> How can i see more detailed error logs ? (cannot find useful logs in
> tomcat/logs folder)
>
>
>
> --
>
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
> MaxPermSize=128M; support was removed in 8.0
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/home/udana/hdp_
> c5000/hbase-1.1.2.2.3.4.7-4/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/udana/hdp_
> c5000/hadoop-2.7.1.2.4.2.0-258/share/hadoop/common/lib/
> slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/udana/hdp_
> c5000/tez-0.7.0.2.4.2.0-258/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/udana/hdp_
> c5000/spark-2.10-1.6.1.2.4.2.0-258/lib/spark-examples-1.6.
> 1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/udana/hdp_
> c5000/spark-2.10-1.6.1.2.4.2.0-258/lib/spark-assembly-1.6.
> 1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> usage: java org.apache.catalina.startup.Catalina [ -config {pathname} ] [
> -nonaming ]  { -help | start | stop }
> Sep 06, 2016 1:47:29 AM org.apache.catalina.core.AprLifecycleListener
> lifecycleEvent
> INFO: The APR based Apache Tomcat Native library which allows optimal
> performance in production environments was not found on the
> java.library.path: /home/udana/hdp_c5000/hadoop-
> 2.7.1.2.4.2.0-258/lib/native
> Sep 06, 2016 1:47:30 AM org.apache.coyote.AbstractProtocol init
> INFO: Initializing ProtocolHandler ["http-bio-7070"]
> Sep 06, 2016 1:47:30 AM org.apache.coyote.AbstractProtocol init
> INFO: Initializing ProtocolHandler ["ajp-bio-9009"]
> Sep 06, 2016 1:47:30 AM org.apache.catalina.startup.Catalina load
> INFO: Initialization processed in 545 ms
> Sep 06, 2016 1:47:30 AM org.apache.catalina.core.StandardService
> startInternal
> INFO: Starting service Catalina
> Sep 06, 2016 1:47:30 AM org.apache.catalina.core.StandardEngine
> startInternal
> INFO: Starting Servlet Engine: Apache Tomcat/7.0.69
> Sep 06, 2016 1:47:30 AM org.apache.catalina.startup.HostConfig deployWAR
> INFO: Deploying web application archive /home/udana/apache-kylin-1.5.
> 3-HBase1.x-bin/tomcat/webapps/kylin.war
> Sep 06, 2016 1:47:30 AM org.apache.tomcat.util.scan.StandardJarScanner
> scan
> WARNING: Failed to scan [file:/home/udana/hdp_c5000/
> hadoop-2.7.1.2.4.2.0-258/contrib/capacity-scheduler/*.jar] from
> classloader hierarchy
> java.io.FileNotFoundException: /home/udana/hdp_c5000/hadoop-
> 2.7.1.2.4.2.0-258/contrib/capacity-scheduler/*.jar (No such file or
> directory)
> at java.util.zip.ZipFile.open(Native Method)
> at java.util.zip.ZipFile.(ZipFile.java:219)
> at java.util.zip.ZipFile.(ZipFile.java:149)
> at java.util.jar.JarFile.(JarFile.java:166)
> at java.util.jar.JarFile.(JarFile.java:103)
> at sun.net.www.protocol.jar.URLJarFile.(URLJarFile.java:93)
> at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:69)
> at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:99)
> at sun.net.www.protocol.jar.JarURLConnection.connect(
> JarURLConnection.java:122)
> at sun.net.www.protocol.jar.JarURLConnection.getJarFile(
> JarURLConnection.java:89)
> at org.apache.tomcat.util.scan.FileUrlJar.(FileUrlJar.java:41)
> at org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:34)
> at 

RE: Empty Cube

2016-09-06 Thread Niasari, Mehrdad
The range is up to 2015 and that’s not the problem.

The only query that doesn’t return zero/empty result is from table 
kylin_cal_dt. When I delete this table from hive, still the result is not empty 
and Kylin doesn’t complain! Where does Kylin bring the data of this table even 
when it’s not on hive?
Also I checked the cube itself on Hbase and it’s empty. Any other suggestion? 
Can it be because of some wrong configuration in connecting Kylin to 
Hive/Hbase? If that’s the case why it can create the cube successfully?

Also I have a question regarding uuid. What is this for?
I’m asking this cause I cannot fully understand that why you have used a fixed 
value in sample cube json files (for example for kylin_sales_cube uuid is: 
“2fbca32a-a33e-4b69-83dd-0bb8b1f8c53b”). How come it doesn’t conflict if you 
run the sample cube multiple times?

Thanks,
Mehrdad

From: ShaoFeng Shi [mailto:shaofeng...@apache.org]
Sent: 2016, September, 01 9:01 PM
To: user
Subject: Re: Empty Cube

ok; The table "kylin_sales" contains data for 2012 and 2013, so need make sure 
the build range covers these years; please have a check and I guess it is some 
minor mistake; BTW, empty cube will return zero since it doesn't have data.

2016-09-02 1:34 GMT+08:00 Niasari, Mehrdad 
>:
Thanks for the reply.

There is no error in the log. Also I checked the hive tables and they are not 
empty; I can run any query on them using Hue and work well.

Also the size of cube in the cubes section in 0.00KB! But even with an empty 
cube the result of queries on the tables should not be empty/zero, right?
Any other suggestion how to debug it?

Thanks,
Mehrdad

From: ShaoFeng Shi 
[mailto:shaofeng...@apache.org]
Sent: 2016, August, 31 9:04 PM
To: user
Subject: Re: Empty Cube

Hi Mehrdad,

When you run the query, is there any error reported in the logs/kylin.log?

If no error, please check whether the source hive table has data, you can do 
this easily in hive, by running HQL "select * from default.kylin_sales limit 
100"; it is very likely that the "sample.sh" failed to upload the file; if it 
is this case, you may need manually import the data files  (sample_cube/data) 
to hive;

2016-09-01 4:06 GMT+08:00 Niasari, Mehrdad 
>:
Hi All,

When I run the sample cube of Kylin the result is empty, i.e. the cube is 
computed successfully, the status is ready but nothing is in the cube and every 
query gets zero.
Do you have any suggestion what can cause the problem?
How should I debug it?

Thanks,
Mehrdad



___

If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.



--
Best regards,

Shaofeng Shi


___

If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.



--
Best regards,

Shaofeng Shi

___
If you received this email in error, please advise the sender (by return email 
or otherwise) immediately. You have consented to receive the attached 
electronically at the above-noted email address; please retain a copy of this 
confirmation for future reference.  

Si vous recevez ce courriel par erreur, veuillez en aviser l'expéditeur 
immédiatement, par retour de courriel ou par un autre moyen. Vous avez accepté 
de recevoir le(s) document(s) ci-joint(s) par voie électronique à l'adresse 
courriel indiquée ci-dessus; veuillez conserver une copie de cette confirmation 
pour les fins de reference future.


Re: Query worked in Hive but not in Kylin

2016-09-06 Thread ShaoFeng Shi
Roberto is correct: although "LEAF_CATEG_ID" was not explicitly defined as
a dimension, it was added as a dimension since other "derived" dimensions
need be derived from it on the fly (since it is a part of the PK).

thank you Roberto!

2016-09-06 22:22 GMT+08:00 Roberto Tardío Olmos :

> Hi,
>
> I think this is due in kylin_sales_cube you define a dimension derived
> from DEFAULT.KYLIN_CATEGORY_GROUPINGS. Because of this, the derived
> dimensions columns (USER_DEFINED_FIELD1, USER_DEFINED_FIELD3, UPD_DATE,
> UDP_USER)  are not included on HBase MOLAP cube. They are derived "on the
> fly" (I have my doubts about the inner workings) from FKs, LEAF_CATEG_ID
> and SITE_ID, that you have defined in Kylin data model. These columns,
> LEAF_CATEG_ID and SITE_ID, are really included in MOLAP. Thus, you can use
> them as dimensions in your queries.
>
> Regards,
>
>
>
> El 06/09/2016 a las 9:50, Something Something escribió:
>
> When I ran this:
>
> select part_dt, LEAF_CATEG_ID, SELLER_ID, sum(price), max(price),
> min(price) from kylin_sales group by part_dt, LEAF_CATEG_ID, SELLER_ID
> order by part_dt, LEAF_CATEG_ID, SELLER_ID;
>
> It worked in Hive but not in Kylin. After debugging I realized it was
> because 'SELLER_ID' is not a Dimension so I tried this & it worked:
>
> select part_dt, LEAF_CATEG_ID, LSTG_FORMAT_NAME, sum(price), max(price),
> min(price) from kylin_sales group by part_dt, LEAF_CATEG_ID,
> LSTG_FORMAT_NAME order by part_dt, LEAF_CATEG_ID, LSTG_FORMAT_NAME;
>
> 'cause LSTG_FORMAT_NAME is a Dimension.
>
>
>
> Which begs a question, even though LEAF_CATEG_ID is NOT  dimension, how
> come Kylin didn't complain? Cleary, I am not understanding something basic.
> Sorry for the dumb question, but please help. Thanks.
>
>
> --
> *Roberto Tardío Olmos*
> *Big Data & Business Intelligence Consultant*
>
> Avenida de Brasil, 17, Planta 16. 28020 Madrid Fijo: 91.788.34.10
>



-- 
Best regards,

Shaofeng Shi


Re: Query worked in Hive but not in Kylin

2016-09-06 Thread Roberto Tardío Olmos

Hi,

I think this is due in kylin_sales_cube you define a dimension derived 
from DEFAULT.KYLIN_CATEGORY_GROUPINGS. Because of this, the derived 
dimensions columns (USER_DEFINED_FIELD1, USER_DEFINED_FIELD3, UPD_DATE, 
UDP_USER)  are not included on HBase MOLAP cube. They are derived "on 
the fly" (I have my doubts about the inner workings) from FKs, 
LEAF_CATEG_ID and SITE_ID, that you have defined in Kylin data model. 
These columns, LEAF_CATEG_ID and SITE_ID, are really included in MOLAP. 
Thus, you can use them as dimensions in your queries.


Regards,


||

El 06/09/2016 a las 9:50, Something Something escribió:

When I ran this:

select part_dt, LEAF_CATEG_ID, SELLER_ID, sum(price), max(price), 
min(price) from kylin_sales group by part_dt, LEAF_CATEG_ID, SELLER_ID 
order by part_dt, LEAF_CATEG_ID, SELLER_ID;


It worked in Hive but not in Kylin. After debugging I realized it was 
because 'SELLER_ID' is not a Dimension so I tried this & it worked:


select part_dt, LEAF_CATEG_ID, LSTG_FORMAT_NAME, sum(price), 
max(price), min(price) from kylin_sales group by part_dt, 
LEAF_CATEG_ID, LSTG_FORMAT_NAME order by part_dt, LEAF_CATEG_ID, 
LSTG_FORMAT_NAME;


'cause LSTG_FORMAT_NAME is a Dimension.



Which begs a question, even though LEAF_CATEG_ID is NOT  dimension, 
how come Kylin didn't complain? Cleary, I am not understanding 
something basic. Sorry for the dumb question, but please help. Thanks.


--
*Roberto Tardío Olmos*
/Big Data & Business Intelligence Consultant/

Avenida de Brasil, 17, Planta 16.

28020 Madrid

Fijo: 91.788.34.10



Re: Documentation on how data is stored

2016-09-06 Thread Alberto Ramón
I dont have more info about this

But,Kylin - 1453  v1.5.2
Shardin: must be a great feature  (and affect to to Key Compose)
 - before: used hash of key
 - now: uses hash of column

In true, I have too many doubts  :)

2016-09-06 9:44 GMT+02:00 Something Something :

> Hmm... that's a good start... but is there more info available somewhere?
> Can you direct me to that PPT? Thanks.
>
> On Tue, Sep 6, 2016 at 12:16 AM, Alberto Ramón 
> wrote:
>
>> I have this picture: (I found this info in a PPT)
>>
>> [image: Imágenes integradas 1]
>>
>> Remember that you can encode dim, by dictionarty or fix length
>>
>>
>> 2016-09-06 1:57 GMT+02:00 Something Something :
>>
>>> Hello,
>>>
>>> Is there any documentation available on how Kylin stores data on HBase?
>>> For example, I am trying to understand how data is stored on HBase when I
>>> run bin/sample.sh to create the "learn_kylin" project.
>>>
>>> I looked at the HBase table for the Cube. It has 2 column families but I
>>> don't understand what goes where in this table after Cube is built.
>>>
>>> I setup 'remote debugging' to debug the code, but the QueryService code
>>> seems to be off between the binary release (
>>> http://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-1.5
>>> .3/apache-kylin-1.5.3-HBase1.x-bin.tar.gz) and the source code (
>>> http://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-1.5
>>> .3/apache-kylin-1.5.3-src.tar.gz)
>>>
>>> I will keep debugging but if any documentation about "how data is
>>> stored" (UML diagram or something) is available, please share.
>>>
>>> Thanks.
>>>
>>
>>
>


Query worked in Hive but not in Kylin

2016-09-06 Thread Something Something
When I ran this:

select part_dt, LEAF_CATEG_ID, SELLER_ID, sum(price), max(price),
min(price) from kylin_sales group by part_dt, LEAF_CATEG_ID, SELLER_ID
order by part_dt, LEAF_CATEG_ID, SELLER_ID;

It worked in Hive but not in Kylin. After debugging I realized it was
because 'SELLER_ID' is not a Dimension so I tried this & it worked:

select part_dt, LEAF_CATEG_ID, LSTG_FORMAT_NAME, sum(price), max(price),
min(price) from kylin_sales group by part_dt, LEAF_CATEG_ID,
LSTG_FORMAT_NAME order by part_dt, LEAF_CATEG_ID, LSTG_FORMAT_NAME;

'cause LSTG_FORMAT_NAME is a Dimension.



Which begs a question, even though LEAF_CATEG_ID is NOT  dimension, how
come Kylin didn't complain? Cleary, I am not understanding something basic.
Sorry for the dumb question, but please help. Thanks.


Re: Documentation on how data is stored

2016-09-06 Thread Alberto Ramón
I have this picture: (I found this info in a PPT)

[image: Imágenes integradas 1]

Remember that you can encode dim, by dictionarty or fix length


2016-09-06 1:57 GMT+02:00 Something Something :

> Hello,
>
> Is there any documentation available on how Kylin stores data on HBase?
> For example, I am trying to understand how data is stored on HBase when I
> run bin/sample.sh to create the "learn_kylin" project.
>
> I looked at the HBase table for the Cube. It has 2 column families but I
> don't understand what goes where in this table after Cube is built.
>
> I setup 'remote debugging' to debug the code, but the QueryService code
> seems to be off between the binary release (http://www.apache.org/dyn/
> closer.cgi/kylin/apache-kylin-1.5.3/apache-kylin-1.5.3-HBase1.x-bin.tar.gz)
> and the source code (http://www.apache.org/dyn/
> closer.cgi/kylin/apache-kylin-1.5.3/apache-kylin-1.5.3-src.tar.gz)
>
> I will keep debugging but if any documentation about "how data is stored"
> (UML diagram or something) is available, please share.
>
> Thanks.
>