Re: usage of Web inteface Kylin an performances

Ge Silas Tue, 20 Feb 2018 05:05:08 -0800

Hi Jean-Luc,

There should not be any need to specify the cube name when sending the query 
through Insight interface. Can you provide more information on how you did your 
query?


For MDX, you may want to check http://dekarlab.de/wp/?p=363

Thanks,
Silas

On 19 Feb 2018, at 7:22 PM, BELLIER Jean-luc 
<jean-luc.bell...@rte-france.com<mailto:jean-luc.bell...@rte-france.com>> wrote:

Hello Shaofeng,

Thank you for this response.

I would like to clarify some things about the ‘Insight’ part of the Kylin Web 
interface.
It indicates the name of the cube when a query is launched, but my assumption 
is that the Hive tables are directly requested. So I do not know where the cube 
enters into account.
Does this query tool support MDX ? I am not sure, so how can we query the cube 
elements ?
If we can, can you send me a few examples of MDX queries ?
My assumption is that we need an external tool such as XMLA or other BI tools.

Thank you in advance for your help.

Have a good day.
Best regards,
Jean-Luc.


De : ShaoFeng Shi [mailto:shaofeng...@apache.org]
Envoyé : dimanche 18 février 2018 03:23
À : user <user@kylin.apache.org<mailto:user@kylin.apache.org>>
Objet : Re: usage of Web inteface Kylin an performances

Hi Jean-luc,

Most of the Kylin developers are in the new year holiday, so there might be 
some delay. Here are some comments from my side:

1.  I presume that the whole .json files are stored, is it right ?
yes
2. Do these kinds of tables contain the cube data ?
yes; cube are stored in HBase with "KYLINL_" as prefix
3. So I am wondering if it is the good method
the "compression" in Tomcat/conf/server.xml has nothing with cube build. To 
enable compression for cube, you need to configure that in your Hadoop 
configurations like mapred-site.xml, hbase-site.xml or 
kylin/conf/kylin_job_conf.xml.
4. How is it possible to optimize cube size to keep good performance ?
https://kylin.apache.org/docs21/howto/howto_optimize_cubes.html
5.  Is it through the ‘rowkeys’ in the advanced settings when you build the 
cube ?
yes, exactly; putting the most used filtering column to the heading position on 
the rowkey can get better performance.
6. What shall we put exactly in the ‘Rowkeys’ section ?
All dimensions (excluding 'derived' dimensions) need be on rowkey; If you see 
too many columns in the agg. group, remove some dimensions from your cube.
7.  Are the aggregation groups used for speed of the queries.
The agg. group is used to optimize the dimension combinations. For a N 
dimension cube, by default it will have 2^N combinations (we called cuboid). If 
you can divide N dimensions to several groups, the combination numbers can be 
greatly reduced, so the cube build will be much easier and taking much less 
space. How to define the agg. group? You can do that with your business query 
patterns.



2018-02-14 1:49 GMT+08:00 BELLIER Jean-luc 
<jean-luc.bell...@rte-france.com<mailto:jean-luc.bell...@rte-france.com>>:
Hello,

I have several questions on Kylin, especially about performances and how to 
manage them. I would like to understand precisely how it works to see if I can 
use it in my business context.

I come from the relational database world, so as far as I understand on OLAP, 
the searches are performed on the values of primary keys in dimensions. These 
subsets are then joined to get the corresponding lines on the facts table. As 
the dimensions tables are much smaller than the facts table, the queries run 
faster


1.       Questions on performances

•         the raw data are stored in Hive, and the models and structures 
(cubes) are stored in HBase; I presume that the whole .json files are stored, 
is it right ?

•         Where are the cube results stores (I mean after a build, a refresh or 
an append action). Is it also in HBase ? I can see in HBase tables like 
"KYLIN_FF46WDAAGH". Do these kinds of tables contain the cube data ?

•         I noticed that when I build the ‘sample_cube’, the volume of data was 
very important compared to the size of the original files. Is there a way to 
reduce it (I saw a attribute in the $KYLIN_HOME/tomcat/conf/server.xml file, 
called ‘compression’ for the connector on port 7070, but I do not know if it is 
related to the cube size). I tried to change this parameter to ‘yes’, but I 
noticed a huge increase of the duration of cube generation. So I am wondering 
if it is the good method.

•         How is it possible to optimize cube size to keep good performance ?

•         In Hive, putting indexes is not recommended. So how Kylin is ensuring 
good performance when querying high volumes of data  ? Is it through the 
‘rowkeys’ in the advanced settings when you build the cube ?

Or is the answer elsewhere ?


2.       Questions on cube building

•         By the way, the ‘Advanced settings’ step is still unclear for me. I 
tried to build a cube from scratch using the tables provided in the sample 
project. But I do not know very much what to put in this section.

•         My goal is to define groups of data on YEAR_BEG_DT, 
QTR_BEG_DT,MONTH_BEG_DT.

•         I do not understand very well why the aggregation group contains so 
many columns. I tried to remove as many as possible, but when I tried to set up 
the joins, but some fields were missing so the saving of the cube failed.

•         What shall we put exactly in the ‘Rowkeys’ section ? I understand 
that this is used to define data encoding (for speed access ? ).Am I right ?

•         Are the aggregation groups used for speed of the queries. I assume it 
is the case, because it represents the most commonly used associations of 
columns for the cube.


Thank you in advance for your help.

Best regards,
Jean-Luc.







"Ce message est destiné exclusivement aux personnes ou entités auxquelles il 
est adressé et peut contenir des informations privilégiées ou confidentielles. 
Si vous avez reçu ce document par erreur, merci de nous l'indiquer par retour, 
de ne pas le transmettre et de procéder à sa destruction.

This message is solely intended for the use of the individual or entity to 
which it is addressed and may contain information that is privileged or 
confidential. If you have received this communication by error, please notify 
us immediately by electronic mail, do not disclose it and delete the original 
message."



--
Best regards,

Shaofeng Shi 史少锋



"Ce message est destiné exclusivement aux personnes ou entités auxquelles il 
est adressé et peut contenir des informations privilégiées ou confidentielles. 
Si vous avez reçu ce document par erreur, merci de nous l'indiquer par retour, 
de ne pas le transmettre et de procéder à sa destruction.

This message is solely intended for the use of the individual or entity to 
which it is addressed and may contain information that is privileged or 
confidential. If you have received this communication by error, please notify 
us immediately by electronic mail, do not disclose it and delete the original 
message."

Re: usage of Web inteface Kylin an performances

Reply via email to