Re: usage of Web inteface Kylin an performances

Luke Han Wed, 21 Feb 2018 06:30:53 -0800

There's no build in XMLA/MDX services today, you could leverage 3rd party
solution, please search Kylin Mondrian for example.


Thanks.


Best Regards!
---------------------

Luke Han

On Tue, Feb 20, 2018 at 10:01 PM, Ge Silas <[email protected]> wrote:

> Hi Jean-luc,
>
> You should use Insight interface just like what you do in Hive. If the
> query can be answered by pre-calculated cube, cube will answer. Otherwise
> you can configure query pushdown http://kylin.apache.
> org/docs21/tutorial/query_pushdown.html for the queries which cannot be
> answered by cube.
>
> I personally have not tried XMLA and Mondrian setup yet but you may want
> to check more in the Tomcat logs or else.
>
> Thanks,
> Silas
>
> On 20 Feb 2018, at 9:47 PM, BELLIER Jean-luc <jean-luc.bellier@rte-france.
> com> wrote:
>
> Hello Silas,
>
> I did a query using the ‘SQL’ tab defining the cube, and customized it to
> add filters. What I noticed is that the interface displays the name of the
> cube, so I presume the tool uses this cube to get the result, but I do not
> understand very much how.
> Here is an example of query. I used also this query directly in Hive, so
> my question is : does the web interface query the cube (i.e. the structure
> stored in HBase) or the tables of the model (stored in Hive) ?
>
> *SELECT*
> *KYLIN_SALES.TRANS_ID as KYLIN_SALES_TRANS_ID*
> *,KYLIN_SALES.PART_DT AS KYLIN_SALES_PART_DT*
> *,KYLIN_SALES.LEAF_CATEG_ID as KYLIN_SALES_LEAF_CATEG_ID*
> *,KYLIN_SALES.LSTG_SITE_ID as KYLIN_SALES_LSTG_SITE_ID*
> *,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME as
> KYLIN_CATEGORY_GROUPINGS_META_CATEG_NAME*
> *,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME as
> KYLIN_CATEGORY_GROUPINGS_CATEG_LVL2_NAME*
> *,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL3_NAME as
> KYLIN_CATEGORY_GROUPINGS_CATEG_LVL3_NAME*
> *,KYLIN_SALES.LSTG_FORMAT_NAME as KYLIN_SALES_LSTG_FORMAT_NAME*
> *,KYLIN_SALES.SELLER_ID as KYLIN_SALES_SELLER_ID*
> *,KYLIN_SALES.BUYER_ID as KYLIN_SALES_BUYER_ID*
> *,BUYER_ACCOUNT.ACCOUNT_BUYER_LEVEL as BUYER_ACCOUNT_ACCOUNT_BUYER_LEVEL*
> *,SELLER_ACCOUNT.ACCOUNT_SELLER_LEVEL as
> SELLER_ACCOUNT_ACCOUNT_SELLER_LEVEL*
> *,BUYER_ACCOUNT.ACCOUNT_COUNTRY as BUYER_ACCOUNT_ACCOUNT_COUNTRY*
> *,SELLER_ACCOUNT.ACCOUNT_COUNTRY as SELLER_ACCOUNT_ACCOUNT_COUNTRY*
> *,BUYER_COUNTRY.NAME <http://BUYER_COUNTRY.NAME> as BUYER_COUNTRY_NAME*
> *,SELLER_COUNTRY.NAME <http://SELLER_COUNTRY.NAME> as SELLER_COUNTRY_NAME*
> *,KYLIN_SALES.OPS_USER_ID as KYLIN_SALES_OPS_USER_ID*
> *,KYLIN_SALES.OPS_REGION as KYLIN_SALES_OPS_REGION*
> *,KYLIN_SALES.PRICE as KYLIN_SALES_PRICE*
> *FROM KYLIN_SALES as KYLIN_SALES*
> *INNER JOIN KYLIN_CATEGORY_GROUPINGS as KYLIN_CATEGORY_GROUPINGS*
> *ON KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID AND
> KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID*
> *INNER JOIN KYLIN_ACCOUNT as BUYER_ACCOUNT*
> *ON KYLIN_SALES.BUYER_ID = BUYER_ACCOUNT.ACCOUNT_ID*
> *INNER JOIN KYLIN_ACCOUNT as SELLER_ACCOUNT*
> *ON KYLIN_SALES.SELLER_ID = SELLER_ACCOUNT.ACCOUNT_ID*
> *INNER JOIN KYLIN_COUNTRY as BUYER_COUNTRY*
> *ON BUYER_ACCOUNT.ACCOUNT_COUNTRY = BUYER_COUNTRY.COUNTRY*
> *INNER JOIN KYLIN_COUNTRY as SELLER_COUNTRY*
> *ON SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRY*
> *WHERE BUYER_COUNTRY.COUNTRY in ('FR','BE','DE')*
> *AND KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME = 'Computers/Tablets &
> Networking'*
>
>
> I wondered also how to query the dimensions and the measures defined in
> the cube at step ‘Advanced settings’ : can it be done on the Kylin
> interface or through another driver. You mail suggested to used XMondrian.
> I tried it on my personal PC, on a VM.  This WM was not installed by me but
> by a teacher for a course about Big Data Programming. So I have not
> modified this part. I have configuration problems at step 2 :
> ·         At step 1, I got a Mondrian.war file, which produced a
> ‘xmondrian’ folder under ‘tomcat’ folder, which is located under
> $KYLIN_HOME folder
> ·         I started tomcat from my tomcat folder (using the ‘tomcat’)
> folder of my VM. The system tells me that tomcat service has started
> ·         I try then to reach http://localhost:8080/xmondrian/xmla. But I
> receive a message : “unable to connect”. How should I parametrize to make
> it work ?
> ·         Same question about the parametrization if I want to use the
> XMLA driver in Excel on my computer, and use my Kylin cubes.
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
> Jean-Luc
>
>
> *De :* Ge Silas [mailto:[email protected] <[email protected]>]
> *Envoyé :* mardi 20 février 2018 14:04
> *À :* [email protected]
> *Objet :* Re: usage of Web inteface Kylin an performances
>
> Hi Jean-Luc,
>
> There should not be any need to specify the cube name when sending the
> query through Insight interface. Can you provide more information on how
> you did your query?
>
> For MDX, you may want to check http://dekarlab.de/wp/?p=363
>
> Thanks,
> Silas
>
> On 19 Feb 2018, at 7:22 PM, BELLIER Jean-luc <jean-luc.bellier@rte-france.
> com> wrote:
>
> Hello Shaofeng,
>
> Thank you for this response.
>
> I would like to clarify some things about the ‘Insight’ part of the Kylin
> Web interface.
> It indicates the name of the cube when a query is launched, but my
> assumption is that the Hive tables are directly requested. So I do not know
> where the cube enters into account.
> Does this query tool support MDX ? I am not sure, so how can we query the
> cube elements ?
> If we can, can you send me a few examples of MDX queries ?
> My assumption is that we need an external tool such as XMLA or other BI
> tools.
>
> Thank you in advance for your help.
>
> Have a good day.
> Best regards,
> Jean-Luc.
>
>
> *De :* ShaoFeng Shi [mailto:[email protected]
> <[email protected]>]
> *Envoyé :* dimanche 18 février 2018 03:23
> *À :* user <[email protected]>
> *Objet :* Re: usage of Web inteface Kylin an performances
>
> Hi Jean-luc,
>
> Most of the Kylin developers are in the new year holiday, so there might
> be some delay. Here are some comments from my side:
>
> 1.  I presume that the whole .json files are stored, is it right ?
> yes
> 2. Do these kinds of tables contain the cube data ?
> yes; cube are stored in HBase with "KYLINL_" as prefix
> 3. So I am wondering if it is the good method
> the "compression" in Tomcat/conf/server.xml has nothing with cube build.
> To enable compression for cube, you need to configure that in your Hadoop
> configurations like mapred-site.xml, hbase-site.xml or
> kylin/conf/kylin_job_conf.xml.
> 4. How is it possible to optimize cube size to keep good performance ?
> https://kylin.apache.org/docs21/howto/howto_optimize_cubes.html
> 5.  Is it through the ‘rowkeys’ in the advanced settings when you build
> the cube ?
> yes, exactly; putting the most used filtering column to the heading
> position on the rowkey can get better performance.
> 6. What shall we put exactly in the ‘Rowkeys’ section ?
> All dimensions (excluding 'derived' dimensions) need be on rowkey; If you
> see too many columns in the agg. group, remove some dimensions from your
> cube.
> 7.  Are the aggregation groups used for speed of the queries.
> The agg. group is used to optimize the dimension combinations. For a N
> dimension cube, by default it will have 2^N combinations (we called
> cuboid). If you can divide N dimensions to several groups, the combination
> numbers can be greatly reduced, so the cube build will be much easier and
> taking much less space. How to define the agg. group? You can do that with
> your business query patterns.
>
>
>
> 2018-02-14 1:49 GMT+08:00 BELLIER Jean-luc <jean-luc.bellier@rte-france.
> com>:
>
> Hello,
>
> I have several questions on Kylin, especially about performances and how
> to manage them. I would like to understand precisely how it works to see if
> I can use it in my business context.
>
> I come from the relational database world, so as far as I understand on
> OLAP, the searches are performed on the values of primary keys in
> dimensions. These subsets are then joined to get the corresponding lines on
> the facts table. As the dimensions tables are much smaller than the facts
> table, the queries run faster
>
>
> *1.*       *Questions on performances*
>
> ·         the raw data are stored in Hive, and the models and structures
> (cubes) are stored in HBase; I presume that the whole .json files are
> stored, is it right ?
>
> ·         Where are the cube results stores (I mean after a build, a
> refresh or an append action). Is it also in HBase ? I can see in HBase
> tables like "KYLIN_FF46WDAAGH". Do these kinds of tables contain the cube
> data ?
>
> ·         I noticed that when I build the ‘sample_cube’, the volume of
> data was very important compared to the size of the original files. Is
> there a way to reduce it (I saw a attribute in the 
> $KYLIN_HOME/tomcat/conf/server.xml
> file, called ‘compression’ for the connector on port 7070, but I do not
> know if it is related to the cube size). I tried to change this parameter
> to ‘yes’, but I noticed a huge increase of the duration of cube generation.
> So I am wondering if it is the good method.
>
> ·         How is it possible to optimize cube size to keep good
> performance ?
>
> ·         In Hive, putting indexes is not recommended. So how Kylin is
> ensuring good performance when querying high volumes of data  ? Is it
> through the ‘rowkeys’ in the advanced settings when you build the cube ?
> Or is the answer elsewhere ?
>
>
> *2.*       *Questions on cube building*
>
> ·         By the way, the ‘Advanced settings’ step is still unclear for
> me. I tried to build a cube from scratch using the tables provided in the
> sample project. But I do not know very much what to put in this section.
>
> ·         My goal is to define groups of data on YEAR_BEG_DT,
> QTR_BEG_DT,MONTH_BEG_DT.
>
> ·         I do not understand very well why the aggregation group
> contains so many columns. I tried to remove as many as possible, but when I
> tried to set up the joins, but some fields were missing so the saving of
> the cube failed.
>
> ·         What shall we put exactly in the ‘Rowkeys’ section ? I
> understand that this is used to define data encoding (for speed access ?
> ).Am I right ?
>
> ·         Are the aggregation groups used for speed of the queries. I
> assume it is the case, because it represents the most commonly used
> associations of columns for the cube.
>
> Thank you in advance for your help.
>
> Best regards,
> Jean-Luc.
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>
>

Re: usage of Web inteface Kylin an performances

Reply via email to