Hello, I have several questions on Kylin, especially about performances and how to manage them. I would like to understand precisely how it works to see if I can use it in my business context.
I come from the relational database world, so as far as I understand on OLAP, the searches are performed on the values of primary keys in dimensions. These subsets are then joined to get the corresponding lines on the facts table. As the dimensions tables are much smaller than the facts table, the queries run faster 1. Questions on performances * the raw data are stored in Hive, and the models and structures (cubes) are stored in HBase; I presume that the whole .json files are stored, is it right ? * Where are the cube results stores (I mean after a build, a refresh or an append action). Is it also in HBase ? I can see in HBase tables like "KYLIN_FF46WDAAGH". Do these kinds of tables contain the cube data ? * I noticed that when I build the 'sample_cube', the volume of data was very important compared to the size of the original files. Is there a way to reduce it (I saw a attribute in the $KYLIN_HOME/tomcat/conf/server.xml file, called 'compression' for the connector on port 7070, but I do not know if it is related to the cube size). I tried to change this parameter to 'yes', but I noticed a huge increase of the duration of cube generation. So I am wondering if it is the good method. * How is it possible to optimize cube size to keep good performance ? * In Hive, putting indexes is not recommended. So how Kylin is ensuring good performance when querying high volumes of data ? Is it through the 'rowkeys' in the advanced settings when you build the cube ? Or is the answer elsewhere ? 2. Questions on cube building * By the way, the 'Advanced settings' step is still unclear for me. I tried to build a cube from scratch using the tables provided in the sample project. But I do not know very much what to put in this section. * My goal is to define groups of data on YEAR_BEG_DT, QTR_BEG_DT,MONTH_BEG_DT. * I do not understand very well why the aggregation group contains so many columns. I tried to remove as many as possible, but when I tried to set up the joins, but some fields were missing so the saving of the cube failed. * What shall we put exactly in the 'Rowkeys' section ? I understand that this is used to define data encoding (for speed access ? ).Am I right ? * Are the aggregation groups used for speed of the queries. I assume it is the case, because it represents the most commonly used associations of columns for the cube. Thank you in advance for your help. Best regards, Jean-Luc. "Ce message est destin? exclusivement aux personnes ou entit?s auxquelles il est adress? et peut contenir des informations privil?gi?es ou confidentielles. Si vous avez re?u ce document par erreur, merci de nous l'indiquer par retour, de ne pas le transmettre et de proc?der ? sa destruction. This message is solely intended for the use of the individual or entity to which it is addressed and may contain information that is privileged or confidential. If you have received this communication by error, please notify us immediately by electronic mail, do not disclose it and delete the original message."