Questions about 'RAW' measure

BELLIER Jean-luc Wed, 28 Feb 2018 08:45:12 -0800

Hello

I discovered that there wsas a RAW measure to get raw data instead of 
aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/)


My assumption is that these raw data are stored in HBase, as aggregated data 
are, i.e. these data are duplicated from Hive into HBase.
So my question is : are there limitations on the data volume ? My fact tables 
contain billions of rows and we need to get detailed information from them. So 
what are the restrictions, and also the benefits related to querying directly 
the data into Hive ?

I have another question : I tested the way to create a model directly from a  
facts table containing raw data, in order to make a test of feasibility and 
avoid transformations (the table is a CSV file provided by an external team). I 
wanted in a first step to avoid creating files for the corresponding dimensions 
a generate a "clean" facts table having foreign keys corresponding to  the 
primary keys of dimension tables.
The creation of the model was OK.
However the cube generation failed at first step, and I got this message :

INFO  : Query ID = hive_20180228120101_6990f9d4-182d-4dd9-b319-fce02caf75ef
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
ERROR : Job hasn't been submitted after 61s. Aborting it.

How could I process to avoid this. Are there kylin parameters (or other) to 
adjust ?

Thank you in advance for your help. Have a good day.
Best regards,
Jean-Luc





"Ce message est destin? exclusivement aux personnes ou entit?s auxquelles il 
est adress? et peut contenir des informations privil?gi?es ou confidentielles. 
Si vous avez re?u ce document par erreur, merci de nous l'indiquer par retour, 
de ne pas le transmettre et de proc?der ? sa destruction.

This message is solely intended for the use of the individual or entity to 
which it is addressed and may contain information that is privileged or 
confidential. If you have received this communication by error, please notify 
us immediately by electronic mail, do not disclose it and delete the original 
message."

Questions about 'RAW' measure

Reply via email to