Re: Get daily average for periodic readings

2018-03-01 Thread deva namaste
Thanks Alberto.  So you would recommend me to create daily one record in
fact table? so from 6 records for year, you would recommend to create 365
records with difference invalues between them.  So I can sort data from
dimension based on week, month, year, etc.  But I was more worried about
amount of data will be stored in fact table in cube.  so for 10 Million
items, we are talking about 10 x 365 = 3650 Millions.  Do you think
performance will be impacted? or other method where I can put only 6
records per item in fact table, so 10 million x 6 = 60 Millions and then
use some sql  for better performance? thanks

On Thu, Mar 1, 2018 at 11:36 AM, Alberto Ramón 
wrote:

> You cant portioned your cube per week.  Must be per -mm-dd
>
> You can perform your own test.  Doing a calculate per year as dim and year
> as sum of days
>
> On 1 Mar 2018 3:50 p.m., "deva namaste"  wrote:
>
>> Hi Alberto,
>>
>> when I was saying 6 vs 365 its for one item. for 20 Million items it will
>> multiply by a lot.  Do you think it wont make much differnce?
>> Also what is  YY-MM-WW ? so I can explain you? Basically I need same
>> avg() for week, month, year, etc.
>>
>> Thanks
>> Deva
>>
>> On Thu, Mar 1, 2018 at 8:42 AM, Alberto Ramón 
>> wrote:
>>
>>> - the 95% of time response, are latencies (= there is no difference
>>> between sum one int or 365, I thought the same when I started with
>>> Kylin)
>>> - The YY-MM-WW, is not implemented, but can be nice if you can
>>> contribute to it
>>>
>>> Alb
>>>
>>> On 28 February 2018 at 22:59, deva namaste  wrote:
>>>
 I was thinking of saving only 6 records in kylin instead of splitting
 them outside in daily avg and adding 365 records for each item.  So is
 there anyway I can achieve using sql level in kylin or have changes to
 model to accomodate above change? Please advice. Thanks

 On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón <
 a.ramonporto...@gmail.com> wrote:

> Sounds like:
> - your minimum granularity for queries are on Weeks, your fact table
> need be on weeks (or less, like days)
> - you will need expand you actual fact table to weeks (or more, days)
> Example use a hive view
> - as extra:  Kylin can't use partition format columns on weeks, the
> minimum es days
>
> Alb
>
> On 28 February 2018 at 21:51, deva namaste  wrote:
>
>> Hello,
>>
>> How would I calculate value for a week while I have bi-monthly values.
>>
>> e.g. Here is my data looks like -
>>
>> Date   -  Value
>> 01/18/2017 -  100
>> 03/27/2017 -  130  (68 Days)
>> 05/17/2017 -  102  (51 Days)
>>
>> I need average value per week, as below. Lets consider between 03/27
>> and 05/17. So total days between period are 51. so Daily average would be
>> 102/51= 2.04
>>
>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>> Week1 (Starting Apr 2, #days = 7) = 14.28
>> Week2 (starting Apr 9, #days = 7)= 14.28
>> Week3 (starting Apr 16, #days = 7)= 14.28
>> Week4 (starting Apr 23, #days = 7)= 14.28
>> week5 (Starting Apr 30, #days =7)= 14.28
>> week1 (starting May 7, #days = 7)= 14.28
>> Week2 (starting May 14, #days = 4)= 8.16
>>
>> But as you see that period from 01/18 to 03/27, have 68 days and
>> daily average would be 130/68=1.91
>>
>> So really to get complete week I need 3 days from 130 value and 4
>> days from 102 value.
>>
>> So real total for that first week would be -
>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73)
>> = 13.89
>>
>> How would I achieve this in Kylin? Any function? or other method I
>> can use?
>> Just for 6 records for year, I dont want to populate daily records.
>> Thanks
>> Deva
>>
>>
>>
>

>>>
>>


Re: Get daily average for periodic readings

2018-03-01 Thread Alberto Ramón
You cant portioned your cube per week.  Must be per -mm-dd

You can perform your own test.  Doing a calculate per year as dim and year
as sum of days

On 1 Mar 2018 3:50 p.m., "deva namaste"  wrote:

> Hi Alberto,
>
> when I was saying 6 vs 365 its for one item. for 20 Million items it will
> multiply by a lot.  Do you think it wont make much differnce?
> Also what is  YY-MM-WW ? so I can explain you? Basically I need same
> avg() for week, month, year, etc.
>
> Thanks
> Deva
>
> On Thu, Mar 1, 2018 at 8:42 AM, Alberto Ramón 
> wrote:
>
>> - the 95% of time response, are latencies (= there is no difference
>> between sum one int or 365, I thought the same when I started with Kylin)
>> - The YY-MM-WW, is not implemented, but can be nice if you can contribute
>> to it
>>
>> Alb
>>
>> On 28 February 2018 at 22:59, deva namaste  wrote:
>>
>>> I was thinking of saving only 6 records in kylin instead of splitting
>>> them outside in daily avg and adding 365 records for each item.  So is
>>> there anyway I can achieve using sql level in kylin or have changes to
>>> model to accomodate above change? Please advice. Thanks
>>>
>>> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón <
>>> a.ramonporto...@gmail.com> wrote:
>>>
 Sounds like:
 - your minimum granularity for queries are on Weeks, your fact table
 need be on weeks (or less, like days)
 - you will need expand you actual fact table to weeks (or more, days)
 Example use a hive view
 - as extra:  Kylin can't use partition format columns on weeks, the
 minimum es days

 Alb

 On 28 February 2018 at 21:51, deva namaste  wrote:

> Hello,
>
> How would I calculate value for a week while I have bi-monthly values.
>
> e.g. Here is my data looks like -
>
> Date   -  Value
> 01/18/2017 -  100
> 03/27/2017 -  130  (68 Days)
> 05/17/2017 -  102  (51 Days)
>
> I need average value per week, as below. Lets consider between 03/27
> and 05/17. So total days between period are 51. so Daily average would be
> 102/51= 2.04
>
> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
> Week1 (Starting Apr 2, #days = 7) = 14.28
> Week2 (starting Apr 9, #days = 7)= 14.28
> Week3 (starting Apr 16, #days = 7)= 14.28
> Week4 (starting Apr 23, #days = 7)= 14.28
> week5 (Starting Apr 30, #days =7)= 14.28
> week1 (starting May 7, #days = 7)= 14.28
> Week2 (starting May 14, #days = 4)= 8.16
>
> But as you see that period from 01/18 to 03/27, have 68 days and daily
> average would be 130/68=1.91
>
> So really to get complete week I need 3 days from 130 value and 4 days
> from 102 value.
>
> So real total for that first week would be -
> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
> 13.89
>
> How would I achieve this in Kylin? Any function? or other method I can
> use?
> Just for 6 records for year, I dont want to populate daily records.
> Thanks
> Deva
>
>
>

>>>
>>
>


Re: Get daily average for periodic readings

2018-03-01 Thread deva namaste
Hi Alberto,

when I was saying 6 vs 365 its for one item. for 20 Million items it will
multiply by a lot.  Do you think it wont make much differnce?
Also what is  YY-MM-WW ? so I can explain you? Basically I need same avg()
for week, month, year, etc.

Thanks
Deva

On Thu, Mar 1, 2018 at 8:42 AM, Alberto Ramón 
wrote:

> - the 95% of time response, are latencies (= there is no difference
> between sum one int or 365, I thought the same when I started with Kylin)
> - The YY-MM-WW, is not implemented, but can be nice if you can contribute
> to it
>
> Alb
>
> On 28 February 2018 at 22:59, deva namaste  wrote:
>
>> I was thinking of saving only 6 records in kylin instead of splitting
>> them outside in daily avg and adding 365 records for each item.  So is
>> there anyway I can achieve using sql level in kylin or have changes to
>> model to accomodate above change? Please advice. Thanks
>>
>> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón > > wrote:
>>
>>> Sounds like:
>>> - your minimum granularity for queries are on Weeks, your fact table
>>> need be on weeks (or less, like days)
>>> - you will need expand you actual fact table to weeks (or more, days)
>>> Example use a hive view
>>> - as extra:  Kylin can't use partition format columns on weeks, the
>>> minimum es days
>>>
>>> Alb
>>>
>>> On 28 February 2018 at 21:51, deva namaste  wrote:
>>>
 Hello,

 How would I calculate value for a week while I have bi-monthly values.

 e.g. Here is my data looks like -

 Date   -  Value
 01/18/2017 -  100
 03/27/2017 -  130  (68 Days)
 05/17/2017 -  102  (51 Days)

 I need average value per week, as below. Lets consider between 03/27
 and 05/17. So total days between period are 51. so Daily average would be
 102/51= 2.04

 Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
 Week1 (Starting Apr 2, #days = 7) = 14.28
 Week2 (starting Apr 9, #days = 7)= 14.28
 Week3 (starting Apr 16, #days = 7)= 14.28
 Week4 (starting Apr 23, #days = 7)= 14.28
 week5 (Starting Apr 30, #days =7)= 14.28
 week1 (starting May 7, #days = 7)= 14.28
 Week2 (starting May 14, #days = 4)= 8.16

 But as you see that period from 01/18 to 03/27, have 68 days and daily
 average would be 130/68=1.91

 So really to get complete week I need 3 days from 130 value and 4 days
 from 102 value.

 So real total for that first week would be -
 Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
 13.89

 How would I achieve this in Kylin? Any function? or other method I can
 use?
 Just for 6 records for year, I dont want to populate daily records.
 Thanks
 Deva



>>>
>>
>


Re: Get daily average for periodic readings

2018-03-01 Thread Alberto Ramón
- the 95% of time response, are latencies (= there is no difference between
sum one int or 365, I thought the same when I started with Kylin)
- The YY-MM-WW, is not implemented, but can be nice if you can contribute
to it

Alb

On 28 February 2018 at 22:59, deva namaste  wrote:

> I was thinking of saving only 6 records in kylin instead of splitting them
> outside in daily avg and adding 365 records for each item.  So is there
> anyway I can achieve using sql level in kylin or have changes to model to
> accomodate above change? Please advice. Thanks
>
> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón 
> wrote:
>
>> Sounds like:
>> - your minimum granularity for queries are on Weeks, your fact table need
>> be on weeks (or less, like days)
>> - you will need expand you actual fact table to weeks (or more, days)
>> Example use a hive view
>> - as extra:  Kylin can't use partition format columns on weeks, the
>> minimum es days
>>
>> Alb
>>
>> On 28 February 2018 at 21:51, deva namaste  wrote:
>>
>>> Hello,
>>>
>>> How would I calculate value for a week while I have bi-monthly values.
>>>
>>> e.g. Here is my data looks like -
>>>
>>> Date   -  Value
>>> 01/18/2017 -  100
>>> 03/27/2017 -  130  (68 Days)
>>> 05/17/2017 -  102  (51 Days)
>>>
>>> I need average value per week, as below. Lets consider between 03/27 and
>>> 05/17. So total days between period are 51. so Daily average would be
>>> 102/51= 2.04
>>>
>>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>>> Week1 (Starting Apr 2, #days = 7) = 14.28
>>> Week2 (starting Apr 9, #days = 7)= 14.28
>>> Week3 (starting Apr 16, #days = 7)= 14.28
>>> Week4 (starting Apr 23, #days = 7)= 14.28
>>> week5 (Starting Apr 30, #days =7)= 14.28
>>> week1 (starting May 7, #days = 7)= 14.28
>>> Week2 (starting May 14, #days = 4)= 8.16
>>>
>>> But as you see that period from 01/18 to 03/27, have 68 days and daily
>>> average would be 130/68=1.91
>>>
>>> So really to get complete week I need 3 days from 130 value and 4 days
>>> from 102 value.
>>>
>>> So real total for that first week would be -
>>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
>>> 13.89
>>>
>>> How would I achieve this in Kylin? Any function? or other method I can
>>> use?
>>> Just for 6 records for year, I dont want to populate daily records.
>>> Thanks
>>> Deva
>>>
>>>
>>>
>>
>


Re: Questions about 'RAW' measure

2018-03-01 Thread Alberto Ramón
MailList
:
Kylin 3062  v2.3 Propose
to disable RAW from UI

Nowadays you cant control the execution (or not) to create Flat Tables,
there is a propuse Kylin 2532

v2.1



On 1 March 2018 at 08:30, BELLIER Jean-luc 
wrote:

> Hello Alberto,
>
>
>
> Thank you for your answer. I will look further for this mistake on the
> cube building.
>
>
>
> Concerning the RAW measure, are you referring to this discussion  ?
>
> I still can see this option on measures section on Kylin 2.2, that is why
> it kept my attention.
>
> Does it mean that to access raw data, we need to first use an aggregated
> measure ? My final users mainly use raw data (e.g. slicing), so I want to
> be sure on that.
>
>
>
> What about building cubes using only a table of facts with all the data
> inside ? Is it a conceivable way of doing (in terms of space storage,
> efficiency) or is it preferable to use separate tables foe dimensions and
> why ?
>
>
>
> Thank you in advance for your help.
>
> Have a good day.
>
>
>
> Best regards,
>
> Jean-Luc.
>
>
>
> *De :* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> *Envoyé :* mercredi 28 février 2018 19:04
> *À :* user 
> *Objet :* Re: Questions about 'RAW' measure
>
>
>
> Hello
>
> - RAW format are deprecated. You will find the thread in this MailList
> - "Job hasn't been submitted after" sound a configuration problem with
> your YARN, please find it on Google and review your CPU and RAM resources
>
>
>
> On 28 February 2018 at 16:44, BELLIER Jean-luc <
> jean-luc.bell...@rte-france.com> wrote:
>
> Hello
>
>
>
> I discovered that there wsas a RAW measure to get raw data instead of
> aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-
> kylin/)
>
>
>
> My assumption is that these raw data are stored in HBase, as aggregated
> data are, i.e. these data are duplicated from Hive into HBase.
>
> So my question is : are there limitations on the data volume ? My fact
> tables contain billions of rows and we need to get detailed information
> from them. So what are the restrictions, and also the benefits related to
> querying directly the data into Hive ?
>
>
>
> I have another question : I tested the way to create a model directly from
> a  facts table containing raw data, in order to make a test of feasibility
> and avoid transformations (the table is a CSV file provided by an external
> team). I wanted in a first step to avoid creating files for the
> corresponding dimensions a generate a “clean” facts table having foreign
> keys corresponding to  the primary keys of dimension tables.
>
> The creation of the model was OK.
>
> However the cube generation failed at first step, and I got this message :
>
>
>
> INFO  : Query ID = hive_20180228120101_6990f9d4-
> 182d-4dd9-b319-fce02caf75ef
>
> INFO  : Total jobs = 3
>
> INFO  : Launching Job 1 out of 3
>
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>
> INFO  : In order to change the average load for a reducer (in bytes):
>
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
>
> INFO  : In order to limit the maximum number of reducers:
>
> INFO  :   set hive.exec.reducers.max=
>
> INFO  : In order to set a constant number of reducers:
>
> INFO  :   set mapreduce.job.reduces=
>
> INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
>
> ERROR : Job hasn't been submitted after 61s. Aborting it.
>
>
>
> How could I process to avoid this. Are there kylin parameters (or other)
> to adjust ?
>
>
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
>
> Jean-Luc
>
>
>
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is 

Re: about kylin Sample problem

2018-03-01 Thread Ge Silas
Hi,

The log contains a complete Hive command to be run. It looks that step failed. 
You can find that Hive command and try to run that by yourself in Hive CLI to 
see if it reports any errors.

Thanks,
Silas

On 1 Mar 2018, at 9:43 AM, nitc_...@qq.com wrote:




nitc_...@qq.com

发件人: nitc_...@qq.com
发送时间: 2018-02-26 10:02
收件人: user
主题: 回复: about kylin Sample problem
“Kylin provides a script for you to create a sample Cube; the script will also 
create five sample hive tables:

  1.  Run ${KYLIN_HOME}/bin/sample.sh ; Restart kylin server to flush the 
caches;
  2.  Logon Kylin web with default user ADMIN/KYLIN, select project 
“learn_kylin” in the project dropdown list (left upper corner);
  3.  Select the sample cube “kylin_sales_cube”, click “Actions” -> “Build”, 
pick up a date later than 2014-01-01 (to cover all 1 sample records);”

Do this step log in the following error.
 The environment is on Cloudera QuickStart VM 5.12 ,on VMWare, and the 
resources are given to 4cpu 8G memory


N SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRY
WHERE (KYLIN_SALES.PART_DT >= '2014-01-01' AND KYLIN_SALES.PART_DT < 
'2018-01-01')
;

" --hiveconf hive.merge.mapredfiles=false --hiveconf hive.merge.mapfiles=false 
--hiveconf hive.stats.autogather=true --hiveconf 
hive.auto.convert.join.noconditionaltask=true --hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask.size=1 --hiveconf 
hive.auto.convert.join=true --hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1
at 
org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:92)
at 
org.apache.kylin.source.hive.CreateFlatHiveTableStep.createFlatHiveTable(CreateFlatHiveTableStep.java:53)
at 
org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(CreateFlatHiveTableStep.java:71)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2018-02-07 06:24:00,234 INFO  [Scheduler 1859819760 Job 
90d1f92f-5797-4222-bef0-409e4534690d-139] execution.ExecutableManager:421 : job 
id:90d1f92f-5797-4222-bef0-409e4534690d-00 from RUNNING to ERROR
2018-02-07 06:24:00,318 INFO  [Scheduler 1859819760 Job 
90d1f92f-5797-4222-bef0-409e4534690d-139] execution.ExecutableManager:421 : job 
id:90d1f92f-5797-4222-bef0-409e4534690d from RUNNING to ERROR
2018-02-07 06:24:00,318 DEBUG [Scheduler 1859819760 Job 
90d1f92f-5797-4222-bef0-409e4534690d-139] execution.AbstractExecutable:259 : no 
need to send email, user list is empty


nitc_...@qq.com

发件人: nitc_...@qq.com
发送时间: 2018-02-08 11:34
收件人: user
主题: about kylin Sample's problem

“Kylin provides a script for you to create a sample Cube; the script will also 
create five sample hive tables:

  1.  Run ${KYLIN_HOME}/bin/sample.sh ; Restart kylin server to flush the 
caches;
  2.  Logon Kylin web with default user ADMIN/KYLIN, select project 
“learn_kylin” in the project dropdown list (left upper corner);
  3.  Select the sample cube “kylin_sales_cube”, click “Actions” -> “Build”, 
pick up a date later than 2014-01-01 (to cover all 1 sample records);”

Do this step log in the following error.
 The environment is on Cloudera QuickStart VM 5.12 ,on VMWare, and the 
resources are given to 4cpu 8G memory


N SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRY
WHERE (KYLIN_SALES.PART_DT >= '2014-01-01' AND KYLIN_SALES.PART_DT < 
'2018-01-01')
;

" --hiveconf hive.merge.mapredfiles=false --hiveconf hive.merge.mapfiles=false 
--hiveconf hive.stats.autogather=true --hiveconf 
hive.auto.convert.join.noconditionaltask=true --hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask.size=1 --hiveconf 
hive.auto.convert.join=true --hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1
at 
org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:92)
at 
org.apache.kylin.source.hive.CreateFlatHiveTableStep.createFlatHiveTable(CreateFlatHiveTableStep.java:53)
at 
org.apache.kylin.source.hive.CreateFlatHiveTableStep.doWork(CreateFlatHiveTableStep.java:71)
at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at 

RE: Questions about 'RAW' measure

2018-03-01 Thread BELLIER Jean-luc
Hello Alberto,

Thank you for your answer. I will look further for this mistake on the cube 
building.

Concerning the RAW measure, are you referring to this discussion  ?
I still can see this option on measures section on Kylin 2.2, that is why it 
kept my attention.
Does it mean that to access raw data, we need to first use an aggregated 
measure ? My final users mainly use raw data (e.g. slicing), so I want to be 
sure on that.

What about building cubes using only a table of facts with all the data inside 
? Is it a conceivable way of doing (in terms of space storage, efficiency) or 
is it preferable to use separate tables foe dimensions and why ?

Thank you in advance for your help.
Have a good day.

Best regards,
Jean-Luc.

De : Alberto Ramón [mailto:a.ramonporto...@gmail.com]
Envoyé : mercredi 28 février 2018 19:04
À : user 
Objet : Re: Questions about 'RAW' measure

Hello
- RAW format are deprecated. You will find the thread in this MailList
- "Job hasn't been submitted after" sound a configuration problem with your 
YARN, please find it on Google and review your CPU and RAM resources

On 28 February 2018 at 16:44, BELLIER Jean-luc 
> wrote:
Hello

I discovered that there wsas a RAW measure to get raw data instead of 
aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/)

My assumption is that these raw data are stored in HBase, as aggregated data 
are, i.e. these data are duplicated from Hive into HBase.
So my question is : are there limitations on the data volume ? My fact tables 
contain billions of rows and we need to get detailed information from them. So 
what are the restrictions, and also the benefits related to querying directly 
the data into Hive ?

I have another question : I tested the way to create a model directly from a  
facts table containing raw data, in order to make a test of feasibility and 
avoid transformations (the table is a CSV file provided by an external team). I 
wanted in a first step to avoid creating files for the corresponding dimensions 
a generate a “clean” facts table having foreign keys corresponding to  the 
primary keys of dimension tables.
The creation of the model was OK.
However the cube generation failed at first step, and I got this message :

INFO  : Query ID = hive_20180228120101_6990f9d4-182d-4dd9-b319-fce02caf75ef
INFO  : Total jobs = 3
INFO  : Launching Job 1 out of 3
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=
INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
ERROR : Job hasn't been submitted after 61s. Aborting it.

How could I process to avoid this. Are there kylin parameters (or other) to 
adjust ?

Thank you in advance for your help. Have a good day.
Best regards,
Jean-Luc





"Ce message est destiné exclusivement aux personnes ou entités auxquelles il 
est adressé et peut contenir des informations privilégiées ou confidentielles. 
Si vous avez reçu ce document par erreur, merci de nous l'indiquer par retour, 
de ne pas le transmettre et de procéder à sa destruction.

This message is solely intended for the use of the individual or entity to 
which it is addressed and may contain information that is privileged or 
confidential. If you have received this communication by error, please notify 
us immediately by electronic mail, do not disclose it and delete the original 
message."



"Ce message est destiné exclusivement aux personnes ou entités auxquelles il 
est adressé et peut contenir des informations privilégiées ou confidentielles. 
Si vous avez reçu ce document par erreur, merci de nous l'indiquer par retour, 
de ne pas le transmettre et de procéder à sa destruction.

This message is solely intended for the use of the individual or entity to 
which it is addressed and may contain information that is privileged or 
confidential. If you have received this communication by error, please notify 
us immediately by electronic mail, do not disclose it and delete the original 
message."