Re: kylin 作为Grafana 支持的一个数据源

2018-10-16 Thread Alberto Ramón
If your column is by hours, Days, . . . this use case is good for Apache
Kylin
If your column is by TimeStamp, is not the best scenario for Apache Kylin

this means what in the best scenario in Grafana, you will see values
grouped by Hours

On Tue, 16 Oct 2018 at 13:20, 潘博存  wrote:

>
>
>-
>1.Grafana is time-based and needs to wrap the time columns, but that 
> doesn't mean that grafana's data sources are all sequential databases, just 
> as grafana supports MySQL and SQL Server。
>-
>2.In our business scenario, we put more emphasis on Grafan's external 
> presentation capabilities, and in terms of timelines we use our business 
> dates by day, hour, etc.
>
>
>
> So I think grafana + kylin is another form of presentation besides saiku, 
> tableup, and so on. In fact, we're trying to put saiku as a grafan's layout 
> plug-in into grafan for data presentation
>
>
>
>
>
> ----------
> 发件人:Alberto Ramón 
> 发送时间:2018年10月16日(星期二) 17:31
> 收件人:user 
> 抄 送:潘博存 ; dev 
> 主 题:Re: kylin 作为Grafana 支持的一个数据源
>
> I checked this possibility time ago (2-3years)
> Grafana is focus in time-line series (one column must be TimeStamp)
> Work with TS doesn't sense in A Kylin, because you are not aggregating
>
> On Tue, 16 Oct 2018 at 06:04, ShaoFeng Shi  wrote:
> Good question, let me translate it to English:
>
> Grafana is one of our important data visualization tools; Kylin is a
> powerful tool for big data query, we want to display Kylin data on Grafana,
> is there anyone already running this solution? Is there a grafana-kylin
> plugin that can be used directly? Currently, Grafana doesn't provide a
> plugin for Kylin.
>
> 潘博存  于2018年10月16日周二 上午11:27写道:
>
> hi,all
>大数据可视化这一块,Grafana 是我们的一个重要展现工具,kylin 的快速查询 是大数据查询的利器,我们想在grafana
> 上展示kylin的数据,不知道大家有没有这样使用的?是否有可以直接使用的grafana -kylin 插件.目前grafana
> 是没有kylin的插件的,
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>
>


Re: kylin 作为Grafana 支持的一个数据源

2018-10-16 Thread Alberto Ramón
I checked this possibility time ago (2-3years)
Grafana is focus in time-line series (one column must be TimeStamp)
Work with TS doesn't sense in A Kylin, because you are not aggregating

On Tue, 16 Oct 2018 at 06:04, ShaoFeng Shi  wrote:

> Good question, let me translate it to English:
>
> Grafana is one of our important data visualization tools; Kylin is a
> powerful tool for big data query, we want to display Kylin data on Grafana,
> is there anyone already running this solution? Is there a grafana-kylin
> plugin that can be used directly? Currently, Grafana doesn't provide a
> plugin for Kylin.
>
> 潘博存  于2018年10月16日周二 上午11:27写道:
>
>>
>> hi,all
>>大数据可视化这一块,Grafana 是我们的一个重要展现工具,kylin 的快速查询 是大数据查询的利器,我们想在grafana
>> 上展示kylin的数据,不知道大家有没有这样使用的?是否有可以直接使用的grafana -kylin 插件.目前grafana
>> 是没有kylin的插件的,
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Can I build an hierarchy aggregation group with joint dimensions

2018-09-20 Thread Alberto Ramón
https://issues.apache.org/jira/browse/KYLIN-2149

On Thu, 20 Sep 2018 at 05:52, you Zhuang  wrote:

> Example: (aid,aname),(bid,bname),(cid,cname).  The three joint dimensions
> are also hierarchical .


Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Alberto Ramón
You can monitoring your yarn in step 3
In any case, step 3 is a sample of Fat table to estimate number of keys for
each dim
If this step takes a lot of time, you will need review your cube design

Alb

On 14 March 2018 at 16:54, Sonny Heer <sonnyh...@gmail.com> wrote:

> 8 YARN nodes with 11 slots each.  each slot is configured to ~2gb.  Step
> #3 in Kylin is launching 19 mappers and 5 reducers.  5 reducers when there
> are 88 slots.
>
> btw: kylin version is 1.6
>
> On Wed, Mar 14, 2018 at 9:48 AM, Sonny Heer <sonnyh...@gmail.com> wrote:
>
>> YARN is properly configured.  we use many other m/r and spark programs
>> that utilize the full slots.  It's only when building cubes.
>>
>> On Wed, Mar 14, 2018 at 9:46 AM, Alberto Ramón <a.ramonporto...@gmail.com
>> > wrote:
>>
>>> You need  check your yarn configuration first
>>>
>>> On Wed, 14 Mar 2018, 14:58 Sonny Heer, <sonnyh...@gmail.com> wrote:
>>>
>>>> Step 3 isn't using our full cluster.  How can i increase the
>>>> mappers/reducers to use all the slots?  Any config to look at in kylin?
>>>>
>>>> Thanks
>>>>
>>>
>>
>


Re: #3 Step Name: Extract Fact Table Distinct Columns (slow)

2018-03-14 Thread Alberto Ramón
You need  check your yarn configuration first

On Wed, 14 Mar 2018, 14:58 Sonny Heer,  wrote:

> Step 3 isn't using our full cluster.  How can i increase the
> mappers/reducers to use all the slots?  Any config to look at in kylin?
>
> Thanks
>


Re: RAW Measure kylin 2.3

2018-03-06 Thread Alberto Ramón
>From this mailList: questions about 'RAW'  measures

MailList

: Kylin 3062  v2.3
Propose to disable RAW from UI

On 6 Mar 2018 1:38 p.m., "deva namaste"  wrote:

> Hello,
>
> I do not see RAW measure after I upgraded to kylin version 2.3.
>
> Any other alternative measure we should use to show the raw data as is?
> (Instead of RAW measure, any other alternative which can be used?)
>
> Thanks
> Deva
>


Re: Get daily average for periodic readings

2018-03-01 Thread Alberto Ramón
You cant portioned your cube per week.  Must be per -mm-dd

You can perform your own test.  Doing a calculate per year as dim and year
as sum of days

On 1 Mar 2018 3:50 p.m., "deva namaste" <ohd...@gmail.com> wrote:

> Hi Alberto,
>
> when I was saying 6 vs 365 its for one item. for 20 Million items it will
> multiply by a lot.  Do you think it wont make much differnce?
> Also what is  YY-MM-WW ? so I can explain you? Basically I need same
> avg() for week, month, year, etc.
>
> Thanks
> Deva
>
> On Thu, Mar 1, 2018 at 8:42 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> - the 95% of time response, are latencies (= there is no difference
>> between sum one int or 365, I thought the same when I started with Kylin)
>> - The YY-MM-WW, is not implemented, but can be nice if you can contribute
>> to it
>>
>> Alb
>>
>> On 28 February 2018 at 22:59, deva namaste <ohd...@gmail.com> wrote:
>>
>>> I was thinking of saving only 6 records in kylin instead of splitting
>>> them outside in daily avg and adding 365 records for each item.  So is
>>> there anyway I can achieve using sql level in kylin or have changes to
>>> model to accomodate above change? Please advice. Thanks
>>>
>>> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón <
>>> a.ramonporto...@gmail.com> wrote:
>>>
>>>> Sounds like:
>>>> - your minimum granularity for queries are on Weeks, your fact table
>>>> need be on weeks (or less, like days)
>>>> - you will need expand you actual fact table to weeks (or more, days)
>>>> Example use a hive view
>>>> - as extra:  Kylin can't use partition format columns on weeks, the
>>>> minimum es days
>>>>
>>>> Alb
>>>>
>>>> On 28 February 2018 at 21:51, deva namaste <ohd...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> How would I calculate value for a week while I have bi-monthly values.
>>>>>
>>>>> e.g. Here is my data looks like -
>>>>>
>>>>> Date   -  Value
>>>>> 01/18/2017 -  100
>>>>> 03/27/2017 -  130  (68 Days)
>>>>> 05/17/2017 -  102  (51 Days)
>>>>>
>>>>> I need average value per week, as below. Lets consider between 03/27
>>>>> and 05/17. So total days between period are 51. so Daily average would be
>>>>> 102/51= 2.04
>>>>>
>>>>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>>>>> Week1 (Starting Apr 2, #days = 7) = 14.28
>>>>> Week2 (starting Apr 9, #days = 7)= 14.28
>>>>> Week3 (starting Apr 16, #days = 7)= 14.28
>>>>> Week4 (starting Apr 23, #days = 7)= 14.28
>>>>> week5 (Starting Apr 30, #days =7)= 14.28
>>>>> week1 (starting May 7, #days = 7)= 14.28
>>>>> Week2 (starting May 14, #days = 4)= 8.16
>>>>>
>>>>> But as you see that period from 01/18 to 03/27, have 68 days and daily
>>>>> average would be 130/68=1.91
>>>>>
>>>>> So really to get complete week I need 3 days from 130 value and 4 days
>>>>> from 102 value.
>>>>>
>>>>> So real total for that first week would be -
>>>>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
>>>>> 13.89
>>>>>
>>>>> How would I achieve this in Kylin? Any function? or other method I can
>>>>> use?
>>>>> Just for 6 records for year, I dont want to populate daily records.
>>>>> Thanks
>>>>> Deva
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Get daily average for periodic readings

2018-03-01 Thread Alberto Ramón
- the 95% of time response, are latencies (= there is no difference between
sum one int or 365, I thought the same when I started with Kylin)
- The YY-MM-WW, is not implemented, but can be nice if you can contribute
to it

Alb

On 28 February 2018 at 22:59, deva namaste <ohd...@gmail.com> wrote:

> I was thinking of saving only 6 records in kylin instead of splitting them
> outside in daily avg and adding 365 records for each item.  So is there
> anyway I can achieve using sql level in kylin or have changes to model to
> accomodate above change? Please advice. Thanks
>
> On Wed, Feb 28, 2018 at 5:51 PM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> Sounds like:
>> - your minimum granularity for queries are on Weeks, your fact table need
>> be on weeks (or less, like days)
>> - you will need expand you actual fact table to weeks (or more, days)
>> Example use a hive view
>> - as extra:  Kylin can't use partition format columns on weeks, the
>> minimum es days
>>
>> Alb
>>
>> On 28 February 2018 at 21:51, deva namaste <ohd...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> How would I calculate value for a week while I have bi-monthly values.
>>>
>>> e.g. Here is my data looks like -
>>>
>>> Date   -  Value
>>> 01/18/2017 -  100
>>> 03/27/2017 -  130  (68 Days)
>>> 05/17/2017 -  102  (51 Days)
>>>
>>> I need average value per week, as below. Lets consider between 03/27 and
>>> 05/17. So total days between period are 51. so Daily average would be
>>> 102/51= 2.04
>>>
>>> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
>>> Week1 (Starting Apr 2, #days = 7) = 14.28
>>> Week2 (starting Apr 9, #days = 7)= 14.28
>>> Week3 (starting Apr 16, #days = 7)= 14.28
>>> Week4 (starting Apr 23, #days = 7)= 14.28
>>> week5 (Starting Apr 30, #days =7)= 14.28
>>> week1 (starting May 7, #days = 7)= 14.28
>>> Week2 (starting May 14, #days = 4)= 8.16
>>>
>>> But as you see that period from 01/18 to 03/27, have 68 days and daily
>>> average would be 130/68=1.91
>>>
>>> So really to get complete week I need 3 days from 130 value and 4 days
>>> from 102 value.
>>>
>>> So real total for that first week would be -
>>> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
>>> 13.89
>>>
>>> How would I achieve this in Kylin? Any function? or other method I can
>>> use?
>>> Just for 6 records for year, I dont want to populate daily records.
>>> Thanks
>>> Deva
>>>
>>>
>>>
>>
>


Re: Questions about 'RAW' measure

2018-03-01 Thread Alberto Ramón
MailList
<http://apache-kylin.74782.x6.nabble.com/Discuss-Disable-hide-RAW-measure-in-Kylin-web-GUI-tp6636.html>:
Kylin 3062 <https://issues.apache.org/jira/browse/KYLIN-3062> v2.3 Propose
to disable RAW from UI

Nowadays you cant control the execution (or not) to create Flat Tables,
there is a propuse Kylin 2532
<https://issues.apache.org/jira/browse/KYLIN-2532?focusedCommentId=15956535=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15956535>
v2.1



On 1 March 2018 at 08:30, BELLIER Jean-luc <jean-luc.bell...@rte-france.com>
wrote:

> Hello Alberto,
>
>
>
> Thank you for your answer. I will look further for this mistake on the
> cube building.
>
>
>
> Concerning the RAW measure, are you referring to this discussion  ?
>
> I still can see this option on measures section on Kylin 2.2, that is why
> it kept my attention.
>
> Does it mean that to access raw data, we need to first use an aggregated
> measure ? My final users mainly use raw data (e.g. slicing), so I want to
> be sure on that.
>
>
>
> What about building cubes using only a table of facts with all the data
> inside ? Is it a conceivable way of doing (in terms of space storage,
> efficiency) or is it preferable to use separate tables foe dimensions and
> why ?
>
>
>
> Thank you in advance for your help.
>
> Have a good day.
>
>
>
> Best regards,
>
> Jean-Luc.
>
>
>
> *De :* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> *Envoyé :* mercredi 28 février 2018 19:04
> *À :* user <user@kylin.apache.org>
> *Objet :* Re: Questions about 'RAW' measure
>
>
>
> Hello
>
> - RAW format are deprecated. You will find the thread in this MailList
> - "Job hasn't been submitted after" sound a configuration problem with
> your YARN, please find it on Google and review your CPU and RAM resources
>
>
>
> On 28 February 2018 at 16:44, BELLIER Jean-luc <
> jean-luc.bell...@rte-france.com> wrote:
>
> Hello
>
>
>
> I discovered that there wsas a RAW measure to get raw data instead of
> aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-
> kylin/)
>
>
>
> My assumption is that these raw data are stored in HBase, as aggregated
> data are, i.e. these data are duplicated from Hive into HBase.
>
> So my question is : are there limitations on the data volume ? My fact
> tables contain billions of rows and we need to get detailed information
> from them. So what are the restrictions, and also the benefits related to
> querying directly the data into Hive ?
>
>
>
> I have another question : I tested the way to create a model directly from
> a  facts table containing raw data, in order to make a test of feasibility
> and avoid transformations (the table is a CSV file provided by an external
> team). I wanted in a first step to avoid creating files for the
> corresponding dimensions a generate a “clean” facts table having foreign
> keys corresponding to  the primary keys of dimension tables.
>
> The creation of the model was OK.
>
> However the cube generation failed at first step, and I got this message :
>
>
>
> INFO  : Query ID = hive_20180228120101_6990f9d4-
> 182d-4dd9-b319-fce02caf75ef
>
> INFO  : Total jobs = 3
>
> INFO  : Launching Job 1 out of 3
>
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>
> INFO  : In order to change the average load for a reducer (in bytes):
>
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
>
> INFO  : In order to limit the maximum number of reducers:
>
> INFO  :   set hive.exec.reducers.max=
>
> INFO  : In order to set a constant number of reducers:
>
> INFO  :   set mapreduce.job.reduces=
>
> INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
>
> ERROR : Job hasn't been submitted after 61s. Aborting it.
>
>
>
> How could I process to avoid this. Are there kylin parameters (or other)
> to adjust ?
>
>
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
>
> Jean-Luc
>
>
>
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not 

Re: Get daily average for periodic readings

2018-02-28 Thread Alberto Ramón
Sounds like:
- your minimum granularity for queries are on Weeks, your fact table need
be on weeks (or less, like days)
- you will need expand you actual fact table to weeks (or more, days)
Example use a hive view
- as extra:  Kylin can't use partition format columns on weeks, the minimum
es days

Alb

On 28 February 2018 at 21:51, deva namaste  wrote:

> Hello,
>
> How would I calculate value for a week while I have bi-monthly values.
>
> e.g. Here is my data looks like -
>
> Date   -  Value
> 01/18/2017 -  100
> 03/27/2017 -  130  (68 Days)
> 05/17/2017 -  102  (51 Days)
>
> I need average value per week, as below. Lets consider between 03/27 and
> 05/17. So total days between period are 51. so Daily average would be
> 102/51= 2.04
>
> Week4 (Starting March 26, #days = 4) = (4 x 2.04) = 8.16
> Week1 (Starting Apr 2, #days = 7) = 14.28
> Week2 (starting Apr 9, #days = 7)= 14.28
> Week3 (starting Apr 16, #days = 7)= 14.28
> Week4 (starting Apr 23, #days = 7)= 14.28
> week5 (Starting Apr 30, #days =7)= 14.28
> week1 (starting May 7, #days = 7)= 14.28
> Week2 (starting May 14, #days = 4)= 8.16
>
> But as you see that period from 01/18 to 03/27, have 68 days and daily
> average would be 130/68=1.91
>
> So really to get complete week I need 3 days from 130 value and 4 days
> from 102 value.
>
> So real total for that first week would be -
> Week4 (Starting March 26, #days = 4) = (4x2.04=8.16) + (3x1.91=5.73) =
> 13.89
>
> How would I achieve this in Kylin? Any function? or other method I can
> use?
> Just for 6 records for year, I dont want to populate daily records.
> Thanks
> Deva
>
>
>


Re: Questions about 'RAW' measure

2018-02-28 Thread Alberto Ramón
Hello

- RAW format are deprecated. You will find the thread in this MailList
- "Job hasn't been submitted after" sound a configuration problem with your
YARN, please find it on Google and review your CPU and RAM resources

On 28 February 2018 at 16:44, BELLIER Jean-luc <
jean-luc.bell...@rte-france.com> wrote:

> Hello
>
>
>
> I discovered that there wsas a RAW measure to get raw data instead of
> aggregated data (http://kylin.apache.org/blog/2016/05/29/raw-measure-in-
> kylin/)
>
>
>
> My assumption is that these raw data are stored in HBase, as aggregated
> data are, i.e. these data are duplicated from Hive into HBase.
>
> So my question is : are there limitations on the data volume ? My fact
> tables contain billions of rows and we need to get detailed information
> from them. So what are the restrictions, and also the benefits related to
> querying directly the data into Hive ?
>
>
>
> I have another question : I tested the way to create a model directly from
> a  facts table containing raw data, in order to make a test of feasibility
> and avoid transformations (the table is a CSV file provided by an external
> team). I wanted in a first step to avoid creating files for the
> corresponding dimensions a generate a “clean” facts table having foreign
> keys corresponding to  the primary keys of dimension tables.
>
> The creation of the model was OK.
>
> However the cube generation failed at first step, and I got this message :
>
>
>
> INFO  : Query ID = hive_20180228120101_6990f9d4-
> 182d-4dd9-b319-fce02caf75ef
>
> INFO  : Total jobs = 3
>
> INFO  : Launching Job 1 out of 3
>
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>
> INFO  : In order to change the average load for a reducer (in bytes):
>
> INFO  :   set hive.exec.reducers.bytes.per.reducer=
>
> INFO  : In order to limit the maximum number of reducers:
>
> INFO  :   set hive.exec.reducers.max=
>
> INFO  : In order to set a constant number of reducers:
>
> INFO  :   set mapreduce.job.reduces=
>
> INFO  : Starting Spark Job = 3556ecc6-2609-4085-bcca-b1b81fa9855c
>
> ERROR : Job hasn't been submitted after 61s. Aborting it.
>
>
>
> How could I process to avoid this. Are there kylin parameters (or other)
> to adjust ?
>
>
>
> Thank you in advance for your help. Have a good day.
>
> Best regards,
>
> Jean-Luc
>
>
>
>
>
>
>
>
> "Ce message est destiné exclusivement aux personnes ou entités auxquelles
> il est adressé et peut contenir des informations privilégiées ou
> confidentielles. Si vous avez reçu ce document par erreur, merci de nous
> l'indiquer par retour, de ne pas le transmettre et de procéder à sa
> destruction.
>
> This message is solely intended for the use of the individual or entity to
> which it is addressed and may contain information that is privileged or
> confidential. If you have received this communication by error, please
> notify us immediately by electronic mail, do not disclose it and delete the
> original message."
>


RE: Optimize Cube Build process

2018-02-01 Thread Alberto Ramón
How many process are you runing in parallel? In build cube step

On 1 Feb 2018 7:39 a.m., "Kumar, Manoj H" <manoj.h.ku...@jpmorgan.com>
wrote:

> We have 15 nodes & each nodes have 8 cores. RAM= 256 MB.
>
>
>
> I don’t think memory is issue here.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Alberto Ramón [mailtoa.ramonporto...@gmail.com]
> *Sent:* Thursday, February 01, 2018 1:33 AM
> *To:* user <user@kylin.apache.org>
> *Subject:* Re: Optimize Cube Build process
>
>
>
> How many nodes do you have?
>
> how many RAM and CPU do you have per node?
>
>
>
> On 31 January 2018 at 05:07, Kumar, Manoj H <manoj.h.ku...@jpmorgan.com>
> wrote:
>
> It has close to 68 mapper & reducers 500.. It keeps running on this. Pls.
> advise.
>
> [image: cid:image001.png@01D39B5D.E7ED9FD0]
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Wednesday, January 31, 2018 9:24 AM
> *To:* 'user@kylin.apache.org' <user@kylin.apache.org>
> *Subject:* Optimize Cube Build process
>
>
>
> Hi Folks – I have close to 33 million of fact data to be processed, Data
> is having lot of unique/Distinct values such Loan_unique_code,
> Facility_code,card_id such.. Dimension looks up are made of these.
>
>
>
> Fact table – 33 millions
>
> Looks up tables having to 3 to 4 millions
>
> Cube build type I have chosen – inmem
>
> Engine – Mapreduce
>
>
>
> Cube build step is taking 90 minutes which is seems to be high. What I can
> do in order to minimize build time? What Parameter I should tweak so that
> Build time gets reduced. Thanks.
>
>
>
>
>
> I have Followed the same steps as given below but it doesn’t help in this
> case
>
>
>
> http://kylin.apache.org/docs21/howto/howto_optimize_build.html
>
>
>
>
>
> Regards,
>
> Manoj
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>


Re: Optimize Cube Build process

2018-01-31 Thread Alberto Ramón
How many nodes do you have?
how many RAM and CPU do you have per node?

On 31 January 2018 at 05:07, Kumar, Manoj H 
wrote:

> It has close to 68 mapper & reducers 500.. It keeps running on this. Pls.
> advise.
>
>
>
> Regards,
>
> Manoj
>
>
>
> *From:* Kumar, Manoj H
> *Sent:* Wednesday, January 31, 2018 9:24 AM
> *To:* 'user@kylin.apache.org' 
> *Subject:* Optimize Cube Build process
>
>
>
> Hi Folks – I have close to 33 million of fact data to be processed, Data
> is having lot of unique/Distinct values such Loan_unique_code,
> Facility_code,card_id such.. Dimension looks up are made of these.
>
>
>
> Fact table – 33 millions
>
> Looks up tables having to 3 to 4 millions
>
> Cube build type I have chosen – inmem
>
> Engine – Mapreduce
>
>
>
> Cube build step is taking 90 minutes which is seems to be high. What I can
> do in order to minimize build time? What Parameter I should tweak so that
> Build time gets reduced. Thanks.
>
>
>
>
>
> I have Followed the same steps as given below but it doesn’t help in this
> case
>
>
>
> http://kylin.apache.org/docs21/howto/howto_optimize_build.html
>
>
>
>
>
> Regards,
>
> Manoj
>
>
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>


Re: segment size estimate when merging

2018-01-27 Thread Alberto Ramón
Could be this related? KYLIN-2779
, this JIRA have a lot of
sense

On 24 January 2018 at 13:43, ShaoFeng Shi  wrote:

> Hi Qilong,
>
> If seg A's estimation size is 10 GB, but real size is 5 GB; then when
> merge or build another segment, we can adjust the estimated size by divide
> by 2. Then it should be closer with real size.
>
> 2018-01-24 9:49 GMT+08:00 苏启龙 :
>
>> Many thanks shaofeng! We’ll check more on these parameters to see how to
>> make it better.
>>
>> BTW, what do u mean by the last line? I mean by which way I can introduce
>> the actual size to help Kylin to adjust the estimation? Currently I can
>> only use the max-regions parameter manually, but this is not convenient for
>> auto-merging.
>>
>> QIlong
>>
>> 发件人: ShaoFeng Shi 
>> 答复: "user@kylin.apache.org" 
>> 日期: 2018年1月23日 星期二 21:49
>>
>> 至: user 
>> 抄送: 林豪(linhao)-技术产品中心 
>> 主题: Re: segment size estimate when merging
>>
>> Hi Qilong,
>>
>> Does your cube have count-distinct or Top-N measure?
>>
>> If you observed that there are too many or too small hbase regions, you
>> can adjust some parameters:
>>
>> kylin.cube.size-estimate-ratio=0.25
>> kylin.cube.size-estimate-countdistinct-ratio=0.05
>>
>> The default ratio for common case is 0.25, you can set it to smaller if
>> the estimated size is bigger than actual size. These two parameters can be
>> set at Cube level.
>>
>> A better way is when doing merge, using the actual size of existing
>> segments to adjust the estimated size, then get a closer result.
>>
>> 2018-01-23 14:47 GMT+08:00 苏启龙 :
>>
>>> Hi shaofeng,
>>>
>>> Yes, it’s usually smaller then the sum of each segment, but usually a
>>> small amount compared with the total size.
>>>
>>> But for the statistics estimate, usually result in a N times larger than
>>> it actually be, and results in a huge waste of HBase region numbers。
>>>
>>>
>>>1. Do you have any data about deviation of the two ways in
>>>statistics? I mean generally which way will be closer?
>>>2. Is there any improve plan for this in the roadmap? Or some
>>>consideration to give more options to user to select their own estimate
>>>algo?
>>>
>>>
>>> Thanks
>>>
>>> Qilong
>>>
>>> 发件人: ShaoFeng Shi 
>>> 答复: "user@kylin.apache.org" 
>>> 日期: 2018年1月23日 星期二 09:43
>>> 至: user 
>>> 抄送: 林豪(linhao)-技术产品中心 
>>> 主题: Re: segment size estimate when merging
>>>
>>> Hi Qilong,
>>>
>>> When merging segments, the dimension-measure values (k-v) will be
>>> re-orged and the same key will be merged, so the merged size is not simply
>>> a sum of each segment; usually, it is smaller than before.
>>>
>>> Always using the statistics to estimate the size is for consistency. Of
>>> course, there is room to improve the estimation accuracy.
>>>
>>>
>>>
>>> 2018-01-22 16:54 GMT+08:00 苏启龙 :
>>>

 Hi,

 We have some unclear points about the segment size estimate when
 merging multi-segments.

 We find that the segment merge job still uses
 CubeStatsReader::getCuboidSizeMap to estimate the total size of the
 merged segment. From our understanding, when building a new segment, Kylin
 uses this way to estimate the total size is OK since no other info we can
 turn to. But in merging we may sum the table size of the segments to be
 merged, which should be more accurate.

 So why for this consideration?



 Su Qilong

>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: MDX queries on kylin cubes.

2018-01-24 Thread Alberto Ramón
You can check this Q in older mail List.

You must think that Apache Kylin has been designed to use SQL as language
(Use Apache Calcite to do it)

Either way, if you want use MDX:

E xcel using Mondrian

Mondrian


On 24 January 2018 at 20:02, Db-Blog  wrote:

> Hi Prasanna & Team,
> Can you please suggest if you were able to access kylin cube using MDX
> queries?
>
> Thanks,
> Saurabh
>
> Sent from my iPhone, please avoid typos.
>
> On 17-Jan-2018, at 10:06 AM, Prasanna 
> wrote:
>
> Hi all,
>
>
>
>   I am using kylin 2.2.0 version. Present I am using only sql type queries
> on kylin cubes like select with aggregation functions. I would like to use
> MDX queries on cubes. If anybody is using please can you guide me, any
> document is available regarding of this.
>
>
>
>
>
> Thanks,
>
> Prasanna.P
>
>


Re: #20 Step Name: Load HFile to HBase Table failed

2018-01-22 Thread Alberto Ramón
You created HFiles, but Kylin don't have permissions to execute
CompleteBulkLoad
Its a typical issue on this mail-list, check permission of user that start
Kylin service

On 22 January 2018 at 09:14, Neters  wrote:

> Hello guys:
>
> I have some problem when the program access to  #20 Step Name: Load HFile
> to HBase Table ;
> and the log displays that:
>
> Could you please advice me some solution to check it out?
>
> The detail kylin.log is attached,please check it.
>
> Thank you
>
> Best Regards
>


Re: Re: kylin前端业务查询问题

2018-01-21 Thread Alberto Ramón
could you check these notes, too:

To use, kylin.query.timeout-seconds, you will need Kylin 2.0
https://issues.apache.org/jira/browse/KYLIN-2847 (v2.4)
https://issues.apache.org/jira/browse/KYLIN-3157 (open)

2018-01-17 5:15 GMT+00:00 杨浩 :

> The conf of ''kylin.query.timeout-seconds" may help to stop long query
>
> 在 2017年12月29日 下午3:13,chenping...@keruyun.com 写道:
>
>> 多谢你的及时回复
>>
>> --
>>
>> 陈平  DBA工程师
>>
>>
>>
>> 成都时时客科技有限责任公司
>>
>> 地址:成都市高新区天府大道1268号1栋3层
>>
>> 邮编:610041
>>
>> 手机:15108456581 <(510)%20845-6581>
>>
>> 在线:QQ 625852056
>>
>> 官网:www.keruyun.com
>>
>> 客服:4006-315-666
>>
>>
>>
>>
>> *发件人:* Joanna He 
>> *发送时间:* 2017-12-29 15:05
>> *收件人:* user 
>> *主题:* Re: kylin前端业务查询问题
>> Translation: Hello my question is when there are multiple queries running
>> , how can I know what query is currently running. And how can I kill
>> the long-running query?
>>
>> Answer:
>> You can view your currently running query in logs/kylin.property under
>> your kylin installation directory.
>> There is no way to kill single query in kylin at the moment, the only way
>> to stop the query is to stop and start the kylin server.
>>
>> 你可以在kylin安装路径下的logs/kylin.property 中查看当前在运行的查询。目前你无法在kylin中kill掉单
>> 独的查询,只有靠重启kylin服务器来停止查询。
>>
>>
>> 2017-12-29 14:59 GMT+08:00 chenping...@keruyun.com <
>> chenping...@keruyun.com>:
>>
>>>
>>> 各位好,我现在遇到一个比较大的问题,前端有很多查询同时过来,我想知道怎么去查看当前的kylin实例正在运行哪些查询并且怎
>>> 么去kill掉运行时间很久的查询?
>>>
>>>
>>>
>>> --
>>>
>>> 陈平  DBA工程师
>>>
>>>
>>>
>>> 成都时时客科技有限责任公司
>>>
>>> 地址:成都市高新区天府大道1268号1栋3层
>>>
>>> 邮编:610041
>>>
>>> 手机:15108456581 <(510)%20845-6581>
>>>
>>> 在线:QQ 625852056
>>>
>>> 官网:www.keruyun.com
>>>
>>> 客服:4006-315-666
>>>
>>>
>>>
>>
>>
>


Re: flat table stored as parquet

2017-12-29 Thread Alberto Ramón
Could you check this: Kylin 3070
 v2.3

On 29 December 2017 at 22:22, Ruslan Dautkhanov 
wrote:

> Is there is a knob I can set to tell Kylin to create flat table
> as Parquet and not as default 'text' serialization?
> I mean that "flat" Hive table that Kylin creates when it builds a cube.
>
>
> Thanks!
> Ruslan Dautkhanov
>
>


Doubt Kylin 2363

2017-12-09 Thread Alberto Ramón
KYLIN-3067 

If you put dim_cap=3 and Dim are A,B,C,D
And you lunch a query  . . .  Group by A,B,C,D, how is this Q resolved?
is Base Cuboid is A,B,C,D?
Internal Cuboids are deleted after used?

BR, Alb


Re: availableVirtualCores

2017-11-29 Thread Alberto Ramón
yes, sorry:

When you execute:* ${KYLIN_HOME}/bin/check-env.sh*

it creates a file:  ${KYLIN_HOME}/logs/cluster.info with this text:
 availableMB=40460<- Correct
availableVirtualCor*es=3 * <- NO correct

which is used by: check-spark.sh in lines:
''saveFileName=${KYLIN_HOME}/logs/cluster.info"
"*yarn_available_cores=`getValueByKey availableVirtualCores
${saveFileName}`*"

On 28 November 2017 at 01:36, Li Yang <liy...@apache.org> wrote:

> Where do you see -- Cluster.info: 'availableVirtualCores=3'??
>
> Cannot recognize it.
>
> On Sat, Nov 25, 2017 at 4:29 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> Hello
>>
>> From Ambari, the number of virtual cores is 4:
>> [image: Inline images 1]
>>
>> But in the file Cluster.info: 'availableVirtualCores=3'
>>
>> (RAM is correct)
>>
>> I don't know from where Kylin read this config
>>
>
>


availableVirtualCores

2017-11-24 Thread Alberto Ramón
Hello

>From Ambari, the number of virtual cores is 4:
[image: Inline images 1]

But in the file Cluster.info: 'availableVirtualCores=3'

(RAM is correct)

I don't know from where Kylin read this config


Re: Can hierarchyDims contain jointDims

2017-11-17 Thread Alberto Ramón
https://issues.apache.org/jira/browse/KYLIN-2149

Check this link, you need choose between use one or other
Some times would be great use both together

On 17 November 2017 at 06:43, doom <43535...@qq.com> wrote:

> So what's the second code segment mean in AggregationGroup build step?
> is it means replace the hierarchy dim with the joint dims witch contain it?
>
>
> -- 原始邮件 --
> *发件人:* "ShaoFeng Shi";;
> *发送时间:* 2017年11月17日(星期五) 下午2:02
> *收件人:* "user";
> *主题:* Re: Can hierarchyDims contain jointDims
>
> Joint could not be used in the hierarchy.
>
> Joint means treating multiple dimensions as one: they either all appeared,
> either all not; It is a conflict with hierarchy.
>
> 2017-11-16 21:29 GMT+08:00 doom <43535...@qq.com>:
>
>> HI ALL:
>> I read the src code of kylin 2.2, and find:
>>
>> In class CubeDes, if hierarchyDims contain jointDims will throw exception.
>> public void validateAggregationGroups() {
>> ...
>> if (CollectionUtils.containsAny(hierarchyDims, jointDims)) {
>> logger.error("Aggregation group " + index + " hierarchy
>> dimensions overlap with joint dimensions");
>> throw new IllegalStateException(
>> "Aggregation group " + index + " hierarchy
>> dimensions overlap with joint dimensions: "
>> + 
>> ensureOrder(CollectionUtils.intersection(hierarchyDims,
>> jointDims)));
>> }
>>
>> But in class AggregationGroup will replace the hierarchy dim with the
>> joint dims witch contain it.
>> private void buildHierarchyMasks(RowKeyDesc rowKeyDesc) {
>> .
>> for (int i = 0; i < hierarchy_dims.length; i++) {
>> TblColRef hColumn = cubeDesc.getModel().findColumn
>> (hierarchy_dims[i]);
>> Integer index = rowKeyDesc.getColumnBitIndex(hColumn);
>> long bit = 1L << index;
>>
>> // combine joint as logic dim
>> if (dim2JointMap.get(bit) != null) {
>> bit = dim2JointMap.get(bit);
>> }
>>
>> mask.fullMask |= bit;
>> allMaskList.add(mask.fullMask);
>> dimList.add(bit);
>> }
>> }
>>
>> do i understand in a wrong way?
>>
>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Kylin and SuperSet

2017-09-12 Thread Alberto Ramón
Hi

Will be an official support of Apache Kylin on Apache SuperSet?


Re: Some questions about Kylin2.0

2017-06-13 Thread Alberto Ramón
Q1:Kylin 2633   The
actual version of spark is 1.6.3 (in Kylin 2.0.0)

On 13 June 2017 at 04:41, lxw  wrote:

> Hi,All :
>
>I have some questions about Kylin2.0, and my environment:
> hadoop-2.6.0-cdh5.8.3
> hbase-1.2.0-cdh5.8.3
> apache-kylin-2.0.0-bin-cdh57
> spark-2.1.0-bin-hadoop2.6
>
> *Q1: Kylin2.0 not support Spark2.0?*
>
>  find-spark-dependency.sh:
>  spark_dependency=`find -L $spark_home -name
> 'spark-assembly-[a-z0-9A-Z\.-]*.jar' 
>
> *Q2: I want to use Kylin2.0 without Spark Cubing, but failed.*
>
>  kylin.sh:
>  function retrieveDependency() {
>  #retrive $hive_dependency and $hbase_dependency
>  source ${dir}/find-hive-dependency.sh
>  source ${dir}/find-hbase-dependency.sh
>  source ${dir}/find-hadoop-conf-dir.sh
>  source ${dir}/find-kafka-dependency.sh
>  source ${dir}/find-spark-dependency.sh
>
>  If not found spark dependencies, Kylin can not start :
>
>  [hadoop@hadoop10 bin]$ ./kylin.sh start
>  Retrieving hadoop conf dir...
>  KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
>  Retrieving hive dependency...
>  Retrieving hbase dependency...
>  Retrieving hadoop conf dir...
>  Retrieving kafka dependency...
>  Retrieving Spark dependency...
>  *spark assembly lib not found.*
>
>  after modify kylin.sh “**source ${dir}/find-spark-dependency.sh”,
> Kylin start success ..
>
> *Q3: Abount kylin_hadoop_conf_dir ?*
>
>  I make some soft link under $KYLIN_HOME/hadoop-conf
> (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),
>  and set 
> "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf",
> when I execute ./check-env.sh,
>
>  *[hadoop@hadoop10 bin]$ ./check-env.sh *
> * Retrieving hadoop conf dir...*
> */home/bigdata/kylin/current/hadoop-conf is override as the
> kylin_hadoop_conf_dir*
> *KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current*
> *-mkdir: java.net.UnknownHostException: cdh5*
> *Usage: hadoop fs [generic options] -mkdir [-p]  ...*
> *Failed to create /kylin20. Please make sure the user has right to
> access /kylin20*
>
> My HDFS with HA, fs.defaultFS is "cdh5",when I don't set
> "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF, 
> HBASE_CONF_DIR
> from envionment variables (/etc/profile), it was correct.
>
>
> Best Regards!
> lxw
>


Re: Why in the Convert Cuboid Data to HFile step to start too many maps and reduces

2017-05-27 Thread Alberto Ramón
Sounds like a YARN configuration problem
Parallelize is good :), not all Map / reduces are executed at same times
Check some configurations like:

   -

   yarn.nodemanager.resource.memory-mb per node
   -

   yarn.nodemanager.resource.cpu-vcores per node

This can help you to start:
https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html

If your cluster is very small, put block size to 256 MB can be too big, you
can try with 128 MB

On 27 May 2017 at 08:49, jianhui.yi  wrote:

> My model have 7 tables,a cube have 15 dimensions, in the “Convert Cuboid
> Data to HFile” step to start too many maps and reduces(maps 500+,reduces
> 1.4k+),This step expend all resources of the small cluster.
>
> I set these parameters in the cluster:
>
> dfs.block.size=256M
>
> hive.exec.reducers.bytes.per.reducer=1073741824
>
> hive.merge.mapfiles=true
>
> hive.merge.mapredfiles=true
>
> hive.merge.size.per.task=256M
>
>
>
> kylin_hive_conf.xml this file uses the default settings
>
> Where can I turning performance optimization?
>
> Thanks.
>


Re: Cannot install Kylin on CDH 5.11 CentOS 7

2017-05-22 Thread Alberto Ramón
Java recomended (I dont know if mandatory) is 1.7

https://kylin.apache.org/docs20/install/hadoop_env.html


On 22 May 2017 at 22:59, Szalai Gergely  wrote:

> Hi All,
>
> We have a blocking issue by installing Kylin on CDH 5.11. By executing
> such lines on CentOS 7 we always getting empty strings.
>
> Could you please advise?
>
> bash $KYLIN_HOME/bin/get-properties.sh kylin.env.hdfs-working-dir
>
> KYLIN_HOME points to the right location, it also gives empty when we call
> directly bash get-properties.sh kylin.env.hdfs-working-dir
>
> Could it be a JAVA issue? we have only 1.6 installed.
>
> Many thanks in advance.
> Regards
>
>
>
>
> ​Kérjük, gondoljon a környezetére, mielőtt kinyomtatja ezt a levelet!
> Please think of environment before printing this e-mail!
>


Re: How to apply historical Updates to existing cube data

2017-05-11 Thread Alberto Ramón
Q1- Check this previous mailList about late data:
http://apache-kylin.74782.x6.nabble.com/Reloading-data-td5669.html

You only will need recalculate segments involved

Q2- Check Shardin (https://issues.apache.org/jira/browse/KYLIN-1453)
  Partition by time column is not reoomended (It Will create hotspot in
HBase)



On 11 May 2017 at 19:43, Nirav Patel  wrote:

> Hi,
>
> Correct me if I am wrong but currently you can not update existing kylin
> cube without refreshing entire cube. Does it mean if I am pulling new data
> from hive based on lets say customerId, Timestamp for which I already built
> cube before I have to rebuild entire cube from scratch? Or can I say
> refresh between startTime and endTime which will update cube data for that
> timeframe only.
>
> Also Hive data can be partitioned by any keys(columns) not just timestamp.
> so why not allow kylin cube updates based on any arbitrary partition
> strategy that user have defined on their hive table?
> e.g. update part of the cube based on timestamp, customerid, batchid etc.
>
> Thanks,
> Nirav
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Re: 答复: kylin nonsupport Multi-value dimensions?

2017-05-10 Thread Alberto Ramón
You can convert this dim to string and check performance using like filters

With hive duplicate values in fact table.  One for each dim value

Other complex solution can be extended dictionary encode dimension to
understand multivalues

No more ideas :)


On 10 May 2017 8:51 a.m., "jianhui.yi" <jianhui...@zhiyoubao.com> wrote:

Sorry, I write it wrongly,this problem is multi-value dimension,

Example: I have a fact table named fact_order,a dimension table named
dim_sales

In the fact_order table ,An order data contains multiple salespeople.

When I use fact_order join dim_sales it report that error: Dup key found.

How can I solve it ?



*发件人:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
*发送时间:* 2017年5月10日 15:29
*收件人:* user <user@kylin.apache.org>
*主题:* Re: kylin nonsupport Multi-value dimensions?



Hi,

Not all hive types are supported

Check this lines:
https://github.com/apache/kylin/blob/5d4982e247a2172d97d44c85309cef
4b3dbfce09/core-metadata/src/main/java/org/apache/kylin/dimension/
DimensionEncodingFactory.java#L76



On 10 May 2017 at 08:10, jianhui.yi <jianhui...@zhiyoubao.com> wrote:

I encountered a multi-dimensional dimension of the problem, and I used
bridge table to try to solve it, but when building a cube,it report an error

java.lang.IllegalStateException: The table: DIM_XXX Dup key found,
key=[1446], value1=[1446,29,1,1], value2=[1446,28,0,0]

 at org.apache.kylin.dict.lookup.LookupTable.initRow(
LookupTable.java:86)

 at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.
java:69)

 at org.apache.kylin.dict.lookup.LookupStringTable.init(
LookupStringTable.java:79)

 at org.apache.kylin.dict.lookup.LookupTable.(
LookupTable.java:57)

 at org.apache.kylin.dict.lookup.LookupStringTable.(
LookupStringTable.java:65)

 at org.apache.kylin.cube.CubeManager.getLookupTable(
CubeManager.java:644)

 at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
DictionaryGeneratorCLI.java:98)

 at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(
DictionaryGeneratorCLI.java:54)

 at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
CreateDictionaryJob.java:66)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

 at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
doWork(HadoopShellExecutable.java:63)

 at org.apache.kylin.job.execution.AbstractExecutable.
execute(AbstractExecutable.java:124)

 at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
DefaultChainedExecutable.java:64)

 at org.apache.kylin.job.execution.AbstractExecutable.
execute(AbstractExecutable.java:124)

 at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
JobRunner.run(DefaultScheduler.java:142)

 at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)

 at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)

 at java.lang.Thread.run(Thread.java:745)

result code:2


Re: kylin nonsupport Multi-value dimensions?

2017-05-10 Thread Alberto Ramón
Hi,
Not all hive types are supported

Check this lines:
https://github.com/apache/kylin/blob/5d4982e247a2172d97d44c85309cef4b3dbfce09/core-metadata/src/main/java/org/apache/kylin/dimension/DimensionEncodingFactory.java#L76

On 10 May 2017 at 08:10, jianhui.yi  wrote:

> I encountered a multi-dimensional dimension of the problem, and I used
> bridge table to try to solve it, but when building a cube,it report an error
>
> java.lang.IllegalStateException: The table: DIM_XXX Dup key found,
> key=[1446], value1=[1446,29,1,1], value2=[1446,28,0,0]
>
>  at org.apache.kylin.dict.lookup.LookupTable.initRow(
> LookupTable.java:86)
>
>  at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.
> java:69)
>
>  at org.apache.kylin.dict.lookup.LookupStringTable.init(
> LookupStringTable.java:79)
>
>  at org.apache.kylin.dict.lookup.LookupTable.(
> LookupTable.java:57)
>
>  at org.apache.kylin.dict.lookup.LookupStringTable.(
> LookupStringTable.java:65)
>
>  at org.apache.kylin.cube.CubeManager.getLookupTable(
> CubeManager.java:644)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:98)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:54)
>
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> CreateDictionaryJob.java:66)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.
> doWork(DefaultChainedExecutable.java:64)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:124)
>
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:142)
>
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>  at java.lang.Thread.run(Thread.java:745)
>
> result code:2
>
>
>
>
>
>
>


Re: [Announce] New Apache Kylin committer Zhixiong Chen

2017-04-29 Thread Alberto Ramón
Congratulations  to Roger Shi and  Zhixiong!! (and Dev team for next 2.0
version)

If you are ever near London or Spain, let me know, have beer will be
necesary  :)

2017-04-29 12:47 GMT+01:00 Dong Li :

> Welcome!
>
> Thanks,
> Dong Li
>
>  Original Message
> *Sender:* Li Yang
> *Recipient:* user
> *Cc:* dev; Apache Kylin PMC;
> chen
> *Date:* Saturday, Apr 29, 2017 19:13
> *Subject:* Re: [Announce] New Apache Kylin committer Zhixiong Chen
>
> Welcome Zhixiong!
>
> Yang
>
> On Sat, Apr 29, 2017 at 6:07 PM, Luke Han  wrote:
>
>> On behalf of the Apache Kylin PMC, I am very pleased to announce
>> that Zhixiong Chen has accepted the PMC's invitation to become a
>> committer on the project.
>>
>> We appreciate all of Zhixiong's generous contributions about many bug
>> fixes, patches, helped many users. We are so glad to have him to be
>> our new committer and looking forward to his continued involvement.
>>
>> Congratulations and Welcome, Zhixiong!
>>
>
>


Re: The coprocessor thread stopped itself due to scan timeout or scan threshold

2017-03-18 Thread Alberto Ramón
For the new version , check this:
https://issues.apache.org/jira/browse/KYLIN-2438

but keep in mind, that these limits exists to protect HBase coprocesor and
it you query is to slow ... pehapt you need re-design the cube

BR

2017-03-18 8:13 GMT+00:00 java_prog...@aliyun.com :

> Hi,
>when  I execute a query , there is an error shows below.
>
> Error while executing SQL "select t.hotel_id_m,t.live_dt,
> d.day_of_week,sum(rns) from tableT t join tableB d on t.live_dt = d.daY_no
> group by t.hotel_id_m,t.live_dt, d.day_of_week LIMIT 5":  for Query 553d8027-b97f-4e86-9aad-47bb0053b6ee GTScanRequest 1c96c729>The
> coprocessor thread stopped itself due to scan timeout or scan
> threshold(check region server log), failing current query..
>
>  I try to set kylin.query.coprocessor.mem.gb, kylin.query.mem.budget as
> bigger as it can be. but it did not work. If I set a small LIMIT number
> like 2 ,it work well.
>  Coulld you tell me what I can do if I want to using limit 5 or Is
> there any other way to let me get final result.
>
>
> Best regards,
>
> --
> java_prog...@aliyun.com
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
Do you have the Resource Manager in a dedicated node ?(without container or
Node Manager)

2017-02-13 17:38 GMT+01:00 不清 <452652...@qq.com>:

> I check the configure in CM。
>
> Java Heap Size of ResourceManager in Bytes =1536 MiB
> Container Memory Minimum =1GiB
>
> Container Memory Increment =512MiB
>
> Container Memory Maximum =8GiB
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";<a.ramonporto...@gmail.com>;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:34
> *收件人:* "user"<user@kylin.apache.org>;
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> check this
> <https://www.mapr.com/blog/best-practices-yarn-resource-management>:
> "Basically, it means RM can only allocate memory to containers in
> increments of .  . . "
>
> TIP: is your RM in a work node? If this is true, this can be the problem
> (Its good idea put yarn master, RM, in a dedicated node)
>
>
> 2017-02-13 17:19 GMT+01:00 不清 <452652...@qq.com>:
>
>> how can i get this heap size?
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Alberto Ramón";<a.ramonporto...@gmail.com>;
>> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
>> *收件人:* "user"<user@kylin.apache.org>;
>> *主题:* Re: kylin job stop accidentally and can resume success!
>>
>> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
>> size for RM
>> Kylin loose connectivity whit RM
>>
>> 2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:
>>
>>> hello,kylin community!
>>>
>>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>>
>>> kylin log is like :
>>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>>> hbase.HBaseResourceStore:262 : Update row 
>>> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 0 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 1 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 :
>>> Retrying connect to server: jxhdp1datanode29/10.180.212.61:50504.
>>> Already tried 2 time(s); retry policy is 
>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>>> sleepTime=1000 MILLISECONDS)
>>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>>> mapred.ClientServiceDelegate:273 : Application state is completed.
>>> FinalApplicationStatus=KILLED. Redirecting to job history server
>>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>>
>>> CM log is like:
>>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>>> User Name: tmn
>>> Queue: root.tmn
>>> State: KILLED
>>> Uberized: false
>>> Submitted: Sun Feb 12 19:19:24 CST 2017
>>> Started: Sun Feb 12 19:19:38 CST 2017
>>> Finished: Sun Feb 12 20:30:13 CST 2017
>>> Elapsed: 1hrs, 10mins, 35sec
>>> Diagnostics:
>>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>>> 10.180.212.38
>>> Job received Kill while in RUNNING state.
>>> Average Map Time 24mins, 48sec
>>>
>>> mapreduce job log
>>> Task KILL is received. Killing attempt!
>>>
>>> and when this happened ,by resume job,the job can resume success! I mean
>>>  it is not stop by error!
>>>
>>> what's the problem?
>>>
>>> My hadoop cluster is very busy,this situation happens very often.
>>>
>>> can I set retry time and retry  Interval?
>>>
>>
>>
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
check this
<https://www.mapr.com/blog/best-practices-yarn-resource-management>:
"Basically, it means RM can only allocate memory to containers in
increments of .  . . "

TIP: is your RM in a work node? If this is true, this can be the problem
(Its good idea put yarn master, RM, in a dedicated node)


2017-02-13 17:19 GMT+01:00 不清 <452652...@qq.com>:

> how can i get this heap size?
>
>
> -- 原始邮件 ------
> *发件人:* "Alberto Ramón";<a.ramonporto...@gmail.com>;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
> *收件人:* "user"<user@kylin.apache.org>;
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
> size for RM
> Kylin loose connectivity whit RM
>
> 2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:
>
>> hello,kylin community!
>>
>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>
>> kylin log is like :
>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>> hbase.HBaseResourceStore:262 : Update row 
>> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>> mapred.ClientServiceDelegate:273 : Application state is completed.
>> FinalApplicationStatus=KILLED. Redirecting to job history server
>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>
>> CM log is like:
>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>> User Name: tmn
>> Queue: root.tmn
>> State: KILLED
>> Uberized: false
>> Submitted: Sun Feb 12 19:19:24 CST 2017
>> Started: Sun Feb 12 19:19:38 CST 2017
>> Finished: Sun Feb 12 20:30:13 CST 2017
>> Elapsed: 1hrs, 10mins, 35sec
>> Diagnostics:
>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>> 10.180.212.38
>> Job received Kill while in RUNNING state.
>> Average Map Time 24mins, 48sec
>>
>> mapreduce job log
>> Task KILL is received. Killing attempt!
>>
>> and when this happened ,by resume job,the job can resume success! I mean
>>  it is not stop by error!
>>
>> what's the problem?
>>
>> My hadoop cluster is very busy,this situation happens very often.
>>
>> can I set retry time and retry  Interval?
>>
>
>


Re: kylin job stop accidentally and can resume success!

2017-02-13 Thread Alberto Ramón
Sounds like a problem of Resource Manager (RM) of YARN, check the Heap size
for RM
Kylin loose connectivity whit RM

2017-02-13 17:00 GMT+01:00 不清 <452652...@qq.com>:

> hello,kylin community!
>
> sometimes my jobs stop accidenttly.It is can stop by any step.
>
> kylin log is like :
> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
> hbase.HBaseResourceStore:262 : Update row 
> /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
> sleepTime=1000 MILLISECONDS)
> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
> mapred.ClientServiceDelegate:273 : Application state is completed.
> FinalApplicationStatus=KILLED. Redirecting to job history server
> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>
> CM log is like:
> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
> User Name: tmn
> Queue: root.tmn
> State: KILLED
> Uberized: false
> Submitted: Sun Feb 12 19:19:24 CST 2017
> Started: Sun Feb 12 19:19:38 CST 2017
> Finished: Sun Feb 12 20:30:13 CST 2017
> Elapsed: 1hrs, 10mins, 35sec
> Diagnostics:
> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
> 10.180.212.38
> Job received Kill while in RUNNING state.
> Average Map Time 24mins, 48sec
>
> mapreduce job log
> Task KILL is received. Killing attempt!
>
> and when this happened ,by resume job,the job can resume success! I mean
>  it is not stop by error!
>
> what's the problem?
>
> My hadoop cluster is very busy,this situation happens very often.
>
> can I set retry time and retry  Interval?
>


Re: 求助有一个超大维度

2017-02-13 Thread Alberto Ramón
for B: its a java option, (. . . java.opts)
  Check if your JVM isn't very old, there are a lot of optimizacions for GC
in last versions of Java 8

TIP 1: Check if you can reduce dimensionality of cube or use AGG to make
lighter the build process
You canTake some ideas from this
<https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance>

TIP 2: solve first problem A, because if you enlarge Heap, the B will be
worst


2017-02-13 10:16 GMT+01:00 不清 <452652...@qq.com>:

> thanks for reply!
>
> For error A, I can set these parameters in kylin.
>
> But for error B,should I fix this problem for whole hadoop cluster?  Can
> you speak the parameter fix in detail?
>
> This really helped us a lot!
>
>
> ------ 原始邮件 --
> *发件人:* "Alberto Ramón";<a.ramonporto...@gmail.com>;
> *发送时间:* 2017年2月13日(星期一) 下午3:58
> *收件人:* "user"<user@kylin.apache.org>;
> *主题:* Re: 求助有一个超大维度
>
> Hello 不清
>
>
> From your errors: "Failed to build cube in mapper " &
> A- "java.lang.OutOfMemoryError: Java heap space at java" &
> B- "java.lang.OutOfMemoryError: GC overhead limit"
>
> For error A:Check override this parameters from kylin:
>
>
> *   kylin.job.mr.config.override.mapred.map.child.java.opts=-Xmx8g  *
>
> *   kylin.job.mr.config.override.mapreduce.map.memory.mb=8500*
>
>
>
> *For error B:  (this is more complicated)*
>
> *   Check you are using Java 8 or higer*
>
> *   Try with this *-XX:+UseG1GC
>
>Explanation: https://wiki.apache.org/solr/ShawnHeisey
>
>
> yes, use Integer dictionary is the best option
>
>
>
> 2017-02-13 3:53 GMT+01:00 不清 <452652...@qq.com>:
>
>> kylin社区,您好!
>>
>> 是手机号作为维度,这个维度的去重值在500w~1500w。
>> 我是使用的integer 编码 然后length设置为8.  测试的数据量大约在1亿条。是我设置的有问题么?
>>
>> 对于超大维度,kylin需要进行什么设置么?
>>
>> 我使用的kylin版本是1.6.
>>
>> 谢谢
>>
>> 报错步骤是 build cube
>> map任务耗时特别长,最后还报错了,如下
>> Error: java.io.IOException: Failed to build cube in mapper 36 at
>> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:145)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148) at
>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
>> java.security.AccessController.doPrivileged(Native Method) at
>> javax.security.auth.Subject.doAs(Subject.java:415) at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused
>> by: java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> java.io.IOException: java.io.IOException: java.lang.RuntimeException:
>> java.io.IOException: java.lang.OutOfMemoryError: Java heap space at
>> java.util.concurrent.FutureTask.report(FutureTask.java:122) at
>> java.util.concurrent.FutureTask.get(FutureTask.java:188) at
>> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:143)
>> ... 8 more Caused by: java.lang.RuntimeException: java.io.IOException:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(
>> AbstractInMemCubeBuilder.java:84) at java.util.concurrent.Executors
>> $RunnableAdapter.call(Executors.java:471) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.build(DoggedCubeBuilder.java:128) at org.apache.kylin.cube.inmemcub
>> ing.DoggedCubeBuilder.build(DoggedCubeBuilder.java:75) at
>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(
>> AbstractInMemCubeBuilder.java:82) ... 5 more Caused by:
>> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
>> java.lang.OutOfMemoryError: Java heap space at
>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnc
>> e.abort(DoggedCubeBuilder.java:196) at org.apache.kylin.cube.inmemcub
>> ing.DoggedCubeBuilder$BuildOnce.checkException(DoggedCubeBuilder.java:169)
>> at 

Re: 求助有一个超大维度

2017-02-12 Thread Alberto Ramón
Hello 不清


>From your errors: "Failed to build cube in mapper " &
A- "java.lang.OutOfMemoryError: Java heap space at java" &
B- "java.lang.OutOfMemoryError: GC overhead limit"

For error A:Check override this parameters from kylin:


*   kylin.job.mr.config.override.mapred.map.child.java.opts=-Xmx8g  *

*   kylin.job.mr.config.override.mapreduce.map.memory.mb=8500*



*For error B:  (this is more complicated)*

*   Check you are using Java 8 or higer*

*   Try with this *-XX:+UseG1GC

   Explanation: https://wiki.apache.org/solr/ShawnHeisey


yes, use Integer dictionary is the best option



2017-02-13 3:53 GMT+01:00 不清 <452652...@qq.com>:

> kylin社区,您好!
>
> 是手机号作为维度,这个维度的去重值在500w~1500w。
> 我是使用的integer 编码 然后length设置为8.  测试的数据量大约在1亿条。是我设置的有问题么?
>
> 对于超大维度,kylin需要进行什么设置么?
>
> 我使用的kylin版本是1.6.
>
> 谢谢
>
> 报错步骤是 build cube
> map任务耗时特别长,最后还报错了,如下
> Error: java.io.IOException: Failed to build cube in mapper 36 at
> org.apache.kylin.engine.mr.steps.InMemCuboidMapper.
> cleanup(InMemCuboidMapper.java:145) at 
> org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1642) at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:163) Caused by: 
> java.util.concurrent.ExecutionException:
> java.lang.RuntimeException: java.io.IOException: java.io.IOException:
> java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.
> FutureTask.report(FutureTask.java:122) at java.util.concurrent.
> FutureTask.get(FutureTask.java:188) at org.apache.kylin.engine.mr.
> steps.InMemCuboidMapper.cleanup(InMemCuboidMapper.java:143) ... 8 more
> Caused by: java.lang.RuntimeException: java.io.IOException:
> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:84)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException:
> java.io.IOException: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:128)
> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder.
> build(DoggedCubeBuilder.java:75) at org.apache.kylin.cube.inmemcubing.
> AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:82) ... 5
> more Caused by: java.io.IOException: java.lang.RuntimeException:
> java.io.IOException: java.lang.OutOfMemoryError: Java heap space at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.abort(DoggedCubeBuilder.java:196)
> at org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$
> BuildOnce.checkException(DoggedCubeBuilder.java:169) at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:116)
> ... 7 more Caused by: java.lang.RuntimeException: java.io.IOException:
> java.lang.OutOfMemoryError: Java heap space at org.apache.kylin.cube.
> inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:289)
> Caused by: java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.throwExceptionIfAny(InMemCubeBuilder.java:226)
> at org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.
> build(InMemCubeBuilder.java:186) at org.apache.kylin.cube.
> inmemcubing.InMemCubeBuilder.build(InMemCubeBuilder.java:137) at
> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:284)
> Caused by: java.lang.OutOfMemoryError: Java heap space at
> java.math.BigInteger.(BigInteger.java:973) at
> java.math.BigInteger.valueOf(BigInteger.java:957) at
> java.math.BigDecimal.inflate(BigDecimal.java:3519) at
> java.math.BigDecimal.unscaledValue(BigDecimal.java:2205) at
> org.apache.kylin.metadata.datatype.BigDecimalSerializer.serialize(BigDecimalSerializer.java:56)
> at 
> org.apache.kylin.metadata.datatype.BigDecimalSerializer.serialize(BigDecimalSerializer.java:33)
> at org.apache.kylin.measure.MeasureCodec.encode(MeasureCodec.java:76) at
> org.apache.kylin.measure.BufferedMeasureCodec.encode(BufferedMeasureCodec.java:93)
> at org.apache.kylin.gridtable.GTAggregateScanner$AggregationCache$
> 

Re: create dictionary error

2017-02-10 Thread Alberto Ramón
Hi, Move this thread to User mailList

SALE_ORD_ID is not a dim of cube, but isit  a PK-FK ?  I think yes  :)
Are you using DERIVED Dims in this table ?

See this
,
the 2G limit is hardcoded, I think increase XMX dont solve your case
They said you have a cardinalty more than " final int _2GB = 20;",
can you check if this is true?
can you review the statistics for this columns?







2017-02-10 6:29 GMT+01:00 仇同心 :

> Hi,all
>
>  Building operation error on the of  Step Name: Build Dimension
> Dictionary:
>
>
>
> java.lang.RuntimeException: Failed to create dictionary on
> DMT.DMT_KYLIN_JDMALL_ORDR_DTL_I_D.SALE_ORD_ID
>
>  at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> DictionaryManager.java:325)
>
>  at org.apache.kylin.cube.CubeManager.buildDictionary(
> CubeManager.java:185)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:50)
>
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.
> processSegment(DictionaryGeneratorCLI.java:41)
>
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(
> CreateDictionaryJob.java:56)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.
> doWork(HadoopShellExecutable.java:63)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.
> doWork(DefaultChainedExecutable.java:57)
>
>  at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:136)
>
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>  at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.RuntimeException: Too big dictionary, dictionary
> cannot be bigger than 2GB
>
>  at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(
> TrieDictionaryBuilder.java:421)
>
>  at org.apache.kylin.dict.TrieDictionaryBuilder.build(
> TrieDictionaryBuilder.java:408)
>
>  at org.apache.kylin.dict.DictionaryGenerator$
> StringDictBuilder.build(DictionaryGenerator.java:165)
>
>  at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:81)
>
>  at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:73)
>
>  at org.apache.kylin.dict.DictionaryManager.buildDictionary(
> DictionaryManager.java:321)
>
>  ... 14 more
>
>
>
>   The  Cardinality of  “SALE_ORD_ID”  is 157644463,but This column was not
> selected for the dimension.
>
>
>
>   In addition, I'm very confused here to build a data dictionary is full
> amount to build or data to construct according to the selected time range?
>
>
>
>
>
> Thank you~
>
>
>
>
>
>
>
>
>
>
>


Re: New document: "How to optimize cube build"

2017-01-25 Thread Alberto Ramón
Be careful about partition by "FLIGHTDATE"

>From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance

*"Option 1: Use id_date as partition column on Hive table. This have a big
problem: the Hive metastore is meant for few hundred of partitions not
thousand (Hive 9452 there is an idea to solve this isn’t in progress)*"

In Hive 2.0 will be a preview (only for testing) to solve this

2017-01-25 9:46 GMT+01:00 ShaoFeng Shi :

> Hello,
>
> A new document is added for the practices of cube build. Any suggestion or
> comment is welcomed. We can update the doc later with feedbacks;
>
> Here is the link:
> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Jekyll

2017-01-23 Thread Alberto Ramón
the error was, I had two versions of "jekyll-multiple-language"



2017-01-23 20:26 GMT+01:00 Alberto Ramón <a.ramonporto...@gmail.com>:

> I'm trying to add new doc to apache kylin
>
>
> jekyll 2.5.3 | Error:  undefined method `post_read' for class
> `Jekyll::Document'
>
> And this is true: https://github.com/jekyll/jekyll/blob/master/lib/jekyll/
> document.rb
>
> I used:
>
>   git init  git clone -b document --single-branch 
> git://git.apache.org/kylin.git
>   cd  … website
>   jekyll server
>
> install:
>
> gem uninstall --all
>
> sudo gem install jekyll --version "=2.5.3"
>
>   sudo gem install bundler
>
>   sudo gem install jekyll-multiple-languages kramdown rouge
>
>
>
> versions:
>   ruby 2.3.1p112
>   jekyll 2.5.3
>
>


Jekyll

2017-01-23 Thread Alberto Ramón
I'm trying to add new doc to apache kylin


jekyll 2.5.3 | Error:  undefined method `post_read' for class
`Jekyll::Document'

And this is true:
https://github.com/jekyll/jekyll/blob/master/lib/jekyll/document.rb

I used:

  git init  git clone -b document --single-branch git://git.apache.org/kylin.git
  cd  … website
  jekyll server

install:

gem uninstall --all

sudo gem install jekyll --version "=2.5.3"

  sudo gem install bundler

  sudo gem install jekyll-multiple-languages kramdown rouge



versions:
  ruby 2.3.1p112
  jekyll 2.5.3


Re: Kylin and BI Tools

2017-01-18 Thread Alberto Ramón
Hello,
https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain

*Changes*:
- Fixed Carabel to Caravel
- Added Zeppelin Reference
- Added Apache Flink

Thanks for all !!

2017-01-17 9:28 GMT+01:00 Alberto Ramón <a.ramonporto...@gmail.com>:

> Thanks Anton
> I will complete/fix my report with your suggestions.
>
> 2017-01-17 3:56 GMT+01:00 Anton Bubna-Litic <Anton.Bubna-Litic@quantium.
> com.au>:
>
>> I have successfully used Zeppelin’s Kylin interpreter with Kylin 1.6 to
>> run sql queries. It was very straight forward to set up and run commands.
>>
>>
>> *From:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
>> *Sent:* Tuesday, 17 January 2017 01:49
>> *To:* user <user@kylin.apache.org>
>> *Subject:* Re: Kylin and BI Tools
>>
>>
>>
>> Somebody has been tested this with last versions of Kylin?:
>> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
>>
>> If this work OK with Kylin 1.6 or 2.0, I can put a reference directly
>>
>>
>>
>> 2017-01-16 15:31 GMT+01:00 Billy Liu <billy...@apache.org>:
>>
>> I have interest on Zeppelin also, please refer to
>> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
>> first.
>>
>>
>>
>> 2017-01-16 19:14 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>
>> yes,
>>  - I will fix "Carabel" to "Caravel". (It is a shame that this project
>> is not updated, because the quality of the graphics are very good)
>>
>>  - Document about  Kylin and Zeppelink will be interesting, I have this
>> in my ToDo list
>>
>>  - More suggestions? bugs ?
>>
>>
>>
>> Thanks !!
>>
>>
>>
>> 2017-01-16 9:37 GMT+01:00 Jian Zhong <zhongj...@apache.org>:
>>
>> very good document.
>>
>>
>>
>> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón <a.ramonporto...@gmail.com>
>> wrote:
>>
>> Happy 2017   :)
>>
>> I updated Kylin & BI tools with new notes:
>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>
>>
>>
>> 2016-09-28 1:30 GMT+02:00 Li Yang <liy...@apache.org>:
>>
>> Base on the great work, we could create more How-To page to add to Kylin
>> document section.
>>
>> Yang
>>
>>
>>
>> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han <luke...@gmail.com> wrote:
>>
>> Very nice, thanks Alberto
>>
>>
>>
>>
>> Best Regards!
>> -
>>
>> Luke Han
>>
>>
>>
>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>> liuyiming@gmail.com> wrote:
>>
>> So cool, impressive. Thank you, Alberto.
>>
>>
>>
>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>
>> Hello
>>
>> This is the end of all my previous articles, about Kylin and differents
>> tools
>> With some successful and some failures   :)
>>
>>
>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>
>>
>>
>> If you have any comment / improvement, feel free to indicate me the
>> changes
>>
>> A lot of thanks to the "Kylin Team", Alb
>>
>>
>>
>>
>>
>> --
>>
>> With Warm regards
>>
>> Yiming Liu (刘一鸣)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>


Re: Problem with limit and joint aggregation

2017-01-17 Thread Alberto Ramón
Joint must be used for:
 - Group Dims with *very *low cardinality, Example: IdCurrency  (most of
bank's transactions uses < 10 currencies)
- You Have columns with same cardinality: Country_ID and Contry_txt

Check TopN feature of Kylin to precalcualte sum order by
You can allocate more memory to Kylin Instance (for order by process)
please, read links I shared with you in the other Q, there are some useful
tips and examples

2017-01-17 12:37 GMT+01:00 Phong Pham :

> Hi all,
> I definedsome dimensions, for example: A,B,C as  joint aggregation.
> When i executed query:
>
> SELECT A,B,C, SUM(metrics) as metrics
> FROM table1
> WHERE DateStats <= x and DateStats >= x
> GROUP BY A,B,C
> LIMIT 250
>
> Query is very fast, but Metrics (from SUM(metrics)) Value just sum data
> within limit (250 rows). If i used ORDER BY , results will be true but
> performance is so bad (If Total Scan Count is over 2-3 milions).
> Please explain to me this problem.
>
> Thanks.
>


Re: Kylin and BI Tools

2017-01-17 Thread Alberto Ramón
Thanks Anton
I will complete/fix my report with your suggestions.

2017-01-17 3:56 GMT+01:00 Anton Bubna-Litic <
anton.bubna-li...@quantium.com.au>:

> I have successfully used Zeppelin’s Kylin interpreter with Kylin 1.6 to
> run sql queries. It was very straight forward to set up and run commands.
>
>
> *From:* Alberto Ramón [mailto:a.ramonporto...@gmail.com]
> *Sent:* Tuesday, 17 January 2017 01:49
> *To:* user <user@kylin.apache.org>
> *Subject:* Re: Kylin and BI Tools
>
>
>
> Somebody has been tested this with last versions of Kylin?:
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
>
> If this work OK with Kylin 1.6 or 2.0, I can put a reference directly
>
>
>
> 2017-01-16 15:31 GMT+01:00 Billy Liu <billy...@apache.org>:
>
> I have interest on Zeppelin also, please refer to
> http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/interpreter/kylin.html
> first.
>
>
>
> 2017-01-16 19:14 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
> yes,
>  - I will fix "Carabel" to "Caravel". (It is a shame that this project is
> not updated, because the quality of the graphics are very good)
>
>  - Document about  Kylin and Zeppelink will be interesting, I have this in
> my ToDo list
>
>  - More suggestions? bugs ?
>
>
>
> Thanks !!
>
>
>
> 2017-01-16 9:37 GMT+01:00 Jian Zhong <zhongj...@apache.org>:
>
> very good document.
>
>
>
> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>
>
>
> Thanks
>
>
>
> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
> Happy 2017   :)
>
> I updated Kylin & BI tools with new notes:
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>
>
>
> 2016-09-28 1:30 GMT+02:00 Li Yang <liy...@apache.org>:
>
> Base on the great work, we could create more How-To page to add to Kylin
> document section.
>
> Yang
>
>
>
> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han <luke...@gmail.com> wrote:
>
> Very nice, thanks Alberto
>
>
>
>
> Best Regards!
> -
>
> Luke Han
>
>
>
> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
> liuyiming@gmail.com> wrote:
>
> So cool, impressive. Thank you, Alberto.
>
>
>
> 2016-09-19 21:42 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
> Hello
>
> This is the end of all my previous articles, about Kylin and differents
> tools
> With some successful and some failures   :)
>
>
> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>
>
>
> If you have any comment / improvement, feel free to indicate me the changes
>
> A lot of thanks to the "Kylin Team", Alb
>
>
>
>
>
> --
>
> With Warm regards
>
> Yiming Liu (刘一鸣)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re:

2017-01-17 Thread Alberto Ramón
Did you compressed the output cube? This is very important (see last link)

About Order BY
  - Check if TopN can solve your problem:
 http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/
  - Try to reorder RowKey to put OrderBY in first possitions
  - Try AGG : Make a "sub-cube" with less Dim
 http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/

2017-01-17 7:50 GMT+01:00 Phong Pham <phongpham1...@gmail.com>:

> Hi Alberto,
>After try to apply your suggestion, our queríe is improved so much.
> Thanks a lot.
> However, we have problem with ORDER BY function. When we use ORDER BY with
> a large data set (for example: with long date-range filter), performance is
> very slow.
> Result:
> *User: ADMIN*
> *Success: true*
> *Duration: 23.311*
> *Project: metrixa_global_database_new*
> *Realization Names: [account_global_convtrack_summary_daily_by_location]*
> *Cuboid Ids: [135]*
> *Total scan count: 2595584*
> *Result row count: 250*
> *Accept Partial: true*
> *Is Partial Result: false*
> *Hit Exception Cache: false*
> *Storage cache used: false*
> *Message: null*
>
> ORDER BY performance goes down when Total Scan Count is big. So how can i
> improve this problem?
> Thanks
>
>
> 2017-01-16 18:45 GMT+07:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> Hi Phon, I'm not expert but I have some suggestions:
>>
>> - All Dim en are using Dict: you can change a lot to Integer (Fix length)
>> - Re-Order row key its a good idea. I always try to first fields of key
>> have Fix Length. Put mandatory the First its a good Idea
>> - See hierarchy optimizations, will be very interesting for you:
>> Country, Region, City, site . Perhaps Company  and Account also can be
>> included (I don't know your data)
>> - If you use Left join, the first step of building cube (flat table) will
>> be more slow
>> - Check if your ORC input table is compressed
>> - Try to use derived DIm with very low cardinality columns, perhaps:
>> TypeID, NetworkID, LanguajeID, IsMovileDevice.
>>I understand that Affiliated, Account, Company, ... will growth in
>> the future, because you are working with test data ?
>>
>> Check this references:
>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>> http://mail-archives.apache.org/mod_mbox/kylin-user/201611.mbox
>> /%3Ctencent_F5A1E061EFFB778CC5BF9909%40qq.com%3E
>> http://mail-archives.apache.org/mod_mbox/kylin-user/201607.mbox
>> /%3C004201d1d4ef%240151b7e0%2403f527a0%24%40fishbowl.com%3E
>> http://mail-archives.apache.org/mod_mbox/kylin-user/201612.mbox
>> /%3CCAEcyM171RGhk0QoXJUjjZJeSxXwgUGu0vO%2B_T71KXMU1k00L%2Bg%
>> 40mail.gmail.com%3E
>> Check this tunning example:  https://github.com/albertoRamon/Kylin
>> /tree/master/KylinPerformance
>>
>> BR, Alb
>>
>>
>> 2017-01-16 3:47 GMT+01:00 Phong Pham <phongpham1...@gmail.com>:
>>
>>> Hi all,
>>> Hi all,
>>>* We still meet problems with query performance. Here is the cube
>>> info of one cube*:
>>> {
>>>  "uuid": "6b2f4643-72a3-4a51-b9f2-47aa8e1322a5",
>>>  "last_modified": 1484533219336,
>>>  "version": "1.6.0",
>>>  "name": "account_global_convtrack_summary_daily_test",
>>>  "owner": "ADMIN",
>>>  "descriptor": "account_global_convtrack_summary_daily_test",
>>>  "cost": 50,
>>>  "status": "READY",
>>>  "segments": [
>>> {
>>>  "uuid": "85fa970e-6808-47c8-ae35-45d1975bb3bc",
>>>  "name": "2016010100_2016122600",
>>>  "storage_location_identifier": "KYLIN_7E4KIJ3YGX",
>>>  "date_range_start": 145160640,
>>>  "date_range_end": 148271040,
>>>  "source_offset_start": 0,
>>>  "source_offset_end": 0,
>>>  "status": "READY",
>>>  "size_kb": 9758001,
>>>  "input_records": 8109122,
>>>  "input_records_size": 102078756,
>>>  "last_build_time": 1484533219335,
>>>  "last_build_job_id": "a4f67403-17cb-4474-84d1-21ad64ed17a8",
>>>  "create_time_utc": 1484527504660,
>>>  "cuboid_shard_nums": {},
>>>  "total_shards": 4,
>>>  "blackout_cuboids": [],
>>>  "binary_signature": null,
>

Re: Kylin and BI Tools

2017-01-16 Thread Alberto Ramón
yes,
 - I will fix "Carabel" to "Caravel". (It is a shame that this project is
not updated, because the quality of the graphics are very good)
 - Document about  Kylin and Zeppelink will be interesting, I have this in
my ToDo list
 - More suggestions? bugs ?

Thanks !!

2017-01-16 9:37 GMT+01:00 Jian Zhong <zhongj...@apache.org>:

> very good document.
>
> I see "Kylin Carabel" section, maybe need to update to "Kylin Caravel"
>
> Thanks
>
> On Sun, Jan 1, 2017 at 6:53 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> Happy 2017   :)
>>
>> I updated Kylin & BI tools with new notes:
>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>
>>
>>
>> 2016-09-28 1:30 GMT+02:00 Li Yang <liy...@apache.org>:
>>
>>> Base on the great work, we could create more How-To page to add to Kylin
>>> document section.
>>>
>>> Yang
>>>
>>> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han <luke...@gmail.com> wrote:
>>>
>>>> Very nice, thanks Alberto
>>>>
>>>>
>>>> Best Regards!
>>>> -
>>>>
>>>> Luke Han
>>>>
>>>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>>>> liuyiming@gmail.com> wrote:
>>>>
>>>>> So cool, impressive. Thank you, Alberto.
>>>>>
>>>>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> This is the end of all my previous articles, about Kylin and
>>>>>> differents tools
>>>>>> With some successful and some failures   :)
>>>>>>
>>>>>>
>>>>>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>>>>>
>>>>>>
>>>>>>
>>>>>> If you have any comment / improvement, feel free to indicate me the
>>>>>> changes
>>>>>> A lot of thanks to the "Kylin Team", Alb
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With Warm regards
>>>>>
>>>>> Yiming Liu (刘一鸣)
>>>>>
>>>>
>>>>
>>>
>>
>


Re: Kylin and BI Tools

2016-12-31 Thread Alberto Ramón
Happy 2017   :)

I updated Kylin & BI tools with new notes:
https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain



2016-09-28 1:30 GMT+02:00 Li Yang <liy...@apache.org>:

> Base on the great work, we could create more How-To page to add to Kylin
> document section.
>
> Yang
>
> On Tue, Sep 20, 2016 at 9:03 AM, Luke Han <luke...@gmail.com> wrote:
>
>> Very nice, thanks Alberto
>>
>>
>> Best Regards!
>> -
>>
>> Luke Han
>>
>> On Mon, Sep 19, 2016 at 10:21 PM, Billy(Yiming) Liu <
>> liuyiming@gmail.com> wrote:
>>
>>> So cool, impressive. Thank you, Alberto.
>>>
>>> 2016-09-19 21:42 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>
>>>> Hello
>>>>
>>>> This is the end of all my previous articles, about Kylin and differents
>>>> tools
>>>> With some successful and some failures   :)
>>>>
>>>>
>>>> https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain
>>>>
>>>>
>>>>
>>>> If you have any comment / improvement, feel free to indicate me the
>>>> changes
>>>> A lot of thanks to the "Kylin Team", Alb
>>>>
>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>


Re: kylin query with case return error result

2016-12-27 Thread Alberto Ramón
See calcite syntaxis

I think isnt allowed: Agg ( distinct case)
You can try with: Agg ( distinct value)

2016-12-27 9:41 GMT+01:00 Billy Liu :

> When you talk about mismatch result, you'd better provide the sample data
> and actual result. Otherwise, nobody could reproduce your issue easily.
>
> 2016-12-27 16:13 GMT+08:00 徐 鹏 :
>
>> HI all:
>> Query1:
>> SELECT  COUNT(DISTINCT CASE WHEN pagefiltername IN
>> (‘homepage') THEN t.loginkey END) AS homepageuv
>> FROM fly t
>> WHERE mmdd='20161222’
>> Query2:
>> SELECT COUNT(DISTINCT t.loginkey ) AS homepageuv FROM fly
>> t WHERE mmdd='20161222' and pagefiltername IN ('homepage') ;
>>
>> expected :Query1=Query2
>> actual:Query1 !=Query2
>>
>> What’s wrong?
>>
>>
>> Regards,
>> Peng Xu
>> xupeng1...@outlook.com
>>
>>
>>
>>
>>
>>
>>
>
>
>


Re: ArrayIndexOutOfBoundsException: -1

2016-12-26 Thread Alberto Ramón
(merry Christmas)

I found the error:
 * You can't have the same name column (cod_producto) in Dim Table and Fact
Table*  ==> ERROR: java.lang.ArrayIndexOutOfBoundsException: -1
  (If you don't use this Dim in Cube, don't have any problem)
  Open JIRA ??


I also discovered:
  In Data model, you can define the same column from Fact Table as Dim and
as Measure
  Is this the desired behavior ??
  Open JIRA ??



2016-12-23 0:44 GMT+01:00 Alberto Ramón <a.ramonporto...@gmail.com>:

>
> Error on, Extract Fact Table Distinct Columns
>
>
>
>
> *   Insane record: [1, 0600-160077, FVP  DAFUTURO - ESTABLE, COP, 11, 11, 
> Tipo de producto 11, 16.94579786]   java.lang.ArrayIndexOutOfBoundsException: 
> -1
>   at org.apache.kylin.engine.mr 
> <http://org.apache.kylin.engine.mr>.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:140)*
>
>
>
> I see an extra column,  My DIM have 7 columns:
>
> *Original CSV: 7 columns*
> [image: Imágenes integradas 3]
>
>
> *On hive: 7 columns*
> [image: Imágenes integradas 1]
>
>
>
> *On DM: 7 columns*[image: Imágenes integradas 2]
>
>
>
> *On Cube: 7 columns*
>
>   "dimensions": [
> {
>   "name": "ID_PRODUCTO",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "ID_PRODUCTO",
>   "derived": null
> },
> {
>   "name": "COD_PRODUCTO",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "COD_PRODUCTO",
>   "derived": null
> },
> {
>   "name": "PRODUCTO_DESC",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "PRODUCTO_DESC",
>   "derived": null
> },
> {
>   "name": "CURRECY",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "CURRENCY",
>   "derived": null
> },
> {
>   "name": "ISIN",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "ISIN",
>   "derived": null
> },
> {
>   "name": "ID_TIPO_PRODUCTO",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "ID_TIPO_PRODUCTO",
>   "derived": null
> },
> {
>   "name": "TIPO_PRODUCTO_DESC",
>   "table": "HERR_POSITIONS.DIM_PRODUCTOS",
>   "column": "TIPO_PRODUCTO_DESC",
>   "derived": null
> }
>   ],
>
>


ArrayIndexOutOfBoundsException: -1

2016-12-22 Thread Alberto Ramón
Error on, Extract Fact Table Distinct Columns




*   Insane record: [1, 0600-160077, FVP  DAFUTURO - ESTABLE, COP, 11,
11, Tipo de producto 11, 16.94579786]
java.lang.ArrayIndexOutOfBoundsException: -1
at 
org.apache.kylin.engine.mr.steps.FactDistinctHiveColumnsMapper.map(FactDistinctHiveColumnsMapper.java:140)*



I see an extra column,  My DIM have 7 columns:

*Original CSV: 7 columns*
[image: Imágenes integradas 3]


*On hive: 7 columns*
[image: Imágenes integradas 1]



*On DM: 7 columns*[image: Imágenes integradas 2]



*On Cube: 7 columns*

  "dimensions": [
{
  "name": "ID_PRODUCTO",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "ID_PRODUCTO",
  "derived": null
},
{
  "name": "COD_PRODUCTO",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "COD_PRODUCTO",
  "derived": null
},
{
  "name": "PRODUCTO_DESC",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "PRODUCTO_DESC",
  "derived": null
},
{
  "name": "CURRECY",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "CURRENCY",
  "derived": null
},
{
  "name": "ISIN",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "ISIN",
  "derived": null
},
{
  "name": "ID_TIPO_PRODUCTO",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "ID_TIPO_PRODUCTO",
  "derived": null
},
{
  "name": "TIPO_PRODUCTO_DESC",
  "table": "HERR_POSITIONS.DIM_PRODUCTOS",
  "column": "TIPO_PRODUCTO_DESC",
  "derived": null
}
  ],


Re: Joint and Order in RowKey

2016-12-21 Thread Alberto Ramón
yes, but I understand that if (ID , TXT) are Joint Dim, In drag and drop
you should see together like one Dim

2016-12-21 11:24 GMT+01:00 Li Yang <liy...@apache.org>:

> Maybe I didn't get the question. But the order of rowkey is adjustable by
> drag then move up and down...
>
> On Tue, Dec 20, 2016 at 2:46 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> If we have these columns:
>> [image: Imágenes integradas 1]
>>
>> With There Joints:
>> [image: Imágenes integradas 3]
>>
>> *Why I cant  order these columns individually?*  (Text , Id) now must be
>> a tupple
>> [image: Imágenes integradas 4]
>>
>> (I accept suggestion about order, anyo=year)
>>
>
>


Re: if can add where clause to a measure?

2016-12-20 Thread Alberto Ramón
I never use, but Kylin 976 
can be useful for you

2016-12-21 8:14 GMT+01:00 ZhouJie :

> hi, everyone
> i want to know if kylin can filter a column which has been measured, as
> follows:
> select sum(price) from hotprice_copy1 where price > 100.0 and price <5000.0
>
> thanks
> joe
>


Re: How to workaround with columns with NULL value?

2016-12-20 Thread Alberto Ramón
about 1º point: In Kylin 2049
 there is a commet

of Shaofeng SHI

2016-12-21 6:32 GMT+01:00 Da Tong :

> Hi, all
>
> I am using kylin 1.6.0. I have met three problem:
>
> 1. in one of my Metrics, some of the values are NULL, when I tried to
> calculate the average of the column, the COUNT function will not filter out
> NULL value, which means the average result is biased. One solution I found
> is using another column to mark whether the value is NULL or not, but there
> are hundreds of columns like this. I don't think adding another hundreds of
> mark column as dimensions is a good way. Any suggestions about this
> situation?
>
> 2. I need to do filter using WHERE clause in some metrics columns, such as
> count rows that having value of one field over 100. It seems that I have to
> add new columns such as A_FIELD_OVER_100 to achieve this. But what if the
> *100* is a variable? User of our system need to filter out result based on
> metrics value, should I add metrics into dimensions? Is this requirement an
> uncommon case?
>
> 3. It seems that querying all-null columns issue is fixed in this issue
>  (Kylin 1527). But I
> still got NullPointerError from RawMesureType.valueOf method. I just want
> to make sure that Kylin support columns with all null values, right?
>
> Any suggestion is welcome. Thank you.
> --
> TONG, Da / 佟达
>


Re: Error when #2 Step: Redistribute Flat Hive Table - File does not exist

2016-12-19 Thread Alberto Ramón
other idea:
Can be a problem with permissions?: the user that execute Kylin can't read
data generated by YARN
check if Kylin user can read your folder  /young/kylin_test/
Which Hadoop user are executing Kylin?

(no more ideas, Good Luck)

2016-12-20 7:51 GMT+01:00 雨日听风 <491245...@qq.com>:

> Thank you!
> We checked the yarn and hard disk. But not found any error. Hard disk
> space and memory and so on is working well.
> Last time its error code was "unknownhost clusterB",now in new server env
> it cant find clusterB(hbase only). but cant find rowCount file.
> ===
> the follow command runs ok:
> hdfs dfs -mkdir /young/kylin_test/kylin_metadata_nokia/
> kylin-678c15ba-5375-4f80-831e-1ae0af8ed576/row_count/tmp
> And "ls" cant find file "00_0"  which it said "file does not exist".
>
> -- 原始邮件 --
> *发件人:* "Alberto Ramón";<a.ramonporto...@gmail.com>;
> *发送时间:* 2016年12月19日(星期一) 晚上9:13
> *收件人:* "user"<user@kylin.apache.org>;
> *主题:* Re: Error when #2 Step: Redistribute Flat Hive Table - File does
> not exist
>
> i think i had this error last nigth  :)
> (go to yarn to find detailed error & find on internet)
> in my case was free space less than 10% of hard disk. Check this please
>
> El 19/12/2016 11:35, "雨日听风" <491245...@qq.com> escribió:
>
>> When I build a cube in kylin1.6, I get error in step #2: Redistribute
>> Flat Hive Table
>>
>> Please help! Thank you very much!
>>
>> env: kylin1.6 is in a independent server, and have 2 other server
>> cluster: clusterA(hive only) and clusterB(hbase only).
>> Error is:
>>
>> 2016-12-19 10:28:00,641 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Compute row count of flat hive table,
>> cmd:
>> 2016-12-19 10:28:00,642 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : hive -e "USE boco;
>> SET dfs.replication=2;
>> SET hive.exec.compress.output=true;
>> SET hive.auto.convert.join.noconditionaltask=true;
>> SET hive.auto.convert.join.noconditionaltask.size=1;
>> SET mapreduce.output.fileoutputformat.compress.type=BLOCK;
>> SET mapreduce.job.split.metainfo.maxsize=-1;
>> SET mapreduce.job.queuename=young;
>> SET tez.queue.name=young;
>>
>> set hive.exec.compress.output=false;
>>
>> set hive.exec.compress.output=false;
>> INSERT OVERWRITE DIRECTORY '/young/kylin_test/kylin_metad
>> ata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count' SELECT
>> count(*) FROM kylin_intermediate_hbase_in_testCluster_CUBE_f9468805_eabf_
>> 4b54_bf2b_182e4c86214a;
>>
>> "
>> 2016-12-19 10:28:03,277 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : WARNING: Use "yarn jar" to launch YARN
>> applications.
>> 2016-12-19 10:28:04,444 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:04,445 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Logging initialized using
>> configuration in file:/etc/hive/conf/hive-log4j.properties
>> 2016-12-19 10:28:14,700 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : OK
>> 2016-12-19 10:28:14,703 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Time taken: 0.935 seconds
>> 2016-12-19 10:28:15,559 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Query ID =
>> young_20161219102814_a7104fd4-ba83-47fc-ac0b-0c9bef4e1969
>> 2016-12-19 10:28:15,560 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Total jobs = 1
>> 2016-12-19 10:28:15,575 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Launching Job 1 out of 1
>> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Status: Running (Executing on YARN
>> cluster with App id application_1473415773736_1063281)
>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 :
>> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: -/- Reducer 2: 0/1
>> 2016-12-19 10:28:23,307 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
>> 2016-12-19 10:28:26,363 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
>> 2016-12-19 10:28:26,567 INFO  [pool-8-thread-7]
>> execution.AbstractExecutable:36 : Map 1: 0(+1)/2 Reducer 2: 0/1

Joint and Order in RowKey

2016-12-19 Thread Alberto Ramón
If we have these columns:
[image: Imágenes integradas 1]

With There Joints:
[image: Imágenes integradas 3]

*Why I cant  order these columns individually?*  (Text , Id) now must be a
tupple
[image: Imágenes integradas 4]

(I accept suggestion about order, anyo=year)


Re: Error when #2 Step: Redistribute Flat Hive Table - File does not exist

2016-12-19 Thread Alberto Ramón
i think i had this error last nigth  :)
(go to yarn to find detailed error & find on internet)
in my case was free space less than 10% of hard disk. Check this please

El 19/12/2016 11:35, "雨日听风" <491245...@qq.com> escribió:

> When I build a cube in kylin1.6, I get error in step #2: Redistribute Flat
> Hive Table
>
> Please help! Thank you very much!
>
> env: kylin1.6 is in a independent server, and have 2 other server cluster:
> clusterA(hive only) and clusterB(hbase only).
> Error is:
>
> 2016-12-19 10:28:00,641 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Compute row count of flat hive table,
> cmd:
> 2016-12-19 10:28:00,642 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : hive -e "USE boco;
> SET dfs.replication=2;
> SET hive.exec.compress.output=true;
> SET hive.auto.convert.join.noconditionaltask=true;
> SET hive.auto.convert.join.noconditionaltask.size=1;
> SET mapreduce.output.fileoutputformat.compress.type=BLOCK;
> SET mapreduce.job.split.metainfo.maxsize=-1;
> SET mapreduce.job.queuename=young;
> SET tez.queue.name=young;
>
> set hive.exec.compress.output=false;
>
> set hive.exec.compress.output=false;
> INSERT OVERWRITE DIRECTORY '/young/kylin_test/kylin_
> metadata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count'
> SELECT count(*) FROM kylin_intermediate_hbase_in_
> testCluster_CUBE_f9468805_eabf_4b54_bf2b_182e4c86214a;
>
> "
> 2016-12-19 10:28:03,277 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : WARNING: Use "yarn jar" to launch YARN
> applications.
> 2016-12-19 10:28:04,444 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:04,445 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Logging initialized using configuration
> in file:/etc/hive/conf/hive-log4j.properties
> 2016-12-19 10:28:14,700 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : OK
> 2016-12-19 10:28:14,703 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Time taken: 0.935 seconds
> 2016-12-19 10:28:15,559 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Query ID =
> young_20161219102814_a7104fd4-ba83-47fc-ac0b-0c9bef4e1969
> 2016-12-19 10:28:15,560 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Total jobs = 1
> 2016-12-19 10:28:15,575 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Launching Job 1 out of 1
> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:22,842 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Status: Running (Executing on YARN
> cluster with App id application_1473415773736_1063281)
> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 :
> 2016-12-19 10:28:23,104 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: -/- Reducer 2: 0/1
> 2016-12-19 10:28:23,307 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
> 2016-12-19 10:28:26,363 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0/2 Reducer 2: 0/1
> 2016-12-19 10:28:26,567 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0(+1)/2 Reducer 2: 0/1
> 2016-12-19 10:28:26,596 INFO  [pool-7-thread-1]
> threadpool.DefaultScheduler:118 : Job Fetcher: 1 should running, 1 actual
> running, 0 ready, 0 already succeed, 3 error, 1 discarded, 0 others
> 2016-12-19 10:28:26,769 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0(+2)/2 Reducer 2: 0/1
> 2016-12-19 10:28:29,810 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 0(+2)/2 Reducer 2: 0/1
> 2016-12-19 10:28:30,217 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 1(+1)/2 Reducer 2: 0(+1)/1
> 2016-12-19 10:28:30,826 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 2/2 Reducer 2: 0(+1)/1
> 2016-12-19 10:28:31,232 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Map 1: 2/2 Reducer 2: 1/1
> 2016-12-19 10:28:31,319 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Moving data to: /young/kylin_test/kylin_
> metadata_test/kylin-678266c0-ba0e-48b4-bdb5-6e578320375a/row_count
> 2016-12-19 10:28:31,406 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : OK
> 2016-12-19 10:28:31,454 INFO  [pool-8-thread-7]
> execution.AbstractExecutable:36 : Time taken: 16.701 seconds
> 2016-12-19 10:28:35,074 ERROR [pool-8-thread-7]
> execution.AbstractExecutable:357 : job:678266c0-ba0e-48b4-bdb5-6e578320375a-01
> execute finished with exception
> java.io.FileNotFoundException: File does not exist:
> /young/kylin_test/kylin_metadata_test/kylin-678266c0-
> ba0e-48b4-bdb5-6e578320375a/row_count/00_0
>  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
> INodeFile.java:71)
>  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
> INodeFile.java:61)
>  at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> 

Relocate SQL Query ?

2016-12-12 Thread Alberto Ramón
Hello

I understand (from my point of view) that SQL TAB can be better under Data
model and not under Cube

Cube tabs:

[image: Imágenes integradas 1]

Data Model Tabs:
[image: Imágenes integradas 2]


Re: Cut Size

2016-12-12 Thread Alberto Ramón
"it will do a cap" I dont't know what cap. this means  :)

Then what is the function of "kylin.storage.hbase.hfile-size-gb=2"

2016-12-12 2:58 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:

> when you have hfile-size-gb, you re-split HFile using max-region-count and
> region-cut-gb ?
>
> --> Yes; Kylin will estimate the total size, then divide by
> "regino-cut-gb" to get the region number; If the region number exceeds
> "max-region-count", it will do a cap.
>
> Medium , small, . ..  is deprecated (KYLIN-1669
> <https://issues.apache.org/jira/browse/KYLIN-1669>)?
> --> Yes, that marker has been removed; Will use same split configuration
> for all cubes; If user want to customize, he can overwrite the config
> values at cube level.
>
> 2016-12-08 21:27 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> I'm reading this MailList
>> <http://apache-kylin.74782.x6.nabble.com/Update-default-config-for-sandbox-environment-td6561.html>
>> and have some doubts (Example
>> <https://github.com/apache/kylin/blob/master/examples/test_case_data/sandbox/kylin.properties#L99>
>> ):
>>
>> region-cut-gb
>> max-region-count
>> hfile-size-gb
>>
>> when you have hfile-size-gb, you re-split HFile using max-region-count
>> and region-cut-gb ?? or is for normal ingest, Kylin 1323?
>>
>> Medium , small, . ..  is deprecated (KYLIN-1669
>> <https://issues.apache.org/jira/browse/KYLIN-1669>)? "# E.g, for cube
>> whose capacity be marked as "SMALL", split region per 10GB by default"
>> (From Example)
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Use derived or Joint

2016-12-08 Thread Alberto Ramón
Typical case 1:

*IDDate*

*Month_ID*

*Month_Txt*

*DayWeek_ID*

*DayWeek_Txt*

*Year*

2016-03-01

3

March

2

Wendesday

2016

2016-03-02

3

March

3

Thursday

2016

2016-03-02

3

March

4

Friday

2016

IDDate is PK of Dim table and Unique


SOL 1: Uses Hierarchy and Derived from non PK column


*Month_ID*

Hierarchy 2

Normal 1

*Month_Txt*


Derived 1

*DayWeek_ID*

Hierarchy 3

Normal 2

*DayWeek_Txt*


Derived 2

*Year*

Hierarchy 1

Normal 3

Year > Month > Day

Text are derived from ID (in month and Week)

PB1: KYLIN-444 

PB2: I don't know how create Derived column from non PK with actual UI (Kylin
– 1313  v1.5.2 Kylin 1786
, v1.5.3)



SOL 2:

*Month_ID*

Hierarchy 2

Join 1

*Month_Txt*


Join 1

*DayWeek_ID*

Hierarchy 3

Join 2

*DayWeek_Txt*


Join 2

*Year*

Hierarchy 1

Normal 3


SOL 2 is this the best solution ??



Typical case 2:

I see the same scenario a lot of times (derived columns with 1:1 Relation)

Product_ID *(PK)*

Product_TXT

TypeProduct_ID

TypeProduct_TXT

Country_TXT

Country_ID

Optimize queries by product / category / country, are mandatory

Perhaps,

Country (lower cardinality) its a good candidate to Join

I don't want put Product_TXT as Join, because is a long text, and can
be affect Row_Key of HBase, but I need Queries like ... where product_TXT =
""iRobot Roomba 650 Robotic Vacuum Cleaner

suggestions ?


Re: Consulting "EXTENDED_COLUMN"

2016-12-02 Thread Alberto Ramón
yes, I will asume this overhead in rowKey

2016-12-02 9:58 GMT+01:00 Billy(Yiming) Liu <liuyiming@gmail.com>:

> Using Joint Dimension for your 1:1 relation is the right design.
>
> 2016-12-02 0:21 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> Nice Liu
>>
>> We have some cases like
>> DayWeekTXT , DayWeekID
>> MonthTXT, MonthID
>>
>> small proposal:
>> Can would be interesting create Derived with 1:1 relation, with support
>> for filters and Group by
>>
>> 2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <liuyiming@gmail.com>:
>>
>>> The cost of joint dimension compared with extended column is you have
>>> more columns in the HBase rowkey. It may harm the query performance. But
>>> most time, joint dimension is still recommended, since the normal dimension
>>> column supports much more functions than extended column, such as count(*).
>>>
>>> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>
>>>> Hello
>>>> I was preparing a email with related doubts:
>>>>
>>>> Some times we have derived dimensions with relation 1:1, examples:
>>>> WeekDayID & WeekDayTxt
>>>> MonthID & WeekTxt
>>>>
>>>> SOL1: Derived.  ID as Host and Txt Extended
>>>> PB: You can't filter / Group by Txt
>>>>
>>>> SOL2: Joint. Define tuples of ID & TXT
>>>> Some PB/limitation?  (I need test this option)
>>>>
>>>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <liuyiming@gmail.com>:
>>>>
>>>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>>>> used for representation, but not filtering or grouping which is  done by
>>>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>>>> key/value map against the HOST_COLUMN.
>>>>>
>>>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>>>> dimensions with joint dimension setting, it has almost the same 
>>>>> performance
>>>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>>>> understanding.
>>>>>
>>>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>>>
>>>>>> This will help you
>>>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>>>
>>>>>> The idea is always, How I can reduce the number of Dimension ?
>>>>>> If you reduce Dim, the time / resources to build the cube and final
>>>>>> size of
>>>>>> it decrease --> Its good
>>>>>>
>>>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address,
>>>>>> .
>>>>>>Id_Person can be HostColumn
>>>>>> and other columns can be calculated from ID --> are Extended
>>>>>> Column
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qiutong...@jd.com>:
>>>>>>
>>>>>> > Hi ,all
>>>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although
>>>>>> I saw
>>>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>>>> Column”?
>>>>>> > Why use this expression,and what aspects of optimization that this
>>>>>> > expression solved?
>>>>>> > Can be combined with a SQL statement to explain?
>>>>>> >
>>>>>> >
>>>>>> > Thanks~
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> With Warm regards
>>>>>
>>>>> Yiming Liu (刘一鸣)
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: corrupt metastore

2016-12-01 Thread Alberto Ramón
yes, yes,
I had this type of problems, I needed used
  hdfs fsck
  hbase hbck
And solved all problems. --> pehaps some data has been lost

The nex steps will be:
-  check metadata of Kylin
-  check consistence between metadata and Kylin's tables


But I don't know if there is some tools/commands to do this
I saw metadata.sh script, but I cant find this functionality



2016-12-02 2:46 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Hi Alberto, It looks like the HBase service is in trouble, please check it
> firstly;
>
> 2016-12-02 8:03 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> I had some problems with corrupt data on HDFS and Meta HDFS
>> Now all services started OK
>>
>> *None query is excuted in none cube *
>> *Error while executing SQL "select part_dt, sum(price) as total_selled,
>> count(distinct seller_id) as sellers from kylin_sales group by part_dt
>> order by part_dt LIMIT 5":
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>> attempts=5, exceptions: Fri Dec 02 07:31:07 GMT+08:00 2016,
>> org.apache.hadoop.hbase.client.RpcRetryingCaller@6cb60fb6,
>> com.google.protobuf.InvalidProtocolBufferException:
>> com.google.protobuf.InvalidProtocolBufferException: Protocol message tag
>> had invalid wire type. at
>> com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
>> at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom*
>>
>>
>> *I tried to rebuild cube, but:*
>>
>>
>>
>>
>> *Could not read JSON: Can not construct instance of long from String
>> value '2000-12-07 06:30:00': not a valid Long value at [Source:
>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>> 21] (through reference chain:
>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]); nested
>> exception is com.fasterxml.jackson.databind.exc.InvalidFormatException: Can
>> not construct instance of long from String value '2000-12-07 06:30:00': not
>> a valid Long value at [Source:
>> org.apache.catalina.connector.CoyoteInputStream@6fcdf2de; line: 1, column:
>> 21] (through reference chain:
>> org.apache.kylin.rest.request.JobBuildRequest["startTime"]*
>>
>> *Some idea? I'm trying to metastore.sh, there is some check tool?*
>> 2016-12-01 16:21:34,162 ERROR [pool-7-thread-1] dao.ExecutableDao:148 :
>> error get all Jobs:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
>> attempts=6, exceptions:
>> Fri Dec 02 05:21:34 GMT+08:00 2016, null, java.net.SocketTimeoutException:
>> callTimeout=6, callDuration=122823: row '/execute/' on table
>> 'kylin_metadata' at region=kylin_metadata,,1477759808710.faab4c9
>> 88f06f17d9e903068db5b3b81., hostname=amb0.mycorp.kom,60020,1480614855596,
>> seqNum=1664
>>
>> at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepl
>> icas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:262)
>> at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.c
>> all(ScannerCallableWithReplicas.java:199)
>>
>> Caused by: java.net.SocketTimeoutException: callTimeout=6,
>> callDuration=122823: row '/execute/' on table 'kylin_metadata' at
>> region=kylin_metadata,,1477759808710.faab4c988f06f17d9e903068db5b3b81.
>>
>> *(re-deploy all isn't a problem, is only for knowledge)*
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Consulting "EXTENDED_COLUMN"

2016-12-01 Thread Alberto Ramón
Nice Liu

We have some cases like
DayWeekTXT , DayWeekID
MonthTXT, MonthID

small proposal:
Can would be interesting create Derived with 1:1 relation, with support for
filters and Group by

2016-12-01 11:55 GMT+01:00 Billy(Yiming) Liu <liuyiming@gmail.com>:

> The cost of joint dimension compared with extended column is you have more
> columns in the HBase rowkey. It may harm the query performance. But most
> time, joint dimension is still recommended, since the normal dimension
> column supports much more functions than extended column, such as count(*).
>
> 2016-12-01 17:07 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> Hello
>> I was preparing a email with related doubts:
>>
>> Some times we have derived dimensions with relation 1:1, examples:
>> WeekDayID & WeekDayTxt
>> MonthID & WeekTxt
>>
>> SOL1: Derived.  ID as Host and Txt Extended
>> PB: You can't filter / Group by Txt
>>
>> SOL2: Joint. Define tuples of ID & TXT
>> Some PB/limitation?  (I need test this option)
>>
>> 2016-12-01 0:35 GMT+01:00 Billy(Yiming) Liu <liuyiming@gmail.com>:
>>
>>> Thanks, Alberto. The explanation is accurate. EXTENDED_COLUMN is only
>>> used for representation, but not filtering or grouping which is  done by
>>> HOST_COLUMN. So EXTENDED_COLUMN is not a dimension, it works like a
>>> key/value map against the HOST_COLUMN.
>>>
>>> If the value in EXTENDED_COLUMN is not long, you could just define two
>>> dimensions with joint dimension setting, it has almost the same performance
>>> impact with EXTENDED_COLUMN which reduces one dimension, but better
>>> understanding.
>>>
>>> 2016-11-30 19:00 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>
>>>> This will help you
>>>> http://kylin.apache.org/docs/howto/howto_optimize_cubes.html
>>>>
>>>> The idea is always, How I can reduce the number of Dimension ?
>>>> If you reduce Dim, the time / resources to build the cube and final
>>>> size of
>>>> it decrease --> Its good
>>>>
>>>> An example can be DIM_Persons: Id_Person , Name, Surname, Address, .
>>>>Id_Person can be HostColumn
>>>> and other columns can be calculated from ID --> are Extended Column
>>>>
>>>>
>>>>
>>>>
>>>> 2016-11-30 11:35 GMT+01:00 仇同心 <qiutong...@jd.com>:
>>>>
>>>> > Hi ,all
>>>> > I don’t understand the usage scenarios of  EXTENDED_COLUMN,although I
>>>> saw
>>>> > this article “https://issues.apache.org/jira/browse/KYLIN-1313”.
>>>> > What,s the means about parameters of “Host Column” and “Extended
>>>> Column”?
>>>> > Why use this expression,and what aspects of optimization that this
>>>> > expression solved?
>>>> > Can be combined with a SQL statement to explain?
>>>> >
>>>> >
>>>> > Thanks~
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> With Warm regards
>>>
>>> Yiming Liu (刘一鸣)
>>>
>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: User MailList

2016-12-01 Thread Alberto Ramón
Nice ¡¡
Will be very helpfull to find similar problems

2016-12-01 13:31 GMT+01:00 Luke Han <luke...@gmail.com>:

> already working on that
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
>
> On Thu, Dec 1, 2016 at 5:15 PM +0800, "Alberto Ramón" <
> a.ramonporto...@gmail.com> wrote:
>
> Small Proposal:
>>
>> Dev mailList is in Nabble (more practical than mail-archives.apache.org:
>> You can find by txt, see pictures and more readable)
>>
>> Is it possible make the same with UserList ?
>>
>> (nowadays, a lot of user's doubts are in Dev MailList or in both)
>>
>


Re: ODBC 1.6

2016-11-28 Thread Alberto Ramón
Issue if you have a "dirty system" with previous version: ODBC 1.5

tested on:
- Win7 ultimate (new system) and tableau 10 ==> OK
- Win7 ultimate (new system) and PowerBI 2.4 ==> Error in preview  TB Kylin
Category (the other 2 works OK)
[image: Imágenes integradas 1]


The load data, fails in all tables:
[image: Imágenes integradas 2]

I attached 2 log:
log of kylin driver: no error founds on it
log of PowerBI: with error

Tip: Error respect ODBC Driver 1.5 is different

2016-11-28 15:54 GMT+01:00 Dong Li <lid...@apache.org>:

> Hello Alberto,
>
> Thanks very much for your feedback.
>
> There're some packaging issues in previous build. We've uploaded a new
> build.
> Please go to download page and find the latest ODBC Drive 1.6. Thanks.
>
> Thanks,
> Dong Li
>
> Thanks,
> Dong Li
>
> 2016-11-28 22:28 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> Nice ¡¡
>> Tell me and I will re-check in my Windows
>>
>> (I tried to install C++ 2013, c++2015 and c++2015 update 4,  With the
>> same negative result)
>>
>> 2016-11-28 15:24 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:
>>
>>> I see; We will check and upload a new build soon. will update here once
>>> finished.
>>>
>>> 2016-11-28 22:17 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>
>>>> x64, all system are 64 bits (win 7 ultimate, and Win Server 2008 R2)
>>>>
>>>> 2016-11-28 14:58 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:
>>>>
>>>>> Hi Alberto, Kylin ODBC zip has two exe files; which one are you
>>>>> installing, the x86 one or x64 one?
>>>>>
>>>>> 2016-11-28 21:51 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>>>
>>>>>> More Info:
>>>>>>
>>>>>> - Same error in Win Srv 2008R2
>>>>>> - I start ODBC config using: C:\Windows\System32\odbcad32.exe  (to
>>>>>> be sure start ODBC 64 bits version)
>>>>>>
>>>>>> 2016-11-28 14:34 GMT+01:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I'm try to test New Kylin ODBC Driver 1.6,
>>>>>>> When I try to create New ODBC, I have this error
>>>>>>> [image: Imágenes integradas 1]
>>>>>>>
>>>>>>> (I tested In two Win7 SP1, I didn't have problems with 1.5)
>>>>>>>
>>>>>>> The dependencies of Ms Visual C++ are the same than old version?
>>>>>>> (C++ 2012)
>>>>>>>
>>>>>>> Also saw the Version identified hasn't been changed: (but is a minor
>>>>>>> problem)
>>>>>>> [image: Imágenes integradas 2]
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Shaofeng Shi 史少锋
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
Log start: Mon Nov 28 21:24:11 2016

Log start: Mon Nov 28 21:24:11 2016

[INFO ][2016-11-28.21:24:11]SQLFreeStmt called, 496478464 with option 0
[INFO ][2016-11-28.21:24:11]
[INFO ][2016-11-28.21:24:11]start exec the query: 
[INFO ][2016-11-28.21:24:11]select "PART_DT",

"LEAF_CATEG_ID",

"LSTG_SITE_ID",

"LSTG_FORMAT_NAME",

"PRICE",

"SELLER_ID"

from "DEFAULT"."KYLIN_SALES"
[INFO ][2016-11-28.21:24:11]SQLFreeStmt called, 494018288 with option 0
[INFO ][2016-11-28.21:24:11]
[INFO ][2016-11-28.21:24:11]start exec the query: 
[INFO ][2016-11-28.21:24:11]select "USER_DEFINED_FIELD1",

"USER_DEFINED_FIELD3",

"UPD_DATE",

"UPD_USER",

"LEAF_CATEG_ID",

"SITE_ID",

"META_CATEG_NAME",

"CATEG_LVL2_NAME",

"CATEG_LVL3_NAME"

from "DEFAULT"."KYLIN_CATEGORY_GROUPINGS"
Log start: Mon Nov 28 21:24:11 2016

[INFO ][2016-11-28.21:24:11]Successfully done executing the query
[INFO ][2016-11-28.21:24:11]SQLFreeHandle called, Handle Type: 3, Handle: 494018288
[INFO ][2016-11-28.21:24:11]SQLDisconnect called
[INFO ][2016-11-28.21

Re: ODBC 1.6

2016-11-28 Thread Alberto Ramón
x64, all system are 64 bits (win 7 ultimate, and Win Server 2008 R2)

2016-11-28 14:58 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Hi Alberto, Kylin ODBC zip has two exe files; which one are you
> installing, the x86 one or x64 one?
>
> 2016-11-28 21:51 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> More Info:
>>
>> - Same error in Win Srv 2008R2
>> - I start ODBC config using: C:\Windows\System32\odbcad32.exe  (to be
>> sure start ODBC 64 bits version)
>>
>> 2016-11-28 14:34 GMT+01:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>
>>> Hello
>>>
>>> I'm try to test New Kylin ODBC Driver 1.6,
>>> When I try to create New ODBC, I have this error
>>> [image: Imágenes integradas 1]
>>>
>>> (I tested In two Win7 SP1, I didn't have problems with 1.5)
>>>
>>> The dependencies of Ms Visual C++ are the same than old version? (C++
>>> 2012)
>>>
>>> Also saw the Version identified hasn't been changed: (but is a minor
>>> problem)
>>> [image: Imágenes integradas 2]
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


ODBC 1.6

2016-11-28 Thread Alberto Ramón
Hello

I'm try to test New Kylin ODBC Driver 1.6,
When I try to create New ODBC, I have this error
[image: Imágenes integradas 1]

(I tested In two Win7 SP1, I didn't have problems with 1.5)

The dependencies of Ms Visual C++ are the same than old version? (C++ 2012)

Also saw the Version identified hasn't been changed: (but is a minor
problem)
[image: Imágenes integradas 2]


Re: Codec of decimal (10,6)

2016-11-26 Thread Alberto Ramón
@ShaoFeng,  You're right
Queries have same result in Hive and Kylin.

*I only have has a problem with MAX* (don't work. But sum, min, avg works
OK) (apache-kylin-1.6.0-SNAPSHOT-bin RC1)

SELECT
SUM (FACT_VALORACIONES.VALORACION) as SUM_Valoracion
,MIN (FACT_VALORACIONES.VALORACION) as MIN_Valoracion
--,MAX (FACT_VALORACIONES.VALORACION) as MAX_Valoracion
,AVG (FACT_VALORACIONES.VALORACION) as AVG_Valoracion

FROM HERR_BANK.FACT_VALORACIONES as FACT_VALORACIONES
INNER JOIN HERR_BANK.DIM_FECHAS as DIM_FECHAS
ON FACT_VALORACIONES.IDFECHAVALORACION = DIM_FECHAS.IDFECHA
Group by DIM_FECHAS.ANYO


ERROR:
 *Can't find any realization. Please confirm with providers*. SQL digest:
fact table HERR_BANK.FACT_VALORACIONES,group by
[HERR_BANK.DIM_FECHAS.ANYO],filter on [],with aggregates[FunctionDesc
[expression=SUM, parameter=ParameterDesc [type=column, value=VALORACION,
nextParam=null], returnType=null], FunctionDesc [expression=COUNT,
parameter=ParameterDesc [type=column, value=VALORACION, nextParam=null],
returnType=null], FunctionDesc [expression=MIN, parameter=ParameterDesc
[type=column, value=VALORACION, nextParam=null], returnType=null],
FunctionDesc [expression=MAX, parameter=ParameterDesc [type=column,
value=VALORACION, nextParam=null], returnType=null]].


The result must be:
[image: Imágenes integradas 1]

Measure definition:
[image: Imágenes integradas 2]

2016-11-26 8:20 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Hi Alberto,
>
> User need aware that Cube only has aggregated data, no raw data; at the
> very begining Kylin will throw error on query like "select * "; but to
> provide a better user experience (also to support some BI tools which need
> load a subset data to warm up), Kylin answers such query from the base
> cuboid (group by all dimensions). The measure column value will be the
> aggregated value; So user could not directly compare the "select *" result
> from a cube with the source data. If you're comparing the aggregated
> queries, I believe they are totally the same.
>
>
>
> 2016-11-26 4:39 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> I have a super-Fact Table with 5 rows
>> [image: Imágenes integradas 2]
>>
>>
>> A- Data in CSV == Hive (OK)
>> B- Select * from Fact, in Kylin some values are different
>>
>> The value 9942758, has been  transformed in 10937033.8 !!!
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Testing 1.6

2016-11-26 Thread Alberto Ramón
JaJa Thanks
I will test immediately

2016-11-26 13:02 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:

> @Yang, I didn't see the function of "auto refresh" on 1.6.0; also not
> found in JIRA; are you sure it has been implemented?
>
> @Alberto, the upgrade guide for 1.6.0 has been updated in
> https://kylin.apache.org/docs16/howto/howto_upgrade.html , FYI
>
> 2016-11-26 19:36 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> I'm re-testing auto refreshing Job Progress , and not work in my case.
>> I used Firefox and Chromium on Ubuntu 16.04
>> Isn't important for me because, the refresh button works OK
>> [image: Imágenes integradas 1]
>>
>> 2016-11-18 4:30 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:
>>
>>> Sure, I will add a section in the "How to upgrade" page for v1.6.0
>>>
>>> 2016-11-18 11:21 GMT+08:00 Li Yang <liy...@apache.org>:
>>>
>>>> The auto refreshing job progress is a new feature in 1.6. Earlier
>>>> version won't auto refresh. Maybe wipe out browser cache and try again?
>>>>
>>>> The 1.6 metadata is compatible with previous version. The upgrade shall
>>>> be pretty straightforward. But you are right. It deserves a document.
>>>>
>>>> @Shaofeng, consider an upgrade guide next time.
>>>>
>>>> Cheers
>>>> Yang
>>>>
>>>> On Wed, Nov 16, 2016 at 3:02 AM, Alberto Ramón <
>>>> a.ramonporto...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> - When you are building Cube, the web is not auto-refresh of Web page,
>>>>> was there the old behavior of older versions ?   (I use Chrome)
>>>>>
>>>>> - There isn't doc about migration from Old Version
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi 史少锋
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Codec of decimal (10,6)

2016-11-25 Thread Alberto Ramón
I have a super-Fact Table with 5 rows
[image: Imágenes integradas 2]


A- Data in CSV == Hive (OK)
B- Select * from Fact, in Kylin some values are different

The value 9942758, has been  transformed in 10937033.8 !!!


Re: Release apache-kylin-1.6.0 (RC2)

2016-11-24 Thread Alberto Ramón
Left: result of
* my ./build/script/package.sh OK*
Rigth:apache-kylin-1.6.0-SNAPSHOT-bin

hu... the result is not the same.
I need move files manually to generate a bin.tar.gz?

[image: Imágenes integradas 1]


[image: Imágenes integradas 2]

2016-11-24 3:43 GMT+01:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Hi Alberto, thanks for the question; the "/build" folder was excluded by
> the assembly tool by mistake I think (the name is too common). I created a
> JIRA (KYLIN-2229) for sovling this. For now please get the "/build" folder
> from Kylin's git repository. Please note, to build a binary package, need
> install maven, npm and grunt at first.
>
> 2016-11-24 5:39 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> My vote = Null (I'm rookie)
>>
>> 1 - Using: https://dist.apache.org/repos/dist/dev/kylin/apache-kylin-1.
>> 6.0-rc2/
>> 2 - mvn clean install -DskipTests  -->  OK on Ubuntu 16.04
>>
>> But how create Binary package? :
>> http://kylin.apache.org/development/howto_package.html,  I don't have
>> /build/ folder ?
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Testing 1.6

2016-11-15 Thread Alberto Ramón
Hi

- When you are building Cube, the web is not auto-refresh of Web page, was
there the old behavior of older versions ?   (I use Chrome)

- There isn't doc about migration from Old Version


IN_THRESHOLD

2016-11-15 Thread Alberto Ramón
About Kylin 2193
What is the poupose of
org.apache.kylin.storage.translate.DerivedFilterTranslator# IN_THRESHOLD ?
:)
(when is used?)


Re: 答复: Most Used BI Tools

2016-11-07 Thread Alberto Ramón
Hello

I'm try to doc the integration of Kylin with BI Tools (I will add more info
in the next weeks)
(https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain)

As resume:

- Microsoft PowerBI: (Bug KYLIN-2121) It not works
- Qlik: Very partial support (not recommended for production)
- Tableau: Some small issue, ready for production
- Hue: Partial Support, only work with table output (no graphics ), no more
1000 record, no auto complete support
- SQuirrieL: (Isn't a BI tool) Works fine
- Flink: (Isn't a BI tool) Works fine (need v1.2, under development
nowadays)
- Sisense: Fail
- Kylin Caravel: Fail, and nowadays is not maintained


Alb


2016-11-07 9:20 GMT+01:00 仇同心 <qiutong...@jd.com>:

> Can you detailed introduction of the BI tools?
>
>
>
> *发件人:* hongbin ma [mailto:mahong...@apache.org]
> *发送时间:* 2016年11月7日 13:39
> *收件人:* user.kylin
> *主题:* Re: Most Used BI Tools
>
>
>
> nice stuff!
>
>
>
> On Mon, Nov 7, 2016 at 1:09 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
> fyi:
>
> https://image-store.slidesharecdn.com/0d3c9706-
> b8a6-4716-afc1-26beb82c8704-large.png
>
>
>
>
>
> --
>
> Regards,
>
>
> *Bin Mahone | **马洪宾*
>


Most Used BI Tools

2016-11-06 Thread Alberto Ramón
fyi:

https://image-store.slidesharecdn.com/0d3c9706-b8a6-4716-afc1-26beb82c8704-large.png


Re: Kylin Dependencies

2016-11-02 Thread Alberto Ramón
yes, I tested (and use) this and other previous version

BUT the image :
  -  more than > 1000 process
  -  more than > 3GB

This is OK (very OK) for testing / develop / PoC

But for production (docker recomendations):
  -  Ideally 1 process (5-10 can be acceptable)
  -  < 100 MB (200 - 300 MB can be acceptable)

The target is: Create Kylin docker (minimal) *with out install *Hive,
YARN, HDFS, or HBase 

2016-11-02 15:52 GMT+01:00 Billy(Yiming) Liu <liuyiming@gmail.com>:

> Here is a quick start for running Kylin on docker, https://github.com/
> kyligence/kylin-docker
>
> From the docker file, you could find the kylin dependencies.
>
> 2016-11-02 22:46 GMT+08:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>
>> With configs ... I can try it (Will be an interesting exercise for
>> me)
>> But libraries, ...
>>These libraries can be static compiled on Kylin?
>> Any Idea / solution about how to solve all dependecies with out
>> install HDFS, Yarn, Hive, HBase in this minimal Linux... ?
>>
>> the idea is make "minimal linux + Kylin" "to docker it"
>> (The result must be few MB, < 150 MB)
>>
>>
>> 2016-11-02 14:20 GMT+01:00 Li Yang <liy...@apache.org>:
>>
>>> Kylin needs Hadoop client library and configs, including hdfs, yarn,
>>> hive, hbase.
>>>
>>> On Sun, Oct 30, 2016 at 1:42 AM, Alberto Ramón <
>>> a.ramonporto...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Target:
>>>>   All Kylin docker are VERY heavy !! (GB and hundred of process) --->
>>>> That Is Good for Develop / testing , but BAD Idea for production
>>>> I'm trying to install Kylin on minimal linux, ideally Alpine or similar
>>>>
>>>> I have:
>>>> -  a clean install of linux (minimal Centos for example) , without
>>>> Hadoop, and and install Kylin from binary
>>>>  - use remote HBase & Hive
>>>>
>>>>
>>>> Which dependencies of Kylin I Will need on my Centos / Alpine?
>>>>
>>>> BR, Alb
>>>>
>>>
>>>
>>
>
>
> --
> With Warm regards
>
> Yiming Liu (刘一鸣)
>


Re: Kylin Dependencies

2016-11-02 Thread Alberto Ramón
With configs ... I can try it (Will be an interesting exercise for me)
But libraries, ...
   These libraries can be static compiled on Kylin?
Any Idea / solution about how to solve all dependecies with out install
HDFS, Yarn, Hive, HBase in this minimal Linux... ?

the idea is make "minimal linux + Kylin" "to docker it"
(The result must be few MB, < 150 MB)


2016-11-02 14:20 GMT+01:00 Li Yang <liy...@apache.org>:

> Kylin needs Hadoop client library and configs, including hdfs, yarn, hive,
> hbase.
>
> On Sun, Oct 30, 2016 at 1:42 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> Hi
>>
>> Target:
>>   All Kylin docker are VERY heavy !! (GB and hundred of process) --->
>> That Is Good for Develop / testing , but BAD Idea for production
>> I'm trying to install Kylin on minimal linux, ideally Alpine or similar
>>
>> I have:
>> -  a clean install of linux (minimal Centos for example) , without Hadoop,
>> and and install Kylin from binary
>>  - use remote HBase & Hive
>>
>>
>> Which dependencies of Kylin I Will need on my Centos / Alpine?
>>
>> BR, Alb
>>
>
>


Re: Kylin Version on UI

2016-11-02 Thread Alberto Ramón
jaja, Thanks ¡¡

+1  ;)

2016-11-02 14:22 GMT+01:00 Li Yang <liy...@apache.org>:

> Yeah, there is a JIRA: KYLIN-1850
> <https://issues.apache.org/jira/browse/KYLIN-1850>
>
> Should be easy to do. Perhaps too easy to get attention.  :-)
>
> On Sun, Oct 30, 2016 at 1:53 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> (Honestly, I'm using  old Kylin version, I don't know if somebody do it
>> in new version)
>>
>> Similar to Kylin 2096 <https://issues.apache.org/jira/browse/KYLIN-2096>
>> Can be useful, put "Kylin version" on UI
>> Exists ?
>>
>>
>> <https://issues.apache.org/jira/browse/KYLIN-2096>
>>
>>
>> <https://issues.apache.org/jira/browse/KYLIN-2096>
>>
>>
>>
>


Kylin Version on UI

2016-10-29 Thread Alberto Ramón
(Honestly, I'm using  old Kylin version, I don't know if somebody do it in
new version)

Similar to Kylin 2096 
Can be useful, put "Kylin version" on UI
Exists ?








Kylin Dependencies

2016-10-29 Thread Alberto Ramón
Hi

Target:
  All Kylin docker are VERY heavy !! (GB and hundred of process) ---> That
Is Good for Develop / testing , but BAD Idea for production
I'm trying to install Kylin on minimal linux, ideally Alpine or similar

I have:
-  a clean install of linux (minimal Centos for example) , without Hadoop,
and and install Kylin from binary
 - use remote HBase & Hive


Which dependencies of Kylin I Will need on my Centos / Alpine?

BR, Alb


Read Apache Kylin from Apache Flink

2016-10-18 Thread Alberto Ramón
Hello

I made a small contribution / manual about:
*"How-to Read  Apache Kylin data from  Apache Flink With Scala"
*




For any suggestions, feel free to contact me

Thanks, Alberto


Re: Question about future Kylin features and roapmap.

2016-10-04 Thread Alberto Ramón
In true Snowflake schema:
 - Is a limitation or not of BI program , I never tested on Tableau
 - Both queries must work, but: I don't Know if with Snowflake the
Optmization  "Dynamic Partition Pruning (HIVE-7826)" is translated to HBase
queries (I don't have this Knowledge, sorry)

2016-10-04 8:38 GMT+02:00 Ashika Umanga Umagiliya <umanga@gmail.com>:

> Sorry,
> I meant "Snowflake schema" not "Star Schema" . Star Schema is already
> supported.
>
> On Tue, Oct 4, 2016 at 3:24 PM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> 1 - I think, no
>> 2 - There isnt MDX support (Kylin 1525
>> <https://issues.apache.org/jira/browse/KYLIN-1525> & Kylin 776
>> <https://issues.apache.org/jira/browse/KYLIN-776>):For now you can
>> use mondrian <http://community.pentaho.com/projects/mondrian/> see this
>> example
>> <https://translate.google.com/translate?act=url=1=en=UTF8=_t=translate.google.com=auto=en=http://lxw1234.com/archives/2016/05/647.htm>
>> 3 - Why not ¿?
>> 4 -  How to Bi-Tools
>> <http://kylin.apache.org/docs15/gettingstarted/best_practices.html>  BI
>> tools (like Tableau) see tables, but queries are resolved using the Cube
>> (in HBase). The BI tool must interpret what is a fact table and what is a
>> Dim table. Tableau can do it (manual1
>> <https://kylin.apache.org/docs15/tutorial/tableau_91.html>and manual2
>> <https://github.com/albertoRamon/Kylin/tree/master/KylinWithTableau>) if
>> you define a data model. Other programs
>> <https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain> have
>> some integration issues :(
>>
>> 2016-10-04 2:53 GMT+02:00 Ashika Umanga Umagiliya <umanga@gmail.com>:
>>
>>> Greetings ,
>>>
>>> We managed to finish a PoC on using Kylin as enterprise level BI
>>> tool.Our primary  goal is to streamline the KPI management in the
>>> orgnisation.
>>> However we come up with following limitation and want to know whether
>>> they are in your roadmap
>>>
>>> 1) Will there be a MAC ODBC Driver available? (We use couple of BI tools
>>> in different BUs including Tabalue,PowerBI,Excel,Domo..etc in different
>>> platforms)
>>> 2) Support MDX
>>> 3) Star Schema (I assume answer for this is No)
>>> 4) Presentation layer for BI tools. (For example when we connect to a
>>> cube  in Tableau, it looks as if its connected to  table.There's no
>>> grouping of Dimensions and Measure, grouping under folders...etc  )
>>>
>>
>>
>
>
> --
> Umanga
> http://jp.linkedin.com/in/umanga
> http://umanga.ifreepages.com
>


Re: Question about future Kylin features and roapmap.

2016-10-04 Thread Alberto Ramón
1 - I think, no
2 - There isnt MDX support (Kylin 1525
 & Kylin 776
):For now you can use
mondrian  see this example

3 - Why not ¿?
4 -  How to Bi-Tools
  BI
tools (like Tableau) see tables, but queries are resolved using the Cube
(in HBase). The BI tool must interpret what is a fact table and what is a
Dim table. Tableau can do it (manual1
and manual2
) if
you define a data model. Other programs
 have some
integration issues :(

2016-10-04 2:53 GMT+02:00 Ashika Umanga Umagiliya :

> Greetings ,
>
> We managed to finish a PoC on using Kylin as enterprise level BI tool.Our
> primary  goal is to streamline the KPI management in the orgnisation.
> However we come up with following limitation and want to know whether they
> are in your roadmap
>
> 1) Will there be a MAC ODBC Driver available? (We use couple of BI tools
> in different BUs including Tabalue,PowerBI,Excel,Domo..etc in different
> platforms)
> 2) Support MDX
> 3) Star Schema (I assume answer for this is No)
> 4) Presentation layer for BI tools. (For example when we connect to a cube
>  in Tableau, it looks as if its connected to  table.There's no grouping of
> Dimensions and Measure, grouping under folders...etc  )
>


Re: support count or count distinct measure in multiple column topN

2016-09-24 Thread Alberto Ramón
KYLIN-1377 

2016-09-24 9:25 GMT+02:00 ShaoFeng Shi :

> It is possible, while with limited resource, we may not able to implement
> it soon. If you'd like to contribute, that would be nice.
>
> 2016-09-19 9:43 GMT+08:00 赵天烁 :
>
>> I have a cube that I need to compute PV and UV at the same time within
>> each cuboid which contain ultra high cardinality.it will be cool if I
>> could create a topN measure to cover both PV and UV,but right now topN only
>> support ORDER|SUM by Column,so if I create a PV topN measure,then UV is
>> resitricted because I could neither create a topN count distinct measure
>> nor a normal count distinct in that same cube .
>> wonder if this is possible to support count、count distinct or other agg
>> method in multiple column topN?
>>
>> --
>>
>> 赵天烁
>>
>> Kevin Zhao
>>
>> *zhaotians...@meizu.com *
>>
>>
>>
>> 珠海市魅族科技有限公司
>>
>> MEIZU Technology Co., Ltd.
>>
>> 广东省珠海市科技创新海岸魅族科技楼
>>
>> MEIZU Tech Bldg., Technology & Innovation Coast
>>
>> Zhuhai, 519085, Guangdong, China
>>
>> meizu.com
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
>


Re: Error while building cube from stream

2016-09-20 Thread Alberto Ramón
I don't know but , can you check this change?: KYLIN-1744
 in V1.3


2016-09-20 14:50 GMT+02:00 Tony Lee :

> Hi,
>
> I was building cube from stream as the document(http://kylin.apache.
> org/docs15/tutorial/cube_streaming.html
>
> ) says.
>
> I was using 1.5.3, and i encounter this error. Same error on 1.5.4.
> Everything fine on 1.5.2.1.
>
> Any idea how to solve this?
>
>
> 2016-09-20 20:31:51,520 INFO  [main KafkaStreamingInput:129]: finish to
> get streaming batch, total message count:30
> 2016-09-20 20:31:51,532 DEBUG [main CubeManager:855]: Reloaded new cube:
> STREAMING_CUBE with reference beingCUBE[name=STREAMING_CUBE] having 1
> segments:KYLIN_2822I1W3CX
> 2016-09-20 20:31:51,536 INFO  [main CubeManager:314]: Updating cube
> instance 'STREAMING_CUBE'
> 2016-09-20 20:31:51,538 WARN  [main StreamingCLI:127]: invalid
> args:streaming start STREAMING_CUBE 147437454_147437460 -start
> 147437454 -end 147437460 -cube STREAMING_CUBE
> 2016-09-20 20:31:51,539 ERROR [main StreamingCLI:103]: error start
> streaming
> java.lang.IllegalStateException: Segments overlap:
> STREAMING_CUBE[FULL_BUILD] and STREAMING_CUBE[FULL_BUILD]
> at org.apache.kylin.cube.CubeValidator.validate(CubeValidator.java:85)
> at org.apache.kylin.cube.CubeManager.updateCubeWithRetry(
> CubeManager.java:358)
> at org.apache.kylin.cube.CubeManager.updateCube(CubeManager.java:301)
> at org.apache.kylin.cube.CubeManager.appendSegment(CubeManager.java:441)
> at org.apache.kylin.engine.streaming.cube.StreamingCubeBuilder.
> createBuildable(StreamingCubeBuilder.java:118)
> at org.apache.kylin.engine.streaming.OneOffStreamingBuilder$1.run(
> OneOffStreamingBuilder.java:76)
> at org.apache.kylin.engine.streaming.cli.StreamingCLI.
> startOneOffCubeStreaming(StreamingCLI.java:123)
> at org.apache.kylin.engine.streaming.cli.StreamingCLI.
> main(StreamingCLI.java:97)
> 2016-09-20 20:31:51,543 INFO  [Thread-0 ConnectionManager$
> HConnectionImplementation:1678]: Closing zookeeper
> sessionid=0x35708fbc2740013
> 2016-09-20 20:31:51,549 INFO  [Thread-0 ZooKeeper:684]: Session:
> 0x35708fbc2740013 closed
> 2016-09-20 20:31:51,549 INFO  [main-EventThread ClientCnxn:512]:
> EventThread shut down
>
>


Kylin and BI Tools

2016-09-19 Thread Alberto Ramón
Hello

This is the end of all my previous articles, about Kylin and differents
tools
With some successful and some failures   :)


https://github.com/albertoRamon/Kylin/tree/master/KylinWithMain



If you have any comment / improvement, feel free to indicate me the changes
A lot of thanks to the "Kylin Team", Alb


Re: Use Apache HUE with Kylin

2016-09-18 Thread Alberto Ramón
yes,
they have in the road map, which each data source can use particular
command to get a metadata info

but a lot of tools uses: *show databases, show tables, ... select * from
[nom_TB]  *(Which generate error in Kylin)

If there is any improve in HUE integration, I will put in this thread




2016-09-18 13:29 GMT+02:00 Luke Han <luke...@gmail.com>:

> I see, maybe you should add your comments there to ask help from Hue
> community
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Sat, Sep 17, 2016 at 1:20 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> See this: HUE-4011 <https://issues.cloudera.org/browse/HUE-4011>
>>
>> I have similar metadata problems, with other vendors (See my comment1
>> <http://apache-kylin.74782.x6.nabble.com/Query-Metadata-td5700.html> and
>> coment2
>> <http://mail-archives.apache.org/mod_mbox/kylin-user/201609.mbox/%3CCAEcyM14%2BCKz5j7gJEMUVYCtdHFcmuWJKRMQPbisDQhW3G5Mc9Q%40mail.gmail.com%3E>
>> )
>>
>> 2016-09-16 18:00 GMT+02:00 Luke Han <luke...@gmail.com>:
>>
>>> Very nice!
>>>
>>> What's metadata bug?
>>>
>>>
>>> Best Regards!
>>> -
>>>
>>> Luke Han
>>>
>>> On Thu, Sep 8, 2016 at 3:51 PM, Alberto Ramón <a.ramonporto...@gmail.com
>>> > wrote:
>>>
>>>> From Hue MailList Link
>>>> <https://groups.google.com/a/cloudera.org/forum/#%21topic/hue-user/zhPzJth3h3s>
>>>> (Romain Rigaux) :
>>>>
>>>> Great guide!
>>>>
>>>> I updated to refer to it:
>>>> http://gethue.com/sql-editor/
>>>> http://gethue.com/custom-sql-query-editors/
>>>>
>>>> Thanks!
>>>>
>>>> PS: hope the metadata bug gets picked up at some point ;)
>>>>
>>>>
>>>> BR, Alberto Ramón
>>>>
>>>> 2016-08-30 11:09 GMT+02:00 Luke Han <luke...@gmail.com>:
>>>>
>>>>> Thanks Alberto, your contribution will benefit many people:)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Best Regards!
>>>>> -
>>>>>
>>>>> Luke Han
>>>>>
>>>>> On Mon, Aug 29, 2016 at 11:38 PM, Alberto Ramón <
>>>>> a.ramonporto...@gmail.com> wrote:
>>>>>
>>>>>> @Luke Han:  fell free to add where you want
>>>>>> (I think the integration with HUE, will be very interested due to
>>>>>> Notebooks, The people use it a lot in reports of BI or ML, with MarkDown 
>>>>>> )
>>>>>>
>>>>>> About Kylin + Hue:
>>>>>>   - Hue 3.10 & 3.11, have the same issues. But in the next version
>>>>>> ¿3.12? Will solve some bug, and the integration will be better and easier
>>>>>>   (Hue 3.10 is in Cloudera 5.8, and 3.11 have less than 10 days old
>>>>>> ¡¡, --> It's a very new feature)
>>>>>>   - Hue Hue 3228 <https://issues.cloudera.org/browse/HUE-3228>, can
>>>>>> be VERY interested but is only in the road map
>>>>>>
>>>>>> About Kylin + Other programs:
>>>>>>   - I'm testing with other programs, when I have final results, I
>>>>>> will put on GitHub and MailList
>>>>>>
>>>>>> BR, Alberto
>>>>>>
>>>>>> 2016-08-29 14:46 GMT+02:00 Luke Han <luke...@gmail.com>:
>>>>>>
>>>>>>> Hi Alberto,
>>>>>>> Would you mind to share more to create one how to wiki and
>>>>>>> submit patch to doc branch? I think this will help a lot for users who 
>>>>>>> want
>>>>>>> leverage Hue with Kylin.
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Luke
>>>>>>>
>>>>>>>
>>>>>>> Best Regards!
>>>>>>> -
>>>>>>>
>>>>>>> Luke Han
>>>>>>>
>>>>>>> On Sun, Aug 28, 2016 at 12:11 PM, hongbin ma <mahong...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> ​thanks Alberto!​
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Use Apache HUE with Kylin

2016-09-16 Thread Alberto Ramón
See this: HUE-4011 <https://issues.cloudera.org/browse/HUE-4011>

I have similar metadata problems, with other vendors (See my comment1
<http://apache-kylin.74782.x6.nabble.com/Query-Metadata-td5700.html> and
coment2
<http://mail-archives.apache.org/mod_mbox/kylin-user/201609.mbox/%3CCAEcyM14%2BCKz5j7gJEMUVYCtdHFcmuWJKRMQPbisDQhW3G5Mc9Q%40mail.gmail.com%3E>
)

2016-09-16 18:00 GMT+02:00 Luke Han <luke...@gmail.com>:

> Very nice!
>
> What's metadata bug?
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Thu, Sep 8, 2016 at 3:51 PM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> From Hue MailList Link
>> <https://groups.google.com/a/cloudera.org/forum/#%21topic/hue-user/zhPzJth3h3s>
>> (Romain Rigaux) :
>>
>> Great guide!
>>
>> I updated to refer to it:
>> http://gethue.com/sql-editor/
>> http://gethue.com/custom-sql-query-editors/
>>
>> Thanks!
>>
>> PS: hope the metadata bug gets picked up at some point ;)
>>
>>
>> BR, Alberto Ramón
>>
>> 2016-08-30 11:09 GMT+02:00 Luke Han <luke...@gmail.com>:
>>
>>> Thanks Alberto, your contribution will benefit many people:)
>>>
>>>
>>>
>>>
>>> Best Regards!
>>> -
>>>
>>> Luke Han
>>>
>>> On Mon, Aug 29, 2016 at 11:38 PM, Alberto Ramón <
>>> a.ramonporto...@gmail.com> wrote:
>>>
>>>> @Luke Han:  fell free to add where you want
>>>> (I think the integration with HUE, will be very interested due to
>>>> Notebooks, The people use it a lot in reports of BI or ML, with MarkDown )
>>>>
>>>> About Kylin + Hue:
>>>>   - Hue 3.10 & 3.11, have the same issues. But in the next version
>>>> ¿3.12? Will solve some bug, and the integration will be better and easier
>>>>   (Hue 3.10 is in Cloudera 5.8, and 3.11 have less than 10 days old ¡¡,
>>>> --> It's a very new feature)
>>>>   - Hue Hue 3228 <https://issues.cloudera.org/browse/HUE-3228>, can be
>>>> VERY interested but is only in the road map
>>>>
>>>> About Kylin + Other programs:
>>>>   - I'm testing with other programs, when I have final results, I will
>>>> put on GitHub and MailList
>>>>
>>>> BR, Alberto
>>>>
>>>> 2016-08-29 14:46 GMT+02:00 Luke Han <luke...@gmail.com>:
>>>>
>>>>> Hi Alberto,
>>>>> Would you mind to share more to create one how to wiki and submit
>>>>> patch to doc branch? I think this will help a lot for users who want
>>>>> leverage Hue with Kylin.
>>>>>
>>>>> Thanks.
>>>>> Luke
>>>>>
>>>>>
>>>>> Best Regards!
>>>>> -
>>>>>
>>>>> Luke Han
>>>>>
>>>>> On Sun, Aug 28, 2016 at 12:11 PM, hongbin ma <mahong...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> ​thanks Alberto!​
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> *Bin Mahone | 马洪宾*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Kylin ODBC

2016-09-14 Thread Alberto Ramón
The cube is the Kylin Example, the fact table have 9.800 Row, and return
time to Kylin UI are 0.3 Seconds
and CATEGORY_GROUPINGS has 144 rows

==> I think isnt't a performance problem

*sound like , cast problem, overflow of int, or Nulls*
But I don't Know the which Row / Column have a problematic Value in any log
  :(

2016-09-14 8:08 GMT+02:00 Abhilash L L <abhil...@infoworks.io>:

> Hello Alberto,
>
> When the request is made via the Kylin UI, it adds an 'accept partial'
> flag, which is similar to a 'limit' in sql world.
>
> When the query is being sent over odbc, kylin server is expected to
> send all the records as the output
>
>  Please check the thread titled 'kylin cube performance' for more
> information.
>
> Regards,
> Abhilash
>
> On Wed, Sep 14, 2016 at 10:48 AM, Alberto Ramón <a.ramonporto...@gmail.com
> > wrote:
>
>> I found this
>> From Kylin ODBC LOG:
>>
>> *[INFO ][2016-09-14.06:48:47]select "PART_DT", *
>> *"LEAF_CATEG_ID", *
>> *"LSTG_SITE_ID", *
>> *"LSTG_FORMAT_NAME", *
>> *"PRICE", *
>> *"SELLER_ID" *
>> *from "DEFAULT"."KYLIN_SALES"*
>> *[ERROR][2016-09-14.06:49:53]The REST query request failed, the error
>> message is: Error while executing SQL "select "PART_DT",
>> "LEAF_CATEG_ID",  "LSTG_SITE_ID",      "LSTG_FORMAT_NAME",
>> "PRICE",  "SELLER_ID"  from "DEFAULT"."KYLIN_SALES"": Timeout visiting
>> cube!*
>>
>> This querie *from Microsoft, never finish* or kill my HBase, but from
>> Kylin UI works fine < 0.5 sec
>>
>> 2016-09-14 6:49 GMT+02:00 Alberto Ramón <a.ramonporto...@gmail.com>:
>>
>>> Hello
>>> I tested New Microsoft Power BI with Kylin
>>> [image: Imágenes integradas 1]
>>> With KYLIN_CAL_DT connector can read data:
>>> [image: Imágenes integradas 2]
>>>
>>>
>>> But with KYLIN GROUPING AND SALES:
>>> And I have this error
>>>
>>> Unexpected error: *Value was either too large or too small for an
>>> Int32.*
>>> Details:
>>> Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was
>>> either too large or too small for an Int32. --->
>>> Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either
>>> too large or too small for an Int32. ---> 
>>> Microsoft.Mashup.Evaluator.Interface.ErrorException:
>>> Value was either too large or too small for an Int32. --->
>>> Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either
>>> too large or too small for an Int32. ---> 
>>> Microsoft.Mashup.Evaluator.Interface.ErrorException:
>>> Value was either too large or too small for an Int32. --->
>>> System.OverflowException: Value was either too large or too small for an
>>> Int32. ---> System.OverflowException: Value was either too large or too
>>> small for an Int32.
>>>at System.Convert.ToInt32(Int64 value)
>>>at Microsoft.Mashup.Engine1.Library.Odbc.OdbcDataReaderExtensio
>>> ns.GetNullableInt32(IDataReader reader, Int32 ordinal)
>>>
>>>
>>> Some idea?
>>> In true, I have also problems of connectivity using:Qlik and Sisense
>>> Software
>>>
>>
>>
>


Kylin ODBC

2016-09-13 Thread Alberto Ramón
Hello
I tested New Microsoft Power BI with Kylin
[image: Imágenes integradas 1]
With KYLIN_CAL_DT connector can read data:
[image: Imágenes integradas 2]


But with KYLIN GROUPING AND SALES:
And I have this error

Unexpected error: *Value was either too large or too small for an Int32.*
Details:
Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either
too large or too small for an Int32. --->
Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either too
large or too small for an Int32. --->
Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either too
large or too small for an Int32. --->
Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either too
large or too small for an Int32. --->
Microsoft.Mashup.Evaluator.Interface.ErrorException: Value was either too
large or too small for an Int32. ---> System.OverflowException: Value was
either too large or too small for an Int32. ---> System.OverflowException:
Value was either too large or too small for an Int32.
   at System.Convert.ToInt32(Int64 value)
   at
Microsoft.Mashup.Engine1.Library.Odbc.OdbcDataReaderExtensions.GetNullableInt32(IDataReader
reader, Int32 ordinal)


Some idea?
In true, I have also problems of connectivity using:Qlik and Sisense
Software


Re: Use Apache HUE with Kylin

2016-09-08 Thread Alberto Ramón
>From Hue MailList Link
<https://groups.google.com/a/cloudera.org/forum/#%21topic/hue-user/zhPzJth3h3s>
(Romain Rigaux) :

Great guide!

I updated to refer to it:
http://gethue.com/sql-editor/
http://gethue.com/custom-sql-query-editors/

Thanks!

PS: hope the metadata bug gets picked up at some point ;)


BR, Alberto Ramón

2016-08-30 11:09 GMT+02:00 Luke Han <luke...@gmail.com>:

> Thanks Alberto, your contribution will benefit many people:)
>
>
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Mon, Aug 29, 2016 at 11:38 PM, Alberto Ramón <a.ramonporto...@gmail.com
> > wrote:
>
>> @Luke Han:  fell free to add where you want
>> (I think the integration with HUE, will be very interested due to
>> Notebooks, The people use it a lot in reports of BI or ML, with MarkDown )
>>
>> About Kylin + Hue:
>>   - Hue 3.10 & 3.11, have the same issues. But in the next version
>> ¿3.12? Will solve some bug, and the integration will be better and easier
>>   (Hue 3.10 is in Cloudera 5.8, and 3.11 have less than 10 days old ¡¡,
>> --> It's a very new feature)
>>   - Hue Hue 3228 <https://issues.cloudera.org/browse/HUE-3228>, can be
>> VERY interested but is only in the road map
>>
>> About Kylin + Other programs:
>>   - I'm testing with other programs, when I have final results, I will
>> put on GitHub and MailList
>>
>> BR, Alberto
>>
>> 2016-08-29 14:46 GMT+02:00 Luke Han <luke...@gmail.com>:
>>
>>> Hi Alberto,
>>> Would you mind to share more to create one how to wiki and submit
>>> patch to doc branch? I think this will help a lot for users who want
>>> leverage Hue with Kylin.
>>>
>>> Thanks.
>>> Luke
>>>
>>>
>>> Best Regards!
>>> -
>>>
>>> Luke Han
>>>
>>> On Sun, Aug 28, 2016 at 12:11 PM, hongbin ma <mahong...@apache.org>
>>> wrote:
>>>
>>>> ​thanks Alberto!​
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> *Bin Mahone | 马洪宾*
>>>>
>>>
>>>
>>
>


Re: Documentation on how data is stored

2016-09-06 Thread Alberto Ramón
I dont have more info about this

But,Kylin - 1453 <https://issues.apache.org/jira/browse/KYLIN-1453> v1.5.2
Shardin: must be a great feature  (and affect to to Key Compose)
 - before: used hash of key
 - now: uses hash of column

In true, I have too many doubts  :)

2016-09-06 9:44 GMT+02:00 Something Something <mailinglist...@gmail.com>:

> Hmm... that's a good start... but is there more info available somewhere?
> Can you direct me to that PPT? Thanks.
>
> On Tue, Sep 6, 2016 at 12:16 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> I have this picture: (I found this info in a PPT)
>>
>> [image: Imágenes integradas 1]
>>
>> Remember that you can encode dim, by dictionarty or fix length
>>
>>
>> 2016-09-06 1:57 GMT+02:00 Something Something <mailinglist...@gmail.com>:
>>
>>> Hello,
>>>
>>> Is there any documentation available on how Kylin stores data on HBase?
>>> For example, I am trying to understand how data is stored on HBase when I
>>> run bin/sample.sh to create the "learn_kylin" project.
>>>
>>> I looked at the HBase table for the Cube. It has 2 column families but I
>>> don't understand what goes where in this table after Cube is built.
>>>
>>> I setup 'remote debugging' to debug the code, but the QueryService code
>>> seems to be off between the binary release (
>>> http://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-1.5
>>> .3/apache-kylin-1.5.3-HBase1.x-bin.tar.gz) and the source code (
>>> http://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-1.5
>>> .3/apache-kylin-1.5.3-src.tar.gz)
>>>
>>> I will keep debugging but if any documentation about "how data is
>>> stored" (UML diagram or something) is available, please share.
>>>
>>> Thanks.
>>>
>>
>>
>


Re: Documentation on how data is stored

2016-09-06 Thread Alberto Ramón
I have this picture: (I found this info in a PPT)

[image: Imágenes integradas 1]

Remember that you can encode dim, by dictionarty or fix length


2016-09-06 1:57 GMT+02:00 Something Something :

> Hello,
>
> Is there any documentation available on how Kylin stores data on HBase?
> For example, I am trying to understand how data is stored on HBase when I
> run bin/sample.sh to create the "learn_kylin" project.
>
> I looked at the HBase table for the Cube. It has 2 column families but I
> don't understand what goes where in this table after Cube is built.
>
> I setup 'remote debugging' to debug the code, but the QueryService code
> seems to be off between the binary release (http://www.apache.org/dyn/
> closer.cgi/kylin/apache-kylin-1.5.3/apache-kylin-1.5.3-HBase1.x-bin.tar.gz)
> and the source code (http://www.apache.org/dyn/
> closer.cgi/kylin/apache-kylin-1.5.3/apache-kylin-1.5.3-src.tar.gz)
>
> I will keep debugging but if any documentation about "how data is stored"
> (UML diagram or something) is available, please share.
>
> Thanks.
>


Re: Metadata II

2016-09-05 Thread Alberto Ramón
yes, this is de idea

1º Sisense recognize the ODBC: yes

2º By defatul uses queries like this: select * from [default].[KYLIN_CAL_DT]
this retieve a error in Kylin Web UI (you can copy and paste to check)

And now, try with this: "select * from KYLIN_CAL_DT", will work in Kylin UI

But other combinations Won't work:
  select * from [default].[KYLIN_CAL_DT]
  select * from [KYLIN_CAL_DT]
  select * from default.KYLIN_CAL_DT

BR

2016-09-05 3:24 GMT+02:00 hongbin ma <mahong...@apache.org>:

> ​is sisense using ODBC to connect with Kylin?
> will the query work in kylin's web GUI?​
>
> On Mon, Sep 5, 2016 at 12:45 AM, Alberto Ramón <a.ramonporto...@gmail.com>
> wrote:
>
>> Hello
>>
>> I'm testing Sisense Software <https://www.sisense.com/>, and saw similar
>> integration-problem like with other vendors with "get Metadata"
>>
>> This don't work: (Picture optional)
>> [image: Imágenes integradas 1]
>>
>>
>> select * from KYLIN_CAL_DT ==> OK
>> select * from [KYLIN_CAL_DT]   ==> Error: Encountered "[" at line 1,
>> column 15. Was expecting one of
>> select * from learn_kylin.KYLIN_CAL_DT  ==> Error:
>> select * from default.KYLIN_CAL_DT ==> Error
>>
>>
>> Also this
>> <http://mail-archives.apache.org/mod_mbox/kylin-dev/201608.mbox/%3CCAEcyM16z2fURyB-VWXXG8dxkX9THAE55meS_%2B3FA%3DMxoc-5a6g%40mail.gmail.com%3E>,
>> can be interesting
>>
>> BR, Alberto
>>
>>
>>
>>
>>
>>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
>


Metadata II

2016-09-04 Thread Alberto Ramón
Hello

I'm testing Sisense Software , and saw similar
integration-problem like with other vendors with "get Metadata"

This don't work: (Picture optional)
[image: Imágenes integradas 1]


select * from KYLIN_CAL_DT ==> OK
select * from [KYLIN_CAL_DT]   ==> Error: Encountered "[" at line 1, column
15. Was expecting one of
select * from learn_kylin.KYLIN_CAL_DT  ==> Error:
select * from default.KYLIN_CAL_DT ==> Error


Also this
,
can be interesting

BR, Alberto


Re: Use Apache HUE with Kylin

2016-08-29 Thread Alberto Ramón
@Luke Han:  fell free to add where you want
(I think the integration with HUE, will be very interested due to
Notebooks, The people use it a lot in reports of BI or ML, with MarkDown )

About Kylin + Hue:
  - Hue 3.10 & 3.11, have the same issues. But in the next version  ¿3.12?
Will solve some bug, and the integration will be better and easier
  (Hue 3.10 is in Cloudera 5.8, and 3.11 have less than 10 days old ¡¡, -->
It's a very new feature)
  - Hue Hue 3228 , can be VERY
interested but is only in the road map

About Kylin + Other programs:
  - I'm testing with other programs, when I have final results, I will put
on GitHub and MailList

BR, Alberto

2016-08-29 14:46 GMT+02:00 Luke Han :

> Hi Alberto,
> Would you mind to share more to create one how to wiki and submit
> patch to doc branch? I think this will help a lot for users who want
> leverage Hue with Kylin.
>
> Thanks.
> Luke
>
>
> Best Regards!
> -
>
> Luke Han
>
> On Sun, Aug 28, 2016 at 12:11 PM, hongbin ma  wrote:
>
>> ​thanks Alberto!​
>>
>>
>>
>>
>> --
>> Regards,
>>
>> *Bin Mahone | 马洪宾*
>>
>
>


Use Apache HUE with Kylin

2016-08-27 Thread Alberto Ramón
Hello,

I did a small manual, about How to querie Kylin on HUE

https://github.com/albertoRamon/Kylin/tree/master/KylinWithHue

For any suggestions/bug, *feel free to contact me*

Thanks, Alberto


Re: Re: does kylin support top-N on a count or count distinct measure?

2016-08-09 Thread Alberto Ramón
Hi,

Top-N is usefull for one 'Top 10', but can be useful know the Sum of 'the
Others' (= sum (top > 10))

Example:
  In a Shop Top 10 Sellers,  sold 1.2M $
  How much sold 'the others'?  1.2M $ its a lot respect the others?

I know that this is not easy to implement, but if someboy have any idea ...

2016-08-09 6:19 GMT+02:00 ShaoFeng Shi <shaofeng...@apache.org>:

> Hi Tiansheng,
>
> The less post-aggregation, the better query performance; So for a specific
> query, if the "signle groupby  column topN" need further aggregation to get
> the final result, but "multiple groupby column topN" doesn't, then the
> later one would have better performance.
>
> I didn't compare that, just personal cents; Welcome to do benchmark and
> share with the community :-)
>
> 2016-08-09 11:54 GMT+08:00 张天生 <zhtsh.lic...@gmail.com>:
>
>> I have a question: whether multiple column groupby is better performance
>> than a single column groupby in topN measure. As i known it all can agg
>> other dimensions.
>> Whether it there was performance optimization in mulitple column groupby
>> in topN measure.
>>
>> ShaoFeng Shi <shaofeng...@apache.org>于2016年8月8日周一 下午6:20写道:
>>
>>> Alberto is correct; SUM(1) and multiple columns are implemented in Kylin
>>> core, but from UI you couldn't define that; You need manually edit metadata
>>> for that.
>>>
>>> 2016-08-08 18:02 GMT+08:00 赵天烁 <zhaotians...@meizu.com>:
>>>
>>>> ok,I'll have a try
>>>>
>>>> --
>>>>
>>>> 赵天烁
>>>>
>>>> Kevin Zhao
>>>>
>>>> *zhaotians...@meizu.com <zhaotians...@meizu.com>*
>>>>
>>>>
>>>>
>>>> 珠海市魅族科技有限公司
>>>>
>>>> MEIZU Technology Co., Ltd.
>>>>
>>>> 广东省珠海市科技创新海岸魅族科技楼
>>>>
>>>> MEIZU Tech Bldg., Technology & Innovation Coast
>>>>
>>>> Zhuhai, 519085, Guangdong, China
>>>>
>>>> meizu.com
>>>>
>>>>
>>>> *From:* Alberto Ramón <a.ramonporto...@gmail.com>
>>>> *Date:* 2016-08-08 17:59
>>>> *To:* user@kylin.apache.org
>>>> *CC:* ShaoFeng Shi <shaofeng...@apache.org>
>>>> *Subject:* Re: Re: does kylin support top-N on a count or count
>>>> distinct measure?
>>>> In teorical en v1.5.3, you can Group by 'n' columns:
>>>> https://issues.apache.org/jira/browse/KYLIN-1693
>>>>
>>>> I don't tested 1.5.3 yet, and I don't know if has been implemented in
>>>> UI Kylin, perhaps you can add this columns to JSON manually  :)
>>>>
>>>> BR, Alberto
>>>>
>>>> 2016-08-08 11:37 GMT+02:00 赵天烁 <zhaotians...@meizu.com>:
>>>>
>>>>> SUM(1)? you mean just left ORDER|SUM by Column empty? ,then another
>>>>> prob is I can't configure more than one group by column to it,how to walk
>>>>> around that?
>>>>>
>>>>> --
>>>>>
>>>>> 赵天烁
>>>>>
>>>>> Kevin Zhao
>>>>>
>>>>> *zhaotians...@meizu.com <zhaotians...@meizu.com>*
>>>>>
>>>>>
>>>>>
>>>>> 珠海市魅族科技有限公司
>>>>>
>>>>> MEIZU Technology Co., Ltd.
>>>>>
>>>>> 广东省珠海市科技创新海岸魅族科技楼
>>>>>
>>>>> MEIZU Tech Bldg., Technology & Innovation Coast
>>>>>
>>>>> Zhuhai, 519085, Guangdong, China
>>>>>
>>>>> meizu.com
>>>>>
>>>>>
>>>>> *From:* ShaoFeng Shi <shaofeng...@apache.org>
>>>>> *Date:* 2016-08-08 11:32
>>>>> *To:* user <user@kylin.apache.org>
>>>>> *Subject:* Re: does kylin support top-N on a count or count distinct
>>>>> measure?
>>>>> For sorting on count, you can use SUM(1) as the expression;
>>>>>
>>>>> For sorting on other measure, it is on roadmap: https://issues.apache
>>>>> .org/jira/browse/KYLIN-1377
>>>>>
>>>>> We welcome the community to contribute on such enhancements, anyone
>>>>> want to have a try?
>>>>>
>>>>> 2016-08-05 15:24 GMT+08:00 赵天烁 <zhaotians...@meizu.com>:
>>>>>
>>>>>> right now top-N measure need to specify a sum column,
>>>>>> does kylin support top-N on a count or count distinct measure?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> 赵天烁
>>>>>>
>>>>>> Kevin Zhao
>>>>>>
>>>>>> *zhaotians...@meizu.com <zhaotians...@meizu.com>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> 珠海市魅族科技有限公司
>>>>>>
>>>>>> MEIZU Technology Co., Ltd.
>>>>>>
>>>>>> 广东省珠海市科技创新海岸魅族科技楼
>>>>>>
>>>>>> MEIZU Tech Bldg., Technology & Innovation Coast
>>>>>>
>>>>>> Zhuhai, 519085, Guangdong, China
>>>>>>
>>>>>> meizu.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Shaofeng Shi
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi
>>>
>>>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>


Re: Kylin Deploy in cluster mode failed!

2016-08-06 Thread Alberto Ramón
Hi

 Q1:In Doc put "Notice that only one server can run the job engine(“all”
mode or “job” mode)"

--> How many 'All' servers can you have?
Q2: what is the diference betten: Query +  Engine  and Engine
Q3: Is this automatic load balanced?

BR, Alberto


2016-08-03 4:14 GMT+02:00 张天生 :

> Hi ShaoFeng:
>
> Yes, when i deployed all nodes as "all" mode, it worked fine.
>
> ShaoFeng Shi 于2016年8月2日周二 下午10:54写道:
>
>> Hi Tiansheng,
>>
>> Thanks for the reporting; In 1.5.3 there is some code refactoring, but
>> the cluster deployment method didn't change: only one node be "all" and
>> others be "query". For the error message "scheduler has not been
>> started" in query node, it is a defect (will create a JIRA for that), but
>> that doesn't impact the function.
>>
>> For the second case: making all nodes as "all", there should be only 1
>> running the job engine, others saying "fail to acquire lock, scheduler
>> has not been started", is that true in your case? Because the zookeeper
>> is used as a lock to avoid two job engines be started.
>>
>> 2016-08-02 20:43 GMT+08:00 张天生 :
>>
>>> Btw, i deployed the newest version kylin 1.5.3 version.
>>>
>>> 张天生 于2016年8月2日周二 下午8:41写道:
>>>
 I deployed kylin in cluster mode. There are 1 "all" server and 2
 "query" servers. When i started query server, the following error log
 appeared:

 2016-08-02 20:26:55,855 INFO  [Thread-11] threadpool.DefaultScheduler:178
 : server mode: query, no need to run job scheduler
 2016-08-02 20:26:55,856 ERROR [Thread-11] controller.JobController:91 :
 scheduler has not been started
 2016-08-02 20:26:56,856 ERROR [Thread-11] controller.JobController:91 :
 scheduler has not been started
 2016-08-02 20:26:57,856 ERROR [Thread-11] controller.JobController:91 :
 scheduler has not been started
 2016-08-02 20:26:58,856 ERROR [Thread-11] controller.JobController:91 :
 scheduler has not been started
 2016-08-02 20:26:59,857 ERROR [Thread-11] controller.JobController:91 :
 scheduler has not been started

 However, when i deployed 3 server in "all" kylin.server.mode, it
 worked fine. But offical document said it only one server can run the
 job engine(“all” mode or “job” mode), the others must all be “query” mode.I
 don't know why, can someone tell me the resson?


>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi
>>
>>


Re: Kylin auto refresh is not working

2016-07-28 Thread Alberto Ramón
Hi

Interesting:
https://issues.apache.org/jira/browse/KYLIN-1892

Thanks, Alberto

2016-07-28 8:28 GMT+02:00 ShaoFeng Shi :

> yes you need trigger the build from outside; The simpliest way is using
> Linux cron + curl comand; As that is quite easy to achieve, Kylin didn't
> re-invent the wheel.
>
> This page has the descriptions about the refresh settings, please have a
> check if you haven't :
> https://kylin.apache.org/docs15/tutorial/create_cube.html
> Please come back if it couldn't answer your question.
>
> 2016-07-25 13:03 GMT+08:00 Karthigeyan K :
>
>> Thanks for the reply ShaoFeng. So we need to trigger the cube refresh
>> externally always? Or Is there a way kylin do it automatically.
>> I am confused with this Cube refresh settings . could you enlight me
>> about that.
>>
>> On Fri, Jul 22, 2016 at 4:08 PM, ShaoFeng Shi 
>> wrote:
>>
>>> Kylin exposes RESTful APIs for user to trigger the cube
>>> build/merge/refresh; you can easily integrate it with your scheduling tools
>>> or workflow system, please check
>>> https://kylin.apache.org/docs15/howto/howto_use_restapi.html#build-cube
>>>
>>>
>>>
>>> 2016-07-22 14:31 GMT+08:00 Karthigeyan K :
>>>
 Hi,
I am trying to use incremental cube building .
I have a hive table partitioned by date. I created and built kylin
 cube for it.
I attached the image of my cube refresh settings. I am trying to
 auto merge new data records every 30 minutes.
 I added a new partition(date=2016-07-21) to hive table with 1 more
 records. when I do a count(*) in hive the added rows got reflected in
 result.
 But Kylin cube in not updated even after many hours.
 When I manually click refresh on cube Its updating. But I like to
 happen automatically.
 I don't know what am I missing here, Your Kind help is appreciated.


 [image: Inline image 2]

 Thanks,
 Karthigeyan.


>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi
>>>
>>>
>>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>


Save querie issue

2016-06-23 Thread Alberto Ramón
Hello

I'm testing K 1.5.2.1 : Insight > Save querie Isn't working (do notihin, no
errors)
Somebody can check this issue in their systems?


BR. Alberto