Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-28 Thread Nam Đỗ Duy
Thank you very much Xiaoxiang

Will try your suggestion soon

I’ve presented quite OK and we decided to use Kylin in dev environment
before using it in production environment

Please continue to help us to master it

Thank you again

On Tue, 28 Nov 2023 at 16:06 Xiaoxiang Yu  wrote:

> Sorry for my incorrect answers before. Let me make it right.
>
> Today I tried again and reproduced the issues you reported.
> The Kylin query engine may not read new files because old metadata is
> cached and not be invalidated.
> It is a known issues with proper solution, the solution is calling a rest
> api to refresh meta cache:
> https://kylin.apache.org/5.0/docs/restapi/query_api#Refresh-cached-data
>
> Here is a sample call in my side:
> curl -X PUT --user ADMIN:KYLIN -H "Content-Type:
> application/json;charset=utf-8" -d '{ "tables":
> ["DATABASE_NAME.TABLE_NAME"]}'
> http://localhost:7070/kylin/api/tables/single_catalog_cache
>
> It is caused by a Spark's feature(introduced in 3.1.0) which tries to cache
> HDFS file lists in the spark driver. (
>
> https://spark.apache.org/docs/latest/sql-ref-syntax-aux-cache-refresh-table.html
> ).
> It's configuration entry is spark.sql.metadataCacheTTLSeconds
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Nov 22, 2023 at 6:06 PM Xiaoxiang Yu  wrote:
>
> > It is a good question, I can share some articles with you.
> >
> > 1. How to build a metric repository by Kylin to share among data teams
> (DA,
> > DS, AI), is that the usage of measure in Kylin?
> >
> > I think the metric repository(or metrics store) is actually which Kylin
> > can help. For example,
> > Beike(ke.com) did create an indicator/metrics platform whose backend is
> > Kylin. They created a metrics
> > store on the top of Kylin.
> >
> > The architecture looks like this
> >
> https://mmbiz.qpic.cn/mmbiz_png/9xAoGyC249Kd9icMaNT1Gs7AlDAZic7PScYNCOkSQF8PqbuSLicoxhdk4w3kJtC0bms4FzW6iby08bNiaVsUzUkBPmg/640?wx_fmt=png=5_lazy=1_co=1
> >
> >
> > Here is technical article which wrote in Chinese about it(I am sorry this
> > is not translated):
> >  https://mp.weixin.qq.com/s/hsGjuaYfEfParcgTimBLnw
> >
> >
> > 2. How to use Kylin for the Customer segmentation of Marketing dept?
> >
> > Here are some articles : (sorry again for these are not translated)
> > https://kylin.apache.org/blog/2016/11/28/intersect-count/
> > https://zhuanlan.zhihu.com/p/100131550
> > https://cn.kyligence.io/blog/kylin-chinagreentown-user-portrait-2/
> >
> >
> https://cn.kyligence.io/blog/apache-kylin-count-distinct-application-in-user-behavior-analysis/
> > https://www.infoq.cn/article/xZYe1DUopNA9CzLwau3O
> >
> > You can send your presentation material to me if you are willing to
> share.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Nov 22, 2023 at 5:36 PM Nam Đỗ Duy 
> wrote:
> >
> >> Thank you Xiaoxiang, tomorrow noon is my presentation to the management
> >> about kylin so I am pending this issue to focus on following ones, can
> you
> >> please advise:
> >>
> >> 1. How to build a metric repository by Kylin to share among data teams
> >> (DA,
> >> DS, AI), is that the usage of measure in Kylin?
> >> 2. How to use Kylin for the Customer segmentation of Marketing dept?
> >>
> >>
> >> On Wed, Nov 22, 2023 at 2:10 PM Xiaoxiang Yu  wrote:
> >>
> >> > Before you try again, you can use spark-sql/spark-shell to check if
> the
> >> > data is loaded
> >> > into your table successfully (or if your data is copied to the right
> >> > place).
> >> > Following is how to start a spark-sql/spark-shell in a container.
> >> >
> >> > export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop
> >> >
> >> > cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark
> >> >
> >> > bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn
> >> >
> >> >
> >> > The result of spark-sql/spark-shell should be the same as your
> >> > saw in Kylin insight page. If there are different results for the same
> >> > query,
> >> > which should not happen, please let me know.
> >> >
> >> > Hope you can fix your problem soon.
> >> >
> >> > 
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy 
> >> > wrote:
> >> >
> >> > > Thank you Xiaoxiang, I tried in my place and it worked for the ssb
> >> > database
> >> > > but it didn't work for my own database.
> >> > >
> >> > > It only works if I restart kylin so I guess there might be some
> >> > > configuration miss in my end.
> >> > >
> >> > > Thank you very much anyway and will update next time.
> >> > >
> >> > > Have a good day.
> >> > >
> >> > > On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu 
> wrote:
> >> > >
> >> > > > I did an easy test to verify if kylin has any bugs for the push
> down
> >> > > > function. And the push
> >> > > > down function works as expected without any mistakes. So I'm 99%
> >> > certain
> >> > > > that
> >> > > > your step "I loaded the incremental data 

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-28 Thread Xiaoxiang Yu
Sorry for my incorrect answers before. Let me make it right.

Today I tried again and reproduced the issues you reported.
The Kylin query engine may not read new files because old metadata is
cached and not be invalidated.
It is a known issues with proper solution, the solution is calling a rest
api to refresh meta cache:
https://kylin.apache.org/5.0/docs/restapi/query_api#Refresh-cached-data

Here is a sample call in my side:
curl -X PUT --user ADMIN:KYLIN -H "Content-Type:
application/json;charset=utf-8" -d '{ "tables":
["DATABASE_NAME.TABLE_NAME"]}'
http://localhost:7070/kylin/api/tables/single_catalog_cache

It is caused by a Spark's feature(introduced in 3.1.0) which tries to cache
HDFS file lists in the spark driver. (
https://spark.apache.org/docs/latest/sql-ref-syntax-aux-cache-refresh-table.html).
It's configuration entry is spark.sql.metadataCacheTTLSeconds


With warm regard
Xiaoxiang Yu



On Wed, Nov 22, 2023 at 6:06 PM Xiaoxiang Yu  wrote:

> It is a good question, I can share some articles with you.
>
> 1. How to build a metric repository by Kylin to share among data teams (DA,
> DS, AI), is that the usage of measure in Kylin?
>
> I think the metric repository(or metrics store) is actually which Kylin
> can help. For example,
> Beike(ke.com) did create an indicator/metrics platform whose backend is
> Kylin. They created a metrics
> store on the top of Kylin.
>
> The architecture looks like this
> https://mmbiz.qpic.cn/mmbiz_png/9xAoGyC249Kd9icMaNT1Gs7AlDAZic7PScYNCOkSQF8PqbuSLicoxhdk4w3kJtC0bms4FzW6iby08bNiaVsUzUkBPmg/640?wx_fmt=png=5_lazy=1_co=1
>
>
> Here is technical article which wrote in Chinese about it(I am sorry this
> is not translated):
>  https://mp.weixin.qq.com/s/hsGjuaYfEfParcgTimBLnw
>
>
> 2. How to use Kylin for the Customer segmentation of Marketing dept?
>
> Here are some articles : (sorry again for these are not translated)
> https://kylin.apache.org/blog/2016/11/28/intersect-count/
> https://zhuanlan.zhihu.com/p/100131550
> https://cn.kyligence.io/blog/kylin-chinagreentown-user-portrait-2/
>
> https://cn.kyligence.io/blog/apache-kylin-count-distinct-application-in-user-behavior-analysis/
> https://www.infoq.cn/article/xZYe1DUopNA9CzLwau3O
>
> You can send your presentation material to me if you are willing to share.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Nov 22, 2023 at 5:36 PM Nam Đỗ Duy  wrote:
>
>> Thank you Xiaoxiang, tomorrow noon is my presentation to the management
>> about kylin so I am pending this issue to focus on following ones, can you
>> please advise:
>>
>> 1. How to build a metric repository by Kylin to share among data teams
>> (DA,
>> DS, AI), is that the usage of measure in Kylin?
>> 2. How to use Kylin for the Customer segmentation of Marketing dept?
>>
>>
>> On Wed, Nov 22, 2023 at 2:10 PM Xiaoxiang Yu  wrote:
>>
>> > Before you try again, you can use spark-sql/spark-shell to check if the
>> > data is loaded
>> > into your table successfully (or if your data is copied to the right
>> > place).
>> > Following is how to start a spark-sql/spark-shell in a container.
>> >
>> > export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop
>> >
>> > cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark
>> >
>> > bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn
>> >
>> >
>> > The result of spark-sql/spark-shell should be the same as your
>> > saw in Kylin insight page. If there are different results for the same
>> > query,
>> > which should not happen, please let me know.
>> >
>> > Hope you can fix your problem soon.
>> >
>> > 
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy 
>> > wrote:
>> >
>> > > Thank you Xiaoxiang, I tried in my place and it worked for the ssb
>> > database
>> > > but it didn't work for my own database.
>> > >
>> > > It only works if I restart kylin so I guess there might be some
>> > > configuration miss in my end.
>> > >
>> > > Thank you very much anyway and will update next time.
>> > >
>> > > Have a good day.
>> > >
>> > > On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu  wrote:
>> > >
>> > > > I did an easy test to verify if kylin has any bugs for the push down
>> > > > function. And the push
>> > > > down function works as expected without any mistakes. So I'm 99%
>> > certain
>> > > > that
>> > > > your step "I loaded the incremental data into Hive already" does not
>> > > work.
>> > > >
>> > > > Here are my steps(you can reproduce in a fresh Kylin5 docker
>> container
>> > in
>> > > > one minute) :
>> > > >
>> > > > 1. Query `select count(*) from SSB.DATES` in project ssb without
>> > building
>> > > > any index.
>> > > > Query result(Answered By: HIVE) is :   2556
>> > > >
>> > > > 2. Duplicate the file of table `ssb.dates` by following command:
>> > > > hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
>> > > > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
>> > 

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-22 Thread Xiaoxiang Yu
It is a good question, I can share some articles with you.

1. How to build a metric repository by Kylin to share among data teams (DA,
DS, AI), is that the usage of measure in Kylin?

I think the metric repository(or metrics store) is actually which Kylin can
help. For example,
Beike(ke.com) did create an indicator/metrics platform whose backend is
Kylin. They created a metrics
store on the top of Kylin.

The architecture looks like this
https://mmbiz.qpic.cn/mmbiz_png/9xAoGyC249Kd9icMaNT1Gs7AlDAZic7PScYNCOkSQF8PqbuSLicoxhdk4w3kJtC0bms4FzW6iby08bNiaVsUzUkBPmg/640?wx_fmt=png=5_lazy=1_co=1


Here is technical article which wrote in Chinese about it(I am sorry this
is not translated):
 https://mp.weixin.qq.com/s/hsGjuaYfEfParcgTimBLnw


2. How to use Kylin for the Customer segmentation of Marketing dept?

Here are some articles : (sorry again for these are not translated)
https://kylin.apache.org/blog/2016/11/28/intersect-count/
https://zhuanlan.zhihu.com/p/100131550
https://cn.kyligence.io/blog/kylin-chinagreentown-user-portrait-2/
https://cn.kyligence.io/blog/apache-kylin-count-distinct-application-in-user-behavior-analysis/
https://www.infoq.cn/article/xZYe1DUopNA9CzLwau3O

You can send your presentation material to me if you are willing to share.


With warm regard
Xiaoxiang Yu



On Wed, Nov 22, 2023 at 5:36 PM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang, tomorrow noon is my presentation to the management
> about kylin so I am pending this issue to focus on following ones, can you
> please advise:
>
> 1. How to build a metric repository by Kylin to share among data teams (DA,
> DS, AI), is that the usage of measure in Kylin?
> 2. How to use Kylin for the Customer segmentation of Marketing dept?
>
>
> On Wed, Nov 22, 2023 at 2:10 PM Xiaoxiang Yu  wrote:
>
> > Before you try again, you can use spark-sql/spark-shell to check if the
> > data is loaded
> > into your table successfully (or if your data is copied to the right
> > place).
> > Following is how to start a spark-sql/spark-shell in a container.
> >
> > export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop
> >
> > cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark
> >
> > bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn
> >
> >
> > The result of spark-sql/spark-shell should be the same as your
> > saw in Kylin insight page. If there are different results for the same
> > query,
> > which should not happen, please let me know.
> >
> > Hope you can fix your problem soon.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy 
> > wrote:
> >
> > > Thank you Xiaoxiang, I tried in my place and it worked for the ssb
> > database
> > > but it didn't work for my own database.
> > >
> > > It only works if I restart kylin so I guess there might be some
> > > configuration miss in my end.
> > >
> > > Thank you very much anyway and will update next time.
> > >
> > > Have a good day.
> > >
> > > On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu  wrote:
> > >
> > > > I did an easy test to verify if kylin has any bugs for the push down
> > > > function. And the push
> > > > down function works as expected without any mistakes. So I'm 99%
> > certain
> > > > that
> > > > your step "I loaded the incremental data into Hive already" does not
> > > work.
> > > >
> > > > Here are my steps(you can reproduce in a fresh Kylin5 docker
> container
> > in
> > > > one minute) :
> > > >
> > > > 1. Query `select count(*) from SSB.DATES` in project ssb without
> > building
> > > > any index.
> > > > Query result(Answered By: HIVE) is :   2556
> > > >
> > > > 2. Duplicate the file of table `ssb.dates` by following command:
> > > > hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> > > > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
> > > >
> > > > 3. Re-query `select count(*) from SSB.DATES` in project ssb
> > > > Query result(Answered By: HIVE) is :  5112
> > > >
> > > > So, it is clear that the second query incremental data can be found
> by
> > > the
> > > > Kylin query engine.
> > > >
> > > > Finally, to make good use of Kylin in real use cases, good knowledge
> of
> > > > Apache Spark
> > > > and Apache Hadoop is a must-to-have.
> > > >
> > > > 
> > > > With warm regard
> > > > Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy 
> > > wrote:
> > > >
> > > > > Have a nice weekend Xiaoxiang, and thank you for helping me to
> > become a
> > > > > kylin's fan
> > > > >
> > > > > You are right I am not familiar with Kylin enough and have little
> > > > > background of the hadoop system so I will double check here
> carefully
> > > > > before
> > > > > future questions. However I did understand the following mechanism
> > > > > in quotes.
> > > > >
> > > > > quoted
> > > > >
> > > > > If incremental data is not loaded into Kylin, Kylin can still
> answer
> 

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-22 Thread Nam Đỗ Duy
Thank you Xiaoxiang, tomorrow noon is my presentation to the management
about kylin so I am pending this issue to focus on following ones, can you
please advise:

1. How to build a metric repository by Kylin to share among data teams (DA,
DS, AI), is that the usage of measure in Kylin?
2. How to use Kylin for the Customer segmentation of Marketing dept?


On Wed, Nov 22, 2023 at 2:10 PM Xiaoxiang Yu  wrote:

> Before you try again, you can use spark-sql/spark-shell to check if the
> data is loaded
> into your table successfully (or if your data is copied to the right
> place).
> Following is how to start a spark-sql/spark-shell in a container.
>
> export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop
>
> cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark
>
> bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn
>
>
> The result of spark-sql/spark-shell should be the same as your
> saw in Kylin insight page. If there are different results for the same
> query,
> which should not happen, please let me know.
>
> Hope you can fix your problem soon.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy 
> wrote:
>
> > Thank you Xiaoxiang, I tried in my place and it worked for the ssb
> database
> > but it didn't work for my own database.
> >
> > It only works if I restart kylin so I guess there might be some
> > configuration miss in my end.
> >
> > Thank you very much anyway and will update next time.
> >
> > Have a good day.
> >
> > On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu  wrote:
> >
> > > I did an easy test to verify if kylin has any bugs for the push down
> > > function. And the push
> > > down function works as expected without any mistakes. So I'm 99%
> certain
> > > that
> > > your step "I loaded the incremental data into Hive already" does not
> > work.
> > >
> > > Here are my steps(you can reproduce in a fresh Kylin5 docker container
> in
> > > one minute) :
> > >
> > > 1. Query `select count(*) from SSB.DATES` in project ssb without
> building
> > > any index.
> > > Query result(Answered By: HIVE) is :   2556
> > >
> > > 2. Duplicate the file of table `ssb.dates` by following command:
> > > hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> > > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
> > >
> > > 3. Re-query `select count(*) from SSB.DATES` in project ssb
> > > Query result(Answered By: HIVE) is :  5112
> > >
> > > So, it is clear that the second query incremental data can be found by
> > the
> > > Kylin query engine.
> > >
> > > Finally, to make good use of Kylin in real use cases, good knowledge of
> > > Apache Spark
> > > and Apache Hadoop is a must-to-have.
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy 
> > wrote:
> > >
> > > > Have a nice weekend Xiaoxiang, and thank you for helping me to
> become a
> > > > kylin's fan
> > > >
> > > > You are right I am not familiar with Kylin enough and have little
> > > > background of the hadoop system so I will double check here carefully
> > > > before
> > > > future questions. However I did understand the following mechanism
> > > > in quotes.
> > > >
> > > > quoted
> > > >
> > > > If incremental data is not loaded into Kylin, Kylin can still answer
> > such
> > > > queries by
> > > > reading the original hive table, but the query is not accelerated.
> > > >
> > > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > > reading the special Index/Cuboid files, and the query will be
> > > accelerated.
> > > >
> > > > end
> > > >
> > > > I explain my previous question that was as follows:
> > > >
> > > > 1. I turned off this configuration kylin.query.cache-enabled (set =
> > > false)
> > > > 2. Restart Kylin
> > > > 3. I loaded the incremental data into Hive already
> > > > 4. Turn on Pushdown option to query Hive not model
> > > > 5. In Kylin Insights window, I still cannot get the incremental data
> > > (which
> > > > has been in Hive already)
> > > >
> > > > That was the reason why I asked you: can I get the incremental result
> > by
> > > > above 5 steps (without model and index) or do I need to create model
> > and
> > > > index and segment then I can  get the incremental result by creating
> a
> > > new
> > > > segment according to incremental data?
> > > >
> > > > Hope you get my point or I will explain more
> > > >
> > > > Thank you very much again
> > > >
> > > >
> > > > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu  wrote:
> > > >
> > > > > Unfortunately, I guess you are not asking good questions.
> > > > > If the answer of a question can be searched on the Internet,
> > > > > it is not recommended to ask it in the mailing list. I guess you
> > > > > didn't know how Kylin works, so you need to search for documents
> > > > >  or some tutorials.
> > > > >
> > > > > 

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-21 Thread Xiaoxiang Yu
Before you try again, you can use spark-sql/spark-shell to check if the
data is loaded
into your table successfully (or if your data is copied to the right place).
Following is how to start a spark-sql/spark-shell in a container.

export HADOOP_CONF_DIR=/opt/hadoop-3.2.1/etc/hadoop

cd /home/kylin/apache-kylin-5.0.0-beta-bin/spark

bin/spark-shell --executor-cores 1 --num-executors 1 --master yarn


The result of spark-sql/spark-shell should be the same as your
saw in Kylin insight page. If there are different results for the same
query,
which should not happen, please let me know.

Hope you can fix your problem soon.


With warm regard
Xiaoxiang Yu



On Wed, Nov 22, 2023 at 11:59 AM Nam Đỗ Duy  wrote:

> Thank you Xiaoxiang, I tried in my place and it worked for the ssb database
> but it didn't work for my own database.
>
> It only works if I restart kylin so I guess there might be some
> configuration miss in my end.
>
> Thank you very much anyway and will update next time.
>
> Have a good day.
>
> On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu  wrote:
>
> > I did an easy test to verify if kylin has any bugs for the push down
> > function. And the push
> > down function works as expected without any mistakes. So I'm 99% certain
> > that
> > your step "I loaded the incremental data into Hive already" does not
> work.
> >
> > Here are my steps(you can reproduce in a fresh Kylin5 docker container in
> > one minute) :
> >
> > 1. Query `select count(*) from SSB.DATES` in project ssb without building
> > any index.
> > Query result(Answered By: HIVE) is :   2556
> >
> > 2. Duplicate the file of table `ssb.dates` by following command:
> > hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> > /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
> >
> > 3. Re-query `select count(*) from SSB.DATES` in project ssb
> > Query result(Answered By: HIVE) is :  5112
> >
> > So, it is clear that the second query incremental data can be found by
> the
> > Kylin query engine.
> >
> > Finally, to make good use of Kylin in real use cases, good knowledge of
> > Apache Spark
> > and Apache Hadoop is a must-to-have.
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy 
> wrote:
> >
> > > Have a nice weekend Xiaoxiang, and thank you for helping me to become a
> > > kylin's fan
> > >
> > > You are right I am not familiar with Kylin enough and have little
> > > background of the hadoop system so I will double check here carefully
> > > before
> > > future questions. However I did understand the following mechanism
> > > in quotes.
> > >
> > > quoted
> > >
> > > If incremental data is not loaded into Kylin, Kylin can still answer
> such
> > > queries by
> > > reading the original hive table, but the query is not accelerated.
> > >
> > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > reading the special Index/Cuboid files, and the query will be
> > accelerated.
> > >
> > > end
> > >
> > > I explain my previous question that was as follows:
> > >
> > > 1. I turned off this configuration kylin.query.cache-enabled (set =
> > false)
> > > 2. Restart Kylin
> > > 3. I loaded the incremental data into Hive already
> > > 4. Turn on Pushdown option to query Hive not model
> > > 5. In Kylin Insights window, I still cannot get the incremental data
> > (which
> > > has been in Hive already)
> > >
> > > That was the reason why I asked you: can I get the incremental result
> by
> > > above 5 steps (without model and index) or do I need to create model
> and
> > > index and segment then I can  get the incremental result by creating a
> > new
> > > segment according to incremental data?
> > >
> > > Hope you get my point or I will explain more
> > >
> > > Thank you very much again
> > >
> > >
> > > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu  wrote:
> > >
> > > > Unfortunately, I guess you are not asking good questions.
> > > > If the answer of a question can be searched on the Internet,
> > > > it is not recommended to ask it in the mailing list. I guess you
> > > > didn't know how Kylin works, so you need to search for documents
> > > >  or some tutorials.
> > > >
> > > > What does 'get the incremental data from Hive into Kylin' means?
> Kylin
> > > > fully relies
> > > > on Apache Spark for execution.
> > > >
> > > > If incremental data is not loaded into Kylin, Kylin can still answer
> > such
> > > > queries by
> > > > reading the original hive table, but the query is not accelerated.
> > > >
> > > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > > reading the special Index/Cuboid files, and the query will be
> > > accelerated.
> > > >
> > > >
> > > > 
> > > > With warm regard
> > > > Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy 
> > > wrote:

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-21 Thread Nam Đỗ Duy
Thank you Xiaoxiang, I tried in my place and it worked for the ssb database
but it didn't work for my own database.

It only works if I restart kylin so I guess there might be some
configuration miss in my end.

Thank you very much anyway and will update next time.

Have a good day.

On Fri, Nov 17, 2023 at 5:34 PM Xiaoxiang Yu  wrote:

> I did an easy test to verify if kylin has any bugs for the push down
> function. And the push
> down function works as expected without any mistakes. So I'm 99% certain
> that
> your step "I loaded the incremental data into Hive already" does not work.
>
> Here are my steps(you can reproduce in a fresh Kylin5 docker container in
> one minute) :
>
> 1. Query `select count(*) from SSB.DATES` in project ssb without building
> any index.
> Query result(Answered By: HIVE) is :   2556
>
> 2. Duplicate the file of table `ssb.dates` by following command:
> hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
> /user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv
>
> 3. Re-query `select count(*) from SSB.DATES` in project ssb
> Query result(Answered By: HIVE) is :  5112
>
> So, it is clear that the second query incremental data can be found by the
> Kylin query engine.
>
> Finally, to make good use of Kylin in real use cases, good knowledge of
> Apache Spark
> and Apache Hadoop is a must-to-have.
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy  wrote:
>
> > Have a nice weekend Xiaoxiang, and thank you for helping me to become a
> > kylin's fan
> >
> > You are right I am not familiar with Kylin enough and have little
> > background of the hadoop system so I will double check here carefully
> > before
> > future questions. However I did understand the following mechanism
> > in quotes.
> >
> > quoted
> >
> > If incremental data is not loaded into Kylin, Kylin can still answer such
> > queries by
> > reading the original hive table, but the query is not accelerated.
> >
> > If incremental data is loaded into Kylin, Kylin can answer queries by
> > reading the special Index/Cuboid files, and the query will be
> accelerated.
> >
> > end
> >
> > I explain my previous question that was as follows:
> >
> > 1. I turned off this configuration kylin.query.cache-enabled (set =
> false)
> > 2. Restart Kylin
> > 3. I loaded the incremental data into Hive already
> > 4. Turn on Pushdown option to query Hive not model
> > 5. In Kylin Insights window, I still cannot get the incremental data
> (which
> > has been in Hive already)
> >
> > That was the reason why I asked you: can I get the incremental result by
> > above 5 steps (without model and index) or do I need to create model and
> > index and segment then I can  get the incremental result by creating a
> new
> > segment according to incremental data?
> >
> > Hope you get my point or I will explain more
> >
> > Thank you very much again
> >
> >
> > On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu  wrote:
> >
> > > Unfortunately, I guess you are not asking good questions.
> > > If the answer of a question can be searched on the Internet,
> > > it is not recommended to ask it in the mailing list. I guess you
> > > didn't know how Kylin works, so you need to search for documents
> > >  or some tutorials.
> > >
> > > What does 'get the incremental data from Hive into Kylin' means? Kylin
> > > fully relies
> > > on Apache Spark for execution.
> > >
> > > If incremental data is not loaded into Kylin, Kylin can still answer
> such
> > > queries by
> > > reading the original hive table, but the query is not accelerated.
> > >
> > > If incremental data is loaded into Kylin, Kylin can answer queries by
> > > reading the special Index/Cuboid files, and the query will be
> > accelerated.
> > >
> > >
> > > 
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy 
> > wrote:
> > >
> > > > Hi Xiaoxiang,
> > > >
> > > > Do I really need to create a model in order to get the incremental
> data
> > > > from Hive into Kylin?
> > > >
> > > > Can I query the incremental data of a pure dim/fact table without a
> > > model?
> > > >
> > > > Thank you very much
> > > >
> > > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu 
> wrote:
> > > >
> > > > > I am not really sure. But I think it is the Query cache make your
> > query
> > > > > result unchanged.
> > > > >
> > > > >
> > > > > The config entry is kylin.query.cache-enabled , is turn on by
> > default.
> > > > > This doc links is
> > > > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best wishes to you !
> > > > > From :Xiaoxiang Yu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy" 
> wrote:
> > > > > >Hello Team, hello Xiaoxiang, can you please help me with this
> 

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-17 Thread Xiaoxiang Yu
I did an easy test to verify if kylin has any bugs for the push down
function. And the push
down function works as expected without any mistakes. So I'm 99% certain
that
your step "I loaded the incremental data into Hive already" does not work.

Here are my steps(you can reproduce in a fresh Kylin5 docker container in
one minute) :

1. Query `select count(*) from SSB.DATES` in project ssb without building
any index.
Query result(Answered By: HIVE) is :   2556

2. Duplicate the file of table `ssb.dates` by following command:
hadoop fs -cp /user/hive/warehouse/ssb.db/dates/SSB.DATES.csv
/user/hive/warehouse/ssb.db/dates/SSB.DATES-2.csv

3. Re-query `select count(*) from SSB.DATES` in project ssb
Query result(Answered By: HIVE) is :  5112

So, it is clear that the second query incremental data can be found by the
Kylin query engine.

Finally, to make good use of Kylin in real use cases, good knowledge of
Apache Spark
and Apache Hadoop is a must-to-have.


With warm regard
Xiaoxiang Yu



On Fri, Nov 17, 2023 at 5:52 PM Nam Đỗ Duy  wrote:

> Have a nice weekend Xiaoxiang, and thank you for helping me to become a
> kylin's fan
>
> You are right I am not familiar with Kylin enough and have little
> background of the hadoop system so I will double check here carefully
> before
> future questions. However I did understand the following mechanism
> in quotes.
>
> quoted
>
> If incremental data is not loaded into Kylin, Kylin can still answer such
> queries by
> reading the original hive table, but the query is not accelerated.
>
> If incremental data is loaded into Kylin, Kylin can answer queries by
> reading the special Index/Cuboid files, and the query will be accelerated.
>
> end
>
> I explain my previous question that was as follows:
>
> 1. I turned off this configuration kylin.query.cache-enabled (set = false)
> 2. Restart Kylin
> 3. I loaded the incremental data into Hive already
> 4. Turn on Pushdown option to query Hive not model
> 5. In Kylin Insights window, I still cannot get the incremental data (which
> has been in Hive already)
>
> That was the reason why I asked you: can I get the incremental result by
> above 5 steps (without model and index) or do I need to create model and
> index and segment then I can  get the incremental result by creating a new
> segment according to incremental data?
>
> Hope you get my point or I will explain more
>
> Thank you very much again
>
>
> On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu  wrote:
>
> > Unfortunately, I guess you are not asking good questions.
> > If the answer of a question can be searched on the Internet,
> > it is not recommended to ask it in the mailing list. I guess you
> > didn't know how Kylin works, so you need to search for documents
> >  or some tutorials.
> >
> > What does 'get the incremental data from Hive into Kylin' means? Kylin
> > fully relies
> > on Apache Spark for execution.
> >
> > If incremental data is not loaded into Kylin, Kylin can still answer such
> > queries by
> > reading the original hive table, but the query is not accelerated.
> >
> > If incremental data is loaded into Kylin, Kylin can answer queries by
> > reading the special Index/Cuboid files, and the query will be
> accelerated.
> >
> >
> > 
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy 
> wrote:
> >
> > > Hi Xiaoxiang,
> > >
> > > Do I really need to create a model in order to get the incremental data
> > > from Hive into Kylin?
> > >
> > > Can I query the incremental data of a pure dim/fact table without a
> > model?
> > >
> > > Thank you very much
> > >
> > > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu  wrote:
> > >
> > > > I am not really sure. But I think it is the Query cache make your
> query
> > > > result unchanged.
> > > >
> > > >
> > > > The config entry is kylin.query.cache-enabled , is turn on by
> default.
> > > > This doc links is
> > > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best wishes to you !
> > > > From :Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2023-11-17 09:48:55, "Nam Đỗ Duy"  wrote:
> > > > >Hello Team, hello Xiaoxiang, can you please help me with this urgent
> > > > >issue...
> > > > >
> > > > >(this is public email group so in general I neglect your specific
> name
> > > > from
> > > > >greeting of first email in the threads, but in fact most of time
> > > Xiaoxiang
> > > > >actively answers my issues, thank you very much)
> > > > >
> > > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy  wrote:
> > > > >
> > > > >> Dear Dev Team, please kindly advise this scenario
> > > > >>
> > > > >> 1. I have a fact table and I use Kylin insights window to query it
> > and
> > > > get
> > > > >> 5 million rows.
> > > > >>
> > > > >> 2. Then I use following command to load X rows (last 

Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-17 Thread Nam Đỗ Duy
Have a nice weekend Xiaoxiang, and thank you for helping me to become a
kylin's fan

You are right I am not familiar with Kylin enough and have little
background of the hadoop system so I will double check here carefully before
future questions. However I did understand the following mechanism
in quotes.

quoted

If incremental data is not loaded into Kylin, Kylin can still answer such
queries by
reading the original hive table, but the query is not accelerated.

If incremental data is loaded into Kylin, Kylin can answer queries by
reading the special Index/Cuboid files, and the query will be accelerated.

end

I explain my previous question that was as follows:

1. I turned off this configuration kylin.query.cache-enabled (set = false)
2. Restart Kylin
3. I loaded the incremental data into Hive already
4. Turn on Pushdown option to query Hive not model
5. In Kylin Insights window, I still cannot get the incremental data (which
has been in Hive already)

That was the reason why I asked you: can I get the incremental result by
above 5 steps (without model and index) or do I need to create model and
index and segment then I can  get the incremental result by creating a new
segment according to incremental data?

Hope you get my point or I will explain more

Thank you very much again


On Fri, 17 Nov 2023 at 16:00 Xiaoxiang Yu  wrote:

> Unfortunately, I guess you are not asking good questions.
> If the answer of a question can be searched on the Internet,
> it is not recommended to ask it in the mailing list. I guess you
> didn't know how Kylin works, so you need to search for documents
>  or some tutorials.
>
> What does 'get the incremental data from Hive into Kylin' means? Kylin
> fully relies
> on Apache Spark for execution.
>
> If incremental data is not loaded into Kylin, Kylin can still answer such
> queries by
> reading the original hive table, but the query is not accelerated.
>
> If incremental data is loaded into Kylin, Kylin can answer queries by
> reading the special Index/Cuboid files, and the query will be accelerated.
>
>
> 
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy  wrote:
>
> > Hi Xiaoxiang,
> >
> > Do I really need to create a model in order to get the incremental data
> > from Hive into Kylin?
> >
> > Can I query the incremental data of a pure dim/fact table without a
> model?
> >
> > Thank you very much
> >
> > On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu  wrote:
> >
> > > I am not really sure. But I think it is the Query cache make your query
> > > result unchanged.
> > >
> > >
> > > The config entry is kylin.query.cache-enabled , is turn on by default.
> > > This doc links is
> > > https://kylin.apache.org/5.0/docs/configuration/query_cache
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Best wishes to you !
> > > From :Xiaoxiang Yu
> > >
> > >
> > >
> > >
> > >
> > > At 2023-11-17 09:48:55, "Nam Đỗ Duy"  wrote:
> > > >Hello Team, hello Xiaoxiang, can you please help me with this urgent
> > > >issue...
> > > >
> > > >(this is public email group so in general I neglect your specific name
> > > from
> > > >greeting of first email in the threads, but in fact most of time
> > Xiaoxiang
> > > >actively answers my issues, thank you very much)
> > > >
> > > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy  wrote:
> > > >
> > > >> Dear Dev Team, please kindly advise this scenario
> > > >>
> > > >> 1. I have a fact table and I use Kylin insights window to query it
> and
> > > get
> > > >> 5 million rows.
> > > >>
> > > >> 2. Then I use following command to load X rows (last hour data) from
> > > >> parquet into Hive table
> > > >>
> > > >> LOAD DATA LOCAL INPATH
> > > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
> > > >> factUserEvent;
> > > >>
> > > >> 3. Then I open Kylin insights window to query it but it still
> returned
> > > >> previous number (5 million rows) not adding the last hour data of X
> > rows
> > > >> which I previously loaded from parquet into hive in step 2)
> > > >>
> > > >> Can you advise the way to make table refresh and updated?
> > > >>
> > > >> Thank you very much
> > > >>
> > >
> >
>


Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-17 Thread Xiaoxiang Yu
Unfortunately, I guess you are not asking good questions.
If the answer of a question can be searched on the Internet,
it is not recommended to ask it in the mailing list. I guess you
didn't know how Kylin works, so you need to search for documents
 or some tutorials.

What does 'get the incremental data from Hive into Kylin' means? Kylin
fully relies
on Apache Spark for execution.

If incremental data is not loaded into Kylin, Kylin can still answer such
queries by
reading the original hive table, but the query is not accelerated.

If incremental data is loaded into Kylin, Kylin can answer queries by
reading the special Index/Cuboid files, and the query will be accelerated.



With warm regard
Xiaoxiang Yu



On Fri, Nov 17, 2023 at 4:36 PM Nam Đỗ Duy  wrote:

> Hi Xiaoxiang,
>
> Do I really need to create a model in order to get the incremental data
> from Hive into Kylin?
>
> Can I query the incremental data of a pure dim/fact table without a model?
>
> Thank you very much
>
> On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu  wrote:
>
> > I am not really sure. But I think it is the Query cache make your query
> > result unchanged.
> >
> >
> > The config entry is kylin.query.cache-enabled , is turn on by default.
> > This doc links is
> > https://kylin.apache.org/5.0/docs/configuration/query_cache
> >
> >
> >
> >
> > --
> >
> > Best wishes to you !
> > From :Xiaoxiang Yu
> >
> >
> >
> >
> >
> > At 2023-11-17 09:48:55, "Nam Đỗ Duy"  wrote:
> > >Hello Team, hello Xiaoxiang, can you please help me with this urgent
> > >issue...
> > >
> > >(this is public email group so in general I neglect your specific name
> > from
> > >greeting of first email in the threads, but in fact most of time
> Xiaoxiang
> > >actively answers my issues, thank you very much)
> > >
> > >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy  wrote:
> > >
> > >> Dear Dev Team, please kindly advise this scenario
> > >>
> > >> 1. I have a fact table and I use Kylin insights window to query it and
> > get
> > >> 5 million rows.
> > >>
> > >> 2. Then I use following command to load X rows (last hour data) from
> > >> parquet into Hive table
> > >>
> > >> LOAD DATA LOCAL INPATH
> > >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
> > >> factUserEvent;
> > >>
> > >> 3. Then I open Kylin insights window to query it but it still returned
> > >> previous number (5 million rows) not adding the last hour data of X
> rows
> > >> which I previously loaded from parquet into hive in step 2)
> > >>
> > >> Can you advise the way to make table refresh and updated?
> > >>
> > >> Thank you very much
> > >>
> >
>


Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-17 Thread Nam Đỗ Duy
Hi Xiaoxiang,

Do I really need to create a model in order to get the incremental data
from Hive into Kylin?

Can I query the incremental data of a pure dim/fact table without a model?

Thank you very much

On Fri, Nov 17, 2023 at 9:05 AM Xiaoxiang Yu  wrote:

> I am not really sure. But I think it is the Query cache make your query
> result unchanged.
>
>
> The config entry is kylin.query.cache-enabled , is turn on by default.
> This doc links is
> https://kylin.apache.org/5.0/docs/configuration/query_cache
>
>
>
>
> --
>
> Best wishes to you !
> From :Xiaoxiang Yu
>
>
>
>
>
> At 2023-11-17 09:48:55, "Nam Đỗ Duy"  wrote:
> >Hello Team, hello Xiaoxiang, can you please help me with this urgent
> >issue...
> >
> >(this is public email group so in general I neglect your specific name
> from
> >greeting of first email in the threads, but in fact most of time Xiaoxiang
> >actively answers my issues, thank you very much)
> >
> >On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy  wrote:
> >
> >> Dear Dev Team, please kindly advise this scenario
> >>
> >> 1. I have a fact table and I use Kylin insights window to query it and
> get
> >> 5 million rows.
> >>
> >> 2. Then I use following command to load X rows (last hour data) from
> >> parquet into Hive table
> >>
> >> LOAD DATA LOCAL INPATH
> >> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
> >> factUserEvent;
> >>
> >> 3. Then I open Kylin insights window to query it but it still returned
> >> previous number (5 million rows) not adding the last hour data of X rows
> >> which I previously loaded from parquet into hive in step 2)
> >>
> >> Can you advise the way to make table refresh and updated?
> >>
> >> Thank you very much
> >>
>


Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-16 Thread Xiaoxiang Yu
I am not really sure. But I think it is the Query cache make your query result 
unchanged.


The config entry is kylin.query.cache-enabled , is turn on by default. 
This doc links is https://kylin.apache.org/5.0/docs/configuration/query_cache




--

Best wishes to you ! 
From :Xiaoxiang Yu





At 2023-11-17 09:48:55, "Nam Đỗ Duy"  wrote:
>Hello Team, hello Xiaoxiang, can you please help me with this urgent
>issue...
>
>(this is public email group so in general I neglect your specific name from
>greeting of first email in the threads, but in fact most of time Xiaoxiang
>actively answers my issues, thank you very much)
>
>On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy  wrote:
>
>> Dear Dev Team, please kindly advise this scenario
>>
>> 1. I have a fact table and I use Kylin insights window to query it and get
>> 5 million rows.
>>
>> 2. Then I use following command to load X rows (last hour data) from
>> parquet into Hive table
>>
>> LOAD DATA LOCAL INPATH
>> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
>> factUserEvent;
>>
>> 3. Then I open Kylin insights window to query it but it still returned
>> previous number (5 million rows) not adding the last hour data of X rows
>> which I previously loaded from parquet into hive in step 2)
>>
>> Can you advise the way to make table refresh and updated?
>>
>> Thank you very much
>>


Re: How to reflect last hour data into Hive and Kylin Insights query window

2023-11-16 Thread Nam Đỗ Duy
Hello Team, hello Xiaoxiang, can you please help me with this urgent
issue...

(this is public email group so in general I neglect your specific name from
greeting of first email in the threads, but in fact most of time Xiaoxiang
actively answers my issues, thank you very much)

On Thu, Nov 16, 2023 at 2:59 PM Nam Đỗ Duy  wrote:

> Dear Dev Team, please kindly advise this scenario
>
> 1. I have a fact table and I use Kylin insights window to query it and get
> 5 million rows.
>
> 2. Then I use following command to load X rows (last hour data) from
> parquet into Hive table
>
> LOAD DATA LOCAL INPATH
> '/opt/LastHour/factUserEventDF_2023_11_16.parquet/14' INTO TABLE
> factUserEvent;
>
> 3. Then I open Kylin insights window to query it but it still returned
> previous number (5 million rows) not adding the last hour data of X rows
> which I previously loaded from parquet into hive in step 2)
>
> Can you advise the way to make table refresh and updated?
>
> Thank you very much
>