RE: Segment overlap issues when refreshing a cube segment

2018-08-28 Thread roberto.tardio
Many thanks for the advice. Do you plan to release version 2.4.1 in September?

 

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org] 
Sent: viernes, 24 de agosto de 2018 5:01
To: user 
Subject: Re: Segment overlap issues when refreshing a cube segment

 

Kylin 2.4.1 will be kicked off soon, which includes several bugfixes in 2.4.0. 
You can directly upgrade to 2.4.1.

 

2018-08-23 19:07 GMT+08:00 mailto:roberto.tar...@stratebi.com> >:

Shaofeng, thank you very much for the help. I will share if version 2.4 solves 
our segment overlap issues.

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

  http://bigdata.stratebi.com/ 

 

  http://www.stratebi.com 

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org 
 ] 
Sent: miércoles, 22 de agosto de 2018 15:21


To: user mailto:user@kylin.apache.org> >
Subject: Re: Segment overlap issues when refreshing a cube segment

 

Yes, KYLIN-3311 introduces a lock mechanism when updating metadata, that can 
avoid concurrent updates on the same entity, so to avoid two duplicated 
segments be inserted. 

 

2018-08-22 19:52 GMT+08:00 mailto:roberto.tar...@stratebi.com> >:

Many thanks ShaoFeng!

 

Yes, I could see the new button because we also have a test installation of 
Kylin 2.4 and we are thinking about to update kylin of the production 
environment from 2.2 to 2.4.

 

As for the API, I was aware of its existence. However, in the case I'm telling 
you we can't delete it because the segment is in the middle of other segments 
and in NEW state, duplicated. That's why I was talking about the JIRA advocacy 
https://issues.apache.org/jira/browse/KYLIN-3311  .  Do you think that the 
error causing the segment overlap is related to this JIRA?

 

Kind Regards,

 

Roberto

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org 
 ] 
Sent: martes, 21 de agosto de 2018 14:54
To: user mailto:user@kylin.apache.org> >
Subject: Re: Segment overlap issues when refreshing a cube segment

 

Hi Roberto, in the latest Kylin (I'm running), on the web GUI, there is a 
"Delete segment" button, which allows you to delete a segment easily. 

 

You can try to use the Rest API to delete a segment, this API should have 
existed for a long time: 
https://kylin.apache.org/docs/howto/howto_use_restapi.html#delete-segment

 

I

 

2018-08-21 6:15 GMT+08:00 mailto:roberto.tar...@stratebi.com> >:

Hi, 

 

I am using Kylin 2.2, where I have several OLAP cubes in which I am constantly 
adding and refreshing segments. I am faced with a problem that has generated 
two segments for the same period of time and therefore an error of the 
"segments overlap" type. Kylin is also cluster mounted, with one machine for 
cube building and one for querying. Here is the log of the query machine that 
seems to reflect the error (cube segment building process was successfully 
completed):

 

2018-08-16 09:19:35,833 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 : 
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20 
segments

2018-08-16 09:19:35,833 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:256 
: BroadcastingUPDATE, project_data, My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93] service.CacheService:143 
: removeOLAPDataSource is called for project My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93] service.CacheService:125 
: cleaning cache for project My_Cube_Trafico (currently remove all entries)

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:290 
: Done broadcastingUPDATE, project_data, My_Cube_Trafico

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:290 
: Done broadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:36,817 INFO  [http-bio-7070-exec-104] cube.CubeManager:358 : 
Updating cube instance 'plataforma_cubo_mes_v2'

2018-08-16 09:19:36,817 ERROR [http-bio-7070-exec-104] 
controller.CubeController:337 : Segments overlap: 
plataforma_cubo_mes_v2[2018080100_2018090100] and 
plataforma_cubo_mes_v2[2018080100_2018090100]

2018-08-16 09:19:36,818 ERROR [http-bio-7070-exec-104] 
controller.BasicController:57 : 

2018-08-16 09:19:37,264 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:256 
: BroadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:37,265 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 : 
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20 
segments

 

Could this error be related to the JIRA entry 
https://issues.apache.org/jira/browse/KYLIN-3311? In that case, I see that it 
has been fixed in version 2.4 of Kylin.

 

Also, to fix the error, we had to stop Kylin, remove the duplicate segment from 
the metadata, restart Kylin and then refresh the segment. 

RE: Segment overlap issues when refreshing a cube segment

2018-08-23 Thread roberto.tardio
Shaofeng, thank you very much for the help. I will share if version 2.4 solves 
our segment overlap issues.

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

  http://bigdata.stratebi.com/ 

 

  http://www.stratebi.com 

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org] 
Sent: miércoles, 22 de agosto de 2018 15:21
To: user 
Subject: Re: Segment overlap issues when refreshing a cube segment

 

Yes, KYLIN-3311 introduces a lock mechanism when updating metadata, that can 
avoid concurrent updates on the same entity, so to avoid two duplicated 
segments be inserted. 

 

2018-08-22 19:52 GMT+08:00 mailto:roberto.tar...@stratebi.com> >:

Many thanks ShaoFeng!

 

Yes, I could see the new button because we also have a test installation of 
Kylin 2.4 and we are thinking about to update kylin of the production 
environment from 2.2 to 2.4.

 

As for the API, I was aware of its existence. However, in the case I'm telling 
you we can't delete it because the segment is in the middle of other segments 
and in NEW state, duplicated. That's why I was talking about the JIRA advocacy 
https://issues.apache.org/jira/browse/KYLIN-3311  .  Do you think that the 
error causing the segment overlap is related to this JIRA?

 

Kind Regards,

 

Roberto

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org 
 ] 
Sent: martes, 21 de agosto de 2018 14:54
To: user mailto:user@kylin.apache.org> >
Subject: Re: Segment overlap issues when refreshing a cube segment

 

Hi Roberto, in the latest Kylin (I'm running), on the web GUI, there is a 
"Delete segment" button, which allows you to delete a segment easily. 

 

You can try to use the Rest API to delete a segment, this API should have 
existed for a long time: 
https://kylin.apache.org/docs/howto/howto_use_restapi.html#delete-segment

 

I

 

2018-08-21 6:15 GMT+08:00 mailto:roberto.tar...@stratebi.com> >:

Hi, 

 

I am using Kylin 2.2, where I have several OLAP cubes in which I am constantly 
adding and refreshing segments. I am faced with a problem that has generated 
two segments for the same period of time and therefore an error of the 
"segments overlap" type. Kylin is also cluster mounted, with one machine for 
cube building and one for querying. Here is the log of the query machine that 
seems to reflect the error (cube segment building process was successfully 
completed):

 

2018-08-16 09:19:35,833 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 : 
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20 
segments

2018-08-16 09:19:35,833 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:256 
: BroadcastingUPDATE, project_data, My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93] service.CacheService:143 
: removeOLAPDataSource is called for project My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93] service.CacheService:125 
: cleaning cache for project My_Cube_Trafico (currently remove all entries)

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:290 
: Done broadcastingUPDATE, project_data, My_Cube_Trafico

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:290 
: Done broadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:36,817 INFO  [http-bio-7070-exec-104] cube.CubeManager:358 : 
Updating cube instance 'plataforma_cubo_mes_v2'

2018-08-16 09:19:36,817 ERROR [http-bio-7070-exec-104] 
controller.CubeController:337 : Segments overlap: 
plataforma_cubo_mes_v2[2018080100_2018090100] and 
plataforma_cubo_mes_v2[2018080100_2018090100]

2018-08-16 09:19:36,818 ERROR [http-bio-7070-exec-104] 
controller.BasicController:57 : 

2018-08-16 09:19:37,264 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:256 
: BroadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:37,265 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 : 
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20 
segments

 

Could this error be related to the JIRA entry 
https://issues.apache.org/jira/browse/KYLIN-3311? In that case, I see that it 
has been fixed in version 2.4 of Kylin.

 

Also, to fix the error, we had to stop Kylin, remove the duplicate segment from 
the metadata, restart Kylin and then refresh the segment. As far as I can read 
on the website https://issues.apache.org/jira/browse/KYLIN-2849 , from version 
2.3 onwards you can delete a segment in any position from the UI or the Kylin 
API after disabling the cube. Is that correct? This would help us mitigate the 
problem more quickly.

 

Thanks in advance!

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

RE: Segment overlap issues when refreshing a cube segment

2018-08-22 Thread roberto.tardio
Many thanks ShaoFeng!

 

Yes, I could see the new button because we also have a test installation of 
Kylin 2.4 and we are thinking about to update kylin of the production 
environment from 2.2 to 2.4.

 

As for the API, I was aware of its existence. However, in the case I'm telling 
you we can't delete it because the segment is in the middle of other segments 
and in NEW state, duplicated. That's why I was talking about the JIRA advocacy 
https://issues.apache.org/jira/browse/KYLIN-3311  .  Do you think that the 
error causing the segment overlap is related to this JIRA?

 

Kind Regards,

 

Roberto

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org] 
Sent: martes, 21 de agosto de 2018 14:54
To: user 
Subject: Re: Segment overlap issues when refreshing a cube segment

 

Hi Roberto, in the latest Kylin (I'm running), on the web GUI, there is a 
"Delete segment" button, which allows you to delete a segment easily. 

 

You can try to use the Rest API to delete a segment, this API should have 
existed for a long time: 
https://kylin.apache.org/docs/howto/howto_use_restapi.html#delete-segment

 

I

 

2018-08-21 6:15 GMT+08:00 mailto:roberto.tar...@stratebi.com> >:

Hi, 

 

I am using Kylin 2.2, where I have several OLAP cubes in which I am constantly 
adding and refreshing segments. I am faced with a problem that has generated 
two segments for the same period of time and therefore an error of the 
"segments overlap" type. Kylin is also cluster mounted, with one machine for 
cube building and one for querying. Here is the log of the query machine that 
seems to reflect the error (cube segment building process was successfully 
completed):

 

2018-08-16 09:19:35,833 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 : 
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20 
segments

2018-08-16 09:19:35,833 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:256 
: BroadcastingUPDATE, project_data, My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93] service.CacheService:143 
: removeOLAPDataSource is called for project My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93] service.CacheService:125 
: cleaning cache for project My_Cube_Trafico (currently remove all entries)

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:290 
: Done broadcastingUPDATE, project_data, My_Cube_Trafico

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:290 
: Done broadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:36,817 INFO  [http-bio-7070-exec-104] cube.CubeManager:358 : 
Updating cube instance 'plataforma_cubo_mes_v2'

2018-08-16 09:19:36,817 ERROR [http-bio-7070-exec-104] 
controller.CubeController:337 : Segments overlap: 
plataforma_cubo_mes_v2[2018080100_2018090100] and 
plataforma_cubo_mes_v2[2018080100_2018090100]

2018-08-16 09:19:36,818 ERROR [http-bio-7070-exec-104] 
controller.BasicController:57 : 

2018-08-16 09:19:37,264 DEBUG [http-bio-7070-exec-93] cachesync.Broadcaster:256 
: BroadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:37,265 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 : 
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20 
segments

 

Could this error be related to the JIRA entry 
https://issues.apache.org/jira/browse/KYLIN-3311? In that case, I see that it 
has been fixed in version 2.4 of Kylin.

 

Also, to fix the error, we had to stop Kylin, remove the duplicate segment from 
the metadata, restart Kylin and then refresh the segment. As far as I can read 
on the website https://issues.apache.org/jira/browse/KYLIN-2849 , from version 
2.3 onwards you can delete a segment in any position from the UI or the Kylin 
API after disabling the cube. Is that correct? This would help us mitigate the 
problem more quickly.

 

Thanks in advance!

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com   

 





 

-- 

Best regards,

 

Shaofeng Shi 史少锋

 



Segment overlap issues when refreshing a cube segment

2018-08-20 Thread roberto.tardio
Hi, 

 

I am using Kylin 2.2, where I have several OLAP cubes in which I am
constantly adding and refreshing segments. I am faced with a problem that
has generated two segments for the same period of time and therefore an
error of the "segments overlap" type. Kylin is also cluster mounted, with
one machine for cube building and one for querying. Here is the log of the
query machine that seems to reflect the error (cube segment building process
was successfully completed):

 

2018-08-16 09:19:35,833 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 :
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20
segments

2018-08-16 09:19:35,833 DEBUG [http-bio-7070-exec-93]
cachesync.Broadcaster:256 : BroadcastingUPDATE, project_data,
My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93]
service.CacheService:143 : removeOLAPDataSource is called for project
My_Cube_Trafico

2018-08-16 09:19:35,834 INFO  [http-bio-7070-exec-93]
service.CacheService:125 : cleaning cache for project My_Cube_Trafico
(currently remove all entries)

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93]
cachesync.Broadcaster:290 : Done broadcastingUPDATE, project_data,
My_Cube_Trafico

2018-08-16 09:19:35,834 DEBUG [http-bio-7070-exec-93]
cachesync.Broadcaster:290 : Done broadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:36,817 INFO  [http-bio-7070-exec-104] cube.CubeManager:358
: Updating cube instance 'plataforma_cubo_mes_v2'

2018-08-16 09:19:36,817 ERROR [http-bio-7070-exec-104]
controller.CubeController:337 : Segments overlap:
plataforma_cubo_mes_v2[2018080100_2018090100] and
plataforma_cubo_mes_v2[2018080100_2018090100]

2018-08-16 09:19:36,818 ERROR [http-bio-7070-exec-104]
controller.BasicController:57 : 

2018-08-16 09:19:37,264 DEBUG [http-bio-7070-exec-93]
cachesync.Broadcaster:256 : BroadcastingUPDATE, cube, trafico_cubo_v8

2018-08-16 09:19:37,265 INFO  [http-bio-7070-exec-93] cube.CubeManager:799 :
Reloaded cube trafico_cubo_v8 being CUBE[name=trafico_cubo_v8] having 20
segments

 

Could this error be related to the JIRA entry
https://issues.apache.org/jira/browse/KYLIN-3311? In that case, I see that
it has been fixed in version 2.4 of Kylin.

 

Also, to fix the error, we had to stop Kylin, remove the duplicate segment
from the metadata, restart Kylin and then refresh the segment. As far as I
can read on the website https://issues.apache.org/jira/browse/KYLIN-2849 ,
from version 2.3 onwards you can delete a segment in any position from the
UI or the Kylin API after disabling the cube. Is that correct? This would
help us mitigate the problem more quickly.

 

Thanks in advance!

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com   

 



RE: Kylin with S3, cubes tables get in transition when new cluster booted

2018-07-30 Thread roberto.tardio
Hi Moisés,

 

If I have understood right you have been able to deploy Kylin on EMR 
successfully . However you lose metadata when you terminate the cluster. is it 
right? 

 

Have you tried to restore Kylin metadata backup after cluster re-creation? 
Moreover, do you enable all HBase tables after cluster re-creation? 

 

We successfully deployed Kylin on EMR using S3 as storage for HBase and Hive. 
But our configuration differ by 2 points:

· We use EMRFS 
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html 

o   {

o  "Classification": "emrfs-site",

o  "Properties": {

o  "fs.s3.consistent.retryPeriodSeconds": "10",

o  "fs.s3.consistent": "true",

o  "fs.s3.consistent.retryCount": "5",

o  "fs.s3.consistent.metadata.tableName": 
"EmrFSMetadata"

o  },

o  "Configurations": []

o   }

· We deployed Kylin on an EC2 machine separated from the cluster.

 

I hope this helps you.

 

Roberto Tardío

 

From: Moisés Català [mailto:moises.cat...@lacupulamusic.com] 
Sent: sábado, 28 de julio de 2018 16:17
To: user@kylin.apache.org
Subject: Kylin with S3, cubes tables get in transition when new cluster booted

 

Hi all,

 

I’ve followed carefully the instructions provided in  
 
http://kylin.apache.org/docs23/install/kylin_aws_emr.html 

 

My idea is to use s3 as the storage for Hbase, I have configured the cluster 
following the instructions but I get that tables that contain cube definition 
keep "on transition" when deploying new cluster and Kylie metadata seems 
outdated...

 

These are the steps I follow to create the cluster

 

Cluster creation command:

 

aws emr create-cluster \

--applications Name=Hadoop Name=Hue Name=Spark Name=Zeppelin Name=Ganglia 
Name=Hive Name=Hbase Name=HCatalog Name=Tez \

--tags 'hive=' 'spark=' 'zeppelin=' \

--ec2-attributes '  
file://../config/ec2-attributes.json' \

--release-label emr-5.16.0 \

--log-uri 's3n://sns-da-logs/' \

--instance-groups '  
file://../config/instance-hive-datawarehouse.json' \

--configurations  '  
file://../config/hive-hbase-s3.json' \

--auto-scaling-role EMR_AutoScaling_DefaultRole \

--ebs-root-volume-size 10 \

--service-role EMR_DefaultRole \

--enable-debugging \

--name 'hbase-hive-datawarehouse' \

--scale-down-behavior TERMINATE_AT_TASK_COMPLETION \

--region us-east-1

 

 

My configuration hive-hbase-s3.json:

 

[

  {

"Classification": "hive-site",

"Configurations": [],

"Properties": {

  "hive.metastore.warehouse.dir": "s3://-datawarehouse/hive.db",

  "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",

  "javax.jdo.option.ConnectionPassword": “x",

  "javax.jdo.option.ConnectionURL": "jdbc: 
 
mysql://xx:3306/hive_metastore?createDatabaseIfNotExist=true",

  "javax.jdo.option.ConnectionUserName": “"

}

  },

  {

"Classification": "hbase",

"Configurations": [],

"Properties": {

  "hbase.emr.storageMode": "s3"

}

  },

  {

"Classification": "hbase-site",

"Configurations": [],

"Properties": {

  "hbase.rpc.timeout": "360",

  "hbase.rootdir": "s3://xx-hbase/"

}

  },

  {

  "Classification": "core-site",

  "Properties": {

"io.file.buffer.size": "65536"

  }

  },

  {

  "Classification": "mapred-site",

  "Properties": {

"mapred.map.tasks.speculative.execution": "false",

"mapred.reduce.tasks.speculative.execution": "false",

"mapreduce.map.speculative": "false",

"mapreduce.reduce.speculative": "false"

 

  }

  } 

]

 

When I shut down the cluster I perform these commands:

 

../kylin_home/bin/kylin.sh stop

 

 

#Before you shutdown/restart the cluster, you must backup the “/kylin” data on 
HDFS to S3 with S3DistCp,

  

aws s3 rm s3://xx-config/metadata/kylin/*

s3-dist-cp --src=  hdfs:///kylin 
--dest=s3://xx-config/metadata/kylin

 

bash /usr/lib/hbase/bin/disable_all_tables.sh

 

 

Please, could you be so kind to indicate me what am I missing

 

 

Thanks in advance

 



Cube moved from one project to another after editing it

2018-05-09 Thread roberto.tardio
Hi,

 

Something very strange happened to us today. We were editing Configuration
Overwrites (changing YARN queue) of a Cube. After saving its updated
definition kylin moved the cube from current project to another existing
project. Then we cannot use or build this cube. We have performed a
metastore backup in order to check projects metadata. The cube realization
was moved from right project to another existing project.

 

To solve it, we have to edit Hbase metadata directly to change cube project
to the right project. Another possibility that we have shuffled is to edit
the local copy of metadata and reload the metastore. We have also reviewed
the log of Kylin and we have not seen anything unusual. Have someone
experienced  this issue?

 

We are using Kylin 2.3.1. Updated from 2.2 last week, with 2.2 we have not
experienced this issue.

 

Thanks!

 

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



RE: Doubts about the hdfs working dir

2018-05-09 Thread roberto.tardio
Hi ShaoFeng,

 

A very clear explanation, now I understand the use of Kylin working dir. Thanks!

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org] 
Sent: miércoles, 9 de mayo de 2018 3:39
To: user 
Subject: Re: Doubts about the hdfs working dir

 

Hi Roberto,

 

The data in hdfs-working-dir includes intermediate files (which will be GC) and 
Cuboid data (won't be GC). The Cuboid data is kept for the further segments' 
merge, as Kylin couldn't merge from HBase. If you're sure those segments won't 
be merged, you can move them to other storage.

 

Please pay attention to the "resources" sub-folder under hdfs-working-dir, 
which persists some big metadata files like dictionary and snapshots. They 
shouldn't be moved.

 

 

 

2018-05-09 0:56 GMT+08:00  >:

Hi,

 

I have some doubts about the use of kylin.env.hdfs-working-dir. I understand 
working dir is needed to store data about RUNNING or STOPPED JOBS. However, is 
it necessary to store data from finished jobs?. Although, we often execute 
kylin cleanup storage command, now our working dir folder is about 300 GB size, 
looks like a lot of data for historical jobs:

 

1.   We can delete old data manually? I tried to stop all jobs and Kylin, 
then change working dir. After change working dir Kylin worked well for new 
jobs. There is any inconvenient to perform this delete manually? I did not 
experience any problems with the working dir change.

2.   Does Kylin 2.3.1 have any advantages over Kylin 2.2 about working dir 
cleaning? 

 

King Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 





 

-- 

Best regards,

 

Shaofeng Shi 史少锋

 



Doubts about the hdfs working dir

2018-05-08 Thread roberto.tardio
Hi,

 

I have some doubts about the use of kylin.env.hdfs-working-dir. I understand
working dir is needed to store data about RUNNING or STOPPED JOBS. However,
is it necessary to store data from finished jobs?. Although, we often
execute kylin cleanup storage command, now our working dir folder is about
300 GB size, looks like a lot of data for historical jobs:

 

1.   We can delete old data manually? I tried to stop all jobs and
Kylin, then change working dir. After change working dir Kylin worked well
for new jobs. There is any inconvenient to perform this delete manually? I
did not experience any problems with the working dir change.

2.   Does Kylin 2.3.1 have any advantages over Kylin 2.2 about working
dir cleaning? 

 

King Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



RE: Disable automatic cube enabling after building

2018-05-08 Thread roberto.tardio
Hi,

 

There is a new patch to configure cube auto enabling. We have tried to compile 
and use with Kylin 2.3.1 with successful.

 

https://issues.apache.org/jira/browse/KYLIN-3366

 

Many thanks to kylin team!

 

From: roberto.tar...@stratebi.com [mailto:roberto.tar...@stratebi.com] 
Sent: jueves, 3 de mayo de 2018 13:21
To: user@kylin.apache.org
Subject: RE: Disable automatic cube enabling after building

 

Thanks Li Yang. I have created a JIRA for this new feature / improvement.

 

https://issues.apache.org/jira/browse/KYLIN-3366

 

Regards,

 

From: Li Yang [mailto:liy...@apache.org] 
Sent: martes, 1 de mayo de 2018 0:44
To: user@kylin.apache.org  
Subject: Re: Disable automatic cube enabling after building

 

Not right now, but is easy to do. You can open a JIRA to kickoff the dev work.

Thanks

Yang

 

On Tue, Apr 24, 2018 at 9:25 PM,  > wrote:

Hi,

 

Is it possible to disable automatic cube enabling after a building process? For 
example, this could be useful if we need to create a new version of a cube but 
we do not want it to be consulted until new cube has been properly tested.

 

Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 

 



RE: Increase of time between steps after Kylin 2.3.1 update

2018-05-07 Thread roberto.tardio
Checking the log I verify that more time is spent between one step and
another in our installation of Kylin 2.3.1 than when we used Kylin 2.2.  In
addition, JOBS are in PENDING status longer in the job monitor.

 

Kylin 2.2

 

2018-05-04 14:25:40,797 INFO  [Scheduler 542798956 Job
cb843ecc-5062-49ec-a881-215097e4501a-518] execution.ExecutableManager:421 :
job id:cb843ecc-5062-49ec-a881-215097e4501a from RUNNING to READY

2018-05-04 14:25:45,496 INFO  [Scheduler 542798956 Job
cb843ecc-5062-49ec-a881-215097e4501a-518] execution.AbstractExecutable:111 :
Executing AbstractExecutable (BUILD CUBE - Captacion_Resumido_Cubo_v3 -
2018050400_2018050500 - GMT+02:00 2018-05-04 14:21:34)

 

 

Kylin 2.3.1

 

2018-05-07 15:25:26,338 INFO  [Scheduler 1942482192 Job
6d616841-1c4d-46ca-a1b9-a18047fbada5-1006] execution.ExecutableManager:411 :
job id:6d616841-1c4d-46ca-a1b9-a18047fbada5 from RUNNING to READY

2018-05-07 15:25:48,365 INFO  [Scheduler 1942482192 Job
6d616841-1c4d-46ca-a1b9-a18047fbada5-1006] execution.AbstractExecutable:147
: Executing AbstractExecutable (BUILD CUBE - Captacion_Resumido_Cubo_v3 -
2018050700_2018050800 - GMT+02:00 2018-05-07 15:19:41)

 

Regards,

 

From: roberto.tar...@stratebi.com [mailto:roberto.tar...@stratebi.com] 
Sent: lunes, 7 de mayo de 2018 14:34
To: user@kylin.apache.org
Subject: Increase of time between steps after Kylin 2.3.1 update

 

Hi,

 

Some days ago we have updated Kylin 2.2 to Kylin 2.3.1. We performed the
update process with successful. However, we have experimented an increase of
time wasted between steps during building time. For example, the time
between step 7 “Build N-Dimension Cuboid:level 1” and “…level2” is now about
30 seconds, a lot of time if we take into account that in these steps no
processing is executed since we use the "layer" algorithm (Screenshot 1).
This time is much longer than when we used Kylin 2.2, were time between
these skipped steps was about 5 seconds. (Screenshot 2)

 

Have you experienced someone a similar behavior? What could be the cause?

 

Environment:

· EMR 5.7 (S3 storage for Hive and HBase)

· Kylin 2.3.1 (updated from 2.2 last Friday). Installed on separated
EC2 instance.

 

Thanks in advance,

 

Screenshot 1:

 

 



 

 

Screenshot 2:

 

 



Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

  http://bigdata.stratebi.com/ 

 

  http://www.stratebi.com 

 



RE: Disable automatic cube enabling after building

2018-05-03 Thread roberto.tardio
Thanks Li Yang. I have created a JIRA for this new feature / improvement.

 

https://issues.apache.org/jira/browse/KYLIN-3366

 

Regards,

 

From: Li Yang [mailto:liy...@apache.org] 
Sent: martes, 1 de mayo de 2018 0:44
To: user@kylin.apache.org
Subject: Re: Disable automatic cube enabling after building

 

Not right now, but is easy to do. You can open a JIRA to kickoff the dev work.

Thanks

Yang

 

On Tue, Apr 24, 2018 at 9:25 PM,  > wrote:

Hi,

 

Is it possible to disable automatic cube enabling after a building process? For 
example, this could be useful if we need to create a new version of a cube but 
we do not want it to be consulted until new cube has been properly tested.

 

Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 

 



RE: Hybrid Cubes Document

2018-04-27 Thread roberto.tardio
You're welcome ShaoFeng,

Thanks for sharing your experiences, the application case of the hybrid model 
that you indicate is also very useful. 

Regards,

 

From: ShaoFeng Shi [mailto:shaofeng...@apache.org] 
Sent: miércoles, 25 de abril de 2018 2:34
To: user 
Subject: Re: Hybrid Cubes Document

 

Thank very much to Roberto for the comment and suggestions! The feedback is 
helpful for us to keep this feature. I had thought it has no external user.

 

For Manoj's question "Then Cube2 will be sufficient to get all query. Why do we 
need to combine?":

 

We had the customer case that they deleted the source data after building into 
Cube (as the source data size is huge). Later when they add new 
dimension/measure, it is impossible to go back to rebuild the history data into 
the new Cube. So they use the hybrid model to combine history cube and new 
cube, then at least for the common dimension/measure, they can get an overall 
view.

 

 

2018-04-24 21:19 GMT+08:00  >:

Hi Kumar,

 

“Then Cube2 will be sufficient to get all query. Why do we need to combine?”

If you combine Cube 1 and Cube 2 using a Hybrid Model, queries that contains 
common columns (dimensions or measures) will be routed to hybrid cube, thus 
joining data from Cube 1 and Cube 2. I.e. This can be useful to avoid stopping 
service when we implement a new version of the cube. 

 

It is a concept difficult to explain. I recommend you to do a test, for example 
with the learning kylin project ,adding a new measure to the fact table. 

 

Regards,

Roberto

From: Kumar, Manoj H [mailto:manoj.h.ku...@jpmorgan.com 
 ] 
Sent: martes, 24 de abril de 2018 10:21


To: user@kylin.apache.org  
Subject: RE: Hybrid Cubes Document

 

Thanks for the explanation.

 

#1 – Understood clearly whenever there is a change in structure. 

#2 – Still not clear how can Hybrid model be used as we need to defined 10 
dimensions to Cube_2(5 Dimensions from Cube1). Then Cube2 will be sufficient to 
get all query. Why do we need to combine?

 

 

Regards,

Manoj

 

From: roberto.tar...@stratebi.com   
[mailtoroberto.tar...@stratebi.com  ] 
Sent: Tuesday, April 24, 2018 1:07 PM
To: user@kylin.apache.org  
Subject: RE: Hybrid Cubes Document

 

Hello Kumar,

 

I try to answer your questions:

 

1.   It can be useful in scenarios where i) we need to change cube 
definition optimizations (e.g. agg groups, mandatory dimensiones, rowkeys,…) or 
i) when we need to add columns or measures to a cube (the case I analyzed on 
document). Due to we cannot modify and existing cube without purging its data, 
create a new cube is the only option to perform changes.  Due to cube 
re-building for historical data can be a hard process (time + resources), we 
can create a Hybrid Model over old and new cube and start to build new cube 
from last segment (e.g. date) of the old cube. After defining a Hybrid model 
queries that contains common columns (dimensions or measures) will be routed to 
hybrid cube, thus joining data from old and new cube. However, if we perform a 
query that includes one or more of new columns from new cube, it will be routed 
only to new cube.  In mi opinion, this is a disadvantage of the hybrid model, 
because if we want to perform queries that include new dimensions, we have to 
rebuild all the historical for the new cube.

2.   In your case Cube 1 will be old cube (historical data) and Cube 2 new 
cube (new data and columns). As there are not common columns between them, 
queries will be routed only to one of the cubes and not to hybrid cube. To 
apply hybrid model you must define cube 2 with 10 dimension ( 5 from cube 1 and 
the new ones from Cube 2).

a.   I do not know if it's your case, but If you are doing this for the 
purpose of dividing the construction of the cubes (e.g. building time 
optimization), you better use the Agg Group concept.

Regards,

Roberto

From: Kumar, Manoj H [mailto:manoj.h.ku...@jpmorgan.com 
 ] 
Sent: lunes, 23 de abril de 2018 9:39
To: user@kylin.apache.org  
Subject: RE: Hybrid Cubes Document

 

Thanks.. Whenever we need to change the Cube Dimensions – To add new 
Dimension/Column, This Hybrid Model would be efficient??. As this doesn’t need 
to re-build the previous cube & save the building time.

 

Also, Another use case, Can we use Hybrid Cube Model for below purpose 

 

-  Cube 1- 5 Dimension

-  Cube 2 – Another 5 Dimension

 

Hybrid Cube – Cube 1 + Cube 2 – Become 10 dimension

 

Hybrid Cube is exposed to BI tool – User can see 10 Dimensions from both Cube 
combination?? Is this right statement?

 

Regards,

Manoj

 

From: Billy Liu [mailtobilly...@apache.org 

Disable automatic cube enabling after building

2018-04-24 Thread roberto.tardio
Hi,

 

Is it possible to disable automatic cube enabling after a building process?
For example, this could be useful if we need to create a new version of a
cube but we do not want it to be consulted until new cube has been properly
tested.

 

Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



RE: Hybrid Cubes Document

2018-04-24 Thread roberto.tardio
Hi Kumar,

 

“Then Cube2 will be sufficient to get all query. Why do we need to combine?”

If you combine Cube 1 and Cube 2 using a Hybrid Model, queries that contains 
common columns (dimensions or measures) will be routed to hybrid cube, thus 
joining data from Cube 1 and Cube 2. I.e. This can be useful to avoid stopping 
service when we implement a new version of the cube. 

 

It is a concept difficult to explain. I recommend you to do a test, for example 
with the learning kylin project ,adding a new measure to the fact table. 

 

Regards,

Roberto

From: Kumar, Manoj H [mailto:manoj.h.ku...@jpmorgan.com] 
Sent: martes, 24 de abril de 2018 10:21
To: user@kylin.apache.org
Subject: RE: Hybrid Cubes Document

 

Thanks for the explanation.

 

#1 – Understood clearly whenever there is a change in structure. 

#2 – Still not clear how can Hybrid model be used as we need to defined 10 
dimensions to Cube_2(5 Dimensions from Cube1). Then Cube2 will be sufficient to 
get all query. Why do we need to combine?

 

 

Regards,

Manoj

 

From: roberto.tar...@stratebi.com   
[mailtoroberto.tar...@stratebi.com  ] 
Sent: Tuesday, April 24, 2018 1:07 PM
To: user@kylin.apache.org  
Subject: RE: Hybrid Cubes Document

 

Hello Kumar,

 

I try to answer your questions:

 

1.   It can be useful in scenarios where i) we need to change cube 
definition optimizations (e.g. agg groups, mandatory dimensiones, rowkeys,…) or 
i) when we need to add columns or measures to a cube (the case I analyzed on 
document). Due to we cannot modify and existing cube without purging its data, 
create a new cube is the only option to perform changes.  Due to cube 
re-building for historical data can be a hard process (time + resources), we 
can create a Hybrid Model over old and new cube and start to build new cube 
from last segment (e.g. date) of the old cube. After defining a Hybrid model 
queries that contains common columns (dimensions or measures) will be routed to 
hybrid cube, thus joining data from old and new cube. However, if we perform a 
query that includes one or more of new columns from new cube, it will be routed 
only to new cube.  In mi opinion, this is a disadvantage of the hybrid model, 
because if we want to perform queries that include new dimensions, we have to 
rebuild all the historical for the new cube.

2.   In your case Cube 1 will be old cube (historical data) and Cube 2 new 
cube (new data and columns). As there are not common columns between them, 
queries will be routed only to one of the cubes and not to hybrid cube. To 
apply hybrid model you must define cube 2 with 10 dimension ( 5 from cube 1 and 
the new ones from Cube 2).

a.   I do not know if it's your case, but If you are doing this for the 
purpose of dividing the construction of the cubes (e.g. building time 
optimization), you better use the Agg Group concept.

Regards,

Roberto

From: Kumar, Manoj H [mailto:manoj.h.ku...@jpmorgan.com 
 ] 
Sent: lunes, 23 de abril de 2018 9:39
To: user@kylin.apache.org  
Subject: RE: Hybrid Cubes Document

 

Thanks.. Whenever we need to change the Cube Dimensions – To add new 
Dimension/Column, This Hybrid Model would be efficient??. As this doesn’t need 
to re-build the previous cube & save the building time.

 

Also, Another use case, Can we use Hybrid Cube Model for below purpose 

 

-  Cube 1- 5 Dimension

-  Cube 2 – Another 5 Dimension

 

Hybrid Cube – Cube 1 + Cube 2 – Become 10 dimension

 

Hybrid Cube is exposed to BI tool – User can see 10 Dimensions from both Cube 
combination?? Is this right statement?

 

Regards,

Manoj

 

From: Billy Liu [mailtobilly...@apache.org  ] 
Sent: Monday, April 23, 2018 8:48 AM
To: user  >
Subject: Re: Hybrid Cubes Document

 

Hello Roberto,

 

Thanks for this sharing. Would you like to publish it on the Kylin website? 





With Warm regards

Billy Liu

 

2018-04-20 1:18 GMT+08:00  >:

Hi,

 

Last days I was doing some researching about the use of hybrid cubes (model). 
However, I just found this document 
http://kylin.apache.org/blog/2015/09/25/hybrid-model/ published on Sep 25, 
2015. Due to this fact, I wrote a little guide that aim to explain its use , 
possible use cases and current limitations. I share a document through the 
following link:

 

https://drive.google.com/open?id=1qbvB1iONBcFMFE__SuF0ayq_l1_0vwXN 

 

Please, do not hesitate to correct me if you see something wrong. I have found 
this feature very interesting to mitigate the issues related to the problem of 
re building the entire Cube if we need to modify its definition. However, the 
hybrid model only combines the data from two cubes if a query 

RE: Hybrid Cubes Document

2018-04-24 Thread roberto.tardio
Hello Kumar,

 

I try to answer your questions:

 

1.   It can be useful in scenarios where i) we need to change cube 
definition optimizations (e.g. agg groups, mandatory dimensiones, rowkeys,…) or 
i) when we need to add columns or measures to a cube (the case I analyzed on 
document). Due to we cannot modify and existing cube without purging its data, 
create a new cube is the only option to perform changes.  Due to cube 
re-building for historical data can be a hard process (time + resources), we 
can create a Hybrid Model over old and new cube and start to build new cube 
from last segment (e.g. date) of the old cube. After defining a Hybrid model 
queries that contains common columns (dimensions or measures) will be routed to 
hybrid cube, thus joining data from old and new cube. However, if we perform a 
query that includes one or more of new columns from new cube, it will be routed 
only to new cube.  In mi opinion, this is a disadvantage of the hybrid model, 
because if we want to perform queries that include new dimensions, we have to 
rebuild all the historical for the new cube.

2.   In your case Cube 1 will be old cube (historical data) and Cube 2 new 
cube (new data and columns). As there are not common columns between them, 
queries will be routed only to one of the cubes and not to hybrid cube. To 
apply hybrid model you must define cube 2 with 10 dimension ( 5 from cube 1 and 
the new ones from Cube 2).

a.   I do not know if it's your case, but If you are doing this for the 
purpose of dividing the construction of the cubes (e.g. building time 
optimization), you better use the Agg Group concept.

Regards,

Roberto

From: Kumar, Manoj H [mailto:manoj.h.ku...@jpmorgan.com] 
Sent: lunes, 23 de abril de 2018 9:39
To: user@kylin.apache.org
Subject: RE: Hybrid Cubes Document

 

Thanks.. Whenever we need to change the Cube Dimensions – To add new 
Dimension/Column, This Hybrid Model would be efficient??. As this doesn’t need 
to re-build the previous cube & save the building time.

 

Also, Another use case, Can we use Hybrid Cube Model for below purpose 

 

-  Cube 1- 5 Dimension

-  Cube 2 – Another 5 Dimension

 

Hybrid Cube – Cube 1 + Cube 2 – Become 10 dimension

 

Hybrid Cube is exposed to BI tool – User can see 10 Dimensions from both Cube 
combination?? Is this right statement?

 

Regards,

Manoj

 

From: Billy Liu [mailtobilly...@apache.org  ] 
Sent: Monday, April 23, 2018 8:48 AM
To: user  >
Subject: Re: Hybrid Cubes Document

 

Hello Roberto,

 

Thanks for this sharing. Would you like to publish it on the Kylin website? 





With Warm regards

Billy Liu

 

2018-04-20 1:18 GMT+08:00  >:

Hi,

 

Last days I was doing some researching about the use of hybrid cubes (model). 
However, I just found this document 
http://kylin.apache.org/blog/2015/09/25/hybrid-model/ published on Sep 25, 
2015. Due to this fact, I wrote a little guide that aim to explain its use , 
possible use cases and current limitations. I share a document through the 
following link:

 

https://drive.google.com/open?id=1qbvB1iONBcFMFE__SuF0ayq_l1_0vwXN 

 

Please, do not hesitate to correct me if you see something wrong. I have found 
this feature very interesting to mitigate the issues related to the problem of 
re building the entire Cube if we need to modify its definition. However, the 
hybrid model only combines the data from two cubes if a query uses only the 
common columns of this two cubes. I have analyzed this drawback in the document.

 

I appreciate the help of the Kylin community and team, I hope this document 
helps.

 

Best Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 

 

This message is confidential and subject to terms at: 
http://www.jpmorgan.com/emaildisclaimer 
  including on confidentiality, legal 
privilege, viruses and monitoring of electronic messages. If you are not the 
intended recipient, please delete this message and notify the sender 
immediately. Any unauthorized use is strictly prohibited.



RE: Hybrid Cubes Document

2018-04-24 Thread roberto.tardio
Hi Billy,

 

Yes, of course. You can publish or edit it if you consider appropriate.

 

Kind Regards,

 

Roberto Tardío

From: Billy Liu [mailto:billy...@apache.org] 
Sent: lunes, 23 de abril de 2018 5:18
To: user 
Subject: Re: Hybrid Cubes Document

 

Hello Roberto,

 

Thanks for this sharing. Would you like to publish it on the Kylin website? 





With Warm regards

Billy Liu

 

2018-04-20 1:18 GMT+08:00  >:

Hi,

 

Last days I was doing some researching about the use of hybrid cubes (model). 
However, I just found this document 
http://kylin.apache.org/blog/2015/09/25/hybrid-model/ published on Sep 25, 
2015. Due to this fact, I wrote a little guide that aim to explain its use , 
possible use cases and current limitations. I share a document through the 
following link:

 

https://drive.google.com/open?id=1qbvB1iONBcFMFE__SuF0ayq_l1_0vwXN 

 

Please, do not hesitate to correct me if you see something wrong. I have found 
this feature very interesting to mitigate the issues related to the problem of 
re building the entire Cube if we need to modify its definition. However, the 
hybrid model only combines the data from two cubes if a query uses only the 
common columns of this two cubes. I have analyzed this drawback in the document.

 

I appreciate the help of the Kylin community and team, I hope this document 
helps.

 

Best Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17 
 , 
Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 

 



Cube migration between clusters

2018-04-20 Thread roberto.tardio
Hi,

 

There is a tool for migrating kylin cubes between two Hadoop / Kylin
clusters? It could be interesting to rebuild cubes in a development
environment and be able to migrate them into a production environment, in
order to avoid spending resources from the production cluster.

 

Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



RE: Dict value is too long

2018-04-20 Thread roberto.tardio
Hi,

 

Looks like there is a dimension with high cardinality (HC) that does not fit
in dictionary. Please check if you have a HC dimension and try to change its
Rowkey Encoding from dictionary (dic) to other encoding, such as fixed
length. Have you tried this already?

 

Regards,

 

 

 

From: Lu Zhe [mailto:tclu...@hotmail.com] 
Sent: martes, 10 de abril de 2018 9:37
To: user@kylin.apache.org
Subject: 答复: Dict value is too long

 

抱歉,我修改下格式。

 

java.lang.IllegalStateException: maxValueLength is negative (-26578). Dict
value is too long, whose length is larger than 32767
 at org.apache.kylin.dict.TrieDictionary.init(TrieDictionary.java:104)
 at org.apache.kylin.dict.TrieDictionary.readFields(TrieDictionary.java:339)
 at
org.apache.kylin.dict.lookup.SnapshotTable.readData(SnapshotTable.java:250)
 at
org.apache.kylin.dict.lookup.SnapshotTableSerializer.deserialize(SnapshotTab
leSerializer.java:74)
 at
org.apache.kylin.dict.lookup.SnapshotTableSerializer.deserialize(SnapshotTab
leSerializer.java:48)
 at
org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.
java:171)
 at
org.apache.kylin.dict.lookup.SnapshotManager.load(SnapshotManager.java:196)
 at
org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManag
er.java:178)
 at
org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotMana
ger.java:140)
 at
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.j
ava:120)
 at
org.apache.kylin.cube.CubeManager$DictionaryAssist.buildSnapshotTable(CubeMa
nager.java:1055)
 at
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:971)
 at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGe
neratorCLI.java:87)
 at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGe
neratorCLI.java:49)
 at
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob
.java:71)
 at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:97)
 at
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellEx
ecutable.java:63)
 at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable
.java:162)
 at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChaine
dExecutable.java:67)
 at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable
.java:162)
 at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultS
cheduler.java:300)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
42)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
17)
 at java.lang.Thread.run(Thread.java:745)

result code:2

 

原本怀疑是字段值过长,因为上次遇到的情况就是这样,后来我限制了字段值的长度就
好了。
这次又遇到这个问题,不是很理解。

 

  _  

发件人: Lu Zhe <  tclu...@hotmail.com>
发送时间: 2018年4月10日 15:29
收件人:   user@kylin.apache.org
主题: Dict value is too long 

 

 Kylin 2.3.1





Build Dimension Dictionary时报错:

java.lang.IllegalStateException: maxValueLength is negative (-26578). Dict
value is too long, whose length is larger than 32767
at
org.apache.kylin.dict.TrieDictionary.init(TrieDictionary.java:104)
at
org.apache.kylin.dict.TrieDictionary.readFields(TrieDictionary.java:339)
at
org.apache.kylin.dict.lookup.SnapshotTable.readData(SnapshotTable.java:250)
at org.apache.kylin.dict.lookup.SnapshotTableSerializer.
deserialize(SnapshotTableSerializer.java:74)
at org.apache.kylin.dict.lookup.SnapshotTableSerializer.
deserialize(SnapshotTableSerializer.java:48)
at
org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.
java:171)
at
org.apache.kylin.dict.lookup.SnapshotManager.load(SnapshotManager.java:196)
at
org.apache.kylin.dict.lookup.SnapshotManager.checkDupByContent(SnapshotManag
er.java:178)
at
org.apache.kylin.dict.lookup.SnapshotManager.trySaveNewSnapshot(SnapshotMana
ger.java:140)
at
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.j
ava:120)
at
org.apache.kylin.cube.CubeManager$DictionaryAssist.buildSnapshotTable(CubeMa
nager.java:1055)
at
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:971)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGe
neratorCLI.java:87)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGe
neratorCLI.java:49)
at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.
run(CreateDictionaryJob.java:71)
at
org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:97)
at
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellEx
ecutable.java:63)

Hybrid Cubes Document

2018-04-19 Thread roberto.tardio
Hi,

 

Last days I was doing some researching about the use of hybrid cubes
(model). However, I just found this document
http://kylin.apache.org/blog/2015/09/25/hybrid-model/ published on Sep 25,
2015. Due to this fact, I wrote a little guide that aim to explain its use ,
possible use cases and current limitations. I share a document through the
following link:

 

https://drive.google.com/open?id=1qbvB1iONBcFMFE__SuF0ayq_l1_0vwXN 

 

Please, do not hesitate to correct me if you see something wrong. I have
found this feature very interesting to mitigate the issues related to the
problem of re building the entire Cube if we need to modify its definition.
However, the hybrid model only combines the data from two cubes if a query
uses only the common columns of this two cubes. I have analyzed this
drawback in the document.

 

I appreciate the help of the Kylin community and team, I hope this document
helps.

 

Best Regards,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



Modify an existing cube

2018-04-04 Thread roberto.tardio
Hi,

 

Is it possible to modify a cube already built, such as adding new metrics or
dimensions? Is it possible to rename cube columns? I have not seen anything
about this type of common needs in the current Kylin documentation. I have
tried to add a new metric but Kylin do not allow to save new cube definition
if I have not purged the cube.

 

Otherwise, could it be a good technique to create a new cube with the new
metrics and dimensions and apply the hybrid model to use the new cube when
the SQL query requires it?

 

Thanks in advance,

 

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



Dimension update issue

2018-04-02 Thread roberto.tardio
Hi,

 

I have some questions about how Kylin refresh dimensions tables.  In case
you delete some instances of the dimension on hive table that are not used
by any fact in fact table, These values will be deleted from dimension table
on next cube segment built? We have to refresh old segments?

 

In our case we have added by mistake a lot of empty future days to a Date
dimension (80 years). Then we have built some days of the cube. However, as
this big date dimensions leads to performance issues, we have deleted most
of future years from Date dimension. However, after built a new segment of
the cube, the deleted rows on Hive table they have not been deleted in Kylin
cube table. How can we refresh the dimension table on Kylin properly?

 

Thanks in advance!

 

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



Job list API REST

2018-03-21 Thread roberto.tardio
Hi,

 

I am using Kylin 2.2. In Kylin online docs

I seen a RESTful API to get job list from a Kylin cube in one project. The
syntax of this API is unclear in the documentation, but I can deduce the
following:

 

curl -L --request "GET" --user "ADMIN:K3LEN"
http://ec2-342-233-999-25.ee-west-1.compute.amazonaws.com:7070/kylin/api/job
s/Cube_Name/Project/0/100/100/0

 

However this command does not provide any result. ¿ Could someone explain
the correct syntax ?

 

I tried the following command to get job status by providing job id with
successful. However I need to get all job list and statuses.

 

curl -L --request "GET" --user "ADMIN: K3LEN "
http://ec2-342-233-999-25.ee-west-1.compute.amazonaws.com:7070/kylin/api/job
s/
 2f9d3bc9-56fd-44f1-8fca-6a6257c34343 

Best Regards!

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com 

 



YARN Queues and Kylin

2018-02-23 Thread roberto.tardio
Hi,

 

There is any way to configure one cube to be built using a chosen YARN
queue?

 

The only way I see is to add the following property on
kylin_job_conf_inmem.xml  and kylin_job_conf.xml, but this affects all cube
and not just one, as I need.

 



mapreduce.job.queuename

Kylin

  



 

Thanks in advance,

Roberto Tardío Olmos

Senior Big Data & Business Intelligence Consultant

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com