[jira] [Commented] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete

2020-03-02 Thread Vsevolod Ostapenko (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17049369#comment-17049369
 ] 

Vsevolod Ostapenko commented on KYLIN-4341:
---

@[~wangrupeng] 

I have to respectfully disagree with this naive explanation of this not being a 
bug.
I did not request segment merging in the cube configuration, therefore files 
are expected to be removed.

Removing files manually is not a constructive proposition. Any manual 
management of intermediate files is an operational burden. Plus, it's not 
properly documented.

> by-level cuboid intermediate files are left behind and not cleaned up after 
> job is complete
> ---
>
> Key: KYLIN-4341
> URL: https://issues.apache.org/jira/browse/KYLIN-4341
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v2.6.4
> Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5
>Reporter: Vsevolod Ostapenko
>Assignee: wangrupeng
>Priority: Major
>
> Setup: MR as a cube build engine and by-level cube build strategy (auto 
> picked).
> Upon completion of a cube segment build job a number of intermediate files 
> are still left behind.
> Namely, output of the MR-jobs that produce the base cuboid, subsequent level 
> cuboids, as well as rowkey_stats from the hfile creation step.
> The files in question consume about the same amount of space in HDFS as the 
> final hfile.
> This lead to wasted space in HDFS that is not released for as long as the 
> corresponding cube segment is online. The only point the leaked space is 
> released, is when segment is offlined and cleaned up as part of the segment 
> retention.
> Sample output is as follows.
> {quote}$ hadoop fs -ls -R 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid
> -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS
> -rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-0
> -rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-1
> -rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-2
> -rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-3
> -rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-4
> -rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-5
> -rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-6
> -rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-7
> drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid
> -rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS
> -rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-0
> -rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 
> /user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-1
> -rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 
> 

[jira] [Created] (KYLIN-4350) Pushdown improperly rewrites the query causing it to fail

2020-01-17 Thread Vsevolod Ostapenko (Jira)
Vsevolod Ostapenko created KYLIN-4350:
-

 Summary: Pushdown improperly rewrites the query causing it to fail
 Key: KYLIN-4350
 URL: https://issues.apache.org/jira/browse/KYLIN-4350
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.6.4
 Environment: HDP 2.6.5, Kylin 2.6.4, CentOS 7.6
Reporter: Vsevolod Ostapenko


A query that uses WITH clause and is subject for pushdown to Hive (or Impala) 
for execution is incorrectly rewritten before being submitted to the execution 
engine. Table aliases are attributed with database name, with makes query 
invalid.

Sample log excerpts are below:

 
{quote}2020-01-17 12:12:21,997 INFO [Query 
e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:404 : The 
original query: with
t as
(
SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID "ZETTICSDW_A_VL_HOURLY_V_IMSIID",
 ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID 
"ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID",
 count(*) cnt
FROM ZETTICSDW.A_VL_HOURLY_V
WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117')
 AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10')
 AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10')))
GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, 
ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID
)
select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID,
 count(*) "vl_aggs_model___CD_MEDIA_GAP_CALL_ID"
*from t*
group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID
ORDER BY "vl_aggs_model___CD_MEDIA_GAP_CALL_ID" desc
LIMIT 500



2020-01-17 12:12:22,073 INFO [Query e844b846-c589-4729-5a04-483f6d73c834-31163] 
adhocquery.AbstractPushdownRunner:37 : the query is converted to with
t as
(
SELECT ZETTICSDW.A_VL_HOURLY_V.IMSIID `ZETTICSDW_A_VL_HOURLY_V_IMSIID`,
 ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID 
`ZETTICSDW_A_VL_HOURLY_V_MEDIA_GAP_CALL_ID`,
 count(*) cnt
FROM ZETTICSDW.A_VL_HOURLY_V
WHERE ((ZETTICSDW.A_VL_HOURLY_V.THEDATE = '20200117')
 AND ((ZETTICSDW.A_VL_HOURLY_V.THEHOUR >= '10')
 AND (ZETTICSDW.A_VL_HOURLY_V.THEHOUR <= '10')))
GROUP BY ZETTICSDW.A_VL_HOURLY_V.IMSIID, 
ZETTICSDW.A_VL_HOURLY_V.MEDIA_GAP_CALL_ID
)
select t.ZETTICSDW_A_VL_HOURLY_V_IMSIID,
 count(*) `vl_aggs_model___CD_MEDIA_GAP_CALL_ID`
*{color:#FF}from ZETTICSDW.t{color}*
group by t.ZETTICSDW_A_VL_HOURLY_V_IMSIID
ORDER BY `vl_aggs_model___CD_MEDIA_GAP_CALL_ID` desc
LIMIT 500 after applying converter 
org.apache.kylin.source.adhocquery.HivePushDownConverter
2020-01-17 12:12:22,108 ERROR [Query 
e844b846-c589-4729-5a04-483f6d73c834-31163] service.QueryService:989 : pushdown 
engine failed current query too
org.apache.hive.service.cli.HiveSQLException: AnalysisException: Could not 
resolve table reference: '*zetticsdw.t*'
{quote}
Pushdown query should be submitted into query engine as written by the user.
 As the best effort Kylin push down executor should issue "use " over 
the same JDBC connection right before submitting the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KYLIN-4341) by-level cuboid intermediate files are left behind and not cleaned up after job is complete

2020-01-15 Thread Vsevolod Ostapenko (Jira)
Vsevolod Ostapenko created KYLIN-4341:
-

 Summary: by-level cuboid intermediate files are left behind and 
not cleaned up after job is complete
 Key: KYLIN-4341
 URL: https://issues.apache.org/jira/browse/KYLIN-4341
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v2.6.4
 Environment: Kylin 2.6.4, CenOS 7.6, HDP 2.6.5
Reporter: Vsevolod Ostapenko


Setup: MR as a cube build engine and by-level cube build strategy (auto picked).
Upon completion of a cube segment build job a number of intermediate files are 
still left behind.
Namely, output of the MR-jobs that produce the base cuboid, subsequent level 
cuboids, as well as rowkey_stats from the hfile creation step.
The files in question consume about the same amount of space in HDFS as the 
final hfile.
This lead to wasted space in HDFS that is not released for as long as the 
corresponding cube segment is online. The only point the leaked space is 
released, is when segment is offlined and cleaned up as part of the segment 
retention.

Sample output is as follows.
{quote}$ hadoop fs -ls -R 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:44 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:26 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:36 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid
-rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:36 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/_SUCCESS
-rw-r--r-- 2 kylin hdfs 51570048 2020-01-07 04:35 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-0
-rw-r--r-- 2 kylin hdfs 51477377 2020-01-07 04:36 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-1
-rw-r--r-- 2 kylin hdfs 51615162 2020-01-07 04:35 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-2
-rw-r--r-- 2 kylin hdfs 51591031 2020-01-07 04:36 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-3
-rw-r--r-- 2 kylin hdfs 51648914 2020-01-07 04:35 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-4
-rw-r--r-- 2 kylin hdfs 51532761 2020-01-07 04:36 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-5
-rw-r--r-- 2 kylin hdfs 51455652 2020-01-07 04:35 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-6
-rw-r--r-- 2 kylin hdfs 51552752 2020-01-07 04:36 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_1_cuboid/part-r-7
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:25 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid
-rw-r--r-- 2 kylin hdfs 0 2020-01-07 04:25 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/_SUCCESS
-rw-r--r-- 2 kylin hdfs 16293012 2020-01-07 04:25 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-0
-rw-r--r-- 2 kylin hdfs 16283730 2020-01-07 04:25 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-1
-rw-r--r-- 2 kylin hdfs 16288965 2020-01-07 04:25 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-2
-rw-r--r-- 2 kylin hdfs 16270572 2020-01-07 04:25 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/cuboid/level_base_cuboid/part-r-3
drwxr-xr-x - kylin hdfs 0 2020-01-07 04:23 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/rowkey_stats
-rw-r--r-- 3 kylin hdfs 155 2020-01-07 04:23 
/user/kylin/26x/test_kylin_26x_metadata/kylin-d2cff21e-9f59-6ac8-df34-656a3479433f/vl_aggs_cube_89/rowkey_stats/part-r-0_hfile
{quote}
 

Removing the job metadata using (metastore.sh clean --jobThreshold Ndays) does 
not help. Information about the job is removed, but no 

[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-10-04 Thread Vsevolod Ostapenko (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944773#comment-16944773
 ] 

Vsevolod Ostapenko commented on KYLIN-3628:
---

On the subject of checking if lookup table is snapshotted as part of the cube.
CubeDesc class already has a method findDimensionByTable(String 
lookupTableName). So, CubeManager.checkContainsSnapshotTable() method can be 
replaced with one call to the findDimensionByTable (which will reduce code 
duplication and fix the regression introduced by the prior version of the fix). 
Still, try/catch block in findLatestSnapshot() needs to be removed.

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v3.0.0-alpha2, v2.6.4
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-09-26 Thread Vsevolod Ostapenko (Jira)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko reopened KYLIN-3628:
---

The most recent change will silently swallow legit exceptions in the 
findLatestSnapshot and may result in an incorrect cube instance to be returned, 
effectively re-introducing the original bug for cases when there is an 
exception.

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v3.0.0-alpha2, v2.6.4
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-09-26 Thread Vsevolod Ostapenko (Jira)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938797#comment-16938797
 ] 

Vsevolod Ostapenko commented on KYLIN-3628:
---

[~hit_lacus]

The latest change effectively reintroduces the issue that was supposed to be 
fixed by the prior series of changes.
I don't see a good reason for doing try/catch and silently swallowing _any_ 
exception and then returning the current cube, even when it does not contain 
lookup table snapshot.

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v3.0.0-alpha2, v2.6.4
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KYLIN-4107) StorageCleanupJob fails to delete Hive tables with "Argument list too long" error

2019-07-29 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895586#comment-16895586
 ] 

Vsevolod Ostapenko commented on KYLIN-4107:
---

[~codingforfun]

I see that in your code fix you are batching Hive drop table commands to avoid 
bash command line becoming too long. That should work, yet, instead of using a 
hard-coded value of 20 drop statements per batch, it would be a bit cleaner to 
add a config parameter with 20 being the default value.

My 2 cents,
Vsevolod.

> StorageCleanupJob fails to delete Hive tables with "Argument list too long" 
> error
> -
>
> Key: KYLIN-4107
> URL: https://issues.apache.org/jira/browse/KYLIN-4107
> Project: Kylin
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: v2.6.2
> Environment: CentOS 7.6, HDP 2.6.5, Kylin 2.6.3
>Reporter: Vsevolod Ostapenko
>Assignee: weibin0516
>Priority: Major
> Fix For: v3.0.0-beta
>
>
> On a system with multiple Kylin developers that experiment with cube design 
> and (re)build/drop cube segments often intermediate Hive tables and HBase 
> left over tables accumulate very quickly.
> After a certain point storage cleanup cannot be executed using suggested 
> method:
> {{${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete 
> true}}
> Apparently, storage cleanup job creates a single shell command to drop all 
> Hive tables, which fails to execute because command line is just too long. 
> For example:
> {quote}
> 2019-07-23 17:47:31,611 ERROR [main] job.StorageCleanupJob:377 : Error during 
> deleting Hive tables
> java.io.IOException: Cannot run program "/bin/bash": error=7, Argument list 
> too long
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>  at 
> org.apache.kylin.common.util.CliCommandExecutor.runNativeCommand(CliCommandExecutor.java:133)
>  at 
> org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:89)
>  at 
> org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:83)
>  at 
> org.apache.kylin.rest.job.StorageCleanupJob.deleteHiveTables(StorageCleanupJob.java:409)
>  at 
> org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTableInternal(StorageCleanupJob.java:375)
>  at 
> org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:278)
>  at 
> org.apache.kylin.rest.job.StorageCleanupJob.cleanup(StorageCleanupJob.java:151)
>  at 
> org.apache.kylin.rest.job.StorageCleanupJob.execute(StorageCleanupJob.java:145)
>  at 
> org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
>  at org.apache.kylin.tool.StorageCleanupJob.main(StorageCleanupJob.java:27)
> Caused by: java.io.IOException: error=7, Argument list too long
>  at java.lang.UNIXProcess.forkAndExec(Native Method)
>  at java.lang.UNIXProcess.(UNIXProcess.java:247)
>  at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>  at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>  ... 10 more 
> {quote}
> Instead of composing one long command, storage cleanup need to generate a 
> script and feed that into beeline or hive CLI.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KYLIN-4107) StorageCleanupJob fails to delete Hive tables with "Argument list too long" error

2019-07-23 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-4107:
-

 Summary: StorageCleanupJob fails to delete Hive tables with 
"Argument list too long" error
 Key: KYLIN-4107
 URL: https://issues.apache.org/jira/browse/KYLIN-4107
 Project: Kylin
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: v2.6.2
 Environment: CentOS 7.6, HDP 2.6.5, Kylin 2.6.3
Reporter: Vsevolod Ostapenko


On a system with multiple Kylin developers that experiment with cube design and 
(re)build/drop cube segments often intermediate Hive tables and HBase left over 
tables accumulate very quickly.

After a certain point storage cleanup cannot be executed using suggested method:
{{${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.StorageCleanupJob --delete 
true}}

Apparently, storage cleanup job creates a single shell command to drop all Hive 
tables, which fails to execute because command line is just too long. For 
example:
{quote}
2019-07-23 17:47:31,611 ERROR [main] job.StorageCleanupJob:377 : Error during 
deleting Hive tables
java.io.IOException: Cannot run program "/bin/bash": error=7, Argument list too 
long
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
 at 
org.apache.kylin.common.util.CliCommandExecutor.runNativeCommand(CliCommandExecutor.java:133)
 at 
org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:89)
 at 
org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:83)
 at 
org.apache.kylin.rest.job.StorageCleanupJob.deleteHiveTables(StorageCleanupJob.java:409)
 at 
org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTableInternal(StorageCleanupJob.java:375)
 at 
org.apache.kylin.rest.job.StorageCleanupJob.cleanUnusedIntermediateHiveTable(StorageCleanupJob.java:278)
 at 
org.apache.kylin.rest.job.StorageCleanupJob.cleanup(StorageCleanupJob.java:151)
 at 
org.apache.kylin.rest.job.StorageCleanupJob.execute(StorageCleanupJob.java:145)
 at 
org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
 at org.apache.kylin.tool.StorageCleanupJob.main(StorageCleanupJob.java:27)
Caused by: java.io.IOException: error=7, Argument list too long
 at java.lang.UNIXProcess.forkAndExec(Native Method)
 at java.lang.UNIXProcess.(UNIXProcess.java:247)
 at java.lang.ProcessImpl.start(ProcessImpl.java:134)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
 ... 10 more 
{quote}
Instead of composing one long command, storage cleanup need to generate a 
script and feed that into beeline or hive CLI.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-07-02 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877286#comment-16877286
 ] 

Vsevolod Ostapenko commented on KYLIN-3628:
---

[~hit_lacus], I build local 2.6.2 with your changes applied on top and 
"derived" column check removed.
It seems to work as intended.

One suggestion though, debug message about overriding cube selection in the 
findLatestSnapshot() should be logged only when cube selection was really 
overridden. For example:


{code:java}
if (!cube.equals(cubeInstance)) {
  logger.debug("Picked cube {} over {} as it provides a more recent snapshot of 
the lookup table {}", cube, cubeInstance, lookupTableName);
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-07-02 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876282#comment-16876282
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 7/2/19 7:50 PM:
---

Hi XiaoXiang, thank you for a quick turnaround with a fix.
 I looked at the code change, and the following check seems too restrictive: 
{code:java}
if (dimensionDesc.isDerived() && 
dimensionDesc.getTable().equalsIgnoreCase(lookupTbl)) {
{code}
Per my internal tests, the entire lookup table is being snapshotted as part of 
a cube, if any dimension (non-derived or otherwise) is supplied by that table. 
So, dimension doesn't have to be derived.
 In fact, in my test cubes there is no derived dimensions at all (all "derived" 
properties for all the lookup tables are set to null), it does not prevent 
executing "select * from lookupTable" against such cube on a non-patched 2.6.2 
system.

To summarize, I believe that the "dimensionDesc.isDerived()" call should be 
removed from the expression above.

 

Edit: I have to stay corrected, for the lookup table only the columns that are 
normal or derived dimensions are being snapshotted (not the entire table). 
Still, it doesn't change the stance that the "derived" check is not needed in 
the expression above.


was (Author: seva_ostapenko):
Hi XiaoXiang, thank you for a quick turnaround with a fix.
I looked at the code change, and the following check seems too restrictive: 
{code:java}
if (dimensionDesc.isDerived() && 
dimensionDesc.getTable().equalsIgnoreCase(lookupTbl)) {
{code}
Per my internal tests, the entire lookup table is being snapshotted as part of 
a cube, if any dimension (non-derived or otherwise) is supplied by that table. 
So, dimension doesn't have to be derived.
In fact, in my test cubes there is no derived dimensions at all (all "derived" 
properties for all the lookup tables are set to null), it does not prevent 
executing "select * from lookupTable" against such cube on a non-patched 2.6.2 
system.

To summarize, I believe that the "dimensionDesc.isDerived()" call should be 
removed from the expression above.

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-07-01 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876282#comment-16876282
 ] 

Vsevolod Ostapenko commented on KYLIN-3628:
---

Hi XiaoXiang, thank you for a quick turnaround with a fix.
I looked at the code change, and the following check seems too restrictive: 
{code:java}
if (dimensionDesc.isDerived() && 
dimensionDesc.getTable().equalsIgnoreCase(lookupTbl)) {
{code}
Per my internal tests, the entire lookup table is being snapshotted as part of 
a cube, if any dimension (non-derived or otherwise) is supplied by that table. 
So, dimension doesn't have to be derived.
In fact, in my test cubes there is no derived dimensions at all (all "derived" 
properties for all the lookup tables are set to null), it does not prevent 
executing "select * from lookupTable" against such cube on a non-patched 2.6.2 
system.

To summarize, I believe that the "dimensionDesc.isDerived()" call should be 
removed from the expression above.

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:32 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

To address both the original issue (where lookup table snapshoted in multiple 
cubes and suitable cube is picked without looking at the segment build times) 
and the regression introduced by the change, CubeManager.findLatestSnapshot 
needs to check if lookup table is actually snapshotted as part of the cube 
realization. So, if there are mix of multiple cubes that do capture lookup 
table and ones that don't only the ones that do capture lookup table are ranked 
by build time.

Affected file is CubeManager.java. The bug is in this check 
{code:java}
if (realization.getModel().isLookupTable(lookupTableName)) {
{code}
getModel.isLookupTable() operates on the model level and across all the cubes, 
while the check needs to be scoped to the current cube only.

 


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

To address both the original issue (where lookup table snapshoted in multiple 
cubes and suitable cube is picked without looking at the segment build times) 
and the regression introduced by the change, CubeManager.findLatestSnapshot 
needs to check if lookup table is actually snapshotted as part of the cube 
realization. So, if there are mix of multiple cubes that do capture lookup 
table and ones that don't only the ones that do capture lookup table are ranked 
by build time.

Affected file is CubeManager.java. The bug is in this check 
{code:java}
if (realization.getModel().isLookupTable(lookupTableName)) {
{code}
getModel operates across all cubes, which the check needs to be scoped to the 
current cube only.

 

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 7:23 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

To address both the original issue (where lookup table snapshoted in multiple 
cubes and suitable cube is picked without looking at the segment build times) 
and the regression introduced by the change, CubeManager.findLatestSnapshot 
needs to check if lookup table is actually snapshotted as part of the cube 
realization. So, if there are mix of multiple cubes that do capture lookup 
table and ones that don't only the ones that do capture lookup table are ranked 
by build time.

Affected file is CubeManager.java. The bug is in this check 
{code:java}
if (realization.getModel().isLookupTable(lookupTableName)) {
{code}
getModel operates across all cubes, which the check needs to be scoped to the 
current cube only.

 


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For example, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
 A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874997#comment-16874997
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3628 at 6/28/19 3:32 PM:


This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
 For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
A query "select * from L1" will fail with an error stating that C2 does not 
contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
 That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}


was (Author: seva_ostapenko):
This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
Query select * from L1 will fail with error that C2 does not contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KYLIN-3628) Query with lookup table always use latest snapshot

2019-06-28 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko reopened KYLIN-3628:
---

This code change introduces a nasty bug, where Kylin will pick a random cube to 
answer the query that goes against a lookup table.
For exapmple, model X has two cubes C1 and C2. C1 is using lookup table L1 and 
C2 does not. C2 has more recent segments built than C1.
Query select * from L1 will fail with error that C2 does not contain L1.

Code analysis indicates, that LookupTableEnumerator overwrites prior cube 
choice correctly made by RealizationChooser. The bug is that 
LookupTableEnumerator finds the latest snapshot on all the realizations of all 
the cubes in the model, not the one that was already correctly chosen.
That leads to a random behavior and unexpected failures.

Affected code is in LookupTableEnumerator.java.
{code:java}
if (olapContext.realization instanceof CubeInstance) {
cube = (CubeInstance) olapContext.realization;
ProjectInstance project = cube.getProjectInstance();
List realizationEntries = project.getRealizationEntries();
String lookupTableName = olapContext.firstTableScan.getTableName();
CubeManager cubeMgr = CubeManager.getInstance(cube.getConfig());
cube = cubeMgr.findLatestSnapshot(realizationEntries, lookupTableName);
olapContext.realization = cube;
}
{code}

> Query with lookup table always use latest snapshot
> --
>
> Key: KYLIN-3628
> URL: https://issues.apache.org/jira/browse/KYLIN-3628
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Na Zhai
>Assignee: Na Zhai
>Priority: Major
> Fix For: v2.6.0
>
>
> If user queries a lookup table, Kylin will randomly selects a Cube (which has 
> the snapshot of this lookup table) to answer it. This causes uncertainty when 
> there are multiple cubes (share the same lookup): some cubes are newly built, 
> some not. If Kylin picks an old cube, the query result is old.
> To remove this uncertainty, for such queries, either always use latest 
> snapshot, or use earlist snapshot. We believe the "latest" version is better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3842) kylinProperties.js Unable to get the public configuration of the first line in the front end

2019-04-23 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824564#comment-16824564
 ] 

Vsevolod Ostapenko commented on KYLIN-3842:
---

I created a patch (attached) that should address both the original concern and 
the regression introduced by the prior bug fix attempt.
Please review, and either comment or approve.

> kylinProperties.js Unable to get the public configuration of the first line 
> in the front end
> 
>
> Key: KYLIN-3842
> URL: https://issues.apache.org/jira/browse/KYLIN-3842
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.5.2
>Reporter: Yuzhang QIU
>Assignee: Yuzhang QIU
>Priority: Minor
> Fix For: v2.6.2
>
> Attachments: KYLIN-3842.master.001.patch
>
>
> Hi dear team:
>   I'm developing OLAP Platform based on Kylin2.5.2. During my work, I found 
> that kylinProperties.js:37(getProperty(name)) can't get the property of the 
> first line in the '_config' which initialized through /admin/public_config. 
>   For example, the public config is 
> 'kylin.restclient.connection.default-max-per-route=20\nkylin.restclient.connection.max-total=200\nkylin.engine.default=2\nkylin.storage.default=2\n
> kylin.web.hive-limit=20\nkylin.web.help.length=4\n'.  I expected to get 20 
> but got '' when I want to get config by key 
> 'kylin.restclient.connection.default-max-per-route'. This problem caused by 
> 'var keyIndex = _config.indexOf('\n' + name + '=');'(at 
> kylinProperties.js:37) return -1 for those names before which don't have an 
> \n(at the first line).
>   Then, I debug the AdminService.java, KylinConfig.java and found that the  
> KylinConfig.java:517(around this line, in method 
> exportToString(Collection propertyKeys)) build the public config 
> string with a char '\n' after each property, which cause the first property 
> don't has '\n' before it.
>   Those are what I found, which will cause problem for developers.
>   How do you think? 
> Best regard
>  yuzhang



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3842) kylinProperties.js Unable to get the public configuration of the first line in the front end

2019-04-23 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3842:
--
Attachment: KYLIN-3842.master.001.patch

> kylinProperties.js Unable to get the public configuration of the first line 
> in the front end
> 
>
> Key: KYLIN-3842
> URL: https://issues.apache.org/jira/browse/KYLIN-3842
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.5.2
>Reporter: Yuzhang QIU
>Assignee: Yuzhang QIU
>Priority: Minor
> Fix For: v2.6.2
>
> Attachments: KYLIN-3842.master.001.patch
>
>
> Hi dear team:
>   I'm developing OLAP Platform based on Kylin2.5.2. During my work, I found 
> that kylinProperties.js:37(getProperty(name)) can't get the property of the 
> first line in the '_config' which initialized through /admin/public_config. 
>   For example, the public config is 
> 'kylin.restclient.connection.default-max-per-route=20\nkylin.restclient.connection.max-total=200\nkylin.engine.default=2\nkylin.storage.default=2\n
> kylin.web.hive-limit=20\nkylin.web.help.length=4\n'.  I expected to get 20 
> but got '' when I want to get config by key 
> 'kylin.restclient.connection.default-max-per-route'. This problem caused by 
> 'var keyIndex = _config.indexOf('\n' + name + '=');'(at 
> kylinProperties.js:37) return -1 for those names before which don't have an 
> \n(at the first line).
>   Then, I debug the AdminService.java, KylinConfig.java and found that the  
> KylinConfig.java:517(around this line, in method 
> exportToString(Collection propertyKeys)) build the public config 
> string with a char '\n' after each property, which cause the first property 
> don't has '\n' before it.
>   Those are what I found, which will cause problem for developers.
>   How do you think? 
> Best regard
>  yuzhang



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KYLIN-3842) kylinProperties.js Unable to get the public configuration of the first line in the front end

2019-04-09 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko reopened KYLIN-3842:
---

_config is one long string that is being searched using simple indexOf (instead 
of regex).
The recent changes introduce regression where partial matches will be falsely 
picked up.

For example, while searching for property XYZ the following case, incorrect 
property assignment will be picked:
{quote}{{# XYZ=foo}}
abcXYZ=bar
XYZ=expected_value{quote}
A trivial fix for the issue with the very first property in the property file 
that doesn't start with a comment is to prepend "\n" to _config upon 
initialization, if the first character of _config is not "\n".

 

> kylinProperties.js Unable to get the public configuration of the first line 
> in the front end
> 
>
> Key: KYLIN-3842
> URL: https://issues.apache.org/jira/browse/KYLIN-3842
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.5.2
>Reporter: Yuzhang QIU
>Assignee: Yuzhang QIU
>Priority: Minor
> Fix For: v2.6.2
>
>
> Hi dear team:
>   I'm developing OLAP Platform based on Kylin2.5.2. During my work, I found 
> that kylinProperties.js:37(getProperty(name)) can't get the property of the 
> first line in the '_config' which initialized through /admin/public_config. 
>   For example, the public config is 
> 'kylin.restclient.connection.default-max-per-route=20\nkylin.restclient.connection.max-total=200\nkylin.engine.default=2\nkylin.storage.default=2\n
> kylin.web.hive-limit=20\nkylin.web.help.length=4\n'.  I expected to get 20 
> but got '' when I want to get config by key 
> 'kylin.restclient.connection.default-max-per-route'. This problem caused by 
> 'var keyIndex = _config.indexOf('\n' + name + '=');'(at 
> kylinProperties.js:37) return -1 for those names before which don't have an 
> \n(at the first line).
>   Then, I debug the AdminService.java, KylinConfig.java and found that the  
> KylinConfig.java:517(around this line, in method 
> exportToString(Collection propertyKeys)) build the public config 
> string with a char '\n' after each property, which cause the first property 
> don't has '\n' before it.
>   Those are what I found, which will cause problem for developers.
>   How do you think? 
> Best regard
>  yuzhang



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3322) TopN requires a SUM to work

2019-02-22 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775591#comment-16775591
 ] 

Vsevolod Ostapenko commented on KYLIN-3322:
---

It's also reported as KYLIN-3687. It's good to have documentation updated, but 
it's better to prevent creation of the incomplete TopN cube definitions through 
the UI and via API calls.

> TopN requires a SUM to work
> ---
>
> Key: KYLIN-3322
> URL: https://issues.apache.org/jira/browse/KYLIN-3322
> Project: Kylin
>  Issue Type: Bug
>  Components: Measure - TopN
>Reporter: liyang
>Assignee: Na Zhai
>Priority: Major
>
> Currently if user creates a measure of TopN seller by sum of price, it is 
> required that user also creates a measure of SUM(price). Otherwise, NPE will 
> be thrown at query time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch

2018-12-04 Thread Vsevolod Ostapenko (JIRA)


[ 
https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709162#comment-16709162
 ] 

Vsevolod Ostapenko commented on KYLIN-3686:
---

Hi Chao, 
the "kylin.storage.default" parameter is not set in kylin.properties in our 
environment, so it does default ti ID_HBASE, as I understand.
As far as I can see, the fix for KYLIN-3636 changes only cube defaults to be 
ID_SHARDED_HBASE.
However, it does not address the misalignment and lack of safety checks between 
cube storage type and implied pre-requisites of Top_N metric. It's still 
possible to load cube with ID_HBASE from JSON and define Topn_N metric and get 
failing cube build with no clear explanations for the failure reasons.

Thanks,
Vsevolod.

> Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the 
> Web UI defaults to ID_HBASE and provides no safeguards against storage type 
> mismatch
> -
>
> Key: KYLIN-3686
> URL: https://issues.apache.org/jira/browse/KYLIN-3686
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN, Metadata, Web 
>Affects Versions: v2.5.0
> Environment: HDP 2.5.6, Kylin 2.5
>Reporter: Vsevolod Ostapenko
>Assignee: Shaofeng SHI
>Priority: Major
> Fix For: v2.6.0
>
>
> When new cube is defined via Kylin 2.5 UI, the default cube storage type is 
> set to 0 (ID_HBASE).
>  Top_N metric support is currently hard coded to expect cube storage type 2 
> (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
> "sharded HBASE".
>  UI provides no safeguards either to prevent a user from defining a cube with 
> Top_N metric that would blow up on the cube building stage with a perplexing 
> stack trace like the following:
> {quote}2018-10-22 16:15:50,388 ERROR [main] 
> org.apache.kylin.engine.mr.KylinMapper:
>  java.lang.ArrayIndexOutOfBoundsException
>  at java.lang.System.arraycopy(Native Method)
>  at 
> org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
>  at 
> org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
>  at 
> org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
>  at 
> org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
>  at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> {quote}
> Please, either:
> – modify Top_N code to support all cube storage types (not only 
> ID_SHARDED_HBASE),
>  or 
>  – modify Top_N code to perform explicit check for cube storage type and 
> raise descriptive exception, when cube storage is not the one that is 
> expected. Plus update the UI to prevent the user from creating cube 
> definitions that are incompatible with the storage type compatible with Top_N 
> measure
> PS: NDCCuboidBuilder,java contains the following line:
> {quote}int offset = RowConstants.ROWKEY_SHARDID_LEN + 
> RowConstants.ROWKEY_CUBOIDID_LEN; // skip shard and cuboidId{quote}
> If cube storage type is not ID_SHARDED_HBASE, offset is calculated 
> incorrectly, which leads to ArrayIndexOutOfBounds exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch

2018-11-13 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3686:
--
Description: 
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
 java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
Please, either:

– modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
 – modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure

PS: NDCCuboidBuilder,java contains the following line:
{quote}int offset = RowConstants.ROWKEY_SHARDID_LEN + 
RowConstants.ROWKEY_CUBOIDID_LEN; // skip shard and cuboidId{quote}
If cube storage type is not ID_SHARDED_HBASE, offset is calculated incorrectly, 
which leads to ArrayIndexOutOfBounds exception.

  was:
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
 java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
Please, either:

-- modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
-- modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure


> Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the 
> Web UI defaults to ID_HBASE and provides no safeguards against storage type 
> mismatch
> -
>
> Key: KYLIN-3686
> URL: https://issues.apache.org/jira/browse/KYLIN-3686
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN, Metadata, Web 
>Affects Versions: v2.5.0
> Environment: HDP 2.5.6, Kylin 2.5
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> When new cube is defined via Kylin 2.5 UI, the default cube storage type is 
> set to 0 (ID_HBASE).
>  Top_N metric support is currently hard coded to expect cube storage type 2 
> (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
> "sharded HBASE".
>  UI provides no safeguards either to prevent a user from defining a cube with 
> Top_N metric that would blow up on the cube building stage with a perplexing 
> stack trace like the following:
> {quote}2018-10-22 16:15:50,388 ERROR [main] 
> org.apache.kylin.engine.mr.KylinMapper:
>  java.lang.ArrayIndexOutOfBoundsException
>  at java.lang.System.arraycopy(Native Method)
>  at 
> 

[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch

2018-11-13 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3686:
--
Description: 
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
 java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
Please, either:

-- modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
-- modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure

  was:
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
 java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
 

Please, either:

** modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
 **modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure


> Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the 
> Web UI defaults to ID_HBASE and provides no safeguards against storage type 
> mismatch
> -
>
> Key: KYLIN-3686
> URL: https://issues.apache.org/jira/browse/KYLIN-3686
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN, Metadata, Web 
>Affects Versions: v2.5.0
> Environment: HDP 2.5.6, Kylin 2.5
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> When new cube is defined via Kylin 2.5 UI, the default cube storage type is 
> set to 0 (ID_HBASE).
>  Top_N metric support is currently hard coded to expect cube storage type 2 
> (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
> "sharded HBASE".
>  UI provides no safeguards either to prevent a user from defining a cube with 
> Top_N metric that would blow up on the cube building stage with a perplexing 
> stack trace like the following:
> {quote}2018-10-22 16:15:50,388 ERROR [main] 
> org.apache.kylin.engine.mr.KylinMapper:
>  java.lang.ArrayIndexOutOfBoundsException
>  at java.lang.System.arraycopy(Native Method)
>  at 
> org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
>  at 
> org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
>  at 
> org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
>  at 
> 

[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch

2018-11-13 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3686:
--
Description: 
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
 java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
 

Please, either:

** modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
 **modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure

  was:
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
Please, either
 * 
 ** modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
 **modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure


> Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the 
> Web UI defaults to ID_HBASE and provides no safeguards against storage type 
> mismatch
> -
>
> Key: KYLIN-3686
> URL: https://issues.apache.org/jira/browse/KYLIN-3686
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN, Metadata, Web 
>Affects Versions: v2.5.0
> Environment: HDP 2.5.6, Kylin 2.5
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> When new cube is defined via Kylin 2.5 UI, the default cube storage type is 
> set to 0 (ID_HBASE).
>  Top_N metric support is currently hard coded to expect cube storage type 2 
> (ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
> "sharded HBASE".
>  UI provides no safeguards either to prevent a user from defining a cube with 
> Top_N metric that would blow up on the cube building stage with a perplexing 
> stack trace like the following:
> {quote}2018-10-22 16:15:50,388 ERROR [main] 
> org.apache.kylin.engine.mr.KylinMapper:
>  java.lang.ArrayIndexOutOfBoundsException
>  at java.lang.System.arraycopy(Native Method)
>  at 
> org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
>  at 
> org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
>  at 
> org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
>  at 
> 

[jira] [Updated] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch

2018-11-13 Thread Vsevolod Ostapenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/KYLIN-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3686:
--
Description: 
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
 Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
 UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:
{quote}2018-10-22 16:15:50,388 ERROR [main] 
org.apache.kylin.engine.mr.KylinMapper:
java.lang.ArrayIndexOutOfBoundsException
 at java.lang.System.arraycopy(Native Method)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKeyInternal(NDCuboidBuilder.java:106)
 at 
org.apache.kylin.engine.mr.common.NDCuboidBuilder.buildKey(NDCuboidBuilder.java:71)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:112)
 at 
org.apache.kylin.engine.mr.steps.NDCuboidMapper.doMap(NDCuboidMapper.java:47)
 at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
{quote}
Please, either
 * 
 ** modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE),
 or 
 **modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure

  was:
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:


{quote}2018-11-08 08:35:45,413 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.IllegalArgumentException: Can't read 
partitions file at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:701) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) Caused by: 
java.io.IOException: wrong key class: 
org.apache.kylin.storage.hbase.steps.RowKeyWritable is not class 
org.apache.hadoop.hbase.io.ImmutableBytesWritable at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2332) at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2384) at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:306)
 at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
 ... 10 more
{quote}
Please, either 
** modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE), 
or 
**modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure


> Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the 
> Web UI defaults to ID_HBASE and provides no safeguards against storage type 
> mismatch
> -
>
> Key: KYLIN-3686
> URL: https://issues.apache.org/jira/browse/KYLIN-3686
> Project: Kylin
>  Issue Type: Improvement
>  Components: Measure - TopN, Metadata, Web 
>Affects Versions: v2.5.0
> Environment: HDP 2.5.6, Kylin 2.5
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> When new cube is defined via Kylin 2.5 UI, the default cube storage type is 
> set to 0 (ID_HBASE).
>  Top_N metric support 

[jira] [Created] (KYLIN-3687) Top_N measure requires related SUM() measure to be defined as part of the cube to work, but Web UI allows creation of the cube that has Top_N measure only, resulting in N

2018-11-13 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3687:
-

 Summary: Top_N measure requires related SUM() measure to be 
defined as part of the cube to work, but Web UI allows creation of the cube 
that has Top_N measure only, resulting in NPE at query time
 Key: KYLIN-3687
 URL: https://issues.apache.org/jira/browse/KYLIN-3687
 Project: Kylin
  Issue Type: Improvement
  Components: Measure - TopN, Metadata, Web 
Affects Versions: v2.5.0
 Environment: HDP 2.5.6, Kylin 2.5
Reporter: Vsevolod Ostapenko


Web UI allows defining a cube with Top_N measure without defining a related 
SUM() measure. E.g. a variation of the kylin_sales_cube can be successfully 
defined via UI with just TOP_SELLER without actually defining GVM_SUM measure.

Such cube builds just fine, but at the query time an NPE is thrown similar to 
the following:
{quote}Caused by: java.lang.NullPointerException
 at 
org.apache.kylin.query.relnode.OLAPAggregateRel.rewriteAggregateCall(OLAPAggregateRel.java:561)
 at 
org.apache.kylin.query.relnode.OLAPAggregateRel.implementRewrite(OLAPAggregateRel.java:419)
 at 
org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:174)
 at 
org.apache.kylin.query.relnode.OLAPSortRel.implementRewrite(OLAPSortRel.java:86)
 at 
org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:174)
 at 
org.apache.kylin.query.relnode.OLAPLimitRel.implementRewrite(OLAPLimitRel.java:109)
 at 
org.apache.kylin.query.relnode.OLAPRel$RewriteImplementor.visitChild(OLAPRel.java:174)
 at 
org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:100)
 at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:108)
 at 
org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
 at 
org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1281)
 at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:331)
 at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:228)
{quote}
There need to be some checks in the UI and in the Top_N query processing code 
to ensure that all the required measures are defined (as Top_N is actually 
dependent on another measure to function properly) and inform the user that 
Top_N definition is incomplete and cube definition is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3686) Top_N metric code requires cube storage type to be ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no safeguards against storage type mismatch

2018-11-13 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3686:
-

 Summary: Top_N metric code requires cube storage type to be 
ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no 
safeguards against storage type mismatch
 Key: KYLIN-3686
 URL: https://issues.apache.org/jira/browse/KYLIN-3686
 Project: Kylin
  Issue Type: Improvement
  Components: Measure - TopN, Metadata, Web 
Affects Versions: v2.5.0
 Environment: HDP 2.5.6, Kylin 2.5
Reporter: Vsevolod Ostapenko


When new cube is defined via Kylin 2.5 UI, the default cube storage type is set 
to 0 (ID_HBASE).
Top_N metric support is currently hard coded to expect cube storage type 2 
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the 
"sharded HBASE".
UI provides no safeguards either to prevent a user from defining a cube with 
Top_N metric that would blow up on the cube building stage with a perplexing 
stack trace like the following:


{quote}2018-11-08 08:35:45,413 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.IllegalArgumentException: Can't read 
partitions file at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:701) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) Caused by: 
java.io.IOException: wrong key class: 
org.apache.kylin.storage.hbase.steps.RowKeyWritable is not class 
org.apache.hadoop.hbase.io.ImmutableBytesWritable at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2332) at 
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2384) at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:306)
 at 
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
 ... 10 more
{quote}
Please, either 
** modify Top_N code to support all cube storage types (not only 
ID_SHARDED_HBASE), 
or 
**modify Top_N code to perform explicit check for cube storage type and raise 
descriptive exception, when cube storage is not the one that is expected. Plus 
update the UI to prevent the user from creating cube definitions that are 
incompatible with the storage type compatible with Top_N measure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3670) Misspelled constant DEFAUL_JOB_CONF_SUFFIX

2018-11-06 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3670:
-

 Summary: Misspelled constant DEFAUL_JOB_CONF_SUFFIX
 Key: KYLIN-3670
 URL: https://issues.apache.org/jira/browse/KYLIN-3670
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v2.5.0
 Environment: HDP 2.5.6, Kylin 2.5, CentOS 7.2
Reporter: Vsevolod Ostapenko


One of the JobEngineConfig constants is misspelled. 
It's defined as DEFAUL_JOB_CONF_SUFFIX, while it should be 
DEFAUL*T*_JOB_CONF_SUFFIX.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3258) No check for duplicate cube name when creating a hybrid cube

2018-02-16 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3258:
--
Environment: HDP 2.5.6, Kylin 2.2

> No check for duplicate cube name when creating a hybrid cube
> 
>
> Key: KYLIN-3258
> URL: https://issues.apache.org/jira/browse/KYLIN-3258
> Project: Kylin
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Priority: Minor
>
> When loading hybrid cube definitions via REST API, there is no check for 
> duplicate cube names is the list. If due to a user error or incorrectly 
> generated list of cubes by an external application/script the same cube name 
> is listed more than once, new or updated hybrid cube will contain the same 
> cube listed multiple times.
> It does not seem to cause any immediate issues with querying, but it's just 
> not right. REST API should throw and exception, when the same cube name is 
> listed multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3259) When a cube is deleted, remove it from the hybrid cube definition

2018-02-16 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3259:
-

 Summary: When a cube is deleted, remove it from the hybrid cube 
definition
 Key: KYLIN-3259
 URL: https://issues.apache.org/jira/browse/KYLIN-3259
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2
Reporter: Vsevolod Ostapenko


When a cube is deleted, its references are not automatically removed from 
existing hybrid cube definition. That can lead to errors down the road, if user 
(or application) retrieves the list of cubes via REST API call and later tries 
to update the hybrid cube.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3258) No check for duplicate cube name when creating a hybrid cube

2018-02-16 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3258:
-

 Summary: No check for duplicate cube name when creating a hybrid 
cube
 Key: KYLIN-3258
 URL: https://issues.apache.org/jira/browse/KYLIN-3258
 Project: Kylin
  Issue Type: Bug
  Components: Metadata
Affects Versions: v2.2.0
Reporter: Vsevolod Ostapenko


When loading hybrid cube definitions via REST API, there is no check for 
duplicate cube names is the list. If due to a user error or incorrectly 
generated list of cubes by an external application/script the same cube name is 
listed more than once, new or updated hybrid cube will contain the same cube 
listed multiple times.

It does not seem to cause any immediate issues with querying, but it's just not 
right. REST API should throw and exception, when the same cube name is listed 
multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3256) Filter of dates do not work

2018-02-16 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367947#comment-16367947
 ] 

Vsevolod Ostapenko commented on KYLIN-3256:
---

There is already an open bug for error in generated code, see KYLIN-3126

> Filter of dates do not work
> ---
>
> Key: KYLIN-3256
> URL: https://issues.apache.org/jira/browse/KYLIN-3256
> Project: Kylin
>  Issue Type: Bug
>Affects Versions: v2.2.0
>Reporter: Jean-Luc BELLIER
>Priority: Major
>
> Hello,
>  
> I am wondering how to filter date columns with Kylin.
> I am working with the sample cube of the learn_kylin project. I have slightly 
> modified the cube to add a few more columns, but that is all.
> In the advanced section, I put KYLIN_SALES.PART_DT in the 'Rowkeys' section, 
> defined as 'date' type.
>  
> I would like to add a filter like 'WHERE KYLIN_SALES.DT_PART = '2012-06-24'
> but the Kylin interface gives me a mistake : 'error while compiling generated 
> Java code'
> This works fine with hive console.
> I also tried with TO_DATE('2012-06-24').
> Using "WHERE KYLIN_SALES.DT_PART BETWEEN '2012-06-24'  AND '2012-06-25'", it 
> works fine.
>  
> Are  there limitations or internal transformations on the 'date' type in 
> Kylin ?
>  
> Thank you for your help. Have a good day.
>  
> Best regards;
> Jean-Luc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3253) Enabling DEBUG in kylin-server-log4j.properties results in NPE in Calcite layer during query execution

2018-02-12 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3253:
-

 Summary: Enabling DEBUG in kylin-server-log4j.properties results 
in NPE in Calcite layer during query execution
 Key: KYLIN-3253
 URL: https://issues.apache.org/jira/browse/KYLIN-3253
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2
Reporter: Vsevolod Ostapenko


If log4j root logger is set to DEBUG level in the kylin-server-log4j.properties 
attempt to run a query after that results in a failure with an NPE being 
triggered in the calcite layer (see stack trace below).
The issue was fixed in Calcite 1.14 as 
https://issues.apache.org/jira/browse/CALCITE-1859
It's a one line change to 
core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java

Since Kylin is packaging it's own fork of Calcite from 
[http://repository.kyligence.io|http://repository.kyligence.io/], the fix need 
to be ported to 1.13.0-kylin-r-SPANSHOT.jar by someone who has access to 
this forked repo.


{quote}    at 
org.apache.calcite.avatica.Helper.createException(Helper.java:56)

    at 
org.apache.calcite.avatica.Helper.createException(Helper.java:41)

    at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)

    at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)

    at 
org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)

    at 
org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)

    at 
org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)

    at 
org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)

    at 
org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:606)

    at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)

    at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)

    at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)

    at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)

    at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)

    at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)

    at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)

    at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)

    at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)

    at 
org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)

    at javax.servlet.http.HttpServlet.service(HttpServlet.java:650)

    at 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)

    at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)

    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)

    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

    at 
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)

    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)

    at 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)

    at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)

    at 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)

    at 

[jira] [Reopened] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-09 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko reopened KYLIN-3223:
---

Reopened as revised fix version is available.

> Query for the list of hybrid cubes results in NPE
> -
>
> Key: KYLIN-3223
> URL: https://issues.apache.org/jira/browse/KYLIN-3223
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Major
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch, 
> KYLIN-3223.master.001.patch
>
>
> Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
> exception.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
> [http://localhost:7070/kylin/api/hybrids]
>  {quote}
>  
> If a parameter project without a value is specified, call succeeds. E.g.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
> [http://localhost:7070/kylin/api/hybrids?project]
> {quote}
> Quick look at the HybridService.java suggests that there is a bug in the 
> code, where the very first line tries to check ACLs on the project using the 
> project name, which is NULL, when project parameter is not specified as part 
> of the URL.
>  If parameter is specified without a value, ACL check is not performed, so 
> it's another bug, as the list of projects is retrieved without read 
> permission checking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-09 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358628#comment-16358628
 ] 

Vsevolod Ostapenko commented on KYLIN-3223:
---

[~yimingliu], I created a revised version of the fix to use updated ACL 
checking API provided by KYLIN-3239 (Refactor the ACL code about 
checkPermission and hasPermission).
Please review and provide feedback.

> Query for the list of hybrid cubes results in NPE
> -
>
> Key: KYLIN-3223
> URL: https://issues.apache.org/jira/browse/KYLIN-3223
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Major
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch, 
> KYLIN-3223.master.001.patch
>
>
> Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
> exception.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
> [http://localhost:7070/kylin/api/hybrids]
>  {quote}
>  
> If a parameter project without a value is specified, call succeeds. E.g.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
> [http://localhost:7070/kylin/api/hybrids?project]
> {quote}
> Quick look at the HybridService.java suggests that there is a bug in the 
> code, where the very first line tries to check ACLs on the project using the 
> project name, which is NULL, when project parameter is not specified as part 
> of the URL.
>  If parameter is specified without a value, ACL check is not performed, so 
> it's another bug, as the list of projects is retrieved without read 
> permission checking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-09 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3223:
--
Attachment: KYLIN-3223.master.001.patch

> Query for the list of hybrid cubes results in NPE
> -
>
> Key: KYLIN-3223
> URL: https://issues.apache.org/jira/browse/KYLIN-3223
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Major
> Fix For: v2.3.0
>
> Attachments: 
> 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch, 
> KYLIN-3223.master.001.patch
>
>
> Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
> exception.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
> [http://localhost:7070/kylin/api/hybrids]
>  {quote}
>  
> If a parameter project without a value is specified, call succeeds. E.g.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
> [http://localhost:7070/kylin/api/hybrids?project]
> {quote}
> Quick look at the HybridService.java suggests that there is a bug in the 
> code, where the very first line tries to check ACLs on the project using the 
> project name, which is NULL, when project parameter is not specified as part 
> of the URL.
>  If parameter is specified without a value, ACL check is not performed, so 
> it's another bug, as the list of projects is retrieved without read 
> permission checking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3249) Default hybrid cube priority should be the same as of a regular cube

2018-02-08 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3249:
--
Description: 
Hybrid cubes are assigned default priority lower than regular cubes, which 
leads to incorrect selection of a hybrid cube while a regular non-hybridized 
cube with lower cost is available.

For example, model has a wide cube with full set of metrics and narrower cube 
with top-N entries for a subset of metrics.

If wide cube is hybridized (due to a new metric addition), but top-N cube 
remains unchanged and non-hybridized, top-N cube will be no longer queried, 
causing query performance degradation.

The issue can be tracked to the 
query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid 
cubes are assigned priority 0, while regular cubes are assigned priority of 1.
 This unconditional priority assignment is incorrect as it only holds for cases 
when there is only one cube "flavor" in the model or when all the cubes of 
various "flavors" are hybridized at the same time.

Simplest fix is to have hybrid priority to be the same as of a regular cube.
 Plus, as an enhancement to the cube selection algorithm a new rule can be 
implemented that will filter out regular candidate cubes that are included into 
candidate hybrid cubes.

  was:
Hybrid cubes are assigned default priority lower than regular cubes, which 
leads to incorrect selection of a hybrid cube while a regular non-hybridized 
cube with lower cost is available.

For example, model has a wide cube with full set of metrics and narrower cube 
with top-N entries for a subset of metrics.

If wide cube is hybridized (due to a new metric addition), but top-N cube 
remains unchanged and non-hybridized, top-N cube will be no longer queried, 
causing query performance degradation.

The issue can be tracked to the 
query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid 
cubes are assigned priority 0, while regular cubes are assigned priority of 1.
 This unconditional priority assignment is incorrect as it only holds for cases 
when there is only one cube "type" in the model or when all the cubes are 
hybridized at the same time.

Simplest fix is to have hybrid priority to be the same as of a regular cube.
 Plus, as an enhancement to the cube selection algorithm a new rule can be 
implemented that will filter out regular candidate cubes that are included into 
candidate hybrid cubes.


> Default hybrid cube priority should be the same as of a regular cube
> 
>
> Key: KYLIN-3249
> URL: https://issues.apache.org/jira/browse/KYLIN-3249
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> Hybrid cubes are assigned default priority lower than regular cubes, which 
> leads to incorrect selection of a hybrid cube while a regular non-hybridized 
> cube with lower cost is available.
> For example, model has a wide cube with full set of metrics and narrower cube 
> with top-N entries for a subset of metrics.
> If wide cube is hybridized (due to a new metric addition), but top-N cube 
> remains unchanged and non-hybridized, top-N cube will be no longer queried, 
> causing query performance degradation.
> The issue can be tracked to the 
> query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where 
> hybrid cubes are assigned priority 0, while regular cubes are assigned 
> priority of 1.
>  This unconditional priority assignment is incorrect as it only holds for 
> cases when there is only one cube "flavor" in the model or when all the cubes 
> of various "flavors" are hybridized at the same time.
> Simplest fix is to have hybrid priority to be the same as of a regular cube.
>  Plus, as an enhancement to the cube selection algorithm a new rule can be 
> implemented that will filter out regular candidate cubes that are included 
> into candidate hybrid cubes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3249) Default hybrid cube priority should be the same as of a regular cube

2018-02-08 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3249:
-

 Summary: Default hybrid cube priority should be the same as of a 
regular cube
 Key: KYLIN-3249
 URL: https://issues.apache.org/jira/browse/KYLIN-3249
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2
Reporter: Vsevolod Ostapenko


Hybrid cubes are assigned default priority lower than regular cubes, which 
leads to incorrect selection of a hybrid cube while a regular non-hybridized 
cube with lower cost is available.

For example, model has a wide cube with full set of metrics and narrower cube 
with top-N entries for a subset of metrics.

If wide cube is hybridized (due to a new metric addition), but top-N cube 
remains unchanged and non-hybridized, top-N cube will be no longer queried, 
'causing query performance degradation.

The issue can be tracked to the 
query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid 
cubes are assigned priority 0, while regular cubes are assigned priority of 1.
This unconditional priority assignment is incorrect as it only holds for cases 
when there is only one cube "type" in the model or when all the cubes are 
hybridized at the same time.

Simplest fix is to have hybrid priority to be the same as of a regular cube.
Plus, as an enhancement to the cube selection algorithm a new rule can be 
implemented that will filter out regular candidate cubes that are included into 
candidate hybrid cubes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3249) Default hybrid cube priority should be the same as of a regular cube

2018-02-08 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3249:
--
Description: 
Hybrid cubes are assigned default priority lower than regular cubes, which 
leads to incorrect selection of a hybrid cube while a regular non-hybridized 
cube with lower cost is available.

For example, model has a wide cube with full set of metrics and narrower cube 
with top-N entries for a subset of metrics.

If wide cube is hybridized (due to a new metric addition), but top-N cube 
remains unchanged and non-hybridized, top-N cube will be no longer queried, 
causing query performance degradation.

The issue can be tracked to the 
query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid 
cubes are assigned priority 0, while regular cubes are assigned priority of 1.
 This unconditional priority assignment is incorrect as it only holds for cases 
when there is only one cube "type" in the model or when all the cubes are 
hybridized at the same time.

Simplest fix is to have hybrid priority to be the same as of a regular cube.
 Plus, as an enhancement to the cube selection algorithm a new rule can be 
implemented that will filter out regular candidate cubes that are included into 
candidate hybrid cubes.

  was:
Hybrid cubes are assigned default priority lower than regular cubes, which 
leads to incorrect selection of a hybrid cube while a regular non-hybridized 
cube with lower cost is available.

For example, model has a wide cube with full set of metrics and narrower cube 
with top-N entries for a subset of metrics.

If wide cube is hybridized (due to a new metric addition), but top-N cube 
remains unchanged and non-hybridized, top-N cube will be no longer queried, 
'causing query performance degradation.

The issue can be tracked to the 
query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where hybrid 
cubes are assigned priority 0, while regular cubes are assigned priority of 1.
This unconditional priority assignment is incorrect as it only holds for cases 
when there is only one cube "type" in the model or when all the cubes are 
hybridized at the same time.

Simplest fix is to have hybrid priority to be the same as of a regular cube.
Plus, as an enhancement to the cube selection algorithm a new rule can be 
implemented that will filter out regular candidate cubes that are included into 
candidate hybrid cubes.


> Default hybrid cube priority should be the same as of a regular cube
> 
>
> Key: KYLIN-3249
> URL: https://issues.apache.org/jira/browse/KYLIN-3249
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Priority: Major
>
> Hybrid cubes are assigned default priority lower than regular cubes, which 
> leads to incorrect selection of a hybrid cube while a regular non-hybridized 
> cube with lower cost is available.
> For example, model has a wide cube with full set of metrics and narrower cube 
> with top-N entries for a subset of metrics.
> If wide cube is hybridized (due to a new metric addition), but top-N cube 
> remains unchanged and non-hybridized, top-N cube will be no longer queried, 
> causing query performance degradation.
> The issue can be tracked to the 
> query/src/main/java/org/apache/kylin/query/routing/Candidate.java, where 
> hybrid cubes are assigned priority 0, while regular cubes are assigned 
> priority of 1.
>  This unconditional priority assignment is incorrect as it only holds for 
> cases when there is only one cube "type" in the model or when all the cubes 
> are hybridized at the same time.
> Simplest fix is to have hybrid priority to be the same as of a regular cube.
>  Plus, as an enhancement to the cube selection algorithm a new rule can be 
> implemented that will filter out regular candidate cubes that are included 
> into candidate hybrid cubes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-08 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko reassigned KYLIN-3223:
-

Assignee: Vsevolod Ostapenko  (was: nichunen)

> Query for the list of hybrid cubes results in NPE
> -
>
> Key: KYLIN-3223
> URL: https://issues.apache.org/jira/browse/KYLIN-3223
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Major
> Attachments: 
> 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch
>
>
> Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
> exception.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
> [http://localhost:7070/kylin/api/hybrids]
>  {quote}
>  
> If a parameter project without a value is specified, call succeeds. E.g.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
> [http://localhost:7070/kylin/api/hybrids?project]
> {quote}
> Quick look at the HybridService.java suggests that there is a bug in the 
> code, where the very first line tries to check ACLs on the project using the 
> project name, which is NULL, when project parameter is not specified as part 
> of the URL.
>  If parameter is specified without a value, ACL check is not performed, so 
> it's another bug, as the list of projects is retrieved without read 
> permission checking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-08 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357639#comment-16357639
 ] 

Vsevolod Ostapenko commented on KYLIN-3223:
---

[~yimingliu], I attached the proposed patch for NPE and missing read access 
check on projects, when project either not specified or empty.
Please review or have someone to look at the changes and provide feedback.

> Query for the list of hybrid cubes results in NPE
> -
>
> Key: KYLIN-3223
> URL: https://issues.apache.org/jira/browse/KYLIN-3223
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: nichunen
>Priority: Major
> Attachments: 
> 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch
>
>
> Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
> exception.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
> [http://localhost:7070/kylin/api/hybrids]
>  {quote}
>  
> If a parameter project without a value is specified, call succeeds. E.g.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
> [http://localhost:7070/kylin/api/hybrids?project]
> {quote}
> Quick look at the HybridService.java suggests that there is a bug in the 
> code, where the very first line tries to check ACLs on the project using the 
> project name, which is NULL, when project parameter is not specified as part 
> of the URL.
>  If parameter is specified without a value, ACL check is not performed, so 
> it's another bug, as the list of projects is retrieved without read 
> permission checking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-08 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3223:
--
Attachment: 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch

> Query for the list of hybrid cubes results in NPE
> -
>
> Key: KYLIN-3223
> URL: https://issues.apache.org/jira/browse/KYLIN-3223
> Project: Kylin
>  Issue Type: Bug
>  Components: REST Service
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2
>Reporter: Vsevolod Ostapenko
>Assignee: nichunen
>Priority: Major
> Attachments: 
> 0001-KYLIN-3223-Query-for-the-list-of-hybrid-cubes-result.patch
>
>
> Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
> exception.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
> [http://localhost:7070/kylin/api/hybrids]
>  {quote}
>  
> If a parameter project without a value is specified, call succeeds. E.g.
> {quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
> [http://localhost:7070/kylin/api/hybrids?project]
> {quote}
> Quick look at the HybridService.java suggests that there is a bug in the 
> code, where the very first line tries to check ACLs on the project using the 
> project name, which is NULL, when project parameter is not specified as part 
> of the URL.
>  If parameter is specified without a value, ACL check is not performed, so 
> it's another bug, as the list of projects is retrieved without read 
> permission checking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster

2018-02-08 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357607#comment-16357607
 ] 

Vsevolod Ostapenko commented on KYLIN-3139:
---

[~liyang], [~yimingliu]
guys, could we make a decision on this one? It's a trivial change, but it has 
been hanging around for more than a month now.

> Failure in map-reduce job due to undefined hdp.version variable when using 
> HDP stack and remote HBase cluster
> -
>
> Key: KYLIN-3139
> URL: https://issues.apache.org/jira/browse/KYLIN-3139
> Project: Kylin
>  Issue Type: Bug
>  Components: Others
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster 
> with Hive only, remote HBase cluster for data storage
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: hdp
> Attachments: KYLIN-3139.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When running on top of HDP stack and using a setup where Hive and HBase run 
> in different clusters cube build/refresh fails on the step "Extract Fact 
> Table Distinct Columns" with the error
> {quote}java.lang.IllegalArgumentException: Unable to parse 
> '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a 
> URI, check the setting for mapreduce.application.framework.path{quote}
> Based on existing JIRA discussions in Ambari project, it's responsibility of 
> a service to set hdp.version Java property. When HBase is not installed as a 
> service in a cluster where Kylin server is running, hbase launcher (invoked 
> by kylin.sh) does not set this property (presumably because HBase in that 
> case is just a client and not a service).
> The only suitable workaround found so far is to set property as part of the 
> conf/setenv.sh script.
> In order to avoid hard coding of the HDP version info, suggested change to 
> setenv.sh will attempt to detect HDP version at run-time. It should work for 
> all released HDP version from 2.2.x to 2.6.x
> In addition to that, it will also try to locate and set Java native library 
> path, when running on top of HDP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-07 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3223:
--
Description: 
Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
exception.
{quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {}  
[http://localhost:7070/kylin/api/hybrids]
 {quote}
 

If a parameter project without a value is specified, call succeeds. E.g.
{quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
[http://localhost:7070/kylin/api/hybrids?project]
{quote}
Quick look at the HybridService.java suggests that there is a bug in the code, 
where the very first line tries to check ACLs on the project using the project 
name, which is NULL, when project parameter is not specified as part of the URL.
 If parameter is specified without a value, ACL check is not performed, so it's 
another bug, as the list of projects is retrieved without read permission 
checking.

  was:
Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
exception.
{quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
[http://localhost:7070/kylin/api/hybrids]

{"code":"999","data":null,"msg":null,"stacktrace":"java.lang.NullPointerException\n\tat
 
java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)\n\tat
 
java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)\n\tat
 
org.apache.kylin.metadata.cachesync.SingleValueCache.get(SingleValueCache.java:85)\n\tat
 
org.apache.kylin.metadata.project.ProjectManager.getProject(ProjectManager.java:172)\n\tat
 
org.apache.kylin.rest.util.AclEvaluate.getProjectInstance(AclEvaluate.java:39)\n\tat
 
org.apache.kylin.rest.util.AclEvaluate.checkProjectReadPermission(AclEvaluate.java:61)\n\tat
 
org.apache.kylin.rest.service.HybridService.listHybrids(HybridService.java:115)\n\tat
 
org.apache.kylin.rest.controller.HybridController.list(HybridController.java:76)\n\tat
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
 java.lang.reflect.Method.invoke(Method.java:497)\n\tat 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)\n\tat
 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)\n\tat
 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)\n\tat
 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)\n\tat
 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)\n\tat
 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)\n\tat
 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)\n\tat
 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)\n\tat
 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)\n\tat
 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)\n\tat
 javax.servlet.http.HttpServlet.service(HttpServlet.java:624)\n\tat 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)\n\tat
 javax.servlet.http.HttpServlet.service(HttpServlet.java:731)\n\tat 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)\n\tat
 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat
 org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)\n\tat 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat
 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)\n\tat
 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)\n\tat
 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
 
org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
 

[jira] [Created] (KYLIN-3223) Query for the list of hybrid cubes results in NPE

2018-02-01 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3223:
-

 Summary: Query for the list of hybrid cubes results in NPE
 Key: KYLIN-3223
 URL: https://issues.apache.org/jira/browse/KYLIN-3223
 Project: Kylin
  Issue Type: Bug
  Components: REST Service
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2
Reporter: Vsevolod Ostapenko
Assignee: luguosheng


Calling REST API to get the list of hybrid cubes returns stack trace with NPE 
exception.
{quote}curl -u ADMIN:KYLIN -X GET -H 'Content-Type: application/json'  -d {} 
[http://localhost:7070/kylin/api/hybrids]

{"code":"999","data":null,"msg":null,"stacktrace":"java.lang.NullPointerException\n\tat
 
java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:778)\n\tat
 
java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1546)\n\tat
 
org.apache.kylin.metadata.cachesync.SingleValueCache.get(SingleValueCache.java:85)\n\tat
 
org.apache.kylin.metadata.project.ProjectManager.getProject(ProjectManager.java:172)\n\tat
 
org.apache.kylin.rest.util.AclEvaluate.getProjectInstance(AclEvaluate.java:39)\n\tat
 
org.apache.kylin.rest.util.AclEvaluate.checkProjectReadPermission(AclEvaluate.java:61)\n\tat
 
org.apache.kylin.rest.service.HybridService.listHybrids(HybridService.java:115)\n\tat
 
org.apache.kylin.rest.controller.HybridController.list(HybridController.java:76)\n\tat
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
 java.lang.reflect.Method.invoke(Method.java:497)\n\tat 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)\n\tat
 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)\n\tat
 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)\n\tat
 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)\n\tat
 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)\n\tat
 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)\n\tat
 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)\n\tat
 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)\n\tat
 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)\n\tat
 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)\n\tat
 javax.servlet.http.HttpServlet.service(HttpServlet.java:624)\n\tat 
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)\n\tat
 javax.servlet.http.HttpServlet.service(HttpServlet.java:731)\n\tat 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)\n\tat
 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat
 org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)\n\tat 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat
 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)\n\tat
 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)\n\tat
 
org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
 
org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
 
org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
 
org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)\n\tat
 
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)\n\tat
 

[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con

2018-01-29 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344161#comment-16344161
 ] 

Vsevolod Ostapenko commented on KYLIN-3122:
---

I think I found one place in the code that is at least partially responsible 
for the behavior observed.
The convertFilterColumnsAndConstants method in the GTUtil.java rewrites 
statement filter, after static values in the WHERE clause were checked against 
tri-dictionary.

There seems to be multiple issues with this approach:
1) Filtering on the partitioning key is treated the same as filtering on a 
non-partitioning columns, which is incorrect, as presence or absence of a lower 
or upper range bound for partitioning column in the dictionary in a specific 
segment provides no guarantees that this segment is or is not a candidate for 
further scan.
2) As the side effect of the #1, it looks like after first candidate segment is 
hit (the lower bound date-time value is found in dictionary), the filter is 
modified in place (rewritten) to exclude the upper bound condition (if upper 
bound condition is not found in the segment, which is always the case in our 
scenario).

 Partitioning keys require special handling, they need to be checked against 
segment range meta-data, and excluded from dictionary-based checks.

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
> Attachments: partition_elimination_bug_single_column_test.log
>
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dum

2018-01-29 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343677#comment-16343677
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3122 at 1/29/18 5:31 PM:


[~Shaofengshi], we tried using single partitioning column with date and time 
fused together.
 The result is still not satisfactory, as cube segments are not properly 
eliminated even in this case.

In our test we had a table with 12 hourly partitions defined for hours 00 to 
11. A test query with condition _*where a.time_key >= '201711200100' and 
a.time_key < '201711200400'*_  is only filtering out the very fist segment for 
the hour 00, and then progresses with scanning all the remaining 11 segments, 
instead of the expected 3 segments (for hours 01, 02 and 03).
 It looks very much like a bug, where as soon as the lower bound condition is 
satisfied, upper bound condition is no longer checked.

I'm attaching a log excerpt to illustrate the above mentioned behavior.

[^partition_elimination_bug_single_column_test.log]


was (Author: seva_ostapenko):
[~Shaofengshi], we tried using single partitioning column with date and time 
fused together.
The result is still not satisfactory, as cube segments are not properly 
eliminated even in this case.

In our test we had a table with 12 hourly partitions defined for hours 00 to 
11, a query with condition _*where a.time_key >= '201711200100' and a.time_key 
< '201711200400'*_  is only filtering out the very fist segment for hour 00, 
then progresses with scanning all the remaining 11 segments, instead of 
expected 3 segments (hours 01, 02 and 03).
It looks very much like a bug, where as soon as the lower bound condition is 
satisfied, upper bound condition is no longer checked.

I'm attaching a log excerpt to illustrate the above mentioned behavior.

[^partition_elimination_bug_single_column_test.log]

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
> Attachments: partition_elimination_bug_single_column_test.log
>
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> 

[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con

2018-01-29 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343677#comment-16343677
 ] 

Vsevolod Ostapenko commented on KYLIN-3122:
---

[~Shaofengshi], we tried using single partitioning column with date and time 
fused together.
The result is still not satisfactory, as cube segments are not properly 
eliminated even in this case.

In our test we had a table with 12 hourly partitions defined for hours 00 to 
11, a query with condition _*where a.time_key >= '201711200100' and a.time_key 
< '201711200400'*_  is only filtering out the very fist segment for hour 00, 
then progresses with scanning all the remaining 11 segments, instead of 
expected 3 segments (hours 01, 02 and 03).
It looks very much like a bug, where as soon as the lower bound condition is 
satisfied, upper bound condition is no longer checked.

I'm attaching a log excerpt to illustrate the above mentioned behavior.

[^partition_elimination_bug_single_column_test.log]

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
> Attachments: partition_elimination_bug_single_column_test.log
>
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2018-01-29 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Attachment: partition_elimination_bug_single_column_test.log

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
> Attachments: partition_elimination_bug_single_column_test.log
>
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump con

2018-01-25 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339446#comment-16339446
 ] 

Vsevolod Ostapenko commented on KYLIN-3122:
---

[~Shaofengshi] or [~yimingliu], could one of you guys please assign this bug 
for proper investigation as this issue is derailing our product development 
plans and is a showstopper for any real production deployment?

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2018-01-25 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Priority: Critical  (was: Major)

> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump conditions
> ---
>
> Key: KYLIN-3122
> URL: https://issues.apache.org/jira/browse/KYLIN-3122
> Project: Kylin
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: hongbin ma
>Priority: Critical
>
> Current algorithm of cube segment elimination seems to be rather inefficient.
>  We are using a model where cubes are partitioned by date and time:
>  "partition_desc":
> { "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
> "A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
> "MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
> "partition_condition_builder": 
> "org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
>  }
> ,
> Cubes contain partitions for multiple days and 24 hours for each day. Each 
> cube segment corresponds to just one hour.
> When a query is issued where both date and hour are specified using equality 
> condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
> integrates over all the segment cubes (hundreds of them) only to skip all 
> except for the one that needs to be scanned (which can be observed by looking 
> in the logs).
>  The expectation is that Kylin would use existing info on the partitioning 
> columns (date and time) and known hierarchical relations between date and 
> time to locate required partition much more efficiently that linear scan 
> through all the cube partitions.
> Now, if filtering condition is on the range of hours, behavior of the 
> partition pruning and scanning becomes not very logical, which suggests bugs 
> in the logic.
> If filtering condition is on specific date and closed-open range of hours 
> (e.g. thedate = '20171011' and thehour >= '10' and thehour < '11'), in 
> addition to sequentially scanning all the cube partitions (as described 
> above), Kylin will scan HBase tables for all the hours from the specified 
> starting hour and till the last hour of the day (e.g. from hour 10 to 24, 
> instead of just hour 10).
>  As the result query will run much longer that necessary, and might run out 
> of memory, causing JVM heap dump and Kylin server crash.
> If filtering condition is on specific date by hour interval is specified as 
> open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= 
> '10'), Kylin will scan all HBase tables for all the later dates and hours 
> (e.g. from hour 10 and till the most recent hour on the most recent day, 
> which can be hundreds of tables and thousands of regions).
>  As the result query execution will dramatically increase and in most cases 
> Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2018-01-25 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Description: 
Current algorithm of cube segment elimination seems to be rather inefficient.
 We are using a model where cubes are partitioned by date and time:
 "partition_desc":

{ "partition_date_column": "A_VL_HOURLY_V.THEDATE", "partition_time_column": 
"A_VL_HOURLY_V.THEHOUR", "partition_date_start": 0, "partition_date_format": 
"MMdd", "partition_time_format": "HH", "partition_type": "APPEND", 
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
 }

,

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
 The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
 As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.

If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of tables and thousands of regions).
 As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.

  was:
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
 "partition_date_column": "A_VL_HOURLY_V.THEDATE",
 "partition_time_column": "A_VL_HOURLY_V.THEHOUR",
 "partition_date_start": 0,
 "partition_date_format": "MMdd",
 "partition_time_format": "HH",
 "partition_type": "APPEND",
 "partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
},

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most cases 
Kylin server 

[jira] [Created] (KYLIN-3186) Add support for partitioning columns that combine date and time (e.g. YYYYMMDDHHMISS)

2018-01-19 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3186:
-

 Summary: Add support for partitioning columns that combine date 
and time (e.g. MMDDHHMISS)
 Key: KYLIN-3186
 URL: https://issues.apache.org/jira/browse/KYLIN-3186
 Project: Kylin
  Issue Type: Improvement
  Components: General
Affects Versions: v2.2.0
Reporter: Vsevolod Ostapenko


In a multitude of existing enterprise applications partitioning is done on a 
single column that fuse date and time into a single value (string, integer or 
big integer). Typical formats are MMDDHHMM or  MMDDHHMMSS (e.g. 
201801181621 and 20180118154734).
Such representation is human readable and provides natural sorting of the 
date/time values.

Lack of support for such date/time representation requires some ugly 
workarounds, like creating views that split date and time into separate columns 
or data copying into tables with different partitioning scheme, none of which 
is a particularly good solution.
More over, using views approach on Hive causes severe performance issues, due 
to inability of Hive optimizer correctly analyze filtering conditions 
auto-generated by Kylin during the flat table build step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KYLIN-3185) Change handling of new metrics in the hybrid cube scenario to return NULL values for segments built before metrics were added

2018-01-19 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3185:
-

 Summary: Change handling of new metrics in the hybrid cube 
scenario to return NULL values for segments built before metrics were added
 Key: KYLIN-3185
 URL: https://issues.apache.org/jira/browse/KYLIN-3185
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2.0
Reporter: Vsevolod Ostapenko
Assignee: liyang


Currently, when a hybrid cube is defined and a new metric is added, cube 
segments that were created before the metric was introduced are not consulted, 
if a query contains this new metric.

As the result, even data for metrics that existed and were computed are not 
returned.

A better behavior would be to find an intersection between metrics present in a 
segment and metrics requested by the query and for given segment return all the 
available metrics and inject NULL values for ones that did not exist at the 
time of cube segment population.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster

2018-01-05 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313524#comment-16313524
 ] 

Vsevolod Ostapenko commented on KYLIN-3139:
---

[~Shaofengshi], I'm not sure who to ask to review my proposed changes to this 
JIRA. Perhaps you could have a look or direct it to the correct reviewer? 
Thanks in advance.

> Failure in map-reduce job due to undefined hdp.version variable when using 
> HDP stack and remote HBase cluster
> -
>
> Key: KYLIN-3139
> URL: https://issues.apache.org/jira/browse/KYLIN-3139
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster 
> with Hive only, remote HBase cluster for data storage
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: hdp
> Attachments: KYLIN-3139.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When running on top of HDP stack and using a setup where Hive and HBase run 
> in different clusters cube build/refresh fails on the step "Extract Fact 
> Table Distinct Columns" with the error
> {quote}java.lang.IllegalArgumentException: Unable to parse 
> '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a 
> URI, check the setting for mapreduce.application.framework.path{quote}
> Based on existing JIRA discussions in Ambari project, it's responsibility of 
> a service to set hdp.version Java property. When HBase is not installed as a 
> service in a cluster where Kylin server is running, hbase launcher (invoked 
> by kylin.sh) does not set this property (presumably because HBase in that 
> case is just a client and not a service).
> The only suitable workaround found so far is to set property as part of the 
> conf/setenv.sh script.
> In order to avoid hard coding of the HDP version info, suggested change to 
> setenv.sh will attempt to detect HDP version at run-time. It should work for 
> all released HDP version from 2.2.x to 2.6.x
> In addition to that, it will also try to locate and set Java native library 
> path, when running on top of HDP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge

2018-01-05 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313505#comment-16313505
 ] 

Vsevolod Ostapenko commented on KYLIN-3069:
---

[~Zhixiong Chen], could you please review the changes and commit into the 
master? Patch looks fine to me (in case anyone is waiting for my feedback).

> Add proper time zone support to the WebUI instead of GMT/PST kludge
> ---
>
> Key: KYLIN-3069
> URL: https://issues.apache.org/jira/browse/KYLIN-3069
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.3, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: peng.jianhua
>Priority: Minor
> Attachments: 
> 0001-KYLIN-3069-Add-proper-time-zone-support-to-the-WebUI.patch, Screen Shot 
> 2017-12-05 at 10.01.39 PM.png, kylin_pic1.png, kylin_pic2.png, kylin_pic3.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Time zone handling logic in the WebUI is a kludge, coded to parse only 
> "GMT-N" time zone specifications and defaulting to PST, if parsing is not 
> successful (kylin/webapp/app/js/filters/filter.js)
> Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) 
> into the product, would allow correct time zone handling.
> For the users who happen to reside in the geographical locations that do 
> observe day light savings time, usage of GMT-N format is very inconvenient 
> and info reported by the UI in various places is perplexing.
> Needless to say that the GMT moniker itself is long deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster

2018-01-03 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3139:
--
Description: 
When running on top of HDP stack and using a setup where Hive and HBase run in 
different clusters cube build/refresh fails on the step "Extract Fact Table 
Distinct Columns" with the error
{quote}java.lang.IllegalArgumentException: Unable to parse 
'/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, 
check the setting for mapreduce.application.framework.path{quote}

Based on existing JIRA discussions in Ambari project, it's responsibility of a 
service to set hdp.version Java property. When HBase is not installed as a 
service in a cluster where Kylin server is running, hbase launcher (invoked by 
kylin.sh) does not set this property (presumably because HBase in that case is 
just a client and not a service).
The only suitable workaround found so far is to set property as part of the 
conf/setenv.sh script.

In order to avoid hard coding of the HDP version info, suggested change to 
setenv.sh will attempt to detect HDP version at run-time. It should work for 
all released HDP version from 2.2.x to 2.6.x
In addition to that, it will also try to locate and set Java native library 
path, when running on top of HDP.

  was:
When running on top of HDP stack and using a setup where Hive and HBase run in 
different clusters cube build/refresh fails on the step "Extract Fact Table 
Distinct Columns" with the error
{quote}java.lang.IllegalArgumentException: Unable to parse 
'/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, 
check the setting for mapreduce.application.framework.path{quote}

Based on existing JIRA discussions in Ambari project, it's responsibility of a 
service to set hdp.version Java property. When HBase is not installed as a 
service in a cluster hbase launcher does not set this property (presumably 
because HBase in that case is just a client and not a service).
The only suitable workaround found so far is to set property as part of the 
conf/setenv.sh script.

In order to avoid hard coding of the HDP version info, suggested change to 
setenv.sh will attempt to detect HDP version at run-time. It should work for 
all released HDP version from 2.2.x to 2.6.x
In addition to that, it will also try to locate and set Java native library 
path, when running on top of HDP.


> Failure in map-reduce job due to undefined hdp.version variable when using 
> HDP stack and remote HBase cluster
> -
>
> Key: KYLIN-3139
> URL: https://issues.apache.org/jira/browse/KYLIN-3139
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster 
> with Hive only, remote HBase cluster for data storage
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: hdp
> Attachments: KYLIN-3139.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When running on top of HDP stack and using a setup where Hive and HBase run 
> in different clusters cube build/refresh fails on the step "Extract Fact 
> Table Distinct Columns" with the error
> {quote}java.lang.IllegalArgumentException: Unable to parse 
> '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a 
> URI, check the setting for mapreduce.application.framework.path{quote}
> Based on existing JIRA discussions in Ambari project, it's responsibility of 
> a service to set hdp.version Java property. When HBase is not installed as a 
> service in a cluster where Kylin server is running, hbase launcher (invoked 
> by kylin.sh) does not set this property (presumably because HBase in that 
> case is just a client and not a service).
> The only suitable workaround found so far is to set property as part of the 
> conf/setenv.sh script.
> In order to avoid hard coding of the HDP version info, suggested change to 
> setenv.sh will attempt to detect HDP version at run-time. It should work for 
> all released HDP version from 2.2.x to 2.6.x
> In addition to that, it will also try to locate and set Java native library 
> path, when running on top of HDP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster

2017-12-28 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305767#comment-16305767
 ] 

Vsevolod Ostapenko commented on KYLIN-3139:
---

Proposed version of the patch is attached, please review and provide your 
feedback (or commit, if it looks OK).

> Failure in map-reduce job due to undefined hdp.version variable when using 
> HDP stack and remote HBase cluster
> -
>
> Key: KYLIN-3139
> URL: https://issues.apache.org/jira/browse/KYLIN-3139
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster 
> with Hive only, remote HBase cluster for data storage
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: hdp
> Attachments: KYLIN-3139.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When running on top of HDP stack and using a setup where Hive and HBase run 
> in different clusters cube build/refresh fails on the step "Extract Fact 
> Table Distinct Columns" with the error
> {quote}java.lang.IllegalArgumentException: Unable to parse 
> '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a 
> URI, check the setting for mapreduce.application.framework.path{quote}
> Based on existing JIRA discussions in Ambari project, it's responsibility of 
> a service to set hdp.version Java property. When HBase is not installed as a 
> service in a cluster hbase launcher does not set this property (presumably 
> because HBase in that case is just a client and not a service).
> The only suitable workaround found so far is to set property as part of the 
> conf/setenv.sh script.
> In order to avoid hard coding of the HDP version info, suggested change to 
> setenv.sh will attempt to detect HDP version at run-time. It should work for 
> all released HDP version from 2.2.x to 2.6.x
> In addition to that, it will also try to locate and set Java native library 
> path, when running on top of HDP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster

2017-12-28 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3139:
--
Attachment: KYLIN-3139.master.001.patch

> Failure in map-reduce job due to undefined hdp.version variable when using 
> HDP stack and remote HBase cluster
> -
>
> Key: KYLIN-3139
> URL: https://issues.apache.org/jira/browse/KYLIN-3139
> Project: Kylin
>  Issue Type: Bug
>  Components: General
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster 
> with Hive only, remote HBase cluster for data storage
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: hdp
> Attachments: KYLIN-3139.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When running on top of HDP stack and using a setup where Hive and HBase run 
> in different clusters cube build/refresh fails on the step "Extract Fact 
> Table Distinct Columns" with the error
> {quote}java.lang.IllegalArgumentException: Unable to parse 
> '/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a 
> URI, check the setting for mapreduce.application.framework.path{quote}
> Based on existing JIRA discussions in Ambari project, it's responsibility of 
> a service to set hdp.version Java property. When HBase is not installed as a 
> service in a cluster hbase launcher does not set this property (presumably 
> because HBase in that case is just a client and not a service).
> The only suitable workaround found so far is to set property as part of the 
> conf/setenv.sh script.
> In order to avoid hard coding of the HDP version info, suggested change to 
> setenv.sh will attempt to detect HDP version at run-time. It should work for 
> all released HDP version from 2.2.x to 2.6.x
> In addition to that, it will also try to locate and set Java native library 
> path, when running on top of HDP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3139) Failure in map-reduce job due to undefined hdp.version variable when using HDP stack and remote HBase cluster

2017-12-28 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3139:
-

 Summary: Failure in map-reduce job due to undefined hdp.version 
variable when using HDP stack and remote HBase cluster
 Key: KYLIN-3139
 URL: https://issues.apache.org/jira/browse/KYLIN-3139
 Project: Kylin
  Issue Type: Bug
  Components: General
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, two cluster setup, Kylin 2.2.0 in a cluster 
with Hive only, remote HBase cluster for data storage
Reporter: Vsevolod Ostapenko
Assignee: Vsevolod Ostapenko
Priority: Minor


When running on top of HDP stack and using a setup where Hive and HBase run in 
different clusters cube build/refresh fails on the step "Extract Fact Table 
Distinct Columns" with the error
{quote}java.lang.IllegalArgumentException: Unable to parse 
'/hdp/apps/$\{hdp.version\}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, 
check the setting for mapreduce.application.framework.path{quote}

Based on existing JIRA discussions in Ambari project, it's responsibility of a 
service to set hdp.version Java property. When HBase is not installed as a 
service in a cluster hbase launcher does not set this property (presumably 
because HBase in that case is just a client and not a service).
The only suitable workaround found so far is to set property as part of the 
conf/setenv.sh script.

In order to avoid hard coding of the HDP version info, suggested change to 
setenv.sh will attempt to detect HDP version at run-time. It should work for 
all released HDP version from 2.2.x to 2.6.x
In addition to that, it will also try to locate and set Java native library 
path, when running on top of HDP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
!https://issues.apache.org/jira/secure/attachment/12903336/Screen%20Shot%202017-12-21%20at%207.49.46%20PM.png!
Similar behavior can observed even with a single cube with a rather long cube 
name.

  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.

Similar behavior can observed even with a single cube with a rather long cube 
name.



> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
> !https://issues.apache.org/jira/secure/attachment/12903336/Screen%20Shot%202017-12-21%20at%207.49.46%20PM.png!
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
[^Screen Shot 2017-12-21 at 7.49.46 PM.png]
Similar behavior can observed even with a single cube with a rather long cube 
name.


  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
 
Similar behavior can observed even with a single cube with a rather long cube 
name.



> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
> [^Screen Shot 2017-12-21 at 7.49.46 PM.png]
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.

Similar behavior can observed even with a single cube with a rather long cube 
name.


  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
[^Screen Shot 2017-12-21 at 7.49.46 PM.png]
Similar behavior can observed even with a single cube with a rather long cube 
name.



> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
 
Similar behavior can observed even with a single cube with a rather long cube 
name.

  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.

!Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail!
 
Similar behavior can observed even with a single cube with a rather long cube 
name.


> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
>  
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
 
Similar behavior can observed even with a single cube with a rather long cube 
name.


  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
 
Similar behavior can observed even with a single cube with a rather long cube 
name.


> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
>  
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.

!Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail!
 
Similar behavior can observed even with a single cube with a rather long cube 
name.

  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.

!attachment-name.jpg|thumbnail!
 
Similar behavior can observed even with a single cube with a rather long cube 
name.


> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
> !Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail!
>  
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3127:
--
Description: 
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.

!attachment-name.jpg|thumbnail!
 
Similar behavior can observed even with a single cube with a rather long cube 
name.

  was:
When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
!Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail!
 
Similar behavior can observed even with a single cube with a rather long cube 
name.


> In the Insights tab, results section, make the list of Cubes hit by the query 
> either scrollable or multiline
> 
>
> Key: KYLIN-3127
> URL: https://issues.apache.org/jira/browse/KYLIN-3127
> Project: Kylin
>  Issue Type: Improvement
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Zhixiong Chen
>Priority: Minor
> Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png
>
>
> When query hits multiple cubes or the same cube multiple times, the list of 
> cubes is truncated as it's a single line and non-scrollable element on the 
> page. Please refer to the enclosed screenshot.
> !attachment-name.jpg|thumbnail!
>  
> Similar behavior can observed even with a single cube with a rather long cube 
> name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3127) In the Insights tab, results section, make the list of Cubes hit by the query either scrollable or multiline

2017-12-21 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3127:
-

 Summary: In the Insights tab, results section, make the list of 
Cubes hit by the query either scrollable or multiline
 Key: KYLIN-3127
 URL: https://issues.apache.org/jira/browse/KYLIN-3127
 Project: Kylin
  Issue Type: Improvement
  Components: Web 
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2.0
Reporter: Vsevolod Ostapenko
Assignee: Zhixiong Chen
Priority: Minor
 Attachments: Screen Shot 2017-12-21 at 7.49.46 PM.png

When query hits multiple cubes or the same cube multiple times, the list of 
cubes is truncated as it's a single line and non-scrollable element on the 
page. Please refer to the enclosed screenshot.
!Screen Shot 2017-12-21 at 7.49.46 PM.png|thumbnail!
 
Similar behavior can observed even with a single cube with a rather long cube 
name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3126) Query fails with "Error while compiling generated Java code" when equality condition is used, and works when equivalent IN clause is specified

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3126:
--
Description: 
The following query fails with "Error while compiling generated Java code", 
when equality condition is used {{(d0.year_beg_dt = '2012-01-01')}} and works 
when IN clause is used {{(d0.year_beg_dt in ('2012-01-01'))}}

{code:sql}
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt = '2012-01-01'  -- blows up
-- d0.year_beg_dt in ('2012-01-01') -- works
and
d2.country in ('US', 'JP')
 group by
d2.country
{code}

  was:
The following query fails with "Error while compiling generated Java code", 
when equality condition is used (d0.year_beg_dt = '2012-01-01') and works when 
IN clause is used (d0.year_beg_dt in ('2012-01-01'))

 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt = '2012-01-01'  -- blows up
-- d0.year_beg_dt in ('2012-01-01') -- works
and
d2.country in ('US', 'JP')
 group by
d2.country


> Query fails with "Error while compiling generated Java code" when equality 
> condition is used, and works when equivalent IN clause is specified
> --
>
> Key: KYLIN-3126
> URL: https://issues.apache.org/jira/browse/KYLIN-3126
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0, sample cube
>Reporter: Vsevolod Ostapenko
>Assignee: liyang
>
> The following query fails with "Error while compiling generated Java code", 
> when equality condition is used {{(d0.year_beg_dt = '2012-01-01')}} and works 
> when IN clause is used {{(d0.year_beg_dt in ('2012-01-01'))}}
> {code:sql}
>  select
> d2.country,
> count(f.item_count) items_ttl
>  from
> kylin_sales f
>  join
> kylin_cal_dt d0
>  on
> f.part_dt = d0.cal_dt
>  join 
> kylin_account d1
>  on
> f.buyer_id = d1.account_id
>  join
> kylin_country d2
>  on
> d1.account_country = d2.country
>  where
> d0.year_beg_dt = '2012-01-01'  -- blows up
> -- d0.year_beg_dt in ('2012-01-01') -- works
> and
> d2.country in ('US', 'JP')
>  group by
> d2.country
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300778#comment-16300778
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3121 at 12/22/17 12:57 AM:
--

[~yimingliu], sure, here is an equivalent query against the sample cube. It 
fails with exactly the same errors.
{code:sql}
with
t1
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country = 'US'
 group by
d2.country
)
,
t2
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country = 'JP'
 group by
d2.country
)
,
t3
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country in ('US', 'JP')
 group by
d2.country
)
select
   t3.country,
   t2.items_ttl,
   t3.items_ttl,
   -- 1 * t1.items_ttl  expr1, -- works
   1.0 * t1.items_ttlexpr1, -- blows up, null while executing 
SQL
   -- 1 * NULLIF(t2.items_ttl, 0)   expr2  -- works
   1.0 * NULLIF(t2.items_ttl, 0) expr2  -- blows up, no error message, just 
Failed
from
   t3
left outer join
   t1
on
   t3.country = t1.country
left outer join
   t2
on
   t3.country = t2.country
{code}


was (Author: seva_ostapenko):
[~yimingliu], sure, here is an equivalent query against the sample cube. It 
fails with exactly the same errors.

with
t1
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country = 'US'
 group by
d2.country
)
,
t2
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country = 'JP'
 group by
d2.country
)
,
t3
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country in ('US', 'JP')
 group by
d2.country
)
select
   t3.country,
   t2.items_ttl,
   t3.items_ttl,
   -- 1 * t1.items_ttl  expr1, -- works
   1.0 * t1.items_ttlexpr1, -- blows up, null while executing 
SQL
   -- 1 * NULLIF(t2.items_ttl, 0)   expr2  -- works
   1.0 * NULLIF(t2.items_ttl, 0) expr2  -- blows up, no error message, just 
Failed
from
   t3
left outer join
   t1
on
   t3.country = t1.country
left outer join
   t2
on
   t3.country = t2.country

> NPE while executing a query with two left outer joins and floating point 
> expressions on nullable fields
> ---
>
> Key: KYLIN-3121
> URL: https://issues.apache.org/jira/browse/KYLIN-3121
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: liyang
>
> Queries that include two (or more) left outer joins and contain floating 
> point expressions that operate on the fields that contain integer NULL values 
> (due to left outer join) fail in-flight with NullPointerExceptions.
> As an example, the following query generates 

[jira] [Commented] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields

2017-12-21 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300778#comment-16300778
 ] 

Vsevolod Ostapenko commented on KYLIN-3121:
---

[~yimingliu], sure, here is an equivalent query against the sample cube. It 
fails with exactly the same errors.

with
t1
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country = 'US'
 group by
d2.country
)
,
t2
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country = 'JP'
 group by
d2.country
)
,
t3
as
(
 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt in ('2012-01-01')
and
d2.country in ('US', 'JP')
 group by
d2.country
)
select
   t3.country,
   t2.items_ttl,
   t3.items_ttl,
   -- 1 * t1.items_ttl  expr1, -- works
   1.0 * t1.items_ttlexpr1, -- blows up, null while executing 
SQL
   -- 1 * NULLIF(t2.items_ttl, 0)   expr2  -- works
   1.0 * NULLIF(t2.items_ttl, 0) expr2  -- blows up, no error message, just 
Failed
from
   t3
left outer join
   t1
on
   t3.country = t1.country
left outer join
   t2
on
   t3.country = t2.country

> NPE while executing a query with two left outer joins and floating point 
> expressions on nullable fields
> ---
>
> Key: KYLIN-3121
> URL: https://issues.apache.org/jira/browse/KYLIN-3121
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: liyang
>
> Queries that include two (or more) left outer joins and contain floating 
> point expressions that operate on the fields that contain integer NULL values 
> (due to left outer join) fail in-flight with NullPointerExceptions.
> As an example, the following query generates NPE on either of the two 
> expressions:
> * 100.0 * t2.media_gap_call_count
> * 1.0 * NULLIF(t1.active_call_count, 0)
> with
> t1
> as
> (
>  select
> d1.cell_name,
> count(distinct a1.call_id) as active_call_count
>  from
> zetticsdw.a_vl_hourly_v a1
>  inner join
> zetticsdw.d_cell_v d1
>  on
> a1.cell_key = d1.cell_key
>  where
> d1.region_3 = 'Mumbai'
> and
> a1.thedate = '20171011'
> and
> a1.thehour = '00'
> and
> a1.active_call_flg = 1
> group by
> d1.cell_name
> ),
> t2
> as
> (
>  select
> d1.cell_name,
> count(distinct a1.call_id) as media_gap_call_count
>  from
> zetticsdw.a_vl_hourly_v a1
>  inner join
> zetticsdw.d_cell_v d1
>  on
> a1.cell_key = d1.cell_key
>  where
> d1.region_3 = 'Mumbai'
> and
> a1.thedate='20171011'
> and
> a1.thehour = '00'
> and
> a1.media_gap_call_flg = 1
> group by
> d1.cell_name
> )
> ,
> t3
> as
> (
>  select
> d1.cell_name,
> sum(a1.ow_call_flg)   one_way_call_count,
> sum(a1.succ_call_flg) successfull_call_count
>  from
> zetticsdw.a_vl_hourly_v a1
>  inner join
> zetticsdw.d_cell_v d1
>  on
> a1.cell_key = d1.cell_key
>  where
> d1.region_3 = 'Mumbai'
> and
> a1.thedate='20171011'
> and
> a1.thehour = '00'
> group by
> d1.cell_name
> )
> select
>t3.cell_name,
>t1.active_call_count,
>t2.media_gap_call_count,
>t3.one_way_call_count,
>t3.successfull_call_count,
>-- 100 * t2.media_gap_call_count nom,   -- 
> works
>-- 1 * NULLIF(t1.active_call_count, 0) denom-- 
> works
>100.0 * t2.media_gap_call_count nom, -- fails, 
> NPE of one kind
>1.0 * NULLIF(t1.active_call_count, 0) denom  

[jira] [Created] (KYLIN-3126) Query fails with "Error while compiling generated Java code" when equality condition is used, and works when equivalent IN clause is specified

2017-12-21 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3126:
-

 Summary: Query fails with "Error while compiling generated Java 
code" when equality condition is used, and works when equivalent IN clause is 
specified
 Key: KYLIN-3126
 URL: https://issues.apache.org/jira/browse/KYLIN-3126
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2.0, sample cube
Reporter: Vsevolod Ostapenko
Assignee: liyang


The following query fails with "Error while compiling generated Java code", 
when equality condition is used (d0.year_beg_dt = '2012-01-01') and works when 
IN clause is used (d0.year_beg_dt in ('2012-01-01'))

 select
d2.country,
count(f.item_count) items_ttl
 from
kylin_sales f
 join
kylin_cal_dt d0
 on
f.part_dt = d0.cal_dt
 join 
kylin_account d1
 on
f.buyer_id = d1.account_id
 join
kylin_country d2
 on
d1.account_country = d2.country
 where
d0.year_beg_dt = '2012-01-01'  -- blows up
-- d0.year_beg_dt in ('2012-01-01') -- works
and
d2.country in ('US', 'JP')
 group by
d2.country



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2017-12-20 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Description: 
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
 "partition_date_column": "A_VL_HOURLY_V.THEDATE",
 "partition_time_column": "A_VL_HOURLY_V.THEHOUR",
 "partition_date_start": 0,
 "partition_date_format": "MMdd",
 "partition_time_format": "HH",
 "partition_type": "APPEND",
 "partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
},

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.

  was:
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
bq. "partition_desc": {
bq. "partition_date_column": "A_VL_HOURLY_V.THEDATE",
bq. "partition_time_column": "A_VL_HOURLY_V.THEHOUR",
bq. "partition_date_start": 0,
bq. "partition_date_format": "MMdd",
bq. "partition_time_format": "HH",
bq. "partition_type": "APPEND",
bq. "partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
bq. },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most 

[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2017-12-20 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Description: 
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
bq. "partition_desc": {
bq. "partition_date_column": "A_VL_HOURLY_V.THEDATE",
bq. "partition_time_column": "A_VL_HOURLY_V.THEHOUR",
bq. "partition_date_start": 0,
bq. "partition_date_format": "MMdd",
bq. "partition_time_format": "HH",
bq. "partition_type": "APPEND",
bq. "partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
bq. },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.

  was:
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
{{"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },}}

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most 

[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2017-12-20 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Description: 
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
{{"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },}}

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.

  was:
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most cases 
Kylin server will be 

[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2017-12-20 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Description: 
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '10') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggests bugs in the logic.

If filtering condition is on specific date and closed-open range of hours (e.g. 
thedate = '20171011' and thehour >= '10' and thehour < '11'), in addition to 
sequentially scanning all the cube partitions (as described above), Kylin will 
scan HBase tables for all the hours from the specified starting hour and till 
the last hour of the day (e.g. from hour 10 to 24, instead of just hour 10).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If filtering condition is on specific date by hour interval is specified as 
open-closed (e.g. thedate = '20171011' and thehour > '09' and thehour <= '10'), 
Kylin will scan all HBase tables for all the later dates and hours (e.g. from 
hour 10 and till the most recent hour on the most recent day, which can be 
hundreds of r).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.

  was:
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggest bugs in the logic.

If condition is on specific date and closed-open range of hours (e.g. thedate = 
'20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially 
scanning all the cube partitions (as described above), Kylin will scan HBase 
regions for all the hours from the starting hour and till the last hour of the 
day (e.g. from hour 10 to 24).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If condition is on specific date by hour interval is specified as open-closed 
(e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will 
scan all HBase regions for all the later dates and hours (e.g. from hour 10 and 
till the most recent hour on the most recent day).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.


> Partition elimination algorithm seems to be 

[jira] [Updated] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2017-12-20 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3122:
--
Description: 
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggest bugs in the logic.

If condition is on specific date and closed-open range of hours (e.g. thedate = 
'20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially 
scanning all the cube partitions (as described above), Kylin will scan HBase 
regions for all the hours from the starting hour and till the last hour of the 
day (e.g. from hour 10 to 24).
As the result query will run much longer that necessary, and might run out of 
memory, causing JVM heap dump and Kylin server crash.


If condition is on specific date by hour interval is specified as open-closed 
(e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will 
scan all HBase regions for all the later dates and hours (e.g. from hour 10 and 
till the most recent hour on the most recent day).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.

  was:
Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggest bugs in the logic.

If condition is on specific date and closed-open range of hours (e.g. thedate = 
'20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially 
scanning all the cube partitions (as described above), Kylin will scan HBase 
regions for all the hours from the starting hour and till the last hour of the 
day (e.g. from hour 10 to 24).
As the result query will run much longer that necessary, and might run out of 
memory.


If condition is on specific date by hour interval is specified as open-closed 
(e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will 
scan all HBase regions for all the later dates and hours (e.g. from hour 10 and 
till the most recent hour on the most recent day).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.


> Partition elimination algorithm seems to be inefficient and have serious 
> issues with handling date/time ranges, can lead to very slow queries and 
> OOM/Java heap dump 

[jira] [Created] (KYLIN-3122) Partition elimination algorithm seems to be inefficient and have serious issues with handling date/time ranges, can lead to very slow queries and OOM/Java heap dump condi

2017-12-20 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3122:
-

 Summary: Partition elimination algorithm seems to be inefficient 
and have serious issues with handling date/time ranges, can lead to very slow 
queries and OOM/Java heap dump conditions
 Key: KYLIN-3122
 URL: https://issues.apache.org/jira/browse/KYLIN-3122
 Project: Kylin
  Issue Type: Bug
  Components: Storage - HBase
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2.0
Reporter: Vsevolod Ostapenko
Assignee: hongbin ma


Current algorithm of cube segment elimination seems to be rather inefficient.
We are using a model where cubes are partitioned by date and time:
"partition_desc": {
"partition_date_column": "A_VL_HOURLY_V.THEDATE",
"partition_time_column": "A_VL_HOURLY_V.THEHOUR",
"partition_date_start": 0,
"partition_date_format": "MMdd",
"partition_time_format": "HH",
"partition_type": "APPEND",
"partition_condition_builder": 
"org.apache.kylin.metadata.model.PartitionDesc$DefaultPartitionConditionBuilder"
  },

Cubes contain partitions for multiple days and 24 hours for each day. Each cube 
segment corresponds to just one hour.

When a query is issued where both date and hour are specified using equality 
condition (e.g. thedate = '20171011' and thehour = '00') Kylin sequentially 
integrates over all the segment cubes (hundreds of them) only to skip all 
except for the one that needs to be scanned (which can be observed by looking 
in the logs).
The expectation is that Kylin would use existing info on the partitioning 
columns (date and time) and known hierarchical relations between date and time 
to locate required partition much more efficiently that linear scan through all 
the cube partitions.

Now, if filtering condition is on the range of hours, behavior of the partition 
pruning and scanning becomes not very logical, which suggest bugs in the logic.

If condition is on specific date and closed-open range of hours (e.g. thedate = 
'20171011' and thehour >= '10' and thehour < '11'), in addition to sequentially 
scanning all the cube partitions (as described above), Kylin will scan HBase 
regions for all the hours from the starting hour and till the last hour of the 
day (e.g. from hour 10 to 24).
As the result query will run much longer that necessary, and might run out of 
memory.


If condition is on specific date by hour interval is specified as open-closed 
(e.g. thedate = '20171011' and thehour > '10' and thehour <= '11'), Kylin will 
scan all HBase regions for all the later dates and hours (e.g. from hour 10 and 
till the most recent hour on the most recent day).
As the result query execution will dramatically increase and in most cases 
Kylin server will be terminated with OOM error and JVM heap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields

2017-12-20 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3121:
--
Description: 
Queries that include two (or more) left outer joins and contain floating point 
expressions that operate on the fields that contain integer NULL values (due to 
left outer join) fail in-flight with NullPointerExceptions.

As an example, the following query generates NPE on either of the two 
expressions:
* 100.0 * t2.media_gap_call_count
* 1.0 * NULLIF(t1.active_call_count, 0)

with
t1
as
(
 select
d1.cell_name,
count(distinct a1.call_id) as active_call_count
 from
zetticsdw.a_vl_hourly_v a1
 inner join
zetticsdw.d_cell_v d1
 on
a1.cell_key = d1.cell_key
 where
d1.region_3 = 'Mumbai'
and
a1.thedate = '20171011'
and
a1.thehour = '00'
and
a1.active_call_flg = 1
group by
d1.cell_name
),
t2
as
(
 select
d1.cell_name,
count(distinct a1.call_id) as media_gap_call_count
 from
zetticsdw.a_vl_hourly_v a1
 inner join
zetticsdw.d_cell_v d1
 on
a1.cell_key = d1.cell_key
 where
d1.region_3 = 'Mumbai'
and
a1.thedate='20171011'
and
a1.thehour = '00'
and
a1.media_gap_call_flg = 1
group by
d1.cell_name
)
,
t3
as
(
 select
d1.cell_name,
sum(a1.ow_call_flg)   one_way_call_count,
sum(a1.succ_call_flg) successfull_call_count
 from
zetticsdw.a_vl_hourly_v a1
 inner join
zetticsdw.d_cell_v d1
 on
a1.cell_key = d1.cell_key
 where
d1.region_3 = 'Mumbai'
and
a1.thedate='20171011'
and
a1.thehour = '00'
group by
d1.cell_name
)
select
   t3.cell_name,
   t1.active_call_count,
   t2.media_gap_call_count,
   t3.one_way_call_count,
   t3.successfull_call_count,
   -- 100 * t2.media_gap_call_count nom,   -- works
   -- 1 * NULLIF(t1.active_call_count, 0) denom-- works
   100.0 * t2.media_gap_call_count nom, -- fails, 
NPE of one kind
   1.0 * NULLIF(t1.active_call_count, 0) denom  -- fails, 
NPE of different kind
   -- 100.0 * COALESCE(t2.media_gap_call_count, 0) nom,-- works
   -- 1.0 * CAST(NULLIF(t1.active_call_count, 0) as DOUBLE) denom  -- works
from
   t3
left outer join
   t1
on
   t3.cell_name = t1.cell_name
left outer join
   t2
on
   t3.cell_name = t2.cell_name

In the first case (multiplication of an integer field with a NULL value and a 
double) kylin log contains a stack trace similar to the following:
null
at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
at 
org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
at 
org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
at 
org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
at 
org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
at 
org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
at sun.reflect.GeneratedMethodAccessor545.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
at 

[jira] [Created] (KYLIN-3121) NPE while executing a query with two left outer joins and floating point expressions on nullable fields

2017-12-20 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3121:
-

 Summary: NPE while executing a query with two left outer joins and 
floating point expressions on nullable fields
 Key: KYLIN-3121
 URL: https://issues.apache.org/jira/browse/KYLIN-3121
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2.0
Reporter: Vsevolod Ostapenko
Assignee: liyang


Queries that include two (or more) left outer joins and contain floating point 
expressions that operate on the fields that contain integer NULL values (due to 
left outer join) fail in-flight with NullPointerExceptions.

As an example, the following query generates NPE on either of the two 
expressions:
* 100.0 * t2.media_gap_call_count
* 1.0 * NULLIF(t1.active_call_count, 0)

{{with
t1
as
(
 select
d1.cell_name,
count(distinct a1.call_id) as active_call_count
 from
zetticsdw.a_vl_hourly_v a1
 inner join
zetticsdw.d_cell_v d1
 on
a1.cell_key = d1.cell_key
 where
d1.region_3 = 'Mumbai'
and
a1.thedate = '20171011'
and
a1.thehour = '00'
and
a1.active_call_flg = 1
group by
d1.cell_name
),
t2
as
(
 select
d1.cell_name,
count(distinct a1.call_id) as media_gap_call_count
 from
zetticsdw.a_vl_hourly_v a1
 inner join
zetticsdw.d_cell_v d1
 on
a1.cell_key = d1.cell_key
 where
d1.region_3 = 'Mumbai'
and
a1.thedate='20171011'
and
a1.thehour = '00'
and
a1.media_gap_call_flg = 1
group by
d1.cell_name
)
,
t3
as
(
 select
d1.cell_name,
sum(a1.ow_call_flg)   one_way_call_count,
sum(a1.succ_call_flg) successfull_call_count
 from
zetticsdw.a_vl_hourly_v a1
 inner join
zetticsdw.d_cell_v d1
 on
a1.cell_key = d1.cell_key
 where
d1.region_3 = 'Mumbai'
and
a1.thedate='20171011'
and
a1.thehour = '00'
group by
d1.cell_name
)
select
   t3.cell_name,
   t1.active_call_count,
   t2.media_gap_call_count,
   t3.one_way_call_count,
   t3.successfull_call_count,
   -- 100 * t2.media_gap_call_count nom,   -- works
   -- 1 * NULLIF(t1.active_call_count, 0) denom-- works
   100.0 * t2.media_gap_call_count nom, -- fails, 
NPE of one kind
   1.0 * NULLIF(t1.active_call_count, 0) denom  -- fails, 
NPE of different kind
   -- 100.0 * COALESCE(t2.media_gap_call_count, 0) nom,-- works
   -- 1.0 * CAST(NULLIF(t1.active_call_count, 0) as DOUBLE) denom  -- works
from
   t3
left outer join
   t1
on
   t3.cell_name = t1.cell_name
left outer join
   t2
on
   t3.cell_name = t2.cell_name}}

In the first (multiplication of integer NULL and a double) kylin log contains a 
stack trace similar to the following:
null
at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:156)
at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
at 
org.apache.kylin.rest.service.QueryService.execute(QueryService.java:834)
at 
org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:561)
at 
org.apache.kylin.rest.service.QueryService.query(QueryService.java:181)
at 
org.apache.kylin.rest.service.QueryService.doQueryWithCache(QueryService.java:415)
at 
org.apache.kylin.rest.controller.QueryController.query(QueryController.java:78)
at sun.reflect.GeneratedMethodAccessor545.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
at 
org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
at 
org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
at 
org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
at 
org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at 

[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-19 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297537#comment-16297537
 ] 

Vsevolod Ostapenko commented on KYLIN-3114:
---

I modified kylinConfig.isInitialized() method to use angular.isString(), which 
is a more appropriate check. Updated patch is attached.

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.002.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-19 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3114:
--
Attachment: (was: KYLIN-3114.master.001.patch)

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.002.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-19 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3114:
--
Attachment: KYLIN-3114.master.002.patch

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch, KYLIN-3114.master.002.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-19 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297455#comment-16297455
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3114 at 12/19/17 9:31 PM:
-

Hi [~Shaofengshi],
They are completely different things.
kylin.web.query-timeout is used to set "timeout" property on the REST API query 
action of AngularJS QueryService controller 
(https://docs.angularjs.org/api/ngResource/service/$resource). This timeout is 
enforced by AngularJS framework. It's measured in milliseconds. Up to this 
point is was hardcoded to be 300_000 milliseconds (5 minutes).

kylin.query.timeout-seconds - despite its name is not a query timeout at all, 
but a "soft" limit on for how long query results can be fetched from a storage 
provider. It's measured in seconds, and it's enforced in 
SequentialCubeTupleIterator.java (btw, check only happens on the .next() 
iterator call, so technically query may never return and this limit will never 
be enforced).
It defaults to 0 (zero), which indicates that there is no time limit 
(technically it's Integer.MAX_VALUE/1000 seconds).

Just to summarize, those settings are completely different and apply to 
different parts of Kylin. Mine is for the Web UI, and the other one is for the 
Kylin back-end.


was (Author: seva_ostapenko):
They are completely different things.
kylin.web.query-timeout is used to set "timeout" property on the REST API query 
action of AngularJS QueryService controller 
(https://docs.angularjs.org/api/ngResource/service/$resource). This timeout is 
enforced by AngularJS framework. It's measured in milliseconds. Up to this 
point is was hardcoded to be 300_000 milliseconds (5 minutes).

kylin.query.timeout-seconds - despite its name is not a query timeout at all, 
but a "soft" limit on for how long query results can be fetched from a storage 
provider. It's measured in seconds, and it's enforced in 
SequentialCubeTupleIterator.java (btw, check only happens on the .next() 
iterator call, so technically query may never return and this limit will never 
be enforced).
It defaults to 0 (zero), which indicates that there is no time limit 
(technically it's Integer.MAX_VALUE/1000 seconds).

Just to summarize, those settings are completely different and apply to 
different parts of Kylin. Mine is for the Web UI, and the other one is for the 
Kylin back-end.

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-19 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297455#comment-16297455
 ] 

Vsevolod Ostapenko commented on KYLIN-3114:
---

They are completely different things.
kylin.web.query-timeout is used to set "timeout" property on the REST API query 
action of AngularJS QueryService controller 
(https://docs.angularjs.org/api/ngResource/service/$resource). This timeout is 
enforced by AngularJS framework. It's measured in milliseconds. Up to this 
point is was hardcoded to be 300_000 milliseconds (5 minutes).

kylin.query.timeout-seconds - despite its name is not a query timeout at all, 
but a "soft" limit on for how long query results can be fetched from a storage 
provider. It's measured in seconds, and it's enforced in 
SequentialCubeTupleIterator.java (btw, check only happens on the .next() 
iterator call, so technically query may never return and this limit will never 
be enforced).
It defaults to 0 (zero), which indicates that there is no time limit 
(technically it's Integer.MAX_VALUE/1000 seconds).

Just to summarize, those settings are completely different and apply to 
different parts of Kylin. Mine is for the Web UI, and the other one is for the 
Kylin back-end.

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3104) When the user log out from "Monitor" page, an alert dialog will pop up warning "Failed to load query."

2017-12-18 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295769#comment-16295769
 ] 

Vsevolod Ostapenko commented on KYLIN-3104:
---

I would also suggest changing the error message to something like "Failed to 
retrieve information about a slow-running query" as the current message is too 
generic.
Btw, the fix looks good to me, as I did exactly the same code change on my 
local copy of 2.2.x to get around this annoyance.


> When the user log out from "Monitor" page, an alert dialog will pop up 
> warning "Failed to load query."
> --
>
> Key: KYLIN-3104
> URL: https://issues.apache.org/jira/browse/KYLIN-3104
> Project: Kylin
>  Issue Type: Bug
>  Components: General, Web 
>Affects Versions: v2.3.0
>Reporter: peng.jianhua
>Assignee: peng.jianhua
> Attachments: 
> 0001-KYLIN-3104-When-the-user-log-out-from-Monitor-page-a.patch, 
> alert_dialog_will_pop_up_when_log_out_from_Monitor_page.PNG
>
>
> When the user log out from "Monitor" page, an alert dialog will pop up 
> warning "Failed to load query."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (KYLIN-3104) When the user log out from "Monitor" page, an alert dialog will pop up warning "Failed to load query."

2017-12-18 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3104:
--
Comment: was deleted

(was: I would also suggest changing the error message to something like "Failed 
to retrieve information about a slow-running query" as the current message is 
too generic.
Btw, the fix looks good to me, as I did exactly the same code change on my 
local copy of 2.2.x to get around this annoyance.
)

> When the user log out from "Monitor" page, an alert dialog will pop up 
> warning "Failed to load query."
> --
>
> Key: KYLIN-3104
> URL: https://issues.apache.org/jira/browse/KYLIN-3104
> Project: Kylin
>  Issue Type: Bug
>  Components: General, Web 
>Affects Versions: v2.3.0
>Reporter: peng.jianhua
>Assignee: peng.jianhua
> Attachments: 
> 0001-KYLIN-3104-When-the-user-log-out-from-Monitor-page-a.patch, 
> alert_dialog_will_pop_up_when_log_out_from_Monitor_page.PNG
>
>
> When the user log out from "Monitor" page, an alert dialog will pop up 
> warning "Failed to load query."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3104) When the user log out from "Monitor" page, an alert dialog will pop up warning "Failed to load query."

2017-12-18 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295766#comment-16295766
 ] 

Vsevolod Ostapenko commented on KYLIN-3104:
---

I would also suggest changing the error message to something like "Failed to 
retrieve information about a slow-running query" as the current message is too 
generic.
Btw, the fix looks good to me, as I did exactly the same code change on my 
local copy of 2.2.x to get around this annoyance.


> When the user log out from "Monitor" page, an alert dialog will pop up 
> warning "Failed to load query."
> --
>
> Key: KYLIN-3104
> URL: https://issues.apache.org/jira/browse/KYLIN-3104
> Project: Kylin
>  Issue Type: Bug
>  Components: General, Web 
>Affects Versions: v2.3.0
>Reporter: peng.jianhua
>Assignee: peng.jianhua
> Attachments: 
> 0001-KYLIN-3104-When-the-user-log-out-from-Monitor-page-a.patch, 
> alert_dialog_will_pop_up_when_log_out_from_Monitor_page.PNG
>
>
> When the user log out from "Monitor" page, an alert dialog will pop up 
> warning "Failed to load query."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-18 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295202#comment-16295202
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3114 at 12/18/17 4:26 PM:
-

[~Shaofengshi], could you please look at the changes or perhaps ask the right 
person to do that and provide feedback?
Thanks in advance,
Vsevolod.


was (Author: seva_ostapenko):
[~Shaofengshi], could you please look at the changes or perhaps the right 
person to do that and provide feedback?
Thanks in advance,
Vsevolod.

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-18 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295202#comment-16295202
 ] 

Vsevolod Ostapenko commented on KYLIN-3114:
---

[~Shaofengshi], could you please look at the changes or perhaps the right 
person to do that and provide feedback?
Thanks in advance,
Vsevolod.

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge

2017-12-15 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293357#comment-16293357
 ] 

Vsevolod Ostapenko commented on KYLIN-3069:
---

[~peng.jianhua], I believe that instead of using
time = moment(item).tz(timezone).format(format) + " (" + timezone + ")";

it should be 
time = moment(item).tz(timezone).format(format + " z");

or formats should include short time zone name element, e.g.
format = "-MM-DD HH:mm:ss  z";

> Add proper time zone support to the WebUI instead of GMT/PST kludge
> ---
>
> Key: KYLIN-3069
> URL: https://issues.apache.org/jira/browse/KYLIN-3069
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.3, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: peng.jianhua
>Priority: Minor
> Attachments: 
> 0001-KYLIN-3069-Add-proper-time-zone-support-to-the-WebUI.patch, Screen Shot 
> 2017-12-05 at 10.01.39 PM.png, kylin_pic1.png, kylin_pic2.png, kylin_pic3.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Time zone handling logic in the WebUI is a kludge, coded to parse only 
> "GMT-N" time zone specifications and defaulting to PST, if parsing is not 
> successful (kylin/webapp/app/js/filters/filter.js)
> Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) 
> into the product, would allow correct time zone handling.
> For the users who happen to reside in the geographical locations that do 
> observe day light savings time, usage of GMT-N format is very inconvenient 
> and info reported by the UI in various places is perplexing.
> Needless to say that the GMT moniker itself is long deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-15 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293300#comment-16293300
 ] 

Vsevolod Ostapenko commented on KYLIN-3114:
---

I attached the patch for the proposed enhancement. Tested it internally and it 
seem to work as expected. 
Properties will be reloaded by QueryService only if kylinConfig has not been 
yet initialized (use case for that is when a user hits refresh while on the 
"Insights" tab or navigates directly to :7070/kylin/query URL in 
their browser).

Please review and provide feedback.

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-15 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3114:
--
Attachment: KYLIN-3114.master.001.patch

> Make timeout for the queries submitted through the Web UI configurable
> --
>
> Key: KYLIN-3114
> URL: https://issues.apache.org/jira/browse/KYLIN-3114
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
> Fix For: v2.3.0
>
> Attachments: KYLIN-3114.master.001.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Currently query.js hard codes timeout for the queries submitted via Web UI to 
> be 300_000 milliseconds.
> Depending on the situation, the default value can be either too large, or too 
> small, especially when query does not hit any cube and is passed through to 
> Hive or Impala.
> Query timeout should be made configurable via kylin.properties.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KYLIN-3114) Make timeout for the queries submitted through the Web UI configurable

2017-12-15 Thread Vsevolod Ostapenko (JIRA)
Vsevolod Ostapenko created KYLIN-3114:
-

 Summary: Make timeout for the queries submitted through the Web UI 
configurable
 Key: KYLIN-3114
 URL: https://issues.apache.org/jira/browse/KYLIN-3114
 Project: Kylin
  Issue Type: Bug
  Components: Web 
Affects Versions: v2.2.0
 Environment: HDP 2.5.6, Kylin 2.2.0
Reporter: Vsevolod Ostapenko
Assignee: Vsevolod Ostapenko
Priority: Minor
 Fix For: v2.3.0


Currently query.js hard codes timeout for the queries submitted via Web UI to 
be 300_000 milliseconds.
Depending on the situation, the default value can be either too large, or too 
small, especially when query does not hit any cube and is passed through to 
Hive or Impala.
Query timeout should be made configurable via kylin.properties.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3070) Add a config property for flat table storage format

2017-12-08 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284311#comment-16284311
 ] 

Vsevolod Ostapenko commented on KYLIN-3070:
---

[~yimingliu] or [~Shaofengshi], could one of you guys review my changes and 
provide feedback or, if the changes are ok, commit them into the master?

> Add a config property for flat table storage format
> ---
>
> Key: KYLIN-3070
> URL: https://issues.apache.org/jira/browse/KYLIN-3070
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: newbie
> Attachments: KYLIN-3070.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Flat table storage format is currently hard-coded as SEQUENCEFILE in the 
> core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java
> That prevents using Impala as a SQL engine while using beeline CLI (via 
> custom JDBC URL), as Impala cannot write sequence files.
> Adding a parameter to kylin.properties to override the default setting would 
> address the issue.
> Removing a hard-coded value for storage format might be good idea in and on 
> itself.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3070) Add a config property for flat table storage format

2017-12-06 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281336#comment-16281336
 ] 

Vsevolod Ostapenko commented on KYLIN-3070:
---

Patch file is attached, please review. Let me know, if you have any questions 
or comments.

> Add a config property for flat table storage format
> ---
>
> Key: KYLIN-3070
> URL: https://issues.apache.org/jira/browse/KYLIN-3070
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: newbie
> Attachments: KYLIN-3070.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Flat table storage format is currently hard-coded as SEQUENCEFILE in the 
> core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java
> That prevents using Impala as a SQL engine while using beeline CLI (via 
> custom JDBC URL), as Impala cannot write sequence files.
> Adding a parameter to kylin.properties to override the default setting would 
> address the issue.
> Removing a hard-coded value for storage format might be good idea in and on 
> itself.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KYLIN-3070) Add a config property for flat table storage format

2017-12-06 Thread Vsevolod Ostapenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vsevolod Ostapenko updated KYLIN-3070:
--
Attachment: KYLIN-3070.master.001.patch

> Add a config property for flat table storage format
> ---
>
> Key: KYLIN-3070
> URL: https://issues.apache.org/jira/browse/KYLIN-3070
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Vsevolod Ostapenko
>Priority: Minor
>  Labels: newbie
> Attachments: KYLIN-3070.master.001.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Flat table storage format is currently hard-coded as SEQUENCEFILE in the 
> core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java
> That prevents using Impala as a SQL engine while using beeline CLI (via 
> custom JDBC URL), as Impala cannot write sequence files.
> Adding a parameter to kylin.properties to override the default setting would 
> address the issue.
> Removing a hard-coded value for storage format might be good idea in and on 
> itself.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-3070) Add a config property for flat table storage format

2017-12-06 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281003#comment-16281003
 ] 

Vsevolod Ostapenko commented on KYLIN-3070:
---

I made a fix and tested it on my copy of the master branch.
My version of the fix introduces two new parameters in the kylin.properties:
* kylin.source.hive.flat-table-storage-format, which defaults to SEQUENCEFILE
* kylin.source.hive.flat-table-field-delimiter, which defaults to \u001F (Unit 
separator, the same default field separator that Hive uses)

I tested my changes internally and confirmed that they are working as expected.
Btw, while making the change I found a problem with existing handling of the 
TEXTFILE field separators - namely, the value was always fetched from 
kylin.source.jdbc.field-delimiter (apparently a kludge), which technically has 
no direct relations to flat table, so introduction of the 
kylin.source.hive.flat-table-field-delimiter seems warranted.
If you don't have changes ready, please reassign this JIRA ticket to me.

> Add a config property for flat table storage format
> ---
>
> Key: KYLIN-3070
> URL: https://issues.apache.org/jira/browse/KYLIN-3070
> Project: Kylin
>  Issue Type: Improvement
>  Components: Job Engine
>Affects Versions: v2.2.0
> Environment: HDP 2.5.6, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: Rong H
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Flat table storage format is currently hard-coded as SEQUENCEFILE in the 
> core-job/src/main/java/org/apache/kylin/job/JoinedFlatTable.java
> That prevents using Impala as a SQL engine while using beeline CLI (via 
> custom JDBC URL), as Impala cannot write sequence files.
> Adding a parameter to kylin.properties to override the default setting would 
> address the issue.
> Removing a hard-coded value for storage format might be good idea in and on 
> itself.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge

2017-12-05 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279656#comment-16279656
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3069 at 12/6/17 4:45 AM:


Hi [~peng.jianhua], here is my use case.
I have kylin.web.timezone set to America/New_York in my kylie.properties.
The time zone is a perfectly valid canonical time zone name. JVM has no issues 
recognizing it as such. As the result all times formatted in Java on the server 
have correct short time zone moniker (EST) - note the job names in the attached 
screenshot.
!https://issues.apache.org/jira/secure/attachment/12900799/Screen%20Shot%202017-12-05%20at%2010.01.39%20PM.png!
At the same time, since Web UI code does not handle time zone names correctly, 
UI defaults to using PST when formatting time values - again this can be seen 
in the same screenshot in the "Last Modified Time" column.
My expectation is that when moment/moment time zone are integrated, canonical 
time zone names will be recognized properly and correct 3-letter time zone 
abbreviated name would be used while formatting time values.
So, when the issue is corrected "Last Modified Time" would show times in EST 
time zone.

I suppose that after reading and checking time zone settings, Web UI should 
carry internally carry around an object with at least three attributes - 
original tz name specified in the kylin.properties, 3-letter abbreviated tz 
name and tz offset from UTC (the last two retrieved by calling moment time zone 
functions).

More over, if time zone name happens to be is incorrect (or not yet supported 
by moment time zone), instead of defaulting to PST, Web UI code should default 
to UTC. Also, since GMT has been deprecated, all references to GMT (if any left 
after integrating support for moment time zone) should be replaced with UTC.


was (Author: seva_ostapenko):
Hi [~peng.jianhua], here is my use case.
I have kylin.web.timezone set to America/New_York in my kylie.properties.
The time zone is a perfectly valid canonical time zone name. JVM has no issues 
recognizing it as such. As the result all times formatted in Java on the server 
have correct short time zone moniker (EST) - note the job names in the attached 
screenshot.
!https://issues.apache.org/jira/secure/attachment/12900799/Screen%20Shot%202017-12-05%20at%2010.01.39%20PM.png!
At the same time, since Web UI code does not handle time zone names correctly, 
UI defaults to using PST when formatting time values - again this can be seen 
in the same screenshot in the "Last Modified Time" column.
My expectation is that when moment/moment time zone are integrated, canonical 
time zone names will be recognized properly and correct 3-letter time zone 
abbreviated name would be used while formatting time values.
I suppose internally UI should carry around an object with at least three 
attributes - original tz name specified in the kylin.properties, 3-letter 
abbreviated tz name and tz offset from UTC.

More over, if time zone name is incorrect, instead of defaulting to PST, it 
should default to UTC. Also, since GMT has been deprecated, all references to 
GMT (if any left after integrating support for moment time zone) should be 
replaced with UTC.

> Add proper time zone support to the WebUI instead of GMT/PST kludge
> ---
>
> Key: KYLIN-3069
> URL: https://issues.apache.org/jira/browse/KYLIN-3069
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.3, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: peng.jianhua
>Priority: Minor
> Attachments: Screen Shot 2017-12-05 at 10.01.39 PM.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Time zone handling logic in the WebUI is a kludge, coded to parse only 
> "GMT-N" time zone specifications and defaulting to PST, if parsing is not 
> successful (kylin/webapp/app/js/filters/filter.js)
> Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) 
> into the product, would allow correct time zone handling.
> For the users who happen to reside in the geographical locations that do 
> observe day light savings time, usage of GMT-N format is very inconvenient 
> and info reported by the UI in various places is perplexing.
> Needless to say that the GMT moniker itself is long deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KYLIN-3069) Add proper time zone support to the WebUI instead of GMT/PST kludge

2017-12-05 Thread Vsevolod Ostapenko (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279656#comment-16279656
 ] 

Vsevolod Ostapenko edited comment on KYLIN-3069 at 12/6/17 4:38 AM:


Hi [~peng.jianhua], here is my use case.
I have kylin.web.timezone set to America/New_York in my kylie.properties.
The time zone is a perfectly valid canonical time zone name. JVM has no issues 
recognizing it as such. As the result all times formatted in Java on the server 
have correct short time zone moniker (EST) - note the job names in the attached 
screenshot.!Screen Shot 2017-12-05 at 10.01.39 PM.png|thumbnail!
At the same time, since Web UI code does not handle time zone names correctly, 
UI defaults to using PST when formatting time values - again this can be seen 
in the same screenshot in the "Last Modified Time" column.
My expectation is that when moment/moment time zone are integrated, canonical 
time zone names will be recognized properly and correct 3-letter time zone 
abbreviated name would be used while formatting time values.
I suppose internally UI should carry around an object with at least three 
attributes - original tz name specified in the kylin.properties, 3-letter 
abbreviated tz name and tz offset from UTC.

More over, if time zone name is incorrect, instead of defaulting to PST, it 
should default to UTC. Also, since GMT has been deprecated, all references to 
GMT (if any left after integrating support for moment time zone) should be 
replaced with UTC.


was (Author: seva_ostapenko):
Hi [~peng.jianhua], here is my use case.
I have kylin.web.timezone set to America/New_York in my kylie.properties.
The time zone is a perfectly valid canonical time zone name. JVM has no issues 
recognizing it as such. As the result all times formatted in Java on the server 
have correct short time zone moniker (EST) - note the job names in the attached 
screenshot.
At the same time, since Web UI code does not handle time zone names correctly, 
UI defaults to using PST when formatting time values - again this can be seen 
in the same screenshot in the "Last Modified Time" column.
My expectation is that when moment/moment time zone are integrated, canonical 
time zone names will be recognized properly and correct 3-letter time zone 
abbreviated name would be used while formatting time values.
I suppose internally UI should carry around an object with at least three 
attributes - original tz name specified in the kylin.properties, 3-letter 
abbreviated tz name and tz offset from UTC.

More over, if time zone name is incorrect, instead of defaulting to PST, it 
should default to UTC. Also, since GMT has been deprecated, all references to 
GMT (if any left after integrating support for moment time zone) should be 
replaced with UTC.

> Add proper time zone support to the WebUI instead of GMT/PST kludge
> ---
>
> Key: KYLIN-3069
> URL: https://issues.apache.org/jira/browse/KYLIN-3069
> Project: Kylin
>  Issue Type: Bug
>  Components: Web 
>Affects Versions: v2.2.0
> Environment: HDP 2.5.3, Kylin 2.2.0
>Reporter: Vsevolod Ostapenko
>Assignee: peng.jianhua
>Priority: Minor
> Attachments: Screen Shot 2017-12-05 at 10.01.39 PM.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Time zone handling logic in the WebUI is a kludge, coded to parse only 
> "GMT-N" time zone specifications and defaulting to PST, if parsing is not 
> successful (kylin/webapp/app/js/filters/filter.js)
> Integrating moment and moment time zone (http://momentjs.com/timezone/docs/) 
> into the product, would allow correct time zone handling.
> For the users who happen to reside in the geographical locations that do 
> observe day light savings time, usage of GMT-N format is very inconvenient 
> and info reported by the UI in various places is perplexing.
> Needless to say that the GMT moniker itself is long deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >