[jira] [Created] (KYLIN-2192) More Robust Global Dictionary
Yerui Sun created KYLIN-2192: Summary: More Robust Global Dictionary Key: KYLIN-2192 URL: https://issues.apache.org/jira/browse/KYLIN-2192 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v1.5.4.1 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: v1.6.0 Global dictionary have been released over 2 months, I've received some feedbacks and bug reports. Here's the patch to make global dictionary more robust, including some functional improvements. * Break through 255 bytes limitation for value, but still recommend value length less than 8K, avoiding stack overflow error; * Fix 'Value not exists' or stack overflow error when dict size is larger than 1GB, the root cause is similar with KYLIN-1834; A check tool also provided for check corrupted or not of existing dict data; * Support parallel dictionary building in one job server, used for parallel segments building; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates
Yerui Sun created KYLIN-2088: Summary: Support intersect count for calculation of retention or conversion rates Key: KYLIN-2088 URL: https://issues.apache.org/jira/browse/KYLIN-2088 Project: Kylin Issue Type: New Feature Components: Query Engine Reporter: Yerui Sun Assignee: Yerui Sun Retention or Conversion Rates is very important in data analyze. It can be calculated from two dataset of two different value of one dimension. For example, we have an count distinct measure, like uv(dataset of uuid), and one dimension, like date, and the retention of uv between '20161015' and '20161016' is the intersection of two uv datasets. Fortunately, we have implement dataset in Kylin, as bitmap, for precisely count distinct. Only an UDAF is needed to calculate intersection of two or more bitmaps. I'll try on this and post patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1934) 'Value not exist' During Cube Merging Caused by Empty Dict
Yerui Sun created KYLIN-1934: Summary: 'Value not exist' During Cube Merging Caused by Empty Dict Key: KYLIN-1934 URL: https://issues.apache.org/jira/browse/KYLIN-1934 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.5.4 Reporter: Yerui Sun Assignee: Yerui Sun Priority: Critical Fix For: v1.5.4 When cube merge, new dictionary will be created which consists of all values in old dictionaries. The values in old dicts is enumerated by MultipleDictionaryValueEnumerator. However, if the first dict is empty, the Enumerator.moveNext() will return false directly and ignore all values in other dicts, made the new dict is also empty. The cube merging will be failed because no values contained in the new dict. Not sure whether this issue related with KYLIN-1834 or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1910) Support Separate HBase Cluster with NN HA and Kerberos Authentication
Yerui Sun created KYLIN-1910: Summary: Support Separate HBase Cluster with NN HA and Kerberos Authentication Key: KYLIN-1910 URL: https://issues.apache.org/jira/browse/KYLIN-1910 Project: Kylin Issue Type: Improvement Components: Job Engine Affects Versions: v1.5.3 Reporter: Yerui Sun Assignee: Yerui Sun Since KYLIN-957, we've support separate hbase cluster deployment. However, it didn't support hbase cluster with NameNode HA and Kerberos Authentication perfectly. The key point is the cube building MR job need to access two hdfs cluster, the main cluster stored source data and the hbase cluster stored the cube result data. The MR task couldn't access hbase cluster with HA qualified path, like hdfs://hbase-cluster:8020/path, because that couldn't found dfs.nameservices related configs of hbase cluster. To solve this problem, we need to read the hbase cluster HA configs and set into job conf. Another point is about authentication tokens. During job submitting, the Resource Manager try to renew all tokens, to ensure can keep tokens alive. The hdfs token of hbase cluster renewing would cause exception, because that RM couldn't found the hbase cluster HA configs. This problem can be resolved with YARN-3021 support. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1905) Wrong Default Date in Cube Build Web UI
Yerui Sun created KYLIN-1905: Summary: Wrong Default Date in Cube Build Web UI Key: KYLIN-1905 URL: https://issues.apache.org/jira/browse/KYLIN-1905 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2 Reporter: Yerui Sun Attachments: screenshot-1.png When build cube from Web UI, there's an confirm dialog to select start/end date. However, the default date is one month later than current date. For example, today is 2016-07, the default date shows '2016-Aug'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1894) GlobalDictionary may corrupt when server suddenly crash
Yerui Sun created KYLIN-1894: Summary: GlobalDictionary may corrupt when server suddenly crash Key: KYLIN-1894 URL: https://issues.apache.org/jira/browse/KYLIN-1894 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.5.3 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: v1.5.3 Global Dictionary store data on hdfs directly, and overwrite directly when data file updated. If the server crashed suddenly during writing file, the data file may be corrupt and can't be recovered. To resolve this problem, copy the data file into a tmp directory and copy back after the file is updated successfully. I'll post a patch later with this solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1803) ExtendedColumn Measure Encoding with Non-ascii Characters
Yerui Sun created KYLIN-1803: Summary: ExtendedColumn Measure Encoding with Non-ascii Characters Key: KYLIN-1803 URL: https://issues.apache.org/jira/browse/KYLIN-1803 Project: Kylin Issue Type: Bug Components: Job Engine Affects Versions: v1.5.2, v1.5.3 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: v1.5.3 ExtendedColumn measure ingests data by converting String to bytes array. The current converting can't deal with non-ascii characters properly. For example, the Chinese characters '北京' was converted to '??',but not UTF-8 byte arrays. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1775) Add Cube Migrate Support for Global Dictionary
Yerui Sun created KYLIN-1775: Summary: Add Cube Migrate Support for Global Dictionary Key: KYLIN-1775 URL: https://issues.apache.org/jira/browse/KYLIN-1775 Project: Kylin Issue Type: Improvement Components: Metadata Affects Versions: v1.5.3 Reporter: Yerui Sun Assignee: Yerui Sun Since KYLIN-1705, we've introduced global dictionary. The global dictionary will serialize dict data into hdfs storage directly, instead of save in hbase resource store. However, when cube was migrated from one metadata to another, the global dict data didn't copy to the new metadata. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1762) Query threw NPE whit 3 or more join conditions
Yerui Sun created KYLIN-1762: Summary: Query threw NPE whit 3 or more join conditions Key: KYLIN-1762 URL: https://issues.apache.org/jira/browse/KYLIN-1762 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v1.5.2 Reporter: Yerui Sun Assignee: Yerui Sun Here's a example to re-produce the error with kylin sample data: {code} select t1.leaf_categ_id, max_price, min_price, sum_price from (select leaf_categ_id, sum(price) as sum_price from kylin_sales group by leaf_categ_id) t1 join (select leaf_categ_id, max(price) as max_price from kylin_sales group by leaf_categ_id) t2 on t1.leaf_categ_id = t2.leaf_categ_id join (select leaf_categ_id, min(price) as min_price from kylin_sales group by leaf_categ_id) t3 on t1.leaf_categ_id = t3.leaf_categ_id order by t1.leaf_categ_id {code} And here's the error stack: {code} Caused by: java.lang.NullPointerException: null at org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:104) at org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) at org.apache.kylin.query.relnode.OLAPSortRel.implementOLAP(OLAPSortRel.java:68) at org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) at org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:69) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:97) at org.apache.calcite.adapter.enumerable.EnumerableSort.implement(EnumerableSort.java:70) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:102) at org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) at org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1171) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:297) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:196) at org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:721) at org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:588) at org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:558) at org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214) at org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:573) at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135) at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186) {code} In OLAPJoinRel.implementOLAP, context only be allocated by root join node, and be deleted in all join node, including child join node. In another word, the count of context allocating and deleting is mismatch. That made the parent node of join got an empty context, and threw NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1732) Support Window Function
Yerui Sun created KYLIN-1732: Summary: Support Window Function Key: KYLIN-1732 URL: https://issues.apache.org/jira/browse/KYLIN-1732 Project: Kylin Issue Type: Bug Components: Query Engine Affects Versions: v1.5.2 Reporter: Yerui Sun Assignee: liyang Kylin didn't support window function yet. Here's a test query: {code} select lstg_format_name, count(*) over(partition by lstg_format_name) from kylin_sales {code} The query threw a exception and here's the error log and stack trace: {code} Error while executing SQL "select lstg_format_name, count(*) over(partition by lstg_format_name) from kylin_sales LIMIT 5": cannot translate call COUNT() OVER (PARTITION BY $t3 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) {code} {code} Caused by: java.lang.RuntimeException: cannot translate call COUNT() OVER (PARTITION BY $t3 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateCall(RexToLixTranslator.java:533) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:507) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:219) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:472) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:219) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:214) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateList(RexToLixTranslator.java:700) at org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateProjects(RexToLixTranslator.java:189) at org.apache.calcite.adapter.enumerable.EnumerableCalc.implement(EnumerableCalc.java:188) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:97) at org.apache.kylin.query.relnode.OLAPRel$JavaImplementor.visitChild(OLAPRel.java:183) at org.apache.calcite.adapter.enumerable.EnumerableLimit.implement(EnumerableLimit.java:106) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:97) at org.apache.kylin.query.relnode.OLAPRel$JavaImplementor.visitChild(OLAPRel.java:183) at org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:108) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:102) at org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) at org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1171) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:297) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:196) at org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:721) at org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:588) at org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:558) at org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214) at org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:573) at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1719) Add config in scan request to control compress the query result or not
Yerui Sun created KYLIN-1719: Summary: Add config in scan request to control compress the query result or not Key: KYLIN-1719 URL: https://issues.apache.org/jira/browse/KYLIN-1719 Project: Kylin Issue Type: Improvement Components: Query Engine Affects Versions: v1.5.1, v1.5.2 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: v1.5.3 The query result in CubeVisitService will be compressed before sending back to client, given about 10% size benefit. However, if the result size is large, such as over 100MB, the compress processing will be very slow, takes 60s or longer. We should add a config in scan request, to control whether compress the result or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1718) Grow ByteBuffer Dynamically in Cube Building and Query
Yerui Sun created KYLIN-1718: Summary: Grow ByteBuffer Dynamically in Cube Building and Query Key: KYLIN-1718 URL: https://issues.apache.org/jira/browse/KYLIN-1718 Project: Kylin Issue Type: Improvement Components: Job Engine, Query Engine Affects Versions: v1.5.1, v1.5.2 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: v1.5.3 In Cube Building Mapper/Reducer and CubeVisitService, we use an allocated ByteBuffer to store encoded metrics value, with a constant size RowConstants.ROWVALUE_BUFFER_SIZE, which is 1MB by default. If the metrics value is larger than 1MB, such as high cardinality bitmap, BufferOverflowException will be threw. We need grow the ByteBuffer if the exception occured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1372) Query with PrepareStatement failed with OR clause
Yerui Sun created KYLIN-1372: Summary: Query with PrepareStatement failed with OR clause Key: KYLIN-1372 URL: https://issues.apache.org/jira/browse/KYLIN-1372 Project: Kylin Issue Type: Bug Affects Versions: v1.2, v2.0 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: v1.3 Query using prepare statement, with filter 'where lstg_format_name in (?, ?)', exception threw: {code} Caused by: java.util.NoSuchElementException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1431) at java.util.HashMap$KeyIterator.next(HashMap.java:1453) at org.apache.kylin.metadata.filter.CompareTupleFilter.addChild(CompareTupleFilter.java:84) at org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.mergeToInClause(OLAPFilterRel.java:159) at org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:126) at org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:45) at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) at org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:117) at org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:45) at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) at org.apache.kylin.query.relnode.OLAPFilterRel.translateFilter(OLAPFilterRel.java:257) at org.apache.kylin.query.relnode.OLAPFilterRel.implementOLAP(OLAPFilterRel.java:241) at org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) at org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:100) at org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81) at org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67) at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99) at org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92) at org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293) at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188) at org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671) at org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572) at org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541) at org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173) at org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:158) ... 84 more {code} If using with filter 'where lstg_format_name in ('FP-GTC', ?)', query succeed, but the result only contained 'FP-GTC' row, the dynamic filter seems didn't work. The reason is, with multi OR clause, OLAPFilterRel.mergeToInClause was called to merge into one In clause, but the new CompareTupleFilter lost dynamic variables. I'll fix this issue later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KYLIN-1274) Query from JDBC is partial results by default
Yerui Sun created KYLIN-1274: Summary: Query from JDBC is partial results by default Key: KYLIN-1274 URL: https://issues.apache.org/jira/browse/KYLIN-1274 Project: Kylin Issue Type: Bug Components: Driver - JDBC Affects Versions: 1.2 Reporter: Yerui Sun Assignee: Yerui Sun Fix For: 1.3, 2.0 How to produce this problem: create a query with the result over 1 rows, and query with 'order by desc'. Check the first row, it's not the last row in all result, but a middle row, maybe the 1th row. Checked the query log in kylin server, found 'Accept Partial: false', indicating it's indeed a partial query. The reason is, JDBC driver sent the QueryRequest with json encoding, and parsed into SQLRequest in server side. By default, QueryRequest only has sql and project parameters, and leave all other attributes as default in SQLRequest, and in SQLRequest, acceptPartial is true. That's why the query was processed as a partial query. The solution to solve this is simple, add acceptPartial as false in QueryRequest, and update JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332)