[jira] [Created] (KYLIN-2192) More Robust Global Dictionary

2016-11-15 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-2192:


 Summary: More Robust Global Dictionary
 Key: KYLIN-2192
 URL: https://issues.apache.org/jira/browse/KYLIN-2192
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v1.5.4.1
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: v1.6.0


Global dictionary have been released over 2 months, I've received some 
feedbacks and bug reports. Here's the patch to make global dictionary more 
robust, including some functional improvements.
* Break through 255 bytes limitation for value, but still recommend value 
length less than 8K, avoiding stack overflow error;
* Fix 'Value not exists' or stack overflow error when dict size is larger than 
1GB, the root cause is similar with KYLIN-1834; A check tool also provided for 
check corrupted or not of existing dict data;
* Support parallel dictionary building in one job server, used for parallel 
segments building;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-2088) Support intersect count for calculation of retention or conversion rates

2016-10-12 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-2088:


 Summary: Support intersect count for calculation of retention or 
conversion rates
 Key: KYLIN-2088
 URL: https://issues.apache.org/jira/browse/KYLIN-2088
 Project: Kylin
  Issue Type: New Feature
  Components: Query Engine
Reporter: Yerui Sun
Assignee: Yerui Sun


Retention or Conversion Rates is very important in data analyze. 
It can be calculated from two dataset of two different value of one dimension. 
For example, we have an count distinct measure, like uv(dataset of uuid), and 
one dimension, like date, and the retention of uv between '20161015' and 
'20161016' is the intersection of two uv datasets.
Fortunately, we have implement dataset in Kylin, as bitmap, for precisely count 
distinct. Only an UDAF is needed to calculate intersection of two or more 
bitmaps.
I'll try on this and post patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1934) 'Value not exist' During Cube Merging Caused by Empty Dict

2016-08-02 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1934:


 Summary: 'Value not exist' During Cube Merging Caused by Empty Dict
 Key: KYLIN-1934
 URL: https://issues.apache.org/jira/browse/KYLIN-1934
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.5.4
Reporter: Yerui Sun
Assignee: Yerui Sun
Priority: Critical
 Fix For: v1.5.4


When cube merge, new dictionary will be created which consists of all values in 
old dictionaries.
The values in old dicts is enumerated by MultipleDictionaryValueEnumerator. 
However, if the first dict is empty, the Enumerator.moveNext() will return 
false directly and ignore all values in other dicts, made the new dict is also 
empty. 
The cube merging will be failed because no values contained in the new dict. 

Not sure whether this issue related with KYLIN-1834 or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1910) Support Separate HBase Cluster with NN HA and Kerberos Authentication

2016-07-21 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1910:


 Summary: Support Separate HBase Cluster with NN HA and Kerberos 
Authentication
 Key: KYLIN-1910
 URL: https://issues.apache.org/jira/browse/KYLIN-1910
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine
Affects Versions: v1.5.3
Reporter: Yerui Sun
Assignee: Yerui Sun


Since KYLIN-957, we've support separate hbase cluster deployment. However, it 
didn't support hbase cluster with NameNode HA and Kerberos Authentication 
perfectly.

The key point is the cube building MR job need to access two hdfs cluster, the 
main cluster stored source data and the hbase cluster stored the cube result 
data. The MR task couldn't access hbase cluster with HA qualified path, like 
hdfs://hbase-cluster:8020/path, because that couldn't found dfs.nameservices 
related configs of hbase cluster.
To solve this problem, we need to read the hbase cluster HA configs and set 
into job conf.

Another point is about authentication tokens. During job submitting, the 
Resource Manager try to renew all tokens, to ensure can keep tokens alive. The 
hdfs token of hbase cluster renewing would cause exception, because that RM 
couldn't found the hbase cluster HA configs.
This problem can be resolved with YARN-3021 support.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1905) Wrong Default Date in Cube Build Web UI

2016-07-19 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1905:


 Summary: Wrong Default Date in Cube Build Web UI
 Key: KYLIN-1905
 URL: https://issues.apache.org/jira/browse/KYLIN-1905
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.5.2
Reporter: Yerui Sun
 Attachments: screenshot-1.png

When build cube from Web UI, there's an confirm dialog to select start/end 
date. However, the default date is one month later than current date. For 
example, today is 2016-07, the default date shows '2016-Aug'. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1894) GlobalDictionary may corrupt when server suddenly crash

2016-07-14 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1894:


 Summary: GlobalDictionary may corrupt when server suddenly crash
 Key: KYLIN-1894
 URL: https://issues.apache.org/jira/browse/KYLIN-1894
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.5.3
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: v1.5.3


Global Dictionary store data on hdfs directly, and overwrite directly when data 
file updated. If the server crashed suddenly during writing file, the data file 
may be corrupt and can't be recovered.

To resolve this problem, copy the data file into a tmp directory and copy back 
after the file is updated successfully. 
I'll post a patch later with this solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1803) ExtendedColumn Measure Encoding with Non-ascii Characters

2016-06-17 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1803:


 Summary: ExtendedColumn Measure Encoding with Non-ascii Characters
 Key: KYLIN-1803
 URL: https://issues.apache.org/jira/browse/KYLIN-1803
 Project: Kylin
  Issue Type: Bug
  Components: Job Engine
Affects Versions: v1.5.2, v1.5.3
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: v1.5.3


ExtendedColumn measure ingests data by converting String to bytes array. The 
current converting can't deal with non-ascii characters properly. For example, 
the Chinese characters '北京' was converted to '??',but not UTF-8 byte arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1775) Add Cube Migrate Support for Global Dictionary

2016-06-08 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1775:


 Summary: Add Cube Migrate Support for Global Dictionary
 Key: KYLIN-1775
 URL: https://issues.apache.org/jira/browse/KYLIN-1775
 Project: Kylin
  Issue Type: Improvement
  Components: Metadata
Affects Versions: v1.5.3
Reporter: Yerui Sun
Assignee: Yerui Sun


Since KYLIN-1705, we've introduced global dictionary. The global dictionary 
will serialize dict data into hdfs storage directly, instead of save in hbase 
resource store. However, when cube was migrated from one metadata to another, 
the global dict data didn't copy to the new metadata.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1762) Query threw NPE whit 3 or more join conditions

2016-06-04 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1762:


 Summary: Query threw NPE whit 3 or more join conditions
 Key: KYLIN-1762
 URL: https://issues.apache.org/jira/browse/KYLIN-1762
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v1.5.2
Reporter: Yerui Sun
Assignee: Yerui Sun


Here's a example to re-produce the error with kylin sample data:
{code}
select t1.leaf_categ_id, max_price, min_price, sum_price
from
(select leaf_categ_id, sum(price) as sum_price from kylin_sales group by 
leaf_categ_id) t1
join
(select leaf_categ_id, max(price) as max_price from kylin_sales group by 
leaf_categ_id) t2
on t1.leaf_categ_id = t2.leaf_categ_id
join
(select leaf_categ_id, min(price) as min_price from kylin_sales group by 
leaf_categ_id) t3
on t1.leaf_categ_id = t3.leaf_categ_id
order by t1.leaf_categ_id
{code}

And here's the error stack:
{code}
Caused by: java.lang.NullPointerException: null
at 
org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:104)
at 
org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
at 
org.apache.kylin.query.relnode.OLAPSortRel.implementOLAP(OLAPSortRel.java:68)
at 
org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
at 
org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:69)
at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:97)
at 
org.apache.calcite.adapter.enumerable.EnumerableSort.implement(EnumerableSort.java:70)
at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:102)
at 
org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
at 
org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1171)
at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:297)
at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:196)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:721)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:588)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:558)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214)
at 
org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:573)
at 
org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135)
at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:186)
{code}

In OLAPJoinRel.implementOLAP, context only be allocated by root join node, and 
be deleted in all join node, including child join node. In another word, the 
count of context allocating and deleting is mismatch. That made the parent node 
of join got an empty context, and threw NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1732) Support Window Function

2016-05-24 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1732:


 Summary: Support Window Function
 Key: KYLIN-1732
 URL: https://issues.apache.org/jira/browse/KYLIN-1732
 Project: Kylin
  Issue Type: Bug
  Components: Query Engine
Affects Versions: v1.5.2
Reporter: Yerui Sun
Assignee: liyang


Kylin didn't support window function yet. Here's a test query:
{code}
select lstg_format_name, count(*) over(partition by lstg_format_name)
from kylin_sales
{code}
The query threw a exception and here's the error log and stack trace:
{code}
Error while executing SQL "select lstg_format_name, count(*) over(partition by 
lstg_format_name) from kylin_sales LIMIT 5": cannot translate call COUNT() 
OVER (PARTITION BY $t3 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING)
{code}
{code}
Caused by: java.lang.RuntimeException: cannot translate call COUNT() OVER 
(PARTITION BY $t3 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateCall(RexToLixTranslator.java:533)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:507)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:219)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:472)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:219)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:214)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateList(RexToLixTranslator.java:700)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateProjects(RexToLixTranslator.java:189)
at 
org.apache.calcite.adapter.enumerable.EnumerableCalc.implement(EnumerableCalc.java:188)
at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:97)
at 
org.apache.kylin.query.relnode.OLAPRel$JavaImplementor.visitChild(OLAPRel.java:183)
at 
org.apache.calcite.adapter.enumerable.EnumerableLimit.implement(EnumerableLimit.java:106)
at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.visitChild(EnumerableRelImplementor.java:97)
at 
org.apache.kylin.query.relnode.OLAPRel$JavaImplementor.visitChild(OLAPRel.java:183)
at 
org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:108)
at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:102)
at 
org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
at 
org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1171)
at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:297)
at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:196)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:721)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:588)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:558)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:214)
at 
org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:573)
at 
org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:571)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:135)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1719) Add config in scan request to control compress the query result or not

2016-05-20 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1719:


 Summary: Add config in scan request to control compress the query 
result or not
 Key: KYLIN-1719
 URL: https://issues.apache.org/jira/browse/KYLIN-1719
 Project: Kylin
  Issue Type: Improvement
  Components: Query Engine
Affects Versions: v1.5.1, v1.5.2
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: v1.5.3


The query result in CubeVisitService will be compressed before sending back to 
client, given about 10% size benefit. However, if the result size is large, 
such as over 100MB, the compress processing will be very slow, takes 60s or 
longer. 
We should add a config in scan request, to control whether compress the result 
or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1718) Grow ByteBuffer Dynamically in Cube Building and Query

2016-05-20 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1718:


 Summary: Grow ByteBuffer Dynamically in Cube Building and Query
 Key: KYLIN-1718
 URL: https://issues.apache.org/jira/browse/KYLIN-1718
 Project: Kylin
  Issue Type: Improvement
  Components: Job Engine, Query Engine
Affects Versions: v1.5.1, v1.5.2
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: v1.5.3


In Cube Building Mapper/Reducer and CubeVisitService, we use an allocated 
ByteBuffer to store encoded metrics value, with a constant size 
RowConstants.ROWVALUE_BUFFER_SIZE, which is 1MB by default.
If the metrics value is larger than 1MB, such as high cardinality bitmap,  
BufferOverflowException will be threw. We need grow the ByteBuffer if the 
exception occured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1372) Query with PrepareStatement failed with OR clause

2016-01-26 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1372:


 Summary: Query with PrepareStatement failed with OR clause
 Key: KYLIN-1372
 URL: https://issues.apache.org/jira/browse/KYLIN-1372
 Project: Kylin
  Issue Type: Bug
Affects Versions: v1.2, v2.0
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: v1.3


Query using prepare statement, with filter 'where lstg_format_name in (?, ?)', 
exception threw:
{code}
Caused by: java.util.NoSuchElementException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1431)
at java.util.HashMap$KeyIterator.next(HashMap.java:1453)
at 
org.apache.kylin.metadata.filter.CompareTupleFilter.addChild(CompareTupleFilter.java:84)
at 
org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.mergeToInClause(OLAPFilterRel.java:159)
at 
org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:126)
at 
org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:45)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:107)
at 
org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:117)
at 
org.apache.kylin.query.relnode.OLAPFilterRel$TupleFilterVisitor.visitCall(OLAPFilterRel.java:45)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:107)
at 
org.apache.kylin.query.relnode.OLAPFilterRel.translateFilter(OLAPFilterRel.java:257)
at 
org.apache.kylin.query.relnode.OLAPFilterRel.implementOLAP(OLAPFilterRel.java:241)
at 
org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
at 
org.apache.kylin.query.relnode.OLAPProjectRel.implementOLAP(OLAPProjectRel.java:100)
at 
org.apache.kylin.query.relnode.OLAPRel$OLAPImplementor.visitChild(OLAPRel.java:81)
at 
org.apache.kylin.query.relnode.OLAPToEnumerableConverter.implement(OLAPToEnumerableConverter.java:67)
at 
org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99)
at 
org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
at 
org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050)
at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293)
at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:158)
... 84 more
{code}

If using with filter 'where lstg_format_name in ('FP-GTC', ?)', query succeed, 
but the result only contained 'FP-GTC' row, the dynamic filter seems didn't 
work.

The reason is, with multi OR clause, OLAPFilterRel.mergeToInClause was called 
to merge into one In clause, but the new CompareTupleFilter lost dynamic 
variables. 

I'll fix this issue later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KYLIN-1274) Query from JDBC is partial results by default

2015-12-30 Thread Yerui Sun (JIRA)
Yerui Sun created KYLIN-1274:


 Summary: Query from JDBC is partial results by default
 Key: KYLIN-1274
 URL: https://issues.apache.org/jira/browse/KYLIN-1274
 Project: Kylin
  Issue Type: Bug
  Components: Driver - JDBC
Affects Versions: 1.2
Reporter: Yerui Sun
Assignee: Yerui Sun
 Fix For: 1.3, 2.0


How to produce this problem: create a query with the result over 1 rows, 
and query with 'order by desc'. Check the first row, it's not the last row in 
all result, but a middle row, maybe the 1th row.
Checked the query log in kylin server, found 'Accept Partial: false', 
indicating it's indeed a partial query.
The reason is, JDBC driver sent the QueryRequest with json encoding, and parsed 
into SQLRequest in server side. By default, QueryRequest only has sql and 
project parameters, and leave all other attributes as default in SQLRequest, 
and in SQLRequest, acceptPartial is true. That's why the query was processed as 
a partial query.
The solution to solve this is simple, add acceptPartial as false in 
QueryRequest, and update JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)