[jira] [Created] (KYLIN-2951) Fix couple of NullPointerException in Kylin code base

2017-10-19 Thread Wang Ken (JIRA)
Wang Ken created KYLIN-2951:
---

 Summary: Fix couple of NullPointerException in Kylin code base
 Key: KYLIN-2951
 URL: https://issues.apache.org/jira/browse/KYLIN-2951
 Project: Kylin
  Issue Type: Bug
Reporter: Wang Ken
Priority: Minor


I create this ticket as the umbrella to cover some NPE bugs in Kylin 2.1 & 2.2 
code base.
We see some NPE errors from our server log, some of them are tiny, it is 
trivial to create individual ticket to cover them one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KYLIN-2951) Fix couple of NullPointerException in Kylin code base

2017-10-19 Thread Wang Ken (JIRA)

 [ 
https://issues.apache.org/jira/browse/KYLIN-2951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Ken reassigned KYLIN-2951:
---

Assignee: Wang Ken

> Fix couple of NullPointerException in Kylin code base
> -
>
> Key: KYLIN-2951
> URL: https://issues.apache.org/jira/browse/KYLIN-2951
> Project: Kylin
>  Issue Type: Bug
>Reporter: Wang Ken
>Assignee: Wang Ken
>Priority: Minor
>
> I create this ticket as the umbrella to cover some NPE bugs in Kylin 2.1 & 
> 2.2 code base.
> We see some NPE errors from our server log, some of them are tiny, it is 
> trivial to create individual ticket to cover them one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-2898) Introduce memcached as a distributed cache for queries

2017-09-30 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186933#comment-16186933
 ] 

Wang Ken commented on KYLIN-2898:
-

I would suggest to add a new module called query-cache to hold all the new 
cache related code, include segment level cache

> Introduce memcached as a distributed cache for queries
> --
>
> Key: KYLIN-2898
> URL: https://issues.apache.org/jira/browse/KYLIN-2898
> Project: Kylin
>  Issue Type: Sub-task
>  Components: Query Engine
>Affects Versions: v2.1.0
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
> Fix For: v2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KYLIN-1723) GTAggregateScanner$Dump.flush() must not write the WHOLE metrics buffer

2016-08-25 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436636#comment-15436636
 ] 

Wang Ken commented on KYLIN-1723:
-

The original implementation in Dump.flush() is actually wrong. It is a critical 
bug for versions prior to Kylin 1.5.3. It will cause the Hbase coprocessor 
return incorrect results
if spill to disk happen. And fast cube building result could be wrong also if 
spill to disk happens. 
This is because in java serialization, if we call writeObject multiple times 
for the same object instance, the serialized stream will just hold one 
reference the to same instance to the same  serialized instance.

https://docs.oracle.com/javase/7/docs/platform/serialization/spec/output.html

 /**
 * Writes an "unshared" object to the ObjectOutputStream.  This method is
 * identical to writeObject, except that it always writes the given object
 * as a new, unique object in the stream (as opposed to a back-reference
 * pointing to a previously serialized instance).  Specifically:
 * 
 *   An object written via writeUnshared is always serialized in the
 *   same manner as a newly appearing object (an object that has not
 *   been written to the stream yet), regardless of whether or not the
 *   object has been written previously.
 *
 *   If writeObject is used to write an object that has been previously
 *   written with writeUnshared, the previous writeUnshared operation
 *   is treated as if it were a write of a separate object.  In other
 *   words, ObjectOutputStream will never generate back-references to
 *   object data written by calls to writeUnshared.
 * 
 * While writing an object via writeUnshared does not in itself guarantee a
 * unique reference to the object when it is deserialized, it allows a
 * single object to be defined multiple times in a stream, so that multiple
 * calls to readUnshared by the receiver will not conflict.  Note that the
 * rules described above only apply to the base-level object written with
 * writeUnshared, and not to any transitively referenced sub-objects in the
 * object graph to be serialized.
 *
 * ObjectOutputStream subclasses which override this method can only be
 * constructed in security contexts possessing the
 * "enableSubclassImplementation" SerializablePermission; any attempt to
 * instantiate such a subclass without this permission will cause a
 * SecurityException to be thrown.
 *
 * @param   obj object to write to stream
 * @throws  NotSerializableException if an object in the graph to be
 *  serialized does not implement the Serializable interface
 * @throws  InvalidClassException if a problem exists with the class of an
 *  object to be serialized
 * @throws  IOException if an I/O error occurs during serialization
 * @since 1.4
 */

This is wrong
oos.writeObject(metricsBuf.array());

This is correct, but file will be huge
oos.writeUnshared(metricsBuf.array());

This is correct, file is small
oos.writeObject(Arrays.copyOf(metricsBuf.array(), metricsBuf.position()));

This is correct, file is small
oos.writeInt(metricsBuf.position());
oos.write(metricsBuf.array(), 0, metricsBuf.position());



> GTAggregateScanner$Dump.flush() must not write the WHOLE metrics buffer
> ---
>
> Key: KYLIN-1723
> URL: https://issues.apache.org/jira/browse/KYLIN-1723
> Project: Kylin
>  Issue Type: Bug
>Reporter: liyang
>Assignee: Dong Li
> Fix For: v1.5.3
>
>
> GTAggregateScanner$Dump.flush() must not write the WHOLE metrics buffer, but 
> only the part that contains data.
> Note the metrics buffer is allocated at the max possible size of metrics, 
> which can be way larger than actual size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1702) The Key of the Snapshot to the related lookup table may be not informative

2016-08-09 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413608#comment-15413608
 ] 

Wang Ken commented on KYLIN-1702:
-

Enable compression can work without breaking the current bytes format. Just a 
small change to the SnapshotTableSerializer and DictionaryInfoSerializer.
We can make it work with uncompressed bytes and make it backward compatible.

> The Key of the Snapshot to the related lookup table may be not informative
> --
>
> Key: KYLIN-1702
> URL: https://issues.apache.org/jira/browse/KYLIN-1702
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently the key for the snapshot stored in hbase metadata file is as 
> follows:
> ResourceStore.SNAPSHOT_RESOURCE_ROOT + "/" + new 
> File(signature.getPath()).getName() + "/" + uuid + ".snapshot"
> However, some hive tables stored in hive may organized like 
> dirName/tableName/00, dirName/tableName/01.
> Based on current setting, the key will be 
> ResourceStore.SNAPSHOT_RESOURCE_ROOT + "/" + 00 + "/" + uuid + ".snapshot", 
> which is lack of the table name information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1702) The Key of the Snapshot to the related lookup table may be not informative

2016-08-09 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413593#comment-15413593
 ] 

Wang Ken commented on KYLIN-1702:
-

Most of the time are spent to load the extra large resources from HDFS.  Hbase 
has the value size limitation and extra large resources are stored to HDFS.
Beside changing the layout of the resource path, there are other optimization 
points to make it more efficient and fast.

a). Enable compression for snapshot and dictionary resources when save to Hbase 
to reduce the possibility of read/write from/to HDFS.
b). More threads to async load resources and compare duplicated contents of 
snapshot tables.
c). Avoid duplicated loading/comparing

> The Key of the Snapshot to the related lookup table may be not informative
> --
>
> Key: KYLIN-1702
> URL: https://issues.apache.org/jira/browse/KYLIN-1702
> Project: Kylin
>  Issue Type: Improvement
>Reporter: Zhong Yanghong
>Assignee: Zhong Yanghong
>
> Currently the key for the snapshot stored in hbase metadata file is as 
> follows:
> ResourceStore.SNAPSHOT_RESOURCE_ROOT + "/" + new 
> File(signature.getPath()).getName() + "/" + uuid + ".snapshot"
> However, some hive tables stored in hive may organized like 
> dirName/tableName/00, dirName/tableName/01.
> Based on current setting, the key will be 
> ResourceStore.SNAPSHOT_RESOURCE_ROOT + "/" + 00 + "/" + uuid + ".snapshot", 
> which is lack of the table name information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1819) Exception swallowed when start DefaultScheduler fail

2016-07-05 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362382#comment-15362382
 ] 

Wang Ken commented on KYLIN-1819:
-

I have different opinions here. A distributed system should be designed to be 
auto recover from different kind of failures without too much operation effort. 
Many distributed system has a role as the controller/coordinator. Some of the 
implementations leverage zookeeper for leader election. During first time 
initialization, if they encounter zk connections issue, usually they make it 
fail fast.  But during running time, if they encounter zk connection loss, just 
retry util zk connection comes back.

Back to Kylin's Job engine, for first time initialization, if it sees zk 
connection issue, it should fail fast. But if it can't get Job lock, it should 
wait until the competitor release the lock.
And if it sees zk connection loss during runtime, it should give up the lock 
and shutdown the scheduling thread pools and wait for the connection back. 
Curator Client framework will retry the underline zk client. If it detect zk 
reconnected, just rejoin the lock competition. Once it get the lock again, it 
restart the scheduling thread pools and do the scheduling work.
The HA implementation is not that complicated as we think. Actually Curator 
Client framework has already implement some recipes for leader election and 
other distributed coordination stuff and we don't need to implement it by our 
own with the low level Mutex lock. 



> Exception swallowed when start DefaultScheduler fail
> 
>
> Key: KYLIN-1819
> URL: https://issues.apache.org/jira/browse/KYLIN-1819
> Project: Kylin
>  Issue Type: Bug
>  Components: Job Engine
>Affects Versions: v1.5.1, v1.5.2
>Reporter: Ma Gang
>Assignee: Ma Gang
> Attachments: fix_swallow_scheduler_start_exception.patch
>
>
> Start job scheduler need to acquire job lock from zookeeper, when lock 
> acquire fail, it will throw an IllegalException, but because the scheduler is 
> started in a new thread, the exception thrown by the thread will be ignored, 
> and the server still started successfully, and no exceptions are logged. That 
> make it hard for trouble shooting, should change to make server started fail 
> when the scheduler started fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1790) Metadata upgrade tool didn't handle aggregation group joint rule successfully

2016-06-16 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333446#comment-15333446
 ] 

Wang Ken edited comment on KYLIN-1790 at 6/16/16 9:30 AM:
--

Can we release this validate rule?
And I think we can add a try run model to the Cube Meta Upgrade Tool to 
validate first without real update


was (Author: wang ken):
We can release this validate rule.
And I think we can add a try run model to the Cube Meta Upgrade Tool to 
validate first without real update

> Metadata upgrade tool didn't handle aggregation group joint rule successfully
> -
>
> Key: KYLIN-1790
> URL: https://issues.apache.org/jira/browse/KYLIN-1790
> Project: Kylin
>  Issue Type: Bug
>Reporter: qianqiaoneng
>Assignee: Vic Wang
>
> If the total dimension is 10, and aggregation group number is 9. after the 
> metadata upgrade, there would be a joint rule with 10-9=1 dimension. But this 
> definition will cause 
> java.lang.IllegalStateException: Aggregation group 0 require at least 2 dims 
> in a joint
> Should change the upgrade logic not to add joint rule if dimension number is 
> 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1790) Metadata upgrade tool didn't handle aggregation group joint rule successfully

2016-06-16 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333446#comment-15333446
 ] 

Wang Ken commented on KYLIN-1790:
-

We can release this validate rule.
And I think we can add a try run model to the Cube Meta Upgrade Tool to 
validate first without real update

> Metadata upgrade tool didn't handle aggregation group joint rule successfully
> -
>
> Key: KYLIN-1790
> URL: https://issues.apache.org/jira/browse/KYLIN-1790
> Project: Kylin
>  Issue Type: Bug
>Reporter: qianqiaoneng
>Assignee: Vic Wang
>
> If the total dimension is 10, and aggregation group number is 9. after the 
> metadata upgrade, there would be a joint rule with 10-9=1 dimension. But this 
> definition will cause 
> java.lang.IllegalStateException: Aggregation group 0 require at least 2 dims 
> in a joint
> Should change the upgrade logic not to add joint rule if dimension number is 
> 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed

2016-06-14 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329350#comment-15329350
 ] 

Wang Ken commented on KYLIN-1590:
-

Hi, Yang

The duplicated merge jobs with same range are triggered by unexpected call 
sequence to CubeManager.updateCube() by multiple threads(job thread and http 
threads).

Thanks
Ken

> 2 Kylin Steaming merge jobs of same time range triggered and failed 
> 
>
> Key: KYLIN-1590
> URL: https://issues.apache.org/jira/browse/KYLIN-1590
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming
>Affects Versions: v1.4.0
>Reporter: qianqiaoneng
>Assignee: Zhong Yanghong
>Priority: Critical
>
> 2 issues:
> 1. Kylin allows 2 merge jobs with same time range running.
> 2. when 2 merge jobs with same time range are running on the same time, they 
> mixed up metadata, always get the HTable not found error.
> Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT 
> 2016-04-15 14:58:38
> Build Result: ERROR
> Job Engine: ***
> Cube Name: site_gmb
> Source Records Count: 0
> Start Time: Fri Apr 15 14:58:44 PDT 2016
> Duration: 2mins
> MR Waiting: 0mins
> Last Update Time: Fri Apr 15 15:01:42 PDT 2016
> Submitter: SYSTEM
> Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070)
>   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:201)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>   at 
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed

2016-06-14 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329340#comment-15329340
 ] 

Wang Ken commented on KYLIN-1590:
-

It's caused by multi-threads issue in CubeManager.updateCubeWithRetry. 
Different threads could run into this method and update the cube instance. 
The HBase resource update and cubeMap(cache) update should be atomic.  
The fix is to add a lock to the cube instance.  It could hold multiple locks in 
case of retry but there is no dead lock risk . It always follow the order to 
get lock on the stale cube then a fresh cube.

CubeInstance _cube = null;
synchronized(cube){
try {
getStore().putResource(cube.getResourcePath(), cube, 
CUBE_SERIALIZER);
} catch (IllegalStateException ise) {
logger.warn("Write conflict to update cube " + cube.getName() + 
" at try " + retry + ", will retry...");
if (retry >= 7) {
logger.error("Retried 7 times till got error, 
abandoning...", ise);
throw ise;
}

update.setCubeInstance(reloadCubeLocal(cube.getName()));
retry++;
_cube = updateCubeWithRetry(update, retry);
}
if(_cube != null){
cube = _cube;
}else{
cubeMap.put(cube.getName(), cube);
}
}

> 2 Kylin Steaming merge jobs of same time range triggered and failed 
> 
>
> Key: KYLIN-1590
> URL: https://issues.apache.org/jira/browse/KYLIN-1590
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming
>Affects Versions: v1.4.0
>Reporter: qianqiaoneng
>Assignee: Zhong Yanghong
>Priority: Critical
>
> 2 issues:
> 1. Kylin allows 2 merge jobs with same time range running.
> 2. when 2 merge jobs with same time range are running on the same time, they 
> mixed up metadata, always get the HTable not found error.
> Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT 
> 2016-04-15 14:58:38
> Build Result: ERROR
> Job Engine: ***
> Cube Name: site_gmb
> Source Records Count: 0
> Start Time: Fri Apr 15 14:58:44 PDT 2016
> Duration: 2mins
> MR Waiting: 0mins
> Last Update Time: Fri Apr 15 15:01:42 PDT 2016
> Submitter: SYSTEM
> Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070)
>   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:201)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>   at 
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed

2016-06-03 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313851#comment-15313851
 ] 

Wang Ken commented on KYLIN-1590:
-

The cube merge scheduling is problematic and not solid. 

> 2 Kylin Steaming merge jobs of same time range triggered and failed 
> 
>
> Key: KYLIN-1590
> URL: https://issues.apache.org/jira/browse/KYLIN-1590
> Project: Kylin
>  Issue Type: Bug
>  Components: streaming
>Affects Versions: v1.4.0
>Reporter: qianqiaoneng
>Assignee: Zhong Yanghong
>Priority: Critical
>
> 2 issues:
> 1. Kylin allows 2 merge jobs with same time range running.
> 2. when 2 merge jobs with same time range are running on the same time, they 
> mixed up metadata, always get the HTable not found error.
> Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT 
> 2016-04-15 14:58:38
> Build Result: ERROR
> Job Engine: ***
> Cube Name: site_gmb
> Source Records Count: 0
> Start Time: Fri Apr 15 14:58:44 PDT 2016
> Duration: 2mins
> MR Waiting: 0mins
> Last Update Time: Fri Apr 15 15:01:42 PDT 2016
> Submitter: SYSTEM
> Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070)
>   at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:201)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
>   at 
> org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>   at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>   at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105)
>   at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> result code:2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed

2016-06-03 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313728#comment-15313728
 ] 

Wang Ken commented on KYLIN-1590:
-

I see there are many duplicated cache wipe request come from the event 
broadcaster, duplicated broadcaster events are triggered.


2016-06-03 05:14:17,394 INFO  [http-bio-8080-exec-20] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:15:09,090 INFO  [http-bio-8080-exec-18] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:15:38,980 INFO  [http-bio-8080-exec-1] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:19:24,505 INFO  [http-bio-8080-exec-7] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:19:45,109 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:20:36,173 INFO  [http-bio-8080-exec-16] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:20:36,200 INFO  [http-bio-8080-exec-7] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:22:02,593 INFO  [http-bio-8080-exec-20] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:22:02,909 INFO  [http-bio-8080-exec-12] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:25:17,037 INFO  [http-bio-8080-exec-19] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:25:38,314 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:25:38,363 INFO  [http-bio-8080-exec-19] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:25:38,437 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:26:21,672 INFO  [http-bio-8080-exec-19] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:31:24,411 INFO  [http-bio-8080-exec-17] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:32:17,007 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:33:51,768 INFO  [http-bio-8080-exec-13] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:35:33,821 INFO  [http-bio-8080-exec-13] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:35:33,894 INFO  [http-bio-8080-exec-19] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:37:26,395 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:40:35,114 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:41:23,895 INFO  [http-bio-8080-exec-2] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:44:21,587 INFO  [http-bio-8080-exec-1] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:45:59,702 INFO  [http-bio-8080-exec-1] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:45:59,725 INFO  [http-bio-8080-exec-1] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:46:12,933 INFO  [http-bio-8080-exec-1] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:46:28,291 INFO  [http-bio-8080-exec-12] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:46:28,351 INFO  [http-bio-8080-exec-18] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:47:06,758 INFO  [http-bio-8080-exec-7] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:54:23,253 INFO  [http-bio-8080-exec-4] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:54:23,462 INFO  [http-bio-8080-exec-2] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:site_gmb
2016-06-03 05:55:00,501 INFO  [http-bio-8080-exec-17] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE 
name:seo_sessions
2016-06-03 05:55:48,472 INFO  [http-bio-8080-exec-19] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 05:55:48,583 INFO  [http-bio-8080-exec-7] 
controller.CacheController:64 : wipe cache type: CUBE event:UPDATE name:seo_gmb
2016-06-03 

[jira] [Commented] (KYLIN-1726) Scalable streaming cubing

2016-06-02 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312468#comment-15312468
 ] 

Wang Ken commented on KYLIN-1726:
-

Hi, Yang

Let's meet and discuss how to achieve this. Generally speaking, consuming Kafka 
from MR is doable. 
But I don't know why you want to add Kafka offset to segment's metadata, Kafka 
offset is per partition, for each kafka topic, there could be at most hundreds 
of partitions.
I don't think we should create segment per Kafka partition per Topic.

Thanks
Ken

> Scalable streaming cubing
> -
>
> Key: KYLIN-1726
> URL: https://issues.apache.org/jira/browse/KYLIN-1726
> Project: Kylin
>  Issue Type: New Feature
>Reporter: liyang
>Assignee: liyang
>
> We try to achieve:
> 1. Scale streaming cubing workload on a computation cluster, e.g. YARN
> 2. Support Kafka as a formal data source
> 3. Guarantee no data loss reading from Kafka, even records are not strictly 
> ordered by time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KYLIN-1539) query fails when calcite parse sql

2016-04-07 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230201#comment-15230201
 ] 

Wang Ken commented on KYLIN-1539:
-

I think it's the same bug as https://issues.apache.org/jira/browse/KYLIN-1518, 
just thrown from different places. 
The field size in input and within RexProgramBuilder don't match.

Both are related to calcite thread safety. 
https://issues.apache.org/jira/browse/CALCITE-1009


> query fails when calcite parse sql
> --
>
> Key: KYLIN-1539
> URL: https://issues.apache.org/jira/browse/KYLIN-1539
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Reporter: Zhong Yanghong
>Assignee: liyang
>
> There are two cases of query failure when calcite parses sql. However, both 
> are solved by the same solution, just restarting kylin server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KYLIN-1518) When a cube built successfully with around 170 measures, some queries cannot be executed

2016-04-07 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230141#comment-15230141
 ] 

Wang Ken edited comment on KYLIN-1518 at 4/7/16 12:34 PM:
--

There is a similar issue in Drill, looks like a calcite bug.
https://issues.apache.org/jira/browse/DRILL-4175

It's is fixed in calcite 1.6.0
https://issues.apache.org/jira/browse/CALCITE-1009

I think we can catch the exception in our code and dump the status of 
RexInputRef.NAMES.
It's private but static, we can reflect and dump the status to check whether it 
runs into such situation or not.



was (Author: wang ken):
There is similar issue in drill, looks like a calcite bug.

https://issues.apache.org/jira/browse/DRILL-4175


> When a cube built successfully with around 170 measures, some queries cannot 
> be executed
> 
>
> Key: KYLIN-1518
> URL: https://issues.apache.org/jira/browse/KYLIN-1518
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Reporter: Zhong Yanghong
>Assignee: liyang
> Attachments: error_stack_trace_exponential.txt
>
>
> A cube with the same data model, the same cube definition except the number 
> of measures. If the number is around 150, the same query can be executed 
> successfully. The error log is as follows:
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:112)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.kylin.rest.service.QueryService.execute(QueryService.java:361)
>   at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:276)
>   at 
> org.apache.kylin.rest.service.QueryService.query(QueryService.java:118)
>   at 
> org.apache.kylin.rest.service.QueryService$$FastClassByCGLIB$$4957273f.invoke()
>   at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
>   at 
> org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:618)
>   at 
> org.apache.kylin.rest.service.QueryService$$EnhancerByCGLIB$$ab2dbbe7.query()
>   at 
> org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:191)
>   at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:95)
>   at sun.reflect.GeneratedMethodAccessor169.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
>   at 
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
>   at 
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
>   at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
>   at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
>   at 
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
>   at 
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
>   at 
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
>   at 
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
>   at 
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>   at 
> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>   at 
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>   at 
> 

[jira] [Commented] (KYLIN-1518) When a cube built successfully with around 170 measures, some queries cannot be executed

2016-04-07 Thread Wang Ken (JIRA)

[ 
https://issues.apache.org/jira/browse/KYLIN-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230141#comment-15230141
 ] 

Wang Ken commented on KYLIN-1518:
-

There is similar issue in drill, looks like a calcite bug.

https://issues.apache.org/jira/browse/DRILL-4175


> When a cube built successfully with around 170 measures, some queries cannot 
> be executed
> 
>
> Key: KYLIN-1518
> URL: https://issues.apache.org/jira/browse/KYLIN-1518
> Project: Kylin
>  Issue Type: Bug
>  Components: Query Engine
>Reporter: Zhong Yanghong
>Assignee: liyang
> Attachments: error_stack_trace_exponential.txt
>
>
> A cube with the same data model, the same cube definition except the number 
> of measures. If the number is around 150, the same query can be executed 
> successfully. The error log is as follows:
> at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:112)
>   at 
> org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
>   at 
> org.apache.kylin.rest.service.QueryService.execute(QueryService.java:361)
>   at 
> org.apache.kylin.rest.service.QueryService.queryWithSqlMassage(QueryService.java:276)
>   at 
> org.apache.kylin.rest.service.QueryService.query(QueryService.java:118)
>   at 
> org.apache.kylin.rest.service.QueryService$$FastClassByCGLIB$$4957273f.invoke()
>   at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
>   at 
> org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:618)
>   at 
> org.apache.kylin.rest.service.QueryService$$EnhancerByCGLIB$$ab2dbbe7.query()
>   at 
> org.apache.kylin.rest.controller.QueryController.doQueryWithCache(QueryController.java:191)
>   at 
> org.apache.kylin.rest.controller.QueryController.query(QueryController.java:95)
>   at sun.reflect.GeneratedMethodAccessor169.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213)
>   at 
> org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126)
>   at 
> org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96)
>   at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617)
>   at 
> org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578)
>   at 
> org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80)
>   at 
> org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
>   at 
> org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
>   at 
> org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
>   at 
> org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>   at 
> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>   at 
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)
>   at 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118)
>   at 
> org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84)
>   at 
> org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342)
>   at 
> org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
>   at 
>