Re: user ADMIN can't login

2018-10-25 Thread 陶 加涛
Seems similar with this JIRA (https://issues.apache.org/jira/browse/KYLIN-3562).


---
Regards!
Aron Tao



发件人: Fei Yi 
答复: "u...@kylin.apache.org" 
日期: 2018年10月25日 星期四 11:26
收件人: "dev@kylin.apache.org" , "u...@kylin.apache.org" 

主题: Re: user ADMIN can't login

env:kylin2.5.0-cdh57 and  cdh.5.14.4

Fei Yi mailto:yijianhui...@gmail.com>> 于2018年10月25日周四 
上午11:20写道:
User ADMIN can't log in. After restarting kylin, it will be normal, but can't 
log in the next day, ANALYST user has been normal.

HTTP Status 500 %E2%80%93 Internal Server Error
HTTP Status 500 %E2%80%93 Internal Server Error


Type Exception Report

Message Overwriting conflict /user/ADMIN, expect old TS 1540406402017, but it 
is 1540406402299

Description The server encountered an unexpected condition that prevented it 
from fulfilling the request.

Exception

org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
conflict /user/ADMIN, expect old TS 1540406402017, but it is 
1540406402299%0A%09org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:325)%0A%09org.apache.kylin.common.persistence.ResourceStore.checkAndPutResourceCheckpoint(ResourceStore.java:323)%0A%09org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:308)%0A%09org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:287)%0A%09org.apache.kylin.metadata.cachesync.CachedCrudAssist.save(CachedCrudAssist.java:192)%0A%09org.apache.kylin.rest.security.KylinUserManager.update(KylinUserManager.java:122)%0A%09org.apache.kylin.rest.service.KylinUserService.updateUser(KylinUserService.java:85)%0A%09org.apache.kylin.rest.security.KylinAuthenticationProvider.authenticate(KylinAuthenticationProvider.java:117)%0A%09org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)%0A%09org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:199)%0A%09org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:180)%0A%09org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)%0A%09org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)%0A%09org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:200)%0A%09org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)%0A%09org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116)%0A%09org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)%0A%09org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64)%0A%09org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)%0A%09org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)%0A%09org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)%0A%09org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)%0A%09org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)%0A%09org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)%0A%09org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)%0A%09org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:214)%0A%09org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177)%0A%09org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346)%0A%09org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:262)%0A%09com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:209)%0A%09com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:244)%0A

Note The full stack trace of the root cause is available in the server logs.


Apache Tomcat/7.0.90
HTTP Status 500 �C Internal Server Error


Type Exception Report

Message Overwriting conflict /user/ADMIN, expect old TS 1540406402017, but it 
is 1540406402299

Description The server encountered an unexpected condition that prevented it 
from fulfilling the request.

Exception

org.apache.kylin.common.persistence.WriteConflictException: Overwriting 
conflict /user/ADMIN, expect old TS 1540406402017, but it is 1540406402299

 
org.apache.kylin.storage.hbase.HBaseResourceStore.checkAndPutResourceImpl(HBaseResourceStore.java:325)

 

Re: Unable to connect to Kylin Web UI

2018-10-25 Thread jenkinsliu
How you were able to resolve ? Please share the steps 

--
Sent from: http://apache-kylin.74782.x6.nabble.com/


Re: Unable to connect to Kylin Web UI

2018-10-25 Thread jenkinsliu
Here is log.

kylin.start   
kylin.log   
kylin.out   

The kylin seems start and there is no error in the log.
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /root/apache-kylin-2.5.0-bin-hbase1x/logs/kylin.log
Web UI is at http://:7070/kylin
[root@sandbox-hdp bin]# netstat -tunlp |grep 7070
tcp0  0 0.0.0.0:70700.0.0.0:*   LISTEN 
5160/java

The reason may be the sandbox hdp(2.6.5) port forwarding.Do you know how to
enable 7070 port for sandbox hdp(2.6.5) in sandbox.

--
Sent from: http://apache-kylin.74782.x6.nabble.com/


Re: Unable to connect to Kylin Web UI

2018-10-25 Thread JiaTao Tao
If you suspect this problem is related to port, you may change your Kylin's
default port(7070) to any other available ports temporarily just
for clarifying your suspicion first?


The way to modify this is changing " 于2018年10月26日周五 上午9:41写道:

> Here is log.
>
> kylin.start 
>
> kylin.log 
> kylin.out 
>
> The kylin seems start and there is no error in the log.
> A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
> Check the log at /root/apache-kylin-2.5.0-bin-hbase1x/logs/kylin.log
> Web UI is at http://:7070/kylin
> [root@sandbox-hdp bin]# netstat -tunlp |grep 7070
> tcp0  0 0.0.0.0:70700.0.0.0:*
>  LISTEN
> 5160/java
>
> The reason may be the sandbox hdp(2.6.5) port forwarding.Do you know how to
> enable 7070 port for sandbox hdp(2.6.5) in sandbox.
>
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/


Re: Unable to connect to Kylin Web UI

2018-10-25 Thread Lijun Cao
Hi jenkinsliu:

Could you provide more details or any logs ?

Best Regards

Lijun Cao

> 在 2018年10月25日,20:14,jenkinsliu  写道:
> 
> How you were able to resolve ? Please share the steps 
> 
> --
> Sent from: http://apache-kylin.74782.x6.nabble.com/
> 



Re: MapReduceException: Counter 0

2018-10-25 Thread liuzhixin
Hello,Na Zhai

step: Kylin: Extract Fact TableDistinct Columns

Kylin can’t submit the MR Task and throw the Counter: 0

#
Best wishes!

> 在 2018年10月24日,下午9:06,liuzhixin  写道:
> 
> Hello Na Zhai:
> #
> The beeline connect hive successful, and There are no more messages!
> #
> 
> 
> Best wishes!
> 
>> 在 2018年10月23日,上午11:35,Na Zhai > > 写道:
>> 
>> Hi, liuzhixin,
>> Beeline connect hive, you mean the first step of build cube as the place 
>> circled in the picture? 
>> If you mean that, you can check whether the other cube can build 
>> successfully? You should make sure the health of the hive and the beeline 
>> connect hive successfully.
>>  
>> Best wishes!
>>  
>> 发送自 Windows 10 版邮件 应用
>>  
>> 发件人: liuzhixin mailto:liuz...@163.com>>
>> 发送时间: Monday, October 22, 2018 8:05:09 PM
>> 收件人: Na Zhai
>> 抄送: 335960...@qq.com 
>> 主题: Re: MapReduceException: Counter 0
>>  
>> Hello, Beeline connect hive
>> 
>> #
>> #
>> 
>> 018-10-22 20:03:32,945 DEBUG [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] common.HadoopCmdOutput:98 : 
>> Counters: 0
>> 2018-10-22 20:03:32,987 DEBUG [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] common.HadoopCmdOutput:104 : 
>> outputFolder 
>> ishdfs://hdfscluster/mnt/kylin/kylin_metadata/kylin-23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f/kylin_sales_mode1/fact_distinct_columns
>>  
>> 
>> 2018-10-22 20:03:32,999 DEBUG [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] common.HadoopCmdOutput:109 : Seems 
>> no counter found for hdfs
>> 2018-10-22 20:03:33,014 INFO  [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] execution.ExecutableManager:434 : 
>> job id:23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-02 from RUNNING to ERROR
>> 2018-10-22 20:03:33,015 ERROR [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] execution.AbstractExecutable:165 : 
>> error running Executable: CubingJob{id=23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f, 
>> name=BUILD CUBE - kylin_sales_mode1 - 2012010100_2016010100 - 
>> GMT+08:00 2018-10-22 19:49:39, state=RUNNING}
>> 2018-10-22 20:03:33,020 DEBUG [pool-7-thread-1] cachesync.Broadcaster:113 : 
>> Servers in the cluster: [localhost:7070]
>> 2018-10-22 20:03:33,021 DEBUG [pool-7-thread-1] cachesync.Broadcaster:123 : 
>> Announcing new broadcast to all: BroadcastEvent{entity=execute_output, 
>> event=update, cacheKey=23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f}
>> 2018-10-22 20:03:33,024 INFO  [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] execution.ExecutableManager:434 : 
>> job id:23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f from RUNNING to ERROR
>> 2018-10-22 20:03:33,024 DEBUG [pool-7-thread-1] cachesync.Broadcaster:113 : 
>> Servers in the cluster: [localhost:7070]
>> 2018-10-22 20:03:33,024 DEBUG [Scheduler 1236831559 Job 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f-142] execution.AbstractExecutable:316 : 
>> no need to send email, user list is empty
>> 2018-10-22 20:03:33,024 DEBUG [pool-7-thread-1] cachesync.Broadcaster:123 : 
>> Announcing new broadcast to all: BroadcastEvent{entity=execute_output, 
>> event=update, cacheKey=23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f}
>> 2018-10-22 20:03:33,028 DEBUG [http-nio-7070-exec-10] 
>> cachesync.Broadcaster:247 : Broadcasting UPDATE, execute_output, 
>> 23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f
>> 2018-10-22 20:03:33,029 ERROR [pool-11-thread-1] 
>> threadpool.DefaultScheduler:115 : ExecuteException 
>> job:23b694f2-c7cc-9d0e-4e3d-0b91ce16e21f
>> org.apache.kylin.job.exception.ExecuteException: 
>> org.apache.kylin.job.exception.ExecuteException: 
>> org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 0
>> 
>> at 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:178)
>> at 
>> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.kylin.job.exception.ExecuteException: 
>> org.apache.kylin.engine.mr.exception.MapReduceException: Counters: 0
>> 
>> at 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:178)
>> at 
>> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
>> at 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:163)
>> ... 4 more
>> Caused by: org.apache.kylin.engine.mr.exception.MapReduceException: 
>> Counters: 0
>> 
>> at 
>> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
>> at 
>> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:163)
>> ... 6 more
>> 2018-10-22 20:03:33,030 DEBUG 

apache-kylin-2.5.0-bin-hadoop3版使用中发现的问题

2018-10-25 Thread zeng . kong
   Hi,

  我使用的是
 apache-kylin-2.5.0-bin-hadoop3版本

  我的环境搭建方式是hive和hbase使用不同的filesystem,在cube构建过程中如
果选择spark引擎构建的话出了以下两个问题:

1、  在进行到Build Cube with Spark步骤时出现hbase-config.xml查询的filesystem
与其路径所在的filesystem不一致,因为其在前面步骤放到了hive所在的hdfs,而在这
步获取的时候用的是hbase所在的filesystem

2、  在进行到Convert Cuboid Data to HFile步骤时出现缺少hbase相关jar包的问题

另外在构建承购后使用inner进行数据查询时,其sql解析不正确,导致扫描大量hbase
记录,主要原因是在tomcat调整为8.5.33后其类的加载顺序出现变化,使得calcite重
写的代码得不到加载。

 

以上问题是我在使用过程中发现的,不确定是否与我搭建方式有关,另外此版本会在何
时合并至master?

 

 
谢谢!

  

   

 

 

 

 

 

 

 

 

 




Default Bloom Filters for HTables

2018-10-25 Thread Shrikant Bang
Hi Team,

  With my understanding, in Kylin v2.5.x bloom filters are disabled

by
default for HTables. I am curious to know if there is any specific reason
using BloomType.NONE?

Please correct me if I am wrong.

Thank You,
Shrikant Bang


Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-25 Thread JiaTao Tao
As far as I'm concerned, using Parquet as Kylin's storage format is pretty
appropriate. From the aspect of integrating Spark, Spark made a lot of
optimizations for Parquet, e.g. We can enjoy Spark's vectorized reading and
lazy dict decoding, etc.


And here are my thoughts about integrating Spark and our query engine. As
Shaofeng mentioned, a cuboid is a Parquet file, and you can think of this
as a small table and we can read this cuboid as a DataFrame directly, which
can be directly queried by Spark, a bit like this:
ss.read.parquet("path/to/CuboidFile").filter("xxx").agg("xxx").select("xxx").
(We need to implement some Kylin's advanced aggregations, as for some
Kylin's basic aggregations like sum/min/max, we can use Spark's directly)



*Compare to our old query engine, the advantages are as follows:*



1. It is distributed! Our old query engine will get all data into a query
node and then calculate, it's a single point of failure and often leads OOM
when in a huge amount of data.



2. It is simple and easy to debug(every step is very clear and
transparent), you can collect data after every single phase,
e.g.(filter/aggregation/projection, etc.), so you can easily check out
which operation/phase went wrong. Our old query engine uses Calcite for
post-calculation, it's difficult when pinpointing problems, especially when
relating to code generation, and you cannot insert your own logic during
computation.



3. We can fully enjoy all efforts that Spark made for optimizing
performance, e.g. Catalyst/Tungsten, etc.



4. It is easy for unit tests, you can test every step separately, which
could reduce the testing granularity of Kylin's query engine.



5. Thanks to Spark's DataSource API, we can change Parquet to other data
formats easily.



6. A lot of upstream tools for Spark like many machine learning tools can
directly be integrated with us.



==
==

 Hi Kylin developers.



HBase has been Kylin’s storage engine since the first day; Kylin on
HBase

has been verified as a success which can support low latency & high

concurrency queries on a very large data scale. Thanks to HBase, most
Kylin

users can get on average less than 1-second query response.



But we also see some limitations when putting Cubes into HBase; I shared

some of them in the HBaseConf Asia 2018[1] this August. The typical

limitations include:



   - Rowkey is the primary index, no secondary index so far;



Filtering by row key’s prefix and suffix can get very different
performance

result. So the user needs to do a good design about the row key;
otherwise,

the query would be slow. This is difficult sometimes because the user
might

not predict the filtering patterns ahead of cube design.



   - HBase is a key-value instead of a columnar storage



Kylin combines multiple measures (columns) into fewer column families
for

smaller data size (row key size is remarkable). This causes HBase often

needing to read more data than requested.



   - HBase couldn't run on YARN



This makes the deployment and auto-scaling a little complicated,
especially

in the cloud.



In one word, HBase is complicated to be Kylin’s storage. The
maintenance,

debugging is also hard for normal developers. Now we’re planning to
seek a

simple, light-weighted, read-only storage engine for Kylin. The new

solution should have the following characteristics:



   - Columnar layout with compression for efficient I/O;

   - Index by each column for quick filtering and seeking;

   - MapReduce / Spark API for parallel processing;

   - HDFS compliant for scalability and availability;

   - Mature, stable and extensible;



With the plugin architecture[2] introduced in Kylin 1.5, adding multiple

storages to Kylin is possible. Some companies like Kyligence Inc and

Meituan.com, have developed their customized storage engine for Kylin in

their product or platform. In their experience, columnar storage is a
good

supplement for the HBase engine. Kaisen Kang from Meituan.com has shared

their KOD (Kylin on Druid) solution[3] in this August’s Kylin meetup in

Beijing.



We plan to do a PoC with Apache Parquet + Apache Spark in the next
phase.

Parquet is a standard columnar file format and has been widely
supported by

many projects like Hive, Impala, Drill, etc. Parquet is adding the page

level column index to support fine-grained filtering.  Apache Spark can

provide the parallel computing over Parquet and can be deployed on

YARN/Mesos and Kubernetes. With this combination, the data persistence
and

computation are separated, which makes the scaling in/out much easier
than

before. Benefiting from Spark's flexibility, we can not only push down
more