Re:Re: Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-31 Thread Ma Gang
Hi ShaoFeng, Very good questions, please see my comments start with [Gang]: 1) How to bridge the real-time cube with a cube built from Hive? You know, in Kylin the source type is marked at the table level, which means a table is either a Hive table, a JDBC table or a streaming table. To implement

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-26 Thread JiaTao Tao
You are welcome, ShaoFeng! Storage and query engine are inseparable and should design together for fully gaining each other's abilities. And I'm very excited about the new coming columnar storage and query engine! -- Regards! Aron Tao ShaoFeng Shi 于2018年10月26日周五 下午10:28写道: > Exactly;

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-26 Thread ShaoFeng Shi
Exactly; Thank you jiatao for the comments! JiaTao Tao 于2018年10月25日周四 下午6:12写道: > As far as I'm concerned, using Parquet as Kylin's storage format is pretty > appropriate. From the aspect of integrating Spark, Spark made a lot of > optimizations for Parquet, e.g. We can enjoy Spark's vectorized

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-25 Thread JiaTao Tao
As far as I'm concerned, using Parquet as Kylin's storage format is pretty appropriate. From the aspect of integrating Spark, Spark made a lot of optimizations for Parquet, e.g. We can enjoy Spark's vectorized reading and lazy dict decoding, etc. And here are my thoughts about integrating Spark

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-16 Thread ShaoFeng Shi
Hi guys, I uploaded the initial design document to JIRA, please feel free to comment: https://issues.apache.org/jira/browse/KYLIN-3621 ShaoFeng Shi 于2018年10月12日周五 上午9:44写道: > JIRA and sub-tasks are created for this. Welcome to comment there: > https://issues.apache.org/jira/browse/KYLIN-3621

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-11 Thread ShaoFeng Shi
JIRA and sub-tasks are created for this. Welcome to comment there: https://issues.apache.org/jira/browse/KYLIN-3621 ShaoFeng Shi 于2018年10月8日周一 下午2:45写道: > I agree; the new storage should be Hadoop/HDFS compliant, and also need be > cloud storage (like S3, blob storage) friendly, as more and

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-08 Thread ShaoFeng Shi
I agree; the new storage should be Hadoop/HDFS compliant, and also need be cloud storage (like S3, blob storage) friendly, as more and more users are running big data analytics in the cloud. Luke Han 于2018年10月7日周日 下午7:44写道: > It makes sense to bring a better storage option for Kylin. > > The

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-07 Thread Luke Han
It makes sense to bring a better storage option for Kylin. The option should be open and people could have different ways to create an adaptor for the underlying storage. Considering huge adoptions of Kylin today are all run on Hadoop/HDFS, I prefer for Parquet or ORC or other HDFS compatible

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-03 Thread Li Yang
Love this discussion. Like to highlight 3 major roles HBase is playing currently, so we don't miss any of them when looking for a replacement. 1) Storage: A high speed big data storage 2) Cache: A distributed storage cache layer (was BlockCache) 3) MPP: A distributed computation framework (was

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-01 Thread ShaoFeng Shi
Hi Billy, Yes, the cloud storage should be considered. The traditional file layouts on HDFS may not work well on cloud storage. Kylin needs to allow extension here. I will add this to the requirement. Billy Liu 于2018年9月29日周六 下午3:22写道: > Hi Shaofeng, > > I'd like to add one more character:

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-29 Thread Billy Liu
Hi Shaofeng, I'd like to add one more character: cloud-native storage support. Quite a few users are using S3 on AWS, or Azure Data Lake Storage on Azure. If new storage engine could be more cloud friendly, more user could get benefits from it. With Warm regards Billy Liu ShaoFeng Shi

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-28 Thread ShaoFeng Shi
Hi Yanghong, Thanks for your question. I think it is not required that other engines know how to read Kylin's storage, but it is a nice to have if possible. We can extend the file format if Parquet or ORC couldn't match Kylin's requirement, but not necessary to re-invent a new format. Zhong,

Re:Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-28 Thread Ma Gang
I like parquet, it is very efficient format and supported by various projects, but there are some questions if we use parquet as the cube storage format: 1. Is it possible to locate a cuboid quickly in a parquet file? How to save cuboid metadata info in the parquet's FileMetaData, just in the

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-28 Thread Zhong, Yanghong
I have one question about the characteristics of Kylin columnar storage files. That is whether it should be a standard or common one. Since the data stored in the storage engine is Kylin specified, is it necessary for other engines to know how to build data into and how to read data from the

[DISCUSS] Columnar storage engine for Apache Kylin

2018-09-28 Thread ShaoFeng Shi
Hi Kylin developers. HBase has been Kylin’s storage engine since the first day; Kylin on HBase has been verified as a success which can support low latency & high concurrency queries on a very large data scale. Thanks to HBase, most Kylin users can get on average less than 1-second query