Re:Re: Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-31 Thread Ma Gang
Hi ShaoFeng, Very good questions, please see my comments start with [Gang]: 1) How to bridge the real-time cube with a cube built from Hive? You know, in Kylin the source type is marked at the table level, which means a table is either a Hive table, a JDBC table or a streaming table. To implement

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-26 Thread JiaTao Tao
You are welcome, ShaoFeng! Storage and query engine are inseparable and should design together for fully gaining each other's abilities. And I'm very excited about the new coming columnar storage and query engine! -- Regards! Aron Tao ShaoFeng Shi 于2018年10月26日周五 下午10:28写道: > Exactly; Than

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-26 Thread ShaoFeng Shi
Exactly; Thank you jiatao for the comments! JiaTao Tao 于2018年10月25日周四 下午6:12写道: > As far as I'm concerned, using Parquet as Kylin's storage format is pretty > appropriate. From the aspect of integrating Spark, Spark made a lot of > optimizations for Parquet, e.g. We can enjoy Spark's vectorized

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-25 Thread JiaTao Tao
As far as I'm concerned, using Parquet as Kylin's storage format is pretty appropriate. From the aspect of integrating Spark, Spark made a lot of optimizations for Parquet, e.g. We can enjoy Spark's vectorized reading and lazy dict decoding, etc. And here are my thoughts about integrating Spark a

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-16 Thread ShaoFeng Shi
Hi guys, I uploaded the initial design document to JIRA, please feel free to comment: https://issues.apache.org/jira/browse/KYLIN-3621 ShaoFeng Shi 于2018年10月12日周五 上午9:44写道: > JIRA and sub-tasks are created for this. Welcome to comment there: > https://issues.apache.org/jira/browse/KYLIN-3621

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-11 Thread ShaoFeng Shi
JIRA and sub-tasks are created for this. Welcome to comment there: https://issues.apache.org/jira/browse/KYLIN-3621 ShaoFeng Shi 于2018年10月8日周一 下午2:45写道: > I agree; the new storage should be Hadoop/HDFS compliant, and also need be > cloud storage (like S3, blob storage) friendly, as more and more

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-07 Thread ShaoFeng Shi
I agree; the new storage should be Hadoop/HDFS compliant, and also need be cloud storage (like S3, blob storage) friendly, as more and more users are running big data analytics in the cloud. Luke Han 于2018年10月7日周日 下午7:44写道: > It makes sense to bring a better storage option for Kylin. > > The opt

RE: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-07 Thread Wang, Ken
om: Luke Han Sent: 2018年10月7日 19:44 To: dev Subject: Re: [DISCUSS] Columnar storage engine for Apache Kylin It makes sense to bring a better storage option for Kylin. The option should be open and people could have different ways to create an adaptor for the underlying storage. Considering huge a

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-07 Thread Luke Han
It makes sense to bring a better storage option for Kylin. The option should be open and people could have different ways to create an adaptor for the underlying storage. Considering huge adoptions of Kylin today are all run on Hadoop/HDFS, I prefer for Parquet or ORC or other HDFS compatible opti

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-02 Thread Li Yang
Love this discussion. Like to highlight 3 major roles HBase is playing currently, so we don't miss any of them when looking for a replacement. 1) Storage: A high speed big data storage 2) Cache: A distributed storage cache layer (was BlockCache) 3) MPP: A distributed computation framework (was Cop

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-10-01 Thread ShaoFeng Shi
Hi Billy, Yes, the cloud storage should be considered. The traditional file layouts on HDFS may not work well on cloud storage. Kylin needs to allow extension here. I will add this to the requirement. Billy Liu 于2018年9月29日周六 下午3:22写道: > Hi Shaofeng, > > I'd like to add one more character: cloud

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-29 Thread Billy Liu
Hi Shaofeng, I'd like to add one more character: cloud-native storage support. Quite a few users are using S3 on AWS, or Azure Data Lake Storage on Azure. If new storage engine could be more cloud friendly, more user could get benefits from it. With Warm regards Billy Liu ShaoFeng Shi 于2018年9月2

Re: Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-29 Thread ShaoFeng Shi
Hi Gang, very good questions, that's why we need to raise such a discussion publicly. Please check my comments below started with [shaofengshi]. Feel free to comment. 1. Is it possible to locate a cuboid quickly in a parquet file? How to save cuboid metadata info in the parquet's FileMetaData, jus

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-28 Thread ShaoFeng Shi
Hi Yanghong, Thanks for your question. I think it is not required that other engines know how to read Kylin's storage, but it is a nice to have if possible. We can extend the file format if Parquet or ORC couldn't match Kylin's requirement, but not necessary to re-invent a new format. Zhong, Yang

Re: [DISCUSS] Columnar storage engine for Apache Kylin

2018-09-28 Thread Zhong, Yanghong
I have one question about the characteristics of Kylin columnar storage files. That is whether it should be a standard or common one. Since the data stored in the storage engine is Kylin specified, is it necessary for other engines to know how to build data into and how to read data from the sto