RE: Abstracting CarbonData's Index Interface

2016-10-04 Thread Jihong Ma
, 2016 9:15 PM To: dev@carbondata.incubator.apache.org Subject: Re: Abstracting CarbonData's Index Interface > 在 2016年10月4日,上午8:01,Jihong Ma <jihong...@huawei.com> 写道: > > It is a great idea to open the door for more flexible/scalable way of > accessing index to help with query p

Re: Abstracting CarbonData's Index Interface

2016-10-04 Thread Qingqing Zhou
On Mon, Oct 3, 2016 at 9:14 PM, Jacky Li wrote: > If data is append only, I think read and load of index is enough. I > think you are right, we should have a clean Index API, currently I am > thinking of read and load API only: I am not super fan for external index at this

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
Original Message- > From: Jacky Li [mailto:jacky.li...@qq.com] > Sent: Sunday, October 02, 2016 10:25 PM > To: dev@carbondata.incubator.apache.org > Subject: Re: Abstracting CarbonData's Index Interface > > After a second thought regarding the index part, another option is that to > hav

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
> 在 2016年10月4日,上午5:43,Qingqing Zhou 写道: > > On Fri, Sep 30, 2016 at 10:31 PM, Jacky Li wrote: >> However, it also introduces memory consumption of the index tree and >> impact first query time because the process of loading of index from >> file

RE: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jihong Ma
Jacky Li [mailto:jacky.li...@qq.com] Sent: Sunday, October 02, 2016 10:25 PM To: dev@carbondata.incubator.apache.org Subject: Re: Abstracting CarbonData's Index Interface After a second thought regarding the index part, another option is that to have a very simple Segment definition which can only list all fil

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Qingqing Zhou
On Fri, Sep 30, 2016 at 10:31 PM, Jacky Li wrote: > However, it also introduces memory consumption of the index tree and > impact first query time because the process of loading of index from > file footer into memory. On the other side, in a multi-tennant > environment,

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
I have created a JIRA and a PR for this: CARBONDATA-284 (https://issues.apache.org/jira/browse/CARBONDATA-284) PR208 (https://github.com/apache/incubator-carbondata/pull/208) Please review the interface Regards, Jacky -- View this message in context:

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
Sure, I think what I am doing will not affect how index is stored and load for the current in memory B tree approach, I am only adding the interface for it. You can go ahead and continue your part. Regards, Jacky > 在 2016年10月3日,下午6:36,Kumar Vishal [via Apache CarbonData Mailing List archive]

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Kumar Vishal
Hi Jacky, I am also changing the carbondata file thrift structure to read less and only required data while loading the btree, Main changes will be removing the data chunk from blocklet info and keeping only the offset of the data chunk and some of the redundant information which is

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Jacky Li
Agreed. Shall I create a JIRA issue and PR for this abstraction? I think reviewing on the interface code will be clearer. Regards, Jacky > 在 2016年10月3日,下午2:38,Aniket Adnaik [via Apache CarbonData Mailing List > archive] 写道: > > I would agree with having

Re: Abstracting CarbonData's Index Interface

2016-10-03 Thread Aniket Adnaik
I would agree with having simple segment definition. Segment can use a metadata info that describes the segment - For example; Segment type, index availability, index type, index storage type (attached or detached/secondary) etc. For streaming ingest segment, it also may possibly contain min-max

Re: Abstracting CarbonData's Index Interface

2016-10-02 Thread Jacky Li
After a second thought regarding the index part, another option is that to have a very simple Segment definition which can only list all files it has or listFile taking the QueryModel as input, implementation of Segment can be IndexSegment, MultiIndexSegment or StreamingSegment (no index). In

Re: Abstracting CarbonData's Index Interface

2016-10-02 Thread Jacky Li
I am currently thinking these abstractions: - A SegmentManager is the global manager of all segments for one table. It can be used to get all segments and manage the segment while loading and compaction. - A CarbonInputFormat will take the input of table path, so means it represent the whole

Re: Abstracting CarbonData's Index Interface

2016-10-02 Thread Venkata Gollamudi
Yes Jacky, interfaces needs to be revisited. For Goal 1 and Goal 2: abstraction required for both Index and Index store. Also multi-column index(composite index) needs to be considered. Regards, Ramana On Sat, Oct 1, 2016 at 11:01 AM, Jacky Li wrote: > Hi community, > >