Re: [Discussion] Please vote and comment for carbon data file format change

2016-12-10 Thread Jean-Baptiste Onofré
+1

Regards
JB⁣​

On Dec 10, 2016, 09:33, at 09:33, "bill.zhou"  wrote:
>+1  this modification will help all the scenario
>
>Kumar Vishal wrote
>> ​Hello All,
>> 
>> Improving carbon first time query performance
>> 
>> Reason:
>> 1. As file system cache is cleared file reading will make it slower
>to
>> read
>> and cache
>> 2. In first time query carbon will have to read the footer from file
>data
>> file to form the btree
>> 3. Carbon reading more footer data than its required(data chunk)
>> 4. There are lots of random seek is happening in carbon as column
>> data(data
>> page, rle, inverted index) are not stored together.
>> 
>> Solution:
>> 1. Improve block loading time. This can be done by removing data
>chunk
>> from
>> blockletInfo and storing only offset and length of data chunk
>> 2. compress presence meta bitset stored for null values for measure
>column
>> using snappy
>> 3. Store the metadata and data of a column together and read together
>this
>> reduces random seek and improve IO
>> 
>> For this I am planing to change the carbondata thrift format
>> 
>> *Old format*
>> 
>> 
>> 
>> *New format*
>> 
>> 
>> 
>> *​*
>> 
>> Please vote and comment for this new format change
>> 
>> -Regards
>> Kumar Vishal
>
>
>
>
>
>--
>View this message in context:
>http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4049.html
>Sent from the Apache CarbonData Mailing List archive mailing list
>archive at Nabble.com.


Re: [Discussion] Please vote and comment for carbon data file format change

2016-12-09 Thread jarray888
+1 , currrent dataformat have first time query slow issue , should be fixed.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4018.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [Discussion] Please vote and comment for carbon data file format change

2016-11-29 Thread Kumar Vishal
Hi All,
Please find the JIRA issue which I have raised for above discussion.

https://issues.apache.org/jira/browse/CARBONDATA-458

-Regards
Kumar Vishal

On Tue, Nov 29, 2016 at 7:14 PM, Kumar Vishal <kumarvishal1...@gmail.com>
wrote:

> Hi Jihong Ma,
> Please find the attachment.
>
> -Regards
> Kumar Vishal
>
> On Fri, Nov 4, 2016 at 12:16 AM, Jihong Ma <jihong...@huawei.com> wrote:
>
>> Hi Kumar,
>>
>> Please place the proposed format changes in attachment or attach to the
>> associated JIRA, I would like to take a look.
>>
>> Thanks!
>>
>> Jihong
>>
>> -Original Message-
>> From: Jacky Li [mailto:jacky.li...@qq.com]
>> Sent: Thursday, November 03, 2016 7:54 AM
>> To: dev@carbondata.incubator.apache.org
>> Subject: Re: [Discussion] Please vote and comment for carbon data file
>> format change
>>
>> The proposed change is reasonable, +1.
>> But is there a plan to make the reader backward compatible with the old
>> format? So the impact to the current deployment is minimum.
>>
>> Regards,
>> Jacky
>>
>> > 在 2016年11月2日,上午12:38,Kumar Vishal <kumarvishal1...@gmail.com> 写道:
>> >
>> >  Hi Xiaoqiao He,
>> >
>> > Please find the attachment.
>> >
>> > -Regards
>> > Kumar Vishal
>> >
>> > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <xq.he2...@gmail.com
>> <mailto:xq.he2...@gmail.com>> wrote:
>> > Hi Kumar Vishal,
>> >
>> > I couldn't get Fig. of the file format, could you re-upload them?
>> > Thanks.
>> >
>> > Best Regards
>> >
>> > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <kumarvishal1...@gmail.com
>> <mailto:kumarvishal1...@gmail.com>>
>> > wrote:
>> >
>> > >
>> > > ​Hello All,
>> > >
>> > > Improving carbon first time query performance
>> > >
>> > > Reason:
>> > > 1. As file system cache is cleared file reading will make it slower to
>> > > read and cache
>> > > 2. In first time query carbon will have to read the footer from file
>> data
>> > > file to form the btree
>> > > 3. Carbon reading more footer data than its required(data chunk)
>> > > 4. There are lots of random seek is happening in carbon as column
>> > > data(data page, rle, inverted index) are not stored together.
>> > >
>> > > Solution:
>> > > 1. Improve block loading time. This can be done by removing data chunk
>> > > from blockletInfo and storing only offset and length of data chunk
>> > > 2. compress presence meta bitset stored for null values for measure
>> column
>> > > using snappy
>> > > 3. Store the metadata and data of a column together and read together
>> this
>> > > reduces random seek and improve IO
>> > >
>> > > For this I am planing to change the carbondata thrift format
>> > >
>> > > *Old format*
>> > >
>> > >
>> > >
>> > > *New format*
>> > >
>> > >
>> > >
>> > > *​*
>> > >
>> > > Please vote and comment for this new format change
>> > >
>> > > -Regards
>> > > Kumar Vishal
>> > >
>> > >
>> > >
>> > >
>> >
>>
>>
>


Re: [Discussion] Please vote and comment for carbon data file format change

2016-11-29 Thread Kumar Vishal
Hi Jihong Ma,
Please find the attachment.

-Regards
Kumar Vishal

On Fri, Nov 4, 2016 at 12:16 AM, Jihong Ma <jihong...@huawei.com> wrote:

> Hi Kumar,
>
> Please place the proposed format changes in attachment or attach to the
> associated JIRA, I would like to take a look.
>
> Thanks!
>
> Jihong
>
> -Original Message-
> From: Jacky Li [mailto:jacky.li...@qq.com]
> Sent: Thursday, November 03, 2016 7:54 AM
> To: dev@carbondata.incubator.apache.org
> Subject: Re: [Discussion] Please vote and comment for carbon data file
> format change
>
> The proposed change is reasonable, +1.
> But is there a plan to make the reader backward compatible with the old
> format? So the impact to the current deployment is minimum.
>
> Regards,
> Jacky
>
> > 在 2016年11月2日,上午12:38,Kumar Vishal <kumarvishal1...@gmail.com> 写道:
> >
> >  Hi Xiaoqiao He,
> >
> > Please find the attachment.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <xq.he2...@gmail.com
> <mailto:xq.he2...@gmail.com>> wrote:
> > Hi Kumar Vishal,
> >
> > I couldn't get Fig. of the file format, could you re-upload them?
> > Thanks.
> >
> > Best Regards
> >
> > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <kumarvishal1...@gmail.com
> <mailto:kumarvishal1...@gmail.com>>
> > wrote:
> >
> > >
> > > ​Hello All,
> > >
> > > Improving carbon first time query performance
> > >
> > > Reason:
> > > 1. As file system cache is cleared file reading will make it slower to
> > > read and cache
> > > 2. In first time query carbon will have to read the footer from file
> data
> > > file to form the btree
> > > 3. Carbon reading more footer data than its required(data chunk)
> > > 4. There are lots of random seek is happening in carbon as column
> > > data(data page, rle, inverted index) are not stored together.
> > >
> > > Solution:
> > > 1. Improve block loading time. This can be done by removing data chunk
> > > from blockletInfo and storing only offset and length of data chunk
> > > 2. compress presence meta bitset stored for null values for measure
> column
> > > using snappy
> > > 3. Store the metadata and data of a column together and read together
> this
> > > reduces random seek and improve IO
> > >
> > > For this I am planing to change the carbondata thrift format
> > >
> > > *Old format*
> > >
> > >
> > >
> > > *New format*
> > >
> > >
> > >
> > > *​*
> > >
> > > Please vote and comment for this new format change
> > >
> > > -Regards
> > > Kumar Vishal
> > >
> > >
> > >
> > >
> >
>
>


RE: [Discussion] Please vote and comment for carbon data file format change

2016-11-03 Thread Jihong Ma
Hi Kumar, 

Please place the proposed format changes in attachment or attach to the 
associated JIRA, I would like to take a look. 

Thanks!

Jihong

-Original Message-
From: Jacky Li [mailto:jacky.li...@qq.com] 
Sent: Thursday, November 03, 2016 7:54 AM
To: dev@carbondata.incubator.apache.org
Subject: Re: [Discussion] Please vote and comment for carbon data file format 
change

The proposed change is reasonable, +1.
But is there a plan to make the reader backward compatible with the old format? 
So the impact to the current deployment is minimum.

Regards,
Jacky

> 在 2016年11月2日,上午12:38,Kumar Vishal <kumarvishal1...@gmail.com> 写道:
> 
>  Hi Xiaoqiao He,
>   
> Please find the attachment.
> 
> -Regards
> Kumar Vishal
> 
> On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He <xq.he2...@gmail.com 
> <mailto:xq.he2...@gmail.com>> wrote:
> Hi Kumar Vishal,
> 
> I couldn't get Fig. of the file format, could you re-upload them?
> Thanks.
> 
> Best Regards
> 
> On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal <kumarvishal1...@gmail.com 
> <mailto:kumarvishal1...@gmail.com>>
> wrote:
> 
> >
> > ​Hello All,
> >
> > Improving carbon first time query performance
> >
> > Reason:
> > 1. As file system cache is cleared file reading will make it slower to
> > read and cache
> > 2. In first time query carbon will have to read the footer from file data
> > file to form the btree
> > 3. Carbon reading more footer data than its required(data chunk)
> > 4. There are lots of random seek is happening in carbon as column
> > data(data page, rle, inverted index) are not stored together.
> >
> > Solution:
> > 1. Improve block loading time. This can be done by removing data chunk
> > from blockletInfo and storing only offset and length of data chunk
> > 2. compress presence meta bitset stored for null values for measure column
> > using snappy
> > 3. Store the metadata and data of a column together and read together this
> > reduces random seek and improve IO
> >
> > For this I am planing to change the carbondata thrift format
> >
> > *Old format*
> >
> >
> >
> > *New format*
> >
> >
> >
> > *​*
> >
> > Please vote and comment for this new format change
> >
> > -Regards
> > Kumar Vishal
> >
> >
> >
> >
> 



Re: [Discussion] Please vote and comment for carbon data file format change

2016-11-03 Thread Kumar Vishal
Dear Jacky,
   Yes I am planning to support both data format reader(new and
old) + writer(new and old), default new writer will be enabled, but if user
wants to write in older format for that i will expose one configuration.
Please let me know if you have any other suggestion.

-Regards
Kumar Vishal

On Thu, Nov 3, 2016 at 8:24 PM, Jacky Li  wrote:

> The proposed change is reasonable, +1.
> But is there a plan to make the reader backward compatible with the old
> format? So the impact to the current deployment is minimum.
>
> Regards,
> Jacky
>
> > 在 2016年11月2日,上午12:38,Kumar Vishal  写道:
> >
> >  Hi Xiaoqiao He,
> >
> > Please find the attachment.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He  > wrote:
> > Hi Kumar Vishal,
> >
> > I couldn't get Fig. of the file format, could you re-upload them?
> > Thanks.
> >
> > Best Regards
> >
> > On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal  >
> > wrote:
> >
> > >
> > > ​Hello All,
> > >
> > > Improving carbon first time query performance
> > >
> > > Reason:
> > > 1. As file system cache is cleared file reading will make it slower to
> > > read and cache
> > > 2. In first time query carbon will have to read the footer from file
> data
> > > file to form the btree
> > > 3. Carbon reading more footer data than its required(data chunk)
> > > 4. There are lots of random seek is happening in carbon as column
> > > data(data page, rle, inverted index) are not stored together.
> > >
> > > Solution:
> > > 1. Improve block loading time. This can be done by removing data chunk
> > > from blockletInfo and storing only offset and length of data chunk
> > > 2. compress presence meta bitset stored for null values for measure
> column
> > > using snappy
> > > 3. Store the metadata and data of a column together and read together
> this
> > > reduces random seek and improve IO
> > >
> > > For this I am planing to change the carbondata thrift format
> > >
> > > *Old format*
> > >
> > >
> > >
> > > *New format*
> > >
> > >
> > >
> > > *​*
> > >
> > > Please vote and comment for this new format change
> > >
> > > -Regards
> > > Kumar Vishal
> > >
> > >
> > >
> > >
> >
>
>


Re: [Discussion] Please vote and comment for carbon data file format change

2016-11-01 Thread Kumar Vishal
* Hi Xiaoqiao He*,

Please find the *attachment.*

*-Regards*
*Kumar Vishal*

On Tue, Nov 1, 2016 at 9:27 PM, Xiaoqiao He  wrote:

> Hi Kumar Vishal,
>
> I couldn't get Fig. of the file format, could you re-upload them?
> Thanks.
>
> Best Regards
>
> On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal 
> wrote:
>
> >
> > ​Hello All,
> >
> > Improving carbon first time query performance
> >
> > Reason:
> > 1. As file system cache is cleared file reading will make it slower to
> > read and cache
> > 2. In first time query carbon will have to read the footer from file data
> > file to form the btree
> > 3. Carbon reading more footer data than its required(data chunk)
> > 4. There are lots of random seek is happening in carbon as column
> > data(data page, rle, inverted index) are not stored together.
> >
> > Solution:
> > 1. Improve block loading time. This can be done by removing data chunk
> > from blockletInfo and storing only offset and length of data chunk
> > 2. compress presence meta bitset stored for null values for measure
> column
> > using snappy
> > 3. Store the metadata and data of a column together and read together
> this
> > reduces random seek and improve IO
> >
> > For this I am planing to change the carbondata thrift format
> >
> > *Old format*
> >
> >
> >
> > *New format*
> >
> >
> >
> > *​*
> >
> > Please vote and comment for this new format change
> >
> > -Regards
> > Kumar Vishal
> >
> >
> >
> >
>


Re: [Discussion] Please vote and comment for carbon data file format change

2016-11-01 Thread Xiaoqiao He
Hi Kumar Vishal,

I couldn't get Fig. of the file format, could you re-upload them?
Thanks.

Best Regards

On Tue, Nov 1, 2016 at 7:12 PM, Kumar Vishal 
wrote:

>
> ​Hello All,
>
> Improving carbon first time query performance
>
> Reason:
> 1. As file system cache is cleared file reading will make it slower to
> read and cache
> 2. In first time query carbon will have to read the footer from file data
> file to form the btree
> 3. Carbon reading more footer data than its required(data chunk)
> 4. There are lots of random seek is happening in carbon as column
> data(data page, rle, inverted index) are not stored together.
>
> Solution:
> 1. Improve block loading time. This can be done by removing data chunk
> from blockletInfo and storing only offset and length of data chunk
> 2. compress presence meta bitset stored for null values for measure column
> using snappy
> 3. Store the metadata and data of a column together and read together this
> reduces random seek and improve IO
>
> For this I am planing to change the carbondata thrift format
>
> *Old format*
>
>
>
> *New format*
>
>
>
> *​*
>
> Please vote and comment for this new format change
>
> -Regards
> Kumar Vishal
>
>
>
>