Re: [DISCUSSION] About data backward compatibility

2017-08-17 Thread Erlu Chen
Agree with caolu, I think users may be confused by lots of format.

In the future, it will be better for carbon to unify the data format. The
unified format should compatible with previous format. If it is unavoidable
to give different format to support different use case to gain better
performance, I think we can add configuration parameter in this unified
format. 

The key point is CarbonData should have only one format.
It will be better for user to understand and also better for developers to
extend.


Regards.
Chenerlu.





--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-compatibility-tp20183p20423.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] About data backward compatibility

2017-08-15 Thread bill.zhou
hi  Jacky & Ravindra
 
 I think it should make sure the migration tool performance is good then can
chose option 2. 
like some user already use v1 for 1 year, there are 1 year history data, the
Migration tool should be make sure the high performance. following is my
thinking the key feature should have
1. the tool should have high performance, like 1 year data can migration in
1 day.
2. the tool support breakpoint migration, like there is 1 one table which
has 1 year data and 365 segment, user can migrate this table one by one
segment, and can read this table for the segment which already migrated .

ravipesala wrote
> Hi Jacky,
> 
> I feel option 2 is better but it should have one time migration of data
> from old formats to latest format. So first we should have migration tool
> support to read old format data using the old version and write using the
> latest version. This migration should be capable of supporting any old
> format to latest format.  We can support only V3 going forward so we can
> clean all the old format code after having this tool.
> 
> Regards,
> Ravindra.
> 
> On 12 August 2017 at 11:43, Jacky Li <

> jacky.likun@

> > wrote:
> 
>> Hi All,
>>
>> As I am implementing new encoding feature for carbondata, I found it is
>> hard to maintain both read and write backward compatibility with all
>> CarbonData format including V1, V2, and V3.
>>
>> In this post, I want to discuss the roadmap for backward compatibility
>> support.
>>
>> I am proposing following feature plan:
>> 1. For the write support. Start from CarbonData 1.2 onwards, support
>> writing V3 format only.
>> V3 format is introduced in CarbonData 1.1 (2017 Feb), and it is stable
>> for
>> more than half year now. And since we are going to add new feature in V3
>> format only, it is better we clean the writing path for V3 format. If
>> there
>> are bugs in V1 and V2 format, we still will fix it in maintenance version
>> before CarbonData 1.1
>>
>> 2. For the read support, there are two options.
>> Option 1: Support reading V1 and V2 format, and in CarbonData 1.3, build
>> data migration tool to help user to migrate old carbon store. Stop
>> supporting reading V1 and V2 after CarbonData 1.3
>> The pro is that if there are still some users are using V1 or V2 carbon
>> in
>> there application, they can continue to use CarbonData 1.2 to read the
>> old
>> data.
>> The con is that any new feature introduced for V3 need to be careful and
>> should not break read compatibility of V1 and V2. Like, some new encoding
>> will be every hard to introduce.
>>
>> Option 2: Support reading V3 format starting from CarbonData 1.2
>> The pro is that code will be more clean and no restriction of add new
>> encoding.
>> The con is that any old carbon store that based on V1 and V2 format, it
>> can be read using CarbonData 1.1 only.
>>
>> I want to collect the opinion form community, if there are users still
>> using V1 or V2 format, I think it is saver to go with Option 1.
>> Otherwise,
>> if all users are using V3 format (CarbonData 1.1 and 1.1.1), I think
>> Option
>> 2 is a better choice.
>>
>>
>> Thanks,
>> Jacky Li
>>
>>
> 
> 
> -- 
> Thanks & Regards,
> Ravi





--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-compatibility-tp20183p20229.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] About data backward compatibility

2017-08-14 Thread Lu Cao
+1
Option2 is better. Multiple formats will make user and developer confuse,
should integrate ASAP.

On Tue, Aug 15, 2017 at 10:18 AM, David CaiQiang 
wrote:

> I agree with Ravindra, now is the time to implement migration tool.
>
>
>
> -
> Best Regards
> David Cai
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-
> compatibility-tp20183p20219.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>


Re: [DISCUSSION] About data backward compatibility

2017-08-14 Thread David CaiQiang
I agree with Ravindra, now is the time to implement migration tool.



-
Best Regards
David Cai
--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-About-data-backward-compatibility-tp20183p20219.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.


Re: [DISCUSSION] About data backward compatibility

2017-08-12 Thread Ravindra Pesala
Hi Jacky,

I feel option 2 is better but it should have one time migration of data
from old formats to latest format. So first we should have migration tool
support to read old format data using the old version and write using the
latest version. This migration should be capable of supporting any old
format to latest format.  We can support only V3 going forward so we can
clean all the old format code after having this tool.

Regards,
Ravindra.

On 12 August 2017 at 11:43, Jacky Li  wrote:

> Hi All,
>
> As I am implementing new encoding feature for carbondata, I found it is
> hard to maintain both read and write backward compatibility with all
> CarbonData format including V1, V2, and V3.
>
> In this post, I want to discuss the roadmap for backward compatibility
> support.
>
> I am proposing following feature plan:
> 1. For the write support. Start from CarbonData 1.2 onwards, support
> writing V3 format only.
> V3 format is introduced in CarbonData 1.1 (2017 Feb), and it is stable for
> more than half year now. And since we are going to add new feature in V3
> format only, it is better we clean the writing path for V3 format. If there
> are bugs in V1 and V2 format, we still will fix it in maintenance version
> before CarbonData 1.1
>
> 2. For the read support, there are two options.
> Option 1: Support reading V1 and V2 format, and in CarbonData 1.3, build
> data migration tool to help user to migrate old carbon store. Stop
> supporting reading V1 and V2 after CarbonData 1.3
> The pro is that if there are still some users are using V1 or V2 carbon in
> there application, they can continue to use CarbonData 1.2 to read the old
> data.
> The con is that any new feature introduced for V3 need to be careful and
> should not break read compatibility of V1 and V2. Like, some new encoding
> will be every hard to introduce.
>
> Option 2: Support reading V3 format starting from CarbonData 1.2
> The pro is that code will be more clean and no restriction of add new
> encoding.
> The con is that any old carbon store that based on V1 and V2 format, it
> can be read using CarbonData 1.1 only.
>
> I want to collect the opinion form community, if there are users still
> using V1 or V2 format, I think it is saver to go with Option 1. Otherwise,
> if all users are using V3 format (CarbonData 1.1 and 1.1.1), I think Option
> 2 is a better choice.
>
>
> Thanks,
> Jacky Li
>
>


-- 
Thanks & Regards,
Ravi


[DISCUSSION] About data backward compatibility

2017-08-11 Thread Jacky Li
Hi All,

As I am implementing new encoding feature for carbondata, I found it is hard to 
maintain both read and write backward compatibility with all CarbonData format 
including V1, V2, and V3.

In this post, I want to discuss the roadmap for backward compatibility support. 

I am proposing following feature plan:
1. For the write support. Start from CarbonData 1.2 onwards, support writing V3 
format only. 
V3 format is introduced in CarbonData 1.1 (2017 Feb), and it is stable for more 
than half year now. And since we are going to add new feature in V3 format 
only, it is better we clean the writing path for V3 format. If there are bugs 
in V1 and V2 format, we still will fix it in maintenance version before 
CarbonData 1.1

2. For the read support, there are two options.
Option 1: Support reading V1 and V2 format, and in CarbonData 1.3, build data 
migration tool to help user to migrate old carbon store. Stop supporting 
reading V1 and V2 after CarbonData 1.3
The pro is that if there are still some users are using V1 or V2 carbon in 
there application, they can continue to use CarbonData 1.2 to read the old data.
The con is that any new feature introduced for V3 need to be careful and should 
not break read compatibility of V1 and V2. Like, some new encoding will be 
every hard to introduce.

Option 2: Support reading V3 format starting from CarbonData 1.2
The pro is that code will be more clean and no restriction of add new encoding.
The con is that any old carbon store that based on V1 and V2 format, it can be 
read using CarbonData 1.1 only.

I want to collect the opinion form community, if there are users still using V1 
or V2 format, I think it is saver to go with Option 1. Otherwise, if all users 
are using V3 format (CarbonData 1.1 and 1.1.1), I think Option 2 is a better 
choice.


Thanks,
Jacky Li