Re: [DISCUSS] For the dimension default should be no dictionary

2017-03-02 Thread bill.zhou
hi All I summary this discussion. 1. to make carbonData compatibility for older vesion, keep DICTIONARY_INCLUDE and DICTIONARY_EXCLUDE, default is no dictionary. do not suggestion change this properties to table_dictionary. 2. Suggestion keep the sort_column properties as the same style for

Re: [DISCUSS] For the dimension default should be no dictionary

2017-03-01 Thread Ravindra Pesala
Hi All, In order to make no-dictionary columns as default we should improve the storage and performance for these columns. I have sent another mail to discuss the improvement points. Please comment on it. Regards, Ravindra On 1 March 2017 at 10:12, Ravindra Pesala wrote:

Re: [DISCUSS] For the dimension default should be no dictionary

2017-03-01 Thread Kumar Vishal
Hi Jacky, I agree with Ravindra's point by making no dictionary column by default will increase the store size and it will impact IO+ currently in carbon for no dictionary column only String data type is supported, so we cannot set dimension column as no dictionary column by default. -Regards

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Ravindra Pesala
Hi Likun, It would be same case if we use all non dictionary columns by default, it will increase the store size and decrease the performance so it is also does not encourage more users if performance is poor. If we need to make no-dictionary columns as default then we should first focus on

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread QiangCai
+1 It is not easy for user to understand the previous options. The logic of this two options SORT_COLUMNS AND TABLE_DICTIOANRY is very clear. I am coding to implement SORT_COLUMNS option by this way. Best Regards David Caiqiang -- View this message in context:

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, I agree to your point. The only concern I have is for loading, I have seen many users accidentally put high cardinality column into dictionary column then the loading failed because out of memory or loading very slow. I guess they just do not know to use DICTIONARY_EXCLUDE for these

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
> 在 2017年2月28日,下午8:35,Liang Chen 写道: > > Hi > > A couple of questions: > > 1) For SORT_KEY option: only build "MDK index, inverted index, minmax > index" for these columns which be specified into the option(SORT_KEY) ? > Yes, build MDK index, inverted index, minimax

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Ravindra Pesala
Hi Likun, You mentioned that if user does not specify dictionary columns then by default those are chosen as no dictionary columns. But we have many disadvantages as I mentioned in above mail if you keep no dictionary as default. We have initially introduced no dictionary columns to handle high

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread bill.zhou
hi Ravindra That is a good idea to conside the sort column and dictioanry column together. For the DDL usability I have following suggestion. please share your suggestion 1. sort columns properties better keep the same style like dictionary. so the key word suggestion changed to SORT_INCLUDE

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Liang Chen
Hi A couple of questions: 1) For SORT_KEY option: only build "MDK index, inverted index, minmax index" for these columns which be specified into the option(SORT_KEY) ? 2) If users don't specify TABLE_DICTIONARY, then all columns don't make dictionary encoding, and all shuffle operations are

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, first we should simplify the DDL options. I propose following options, please check weather it miss some scenario. 1. SORT_COLUMNS, or SORT_KEY This indicates three things: 1) All columns specified in options will be used to construct Multi-Dimensional Key, which will be sorted along this

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread Jacky Li
Yes, first we should simplify the DDL options. I propose following options, please check weather it miss some scenario. 1. SORT_COLUMNS, or SORT_KEY This indicates three things: 1) All columns specified in options will be used to construct Multi-Dimensional Key, which will be sorted along this

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-27 Thread Ravindra Pesala
Hi Bill, I got your point, but the solution of making no-dictionary as default may not be perfect solution. Basically no-dictionary columns are only meant for high cardinality dimensions, so the usage may change from user to user or scenario to scenario . This is the basic issue of usability of

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-26 Thread bill.zhou
Dear Vishal & Ravindra Thanks for you reply, I think I didn't describe it clearly so that you don't get full idea. 1. dictionary is important feature in CarbonData, for every new customer we will introduce this feature to him. So for new customer will know it clearly, will set the

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-26 Thread Ravindra Pesala
Hi, I feel there are more disadvantages than advantages in this approach. In your current scenario you want to set dictionary only for columns which are used as filters, but the usage of dictionary is not only limited for filters, it can reduce the store size and improve the aggregation queries.