hi All
I summary this discussion.
1. to make carbonData compatibility for older vesion, keep
DICTIONARY_INCLUDE and DICTIONARY_EXCLUDE, default is no dictionary. do not
suggestion change this properties to table_dictionary.
2. Suggestion keep the sort_column properties as the same style for
Hi All,
In order to make no-dictionary columns as default we should improve the
storage and performance for these columns. I have sent another mail to
discuss the improvement points. Please comment on it.
Regards,
Ravindra
On 1 March 2017 at 10:12, Ravindra Pesala wrote:
Hi Jacky,
I agree with Ravindra's point by making no dictionary column by default
will increase the store size and it will impact IO+ currently in carbon for
no dictionary column only String data type is supported, so we cannot set
dimension column as no dictionary column by default.
-Regards
Hi Likun,
It would be same case if we use all non dictionary columns by default, it
will increase the store size and decrease the performance so it is also
does not encourage more users if performance is poor.
If we need to make no-dictionary columns as default then we should first
focus on
+1
It is not easy for user to understand the previous options.
The logic of this two options SORT_COLUMNS AND TABLE_DICTIOANRY is very
clear.
I am coding to implement SORT_COLUMNS option by this way.
Best Regards
David Caiqiang
--
View this message in context:
Yes, I agree to your point. The only concern I have is for loading, I have seen
many users accidentally put high cardinality column into dictionary column then
the loading failed because out of memory or loading very slow. I guess they
just do not know to use DICTIONARY_EXCLUDE for these
> 在 2017年2月28日,下午8:35,Liang Chen 写道:
>
> Hi
>
> A couple of questions:
>
> 1) For SORT_KEY option: only build "MDK index, inverted index, minmax
> index" for these columns which be specified into the option(SORT_KEY) ?
>
Yes, build MDK index, inverted index, minimax
Hi Likun,
You mentioned that if user does not specify dictionary columns then by
default those are chosen as no dictionary columns.
But we have many disadvantages as I mentioned in above mail if you keep no
dictionary as default. We have initially introduced no dictionary columns
to handle high
hi Ravindra
That is a good idea to conside the sort column and dictioanry column
together.
For the DDL usability I have following suggestion. please share your
suggestion
1. sort columns properties better keep the same style like dictionary.
so the key word suggestion changed to SORT_INCLUDE
Hi
A couple of questions:
1) For SORT_KEY option: only build "MDK index, inverted index, minmax
index" for these columns which be specified into the option(SORT_KEY) ?
2) If users don't specify TABLE_DICTIONARY, then all columns don't make
dictionary encoding, and all shuffle operations are
Yes, first we should simplify the DDL options. I propose following options,
please check weather it miss some scenario.
1. SORT_COLUMNS, or SORT_KEY
This indicates three things:
1) All columns specified in options will be used to construct
Multi-Dimensional Key, which will be sorted along this
Yes, first we should simplify the DDL options. I propose following options,
please check weather it miss some scenario.
1. SORT_COLUMNS, or SORT_KEY
This indicates three things:
1) All columns specified in options will be used to construct Multi-Dimensional
Key, which will be sorted along this
Hi Bill,
I got your point, but the solution of making no-dictionary as default may
not be perfect solution. Basically no-dictionary columns are only meant for
high cardinality dimensions, so the usage may change from user to user or
scenario to scenario .
This is the basic issue of usability of
Dear Vishal & Ravindra
Thanks for you reply, I think I didn't describe it clearly so that you
don't get full idea.
1. dictionary is important feature in CarbonData, for every new customer we
will introduce this feature to him. So for new customer will know it
clearly, will set the
Hi,
I feel there are more disadvantages than advantages in this approach. In
your current scenario you want to set dictionary only for columns which are
used as filters, but the usage of dictionary is not only limited for
filters, it can reduce the store size and improve the aggregation queries.
15 matches
Mail list logo