Re: Impala 2000 columns recommendations

Dejan Prokić Mon, 19 Nov 2018 21:37:54 -0800

I suppose that you want to create something like partitioned table with
measures. I would suggest you to split than table into smaller logical
units and create view which would be join of those smaller tables. That
would also speed up recalculation in the future since you would have to
rewrite much less columns if you want to recalculate only few columns.


In case you are making dimension table I believe it is not the problem to
have such big number of columns since it would have not many rows.

In case you use it for machine learning features, I think you still can
split it into smaller chunks and merge it from code.

Hope you find this useful.

Cheers,

Dejan Prokić | Data Engineer | Nordeus

uto, 20. nov 2018. 01:11 David Lauzon <davidonlap...@gmail.com> је
написао/ла:

> Hi folks!
>
> I'm evaluating the possibility to build an Impala table with 2200 columns.
> I came across the recommendation of max 2000 columns
> <https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_scalability.html#big_tables>
> in the documentation and would like to understand the impact.
>
> So far, I've found that it could impact the memory usage of the catalog
> service. Is the catalog memory usage formula
> <https://www.slideshare.net/cloudera/the-impala-cookbook-42530186/17> is
> still relevant? What other performance aspects I should consider?
>
> Thanks,
>
> -David
>

Re: Impala 2000 columns recommendations

Reply via email to