they did a pretty good job explaining why Kudu was born in the docs
https://kudu.apache.org/docs/

and there are tons of posts on the subject including my post
https://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala/


Mainly, Kudu allows to have mutable data and faster seeks. So if your table
does not need to be updated in real-time and you are fine with doing batch
reprocessing and managing partitions, you will be happy with Impala/hdfs.
Actually, we use Hive in such cases to do processing, and then our users
would use Impala to query.

On Sat, Dec 14, 2019 at 8:53 AM l vic <lvic4...@gmail.com> wrote:

> Naive question: when using of Impala/hdfs would be preferable over
> Impala/kudu? In particular: what would makes more sense for large tables (
> > 1TB)?
> Thanks...
>

Reply via email to