they did a pretty good job explaining why Kudu was born in the docs https://kudu.apache.org/docs/
and there are tons of posts on the subject including my post https://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala/ Mainly, Kudu allows to have mutable data and faster seeks. So if your table does not need to be updated in real-time and you are fine with doing batch reprocessing and managing partitions, you will be happy with Impala/hdfs. Actually, we use Hive in such cases to do processing, and then our users would use Impala to query. On Sat, Dec 14, 2019 at 8:53 AM l vic <lvic4...@gmail.com> wrote: > Naive question: when using of Impala/hdfs would be preferable over > Impala/kudu? In particular: what would makes more sense for large tables ( > > 1TB)? > Thanks... >