thanks Dan! I am mostly worried about read/join performance with Impala.
Should I do 3 partitions to be a safe side? For example, one of the most
frequently used table is only 50Mb before replication and another one 130Mb
(I checked total tablets size). Impala would broadcast them almost always,
doing a full scan of the large 10B table first. What do you think?

On Mon, Oct 15, 2018 at 3:45 PM Dan Burkert <danburk...@apache.org> wrote:

> Often for these cases having multiple partitions doesn't provide any
> advantage.  There are fixed-cost overheads to having many tablets, so if
> the tablets are small these costs can outweigh the benefit.  Additionally,
> if you aren't actively writing to the table then the benefit of
> parallelizing those writes isn't there.  As far as join performance, that's
> usually dominated by the join strategy used by the SQL engine, and whether
> the engine can take advantage of the size disparity in the tables.
>
> - Dan
>
> On Mon, Oct 15, 2018 at 10:44 AM Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> Out of 300 tables I need to ingest into Kudu, 250 are really small - less
>> than 500k rows and will fit in a single 1Gb partition. Does it still make
>> sense to create 3 partitions or have no partitions at all?
>>
>> Some of these tables are frequently joined to very large 1-10B row
>> tables...
>>
>> Thanks,
>> Boris
>>
>

Reply via email to