The problem is, AFIK, that replication count is not necessarily the 
distribution count, so you can't guarantee all tablet servers will have a copy.

On Mar 16, 2018 1:41 PM, Boris Tyukin <bo...@boristyukin.com> wrote:
I'm new to Kudu but we are also going to use Impala mostly with Kudu. We have a 
few tables that are small but used a lot. My plan is replicate them more than 3 
times. When you create a kudu table, you can specify number of replicated 
copies (3 by default) and I guess you can put there a number, corresponding to 
your node count in cluster. The downside, you cannot change that number unless 
you recreate a table.

On Fri, Mar 16, 2018 at 10:42 AM, Cliff Resnick 
<cre...@gmail.com<mailto:cre...@gmail.com>> wrote:
We will soon be moving our analytics from AWS Redshift to Impala/Kudu. One 
Redshift feature that we will miss is its ALL Distribution, where a copy of a 
table is maintained on each server. We define a number of metadata tables this 
way since they are used in nearly every query. We are considering using parquet 
in HDFS cache for these, and Kudu would be a much better fit for the update 
semantics but we are worried about the additional contention.  I'm wondering if 
having a Broadcast, or ALL, tablet replication might be an easy feature to add 
to Kudu?

-Cliff

Reply via email to