Also should mention that we currently limit the number of replicas of a table to 7 due to the '--max-num-replicas' flag. In order to change this you'd have to enable --unlock-unsafe-flags, which means you're going into untested territory. Your mileage may vary but I wouldn't try it on a production system.
-Todd On Fri, Mar 16, 2018 at 12:00 PM, Dan Burkert <[email protected]> wrote: > > On Fri, Mar 16, 2018 at 11:35 AM, Clifford Resnick <[email protected] > > wrote: > >> Thanks for that, glad I was wrong there! Aside from replication >> considerations, is it also recommended the number of tablet servers be odd? >> > > No, so long as you have enough tablet servers to host your desired > replication factor you should be fine. In production scenarios we > typically recommend at least 4, since if you are 3x replicated and suffer a > permanent node failure, the 4th node comes in handy as a fail-over target > (Kudu will do this automatically). But above and beyond that you don't > need to worry about odd/even WRT number of tablet servers. > > - Dan > > >> >> From: Dan Burkert <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Friday, March 16, 2018 at 2:09 PM >> To: "[email protected]" <[email protected]> >> Subject: Re: "broadcast" tablet replication for kudu? >> >> The replication count is the number of tablet servers which Kudu will >> host copies on. So if you set the replication level to 5, Kudu will put >> the data on 5 separate tablet servers. There's no built-in broadcast table >> feature; upping the replication factor is the closest thing. A couple of >> things to keep in mind: >> >> - Always use an odd replication count. This is important due to how the >> Raft algorithm works. Recent versions of Kudu won't even let you specify >> an even number without flipping some flags. >> - We don't test much much beyond 5 replicas. It *should* work, but you >> may run in to issues since it's a relatively rare configuration. With a >> heavy write workload and many replicas you are even more likely to >> encounter issues. >> >> It's also worth checking in an Impala forum whether it has features that >> make joins against small broadcast tables better? Perhaps Impala can cache >> small tables locally when doing joins. >> >> - Dan >> >> On Fri, Mar 16, 2018 at 10:55 AM, Clifford Resnick < >> [email protected]> wrote: >> >>> The problem is, AFIK, that replication count is not necessarily the >>> distribution count, so you can't guarantee all tablet servers will have a >>> copy. >>> >>> On Mar 16, 2018 1:41 PM, Boris Tyukin <[email protected]> wrote: >>> I'm new to Kudu but we are also going to use Impala mostly with Kudu. We >>> have a few tables that are small but used a lot. My plan is replicate them >>> more than 3 times. When you create a kudu table, you can specify number of >>> replicated copies (3 by default) and I guess you can put there a number, >>> corresponding to your node count in cluster. The downside, you cannot >>> change that number unless you recreate a table. >>> >>> On Fri, Mar 16, 2018 at 10:42 AM, Cliff Resnick <[email protected]> >>> wrote: >>> >>>> We will soon be moving our analytics from AWS Redshift to Impala/Kudu. >>>> One Redshift feature that we will miss is its ALL Distribution, where a >>>> copy of a table is maintained on each server. We define a number of >>>> metadata tables this way since they are used in nearly every query. We are >>>> considering using parquet in HDFS cache for these, and Kudu would be a much >>>> better fit for the update semantics but we are worried about the additional >>>> contention. I'm wondering if having a Broadcast, or ALL, tablet >>>> replication might be an easy feature to add to Kudu? >>>> >>>> -Cliff >>>> >>> >>> >> > -- Todd Lipcon Software Engineer, Cloudera
