Re: A few questions for using Kudu

Dan Burkert Thu, 15 Mar 2018 10:06:28 -0700

Hi, answers inline:

On Thu, Mar 15, 2018 at 3:12 AM, 张晓宁 <[email protected]> wrote:


> I have a few questions for using kudu:
>
> 1.       As more and more data inserted to kudu, the performance
> decrease. After continuous data insertion for about 30 minutes, the TPS
> performance decreased with 20%, and after 1-hour data insertion, the
> performance decreased with 40%. Is this a known issue?
>
This is expected if you are inserting data in random order.  If you try
another benchmark where you insert data in primary key sorted order, you'll
see that the performance will be much higher, and more consistent.  If you
have a heavy insert workload, this kind of optimization is critical.  The
table's partitioning and primary key can often be designed to make this
happen naturally, but it's a dataset dependent thing, so without more
specifics about your data it's difficult to give more precise advice.


> 2.       When setting the replica number to be 1, totally I will have 2
> copy of data(1 master data + 1 replica data), is this true?
>
That's incorrect.  The master node does not hold any table data.  If you
set the number of replicas to be 1, you will lose data if you lose the
tablet server which holds the replica.  We always recommend production
workloads set number of replicas to 3 in order to have fault tolerance.


> 3.       I want to install kudu 1.6, but our machine cannot connect to
> public internet. Will kudu team build out the rpm packages for 1.6 version?
>

The Apache Kudu project does not provide binary artifacts for releases,
however vendors can and do.  For instance you can find Cloudera's RPMs
corresponding to Kudu 1.6 here
<https://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5.14/RPMS/x86_64/>.

- Dan

Re: A few questions for using Kudu

Reply via email to