We are using CDH 5.12 and using HDFS for our primary data storage and
Impala for querying them. Our worker node hosts both HDFS datanode and
Impalad services. We're starting to move some of our data into KUDU and
would like to understand community experiment and recommendation on
disk/machine allocation and pro/cons for each.

Install KUDU tablet server on each worker node vs separate machine
Separate physical disks for KUDU tablet server on same machine vs sharing
the disk with data nodes
SSD vs spinning disks

Some more questions on separate note but kinda related to the POC
We have a small table as a first candidate for KUDU ( couple of G before
replication ) . Does KUDU tries to distribute data across tablet servers
for each table i.e. slow performance with too much sparse data. i.e. for
small table what is better fewer disk partitions ( host-partition ) vs
evenly distributed across worker nodes.

Thanks,
Sunil Parmar

Reply via email to