We are using CDH 5.12 and using HDFS for our primary data storage and Impala for querying them. Our worker node hosts both HDFS datanode and Impalad services. We're starting to move some of our data into KUDU and would like to understand community experiment and recommendation on disk/machine allocation and pro/cons for each.
Install KUDU tablet server on each worker node vs separate machine Separate physical disks for KUDU tablet server on same machine vs sharing the disk with data nodes SSD vs spinning disks Some more questions on separate note but kinda related to the POC We have a small table as a first candidate for KUDU ( couple of G before replication ) . Does KUDU tries to distribute data across tablet servers for each table i.e. slow performance with too much sparse data. i.e. for small table what is better fewer disk partitions ( host-partition ) vs evenly distributed across worker nodes. Thanks, Sunil Parmar