Hello list,
Although a lot of similar discussions have been done here, I still
seek some of your able guidance. Till now I have worked only on small or
mid-sized clusters. But this time situation is a bit different. I have to
cpollect a lot of legacy data, stored over last few decades. This data is
on tape drives and I have to collect it from there and store in my cluster.
The size could go somewhere near 24 Petabytes (inclusive of replication).
Now, I need some help to kick this off, like what could be the optimal
config for my NN+JT, DN+TT+RS, HMaster, ZK machines?
What should be the no. of slaves and ZK peers nodes keeping this config in
mind?
What is the optimal network config for a cluster of this size.
Which kind of disks would be more efficient?
Please do provide me some guidance as I want to have some expert comments
before moving ahead. Many thanks.
Regards,
Mohammad Tariq