John Martyniak wrote:
Thanks for all of the input, I was leaning towards setting up hadoop cluster for this, as the data set is getting quite large and creating indexes etc, is taking longer and longer.

My other option would be to setup several Virtual Private Servers across the two boxes and then run hadoop cluster on all of the VPS, so in effect I could create 4, 6, 8 nodes running on two physical boxes, has anyone tried something like this. Would this reduce the amount of Disk contention? or would it make any difference and it is better just to have a two node cluster.

VPS wouldn't help with IO contention - after all, you're still using the same single physical disk on the machine, no matter how many VPS-es run on it. VPS may help in testing a distributed setup if all you have at the moment is a single physical machine.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to