John Martyniak wrote:
Thanks for all of the input, I was leaning towards setting up hadoop
cluster for this, as the data set is getting quite large and creating
indexes etc, is taking longer and longer.
My other option would be to setup several Virtual Private Servers across
the two boxes and then run hadoop cluster on all of the VPS, so in
effect I could create 4, 6, 8 nodes running on two physical boxes, has
anyone tried something like this. Would this reduce the amount of Disk
contention? or would it make any difference and it is better just to
have a two node cluster.
VPS wouldn't help with IO contention - after all, you're still using the
same single physical disk on the machine, no matter how many VPS-es run
on it. VPS may help in testing a distributed setup if all you have at
the moment is a single physical machine.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com