I'm trying to set up Nutch 2.x on AWS EC2 clusters, and I was wondering if anyone know of a "best set up" for it. The hadoop and hbase version in current EMR releases doesn't seem to work with Nutch 2.x. Does it sound like a good idea to manually set up Hadoop clusters and then run Nutch on it? Will I be able to use S3 as data storage so that I can keep the data when EC2 instance stops?

Any suggestions would be very much helpful!

Thanks in advance,


Reply via email to