We have a setup of Hbase on an AWS cluster with centos 7. The setup was
done using cloudera-manager. Nutch can be then run in standalone mode or
over yarn by running the deployment jar in deploy folder.
I have not tested with S3 directly but your can always backup the hbase
data daily to S3.
Hope this helps.Let me know if you have further queries.
On Sun, Aug 6, 2017 at 5:59 AM, Michael Chen <
> I'm trying to set up Nutch 2.x on AWS EC2 clusters, and I was wondering if
> anyone know of a "best set up" for it. The hadoop and hbase version in
> current EMR releases doesn't seem to work with Nutch 2.x. Does it sound
> like a good idea to manually set up Hadoop clusters and then run Nutch on
> it? Will I be able to use S3 as data storage so that I can keep the data
> when EC2 instance stops?
> Any suggestions would be very much helpful!
> Thanks in advance,