Hello,

I'm setting the AWS cluster for Nutch 1.10 to crawl about 100M+ pages from
www.

Could some one please advice about choosing aws instance, storage:
- We don't use EMR
- Which aws instance type is best for us?
- Should we use EBS for storage?
- Should we use a dedicated DNS or install local DNS on each crawl machines
is good enough?
- Which is best for Nutch 1.10: hadoop 1.x or hadoop 2.x?

Thanks
TIen

Reply via email to