Hello, I'm setting the AWS cluster for Nutch 1.10 to crawl about 100M+ pages from www.
Could some one please advice about choosing aws instance, storage: - We don't use EMR - Which aws instance type is best for us? - Should we use EBS for storage? - Should we use a dedicated DNS or install local DNS on each crawl machines is good enough? - Which is best for Nutch 1.10: hadoop 1.x or hadoop 2.x? Thanks TIen

