Re: Recommendation for free and production-ready Hadoop setup to run Nutch

2021-06-03 Thread Nicholas Roberts
Does the Apache Bigtop project not meet the requirements of a free distribution? https://github.com/apache/bigtop What is the status of that project?

DuplexWeb-Google - GoogleBot Crawler For Duplex / Google Assistant

2021-06-03 Thread lewis john mcgibbney
Some interesting content for a short read :) https://www.seroundtable.com/duplexweb-google-bot-31522.html?utm_source=search_engine_roundtable_campaign=ser_newsletter_2021-06-03_medium=email -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc

Re: Recommendation for free and production-ready Hadoop setup to run Nutch

2021-06-03 Thread lewis john mcgibbney
Hi Sebastian, If we did not know how long our crawl infrastructure was required for (i.e. the customer may revoke or extend the contract with very little notice) we always chose AWS EMR. Specifically to reduce costs we made sure that all worker/task nodes were run on spot instances