Hello Sebastian,
We have always used vanilla Apache Hadoop on our own physical servers that
are running on the latest Debian, which also runs on ARM. It will run HDFS
and YARN and any other custom job you can think of. It has snappy
compression, which is a massive improvement for large data
Hi,
does anybody have a recommendation for a free and production-ready Hadoop setup?
- HDFS + YARN
- run Nutch but also other MapReduce and Spark-on-Yarn jobs
- with native library support: libhadoop.so and compression
libs (bzip2, zstd, snappy)
- must run on AWS EC2 instances and read/write
Hi Kieran,
thanks for the feedback!
> I didn't realise that it is intended for users to edit the bin/crawl file.
Maybe we should add a comment to encourage users to adapt the shell scripts
to their needs. Almost 10 years ago, the Java "Crawl" class was replaced
by the scripts because a shell
Hi Sebastian,
Thank you for your response. It was a great help.
I didn't realise that it is intended for users to edit the bin/crawl file.
Although looking at it now it's clear.
This makes it easier for me to access the html content within my plugin,
thanks again
On Fri, May 28, 2021 at 8:36 PM
4 matches
Mail list logo