Hi, I would like to use the dataset used in the Big Data Benchmark <https://amplab.cs.berkeley.edu/benchmark/> on my own cluster, to run some tests between Hadoop and Spark. The dataset should be available at s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix], in the amazon cluster. Is there a way I can download this without being a user of the Amazon cluster? I tried "bin/hadoop distcp s3n://123:456@big-data-benchmark/pavlo/text/tiny/* ./" but it asks for an AWS Access Key ID and Secret Access Key which I do not have.
Thanks in advance, Tom -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Retrieve-dataset-of-Big-Data-Benchmark-tp9821.html Sent from the Apache Spark User List mailing list archive at Nabble.com.