Hi Burak,Thanks,  I will then start benchmarking the cluster.

> Date: Wed, 27 Aug 2014 11:52:05 -0700
> From: bya...@stanford.edu
> To: ssti...@live.com
> CC: user@spark.apache.org
> Subject: Re: Amplab: big-data-benchmark
> 
> Hi Sameer,
> 
> I've faced this issue before. They don't show up on 
> http://s3.amazonaws.com/big-data-benchmark/. But you can directly use: 
> `sc.textFile("s3n://big-data-benchmark/pavlo/text/tiny/crawl")`
> The gotcha is that you also need to supply which dataset you want: crawl, 
> uservisits, or rankings in lower case after the format and size you want them 
> in.
> They should be there.
> 
> Best,
> Burak
> 
> ----- Original Message -----
> From: "Sameer Tilak" <ssti...@live.com>
> To: user@spark.apache.org
> Sent: Wednesday, August 27, 2014 11:42:28 AM
> Subject: Amplab: big-data-benchmark
> 
> Hi All,
> I am planning to run amplab benchmark suite to evaluate the performance of 
> our cluster. I looked at: https://amplab.cs.berkeley.edu/benchmark/ and it 
> mentions about data avallability at:
> s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]where
>  /tiny/, /1node/ and /5nodes/ are options for suffix. However, I am not able 
> to doanload these datasets directly. Here is what I see. I read that they can 
> be used directly by doing : sc.textFile(s3:/....). However, I wanted to make 
> sure that my understanding is correct. Here is what I see at 
> http://s3.amazonaws.com/big-data-benchmark/
> I do not see anything for sequence or text-deflate.
> I see sequence-snappy dataset:
> <Contents><Key>pavlo/sequence-snappy/5nodes/crawl/000738_0</Key><LastModified>2013-05-27T21:26:40.000Z</LastModified><ETag>"a978d18721d5a533d38a88f558461644"</ETag><Size>42958735</Size><StorageClass>STANDARD</StorageClass></Contents>
> For text, I get the following error:
> <Error><Code>NoSuchKey</Code><Message>The specified key does not 
> exist.</Message><Key>pavlo/text/1node/crawl</Key><RequestId>166D239D38399526</RequestId><HostId>4Bg8BHomWqJ6BXOkx/3fQZhN5Uw1TtCn01uQzm+1qYffx2s/oPV+9sGoAWV2thCI</HostId></Error>
> 
> Please let me know if there is a way to readily download the dataset and view 
> it.                                       
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                          

Reply via email to