Re: solution to write data to S3?

Matei Zaharia Wed, 23 Oct 2013 18:48:57 -0700

Yes, it does read it in parallel, based on the input format you have (e.g. text 
file, SequenceFile, etc). By default it uses 32 MB blocks. All of this is just 
going through Hadoop's S3 library so anything that Hadoop does can be done here.


Matei

On Oct 23, 2013, at 6:36 PM, Ankur Chauhan <[email protected]> wrote:

> Just a follow up question. How does the spark task/job/master know how to 
> split the file that is in s3. In most cases, it would be better to fetch 
> different parts of the file in parallel. Is that something that is done by 
> the workers?
> 
> On Oct 23, 2013, at 18:28, Ayush Mishra <[email protected]> wrote:
> 
>> You can check 
>> http://blog.knoldus.com/2013/09/09/running-standalone-scala-job-on-amazon-ec2-spark-cluster/.
>> 
>> 
>> On Thu, Oct 24, 2013 at 6:54 AM, Nan Zhu <[email protected]> wrote:
>> Great!!!
>> 
>> 
>> On Wed, Oct 23, 2013 at 9:21 PM, Matei Zaharia <[email protected]> 
>> wrote:
>> Yes, take a look at 
>> http://spark.incubator.apache.org/docs/latest/ec2-scripts.html#accessing-data-in-s3
>>  
>> Matei
>> 
>> 
>> On Oct 23, 2013, at 6:17 PM, Nan Zhu <[email protected]> wrote:
>> 
>>> Hi, all
>>> 
>>> Is there any solution running Spark with Amazon S3? 
>>> 
>>> Best,
>>> 
>>> Nan
>> 
>> 
>>

Re: solution to write data to S3?

Reply via email to