This is interesting. Would really appreciate it if you could share what
exactly did you change in* core-site.xml *and *yarn-site.xml.*

On Wed, May 22, 2019 at 9:14 AM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> just wondering what is the advantage of doing this?
>
> Regards
> Gourav Sengupta
>
> On Wed, May 22, 2019 at 3:01 AM Huizhe Wang <wang.h...@husky.neu.edu>
> wrote:
>
>> Hi Hari,
>> Thanks :) I tried to do it as u said. It works ;)
>>
>>
>> Hariharan <hariharan...@gmail.com>于2019年5月20日 周一下午3:54写道:
>>
>>> Hi Huizhe,
>>>
>>> You can set the "fs.defaultFS" field in core-site.xml to some path on
>>> s3. That way your spark job will use S3 for all operations that need HDFS.
>>> Intermediate data will still be stored on local disk though.
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari <
>>> abdealikoth...@gmail.com> wrote:
>>>
>>>> While spark can read from S3 directly in EMR, I believe it still needs
>>>> the HDFS to perform shuffles and to write intermediate data into disk when
>>>> doing jobs (I.e. when the in memory need stop spill over to disk)
>>>>
>>>> For these operations, Spark does need a distributed file system - You
>>>> could use something like EMRFS (which is like a HDFS backed by S3) on
>>>> Amazon.
>>>>
>>>> The issue could be something else too - so a stacktrace or error
>>>> message could help in understanding the problem.
>>>>
>>>>
>>>>
>>>> On Mon, May 20, 2019, 07:20 Huizhe Wang <wang.h...@husky.neu.edu>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS
>>>>> and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode
>>>>> and DataNode. I got an error when using yarn cluster mode. Could I using
>>>>> yarn without start DFS, how could I use this mode?
>>>>>
>>>>> Yours,
>>>>> Jane
>>>>>
>>>>

Reply via email to