This is interesting. Would really appreciate it if you could share what
exactly did you change in* core-site.xml *and *yarn-site.xml.*
On Wed, May 22, 2019 at 9:14 AM Gourav Sengupta
wrote:
> just wondering what is the advantage of doing this?
>
> Regards
> Gourav Sengupta
>
> On Wed, May 22, 20
just wondering what is the advantage of doing this?
Regards
Gourav Sengupta
On Wed, May 22, 2019 at 3:01 AM Huizhe Wang wrote:
> Hi Hari,
> Thanks :) I tried to do it as u said. It works ;)
>
>
> Hariharan 于2019年5月20日 周一下午3:54写道:
>
>> Hi Huizhe,
>>
>> You can set the "fs.defaultFS" field in cor
Hi Hari,
Thanks :) I tried to do it as u said. It works ;)
Hariharan 于2019年5月20日 周一下午3:54写道:
> Hi Huizhe,
>
> You can set the "fs.defaultFS" field in core-site.xml to some path on s3.
> That way your spark job will use S3 for all operations that need HDFS.
> Intermediate data will still be store
There is a kind of check in the *yarn-site.xml*
*yarn.nodemanager.remote-app-log-dir
/var/yarn/logs*
**
Using *hdfs://:9000* as* fs.defaultFS* in *core-site.xml* you have to *hdfs
dfs -mkdir /var/yarn/logs*
Using *S3://* as * fs.defaultFS*...
Take care of *.dir* properties in* hdfs-site
Hi Huizhe,
You can set the "fs.defaultFS" field in core-site.xml to some path on s3.
That way your spark job will use S3 for all operations that need HDFS.
Intermediate data will still be stored on local disk though.
Thanks,
Hari
On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari
wrote:
> While
While spark can read from S3 directly in EMR, I believe it still needs the
HDFS to perform shuffles and to write intermediate data into disk when
doing jobs (I.e. when the in memory need stop spill over to disk)
For these operations, Spark does need a distributed file system - You could
use someth
I am afraid not, because yarn needs dfs.
Huizhe Wang 于2019年5月20日周一 上午9:50写道:
> Hi,
>
> I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
> using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and
> DataNode. I got an error when using yarn cluster mode. Co
Hi,
I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and
using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and
DataNode. I got an error when using yarn cluster mode. Could I using yarn
without start DFS, how could I use this mode?
Yours,
Jane