RE: Not able to write output to local filsystem from Standalone mode.

2016-05-27 Thread Yong Zhang
I am not familiar with that particular piece of code. But the spark's 
concurrency comes from Multi-thread. One executor will use multi threads to 
process tasks, and these tasks share the JVM memory of the executor. So it 
won't be surprised that Spark needs some blocking/sync for the memory some 
places.
Yong

> Date: Fri, 27 May 2016 20:21:46 +0200
> Subject: Re: Not able to write output to local filsystem from Standalone mode.
> From: ja...@japila.pl
> To: java8...@hotmail.com
> CC: math...@closetwork.org; stutiawas...@hcl.com; user@spark.apache.org
> 
> Hi Yong,
> 
> It makes sense...almost. :) I'm not sure how relevant it is, but just
> today was reviewing BlockInfoManager code with the locks for reading
> and writing, and my understanding of the code shows that Spark if fine
> when there are multiple attempts for writes of new memory blocks
> (pages) with a mere synchronized code block. See
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L324-L325
> 
> With that, it's not that simple to say "that just makes sense".
> 
> p.s. The more I know the less things "just make sense to me".
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
> On Fri, May 27, 2016 at 3:42 AM, Yong Zhang <java8...@hotmail.com> wrote:
> > That just makes sense, doesn't it?
> >
> > The only place will be driver. If not, the executor will be having
> > contention by whom should create the directory in this case.
> >
> > Only the coordinator (driver in this case) is the best place for doing it.
> >
> > Yong
> >
> > ________
> > From: math...@closetwork.org
> > Date: Wed, 25 May 2016 18:23:02 +
> > Subject: Re: Not able to write output to local filsystem from Standalone
> > mode.
> > To: ja...@japila.pl
> > CC: stutiawas...@hcl.com; user@spark.apache.org
> >
> >
> > Experience. I don't use Mesos or Yarn or Hadoop, so I don't know.
> >
> >
> > On Wed, May 25, 2016 at 2:51 AM Jacek Laskowski <ja...@japila.pl> wrote:
> >
> > Hi Mathieu,
> >
> > Thanks a lot for the answer! I did *not* know it's the driver to
> > create the directory.
> >
> > You said "standalone mode", is this the case for the other modes -
> > yarn and mesos?
> >
> > p.s. Did you find it in the code or...just experienced before? #curious
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark http://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org>
> > wrote:
> >> In standalone mode, executor assume they have access to a shared file
> >> system. The driver creates the directory and the executor write files, so
> >> the executors end up not writing anything since there is no local
> >> directory.
> >>
> >> On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com>
> >> wrote:
> >>>
> >>> hi Jacek,
> >>>
> >>> Parent directory already present, its my home directory. Im using Linux
> >>> (Redhat) machine 64 bit.
> >>> Also I noticed that "test1" folder is created in my master with
> >>> subdirectory as "_temporary" which is empty. but on slaves, no such
> >>> directory is created under /home/stuti.
> >>>
> >>> Thanks
> >>> Stuti
> >>> 
> >>> From: Jacek Laskowski [ja...@japila.pl]
> >>> Sent: Tuesday, May 24, 2016 5:27 PM
> >>> To: Stuti Awasthi
> >>> Cc: user
> >>> Subject: Re: Not able to write output to local filsystem from Standalone
> >>> mode.
> >>>
> >>> Hi,
> >>>
> >>> What happens when you create the parent directory /home/stuti? I think
> >>> the
> >>> failure is due to missing parent directories. What's the OS?
> >>>
> >>> Jacek
> >>>
> >>> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> I have 3 nodes Spark 1.6 Standalone mode cluster w

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-27 Thread Jacek Laskowski
Hi Yong,

It makes sense...almost. :) I'm not sure how relevant it is, but just
today was reviewing BlockInfoManager code with the locks for reading
and writing, and my understanding of the code shows that Spark if fine
when there are multiple attempts for writes of new memory blocks
(pages) with a mere synchronized code block. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L324-L325

With that, it's not that simple to say "that just makes sense".

p.s. The more I know the less things "just make sense to me".

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, May 27, 2016 at 3:42 AM, Yong Zhang <java8...@hotmail.com> wrote:
> That just makes sense, doesn't it?
>
> The only place will be driver. If not, the executor will be having
> contention by whom should create the directory in this case.
>
> Only the coordinator (driver in this case) is the best place for doing it.
>
> Yong
>
> 
> From: math...@closetwork.org
> Date: Wed, 25 May 2016 18:23:02 +0000
> Subject: Re: Not able to write output to local filsystem from Standalone
> mode.
> To: ja...@japila.pl
> CC: stutiawas...@hcl.com; user@spark.apache.org
>
>
> Experience. I don't use Mesos or Yarn or Hadoop, so I don't know.
>
>
> On Wed, May 25, 2016 at 2:51 AM Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi Mathieu,
>
> Thanks a lot for the answer! I did *not* know it's the driver to
> create the directory.
>
> You said "standalone mode", is this the case for the other modes -
> yarn and mesos?
>
> p.s. Did you find it in the code or...just experienced before? #curious
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org>
> wrote:
>> In standalone mode, executor assume they have access to a shared file
>> system. The driver creates the directory and the executor write files, so
>> the executors end up not writing anything since there is no local
>> directory.
>>
>> On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com>
>> wrote:
>>>
>>> hi Jacek,
>>>
>>> Parent directory already present, its my home directory. Im using Linux
>>> (Redhat) machine 64 bit.
>>> Also I noticed that "test1" folder is created in my master with
>>> subdirectory as "_temporary" which is empty. but on slaves, no such
>>> directory is created under /home/stuti.
>>>
>>> Thanks
>>> Stuti
>>> 
>>> From: Jacek Laskowski [ja...@japila.pl]
>>> Sent: Tuesday, May 24, 2016 5:27 PM
>>> To: Stuti Awasthi
>>> Cc: user
>>> Subject: Re: Not able to write output to local filsystem from Standalone
>>> mode.
>>>
>>> Hi,
>>>
>>> What happens when you create the parent directory /home/stuti? I think
>>> the
>>> failure is due to missing parent directories. What's the OS?
>>>
>>> Jacek
>>>
>>> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:
>>>
>>> Hi All,
>>>
>>> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2
>>> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch
>>> shell , read the input file from local filesystem and perform
>>> transformation
>>> successfully. When I try to write my output in local filesystem path then
>>> I
>>> receive below error .
>>>
>>>
>>>
>>> I tried to search on web and found similar Jira :
>>> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows
>>> resolved for Spark 1.3+ but already people have posted the same issue
>>> still
>>> persists in latest versions.
>>>
>>>
>>>
>>> ERROR
>>>
>>> scala> data.saveAsTextFile("/home/stuti/test1")
>>>
>>> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2,
>>> server1): java.io.IOException: The temporary job-output directory
>>> file:/home/stuti/test1/_temporary doesn't exist!
>>>
>>> at
>>>
>>> org.apache.hadoop.ma

RE: Not able to write output to local filsystem from Standalone mode.

2016-05-26 Thread Yong Zhang
That just makes sense, doesn't it?
The only place will be driver. If not, the executor will be having contention 
by whom should create the directory in this case.
Only the coordinator (driver in this case) is the best place for doing it.
Yong

From: math...@closetwork.org
Date: Wed, 25 May 2016 18:23:02 +
Subject: Re: Not able to write output to local filsystem from Standalone mode.
To: ja...@japila.pl
CC: stutiawas...@hcl.com; user@spark.apache.org

Experience. I don't use Mesos or Yarn or Hadoop, so I don't know.

On Wed, May 25, 2016 at 2:51 AM Jacek Laskowski <ja...@japila.pl> wrote:
Hi Mathieu,



Thanks a lot for the answer! I did *not* know it's the driver to

create the directory.



You said "standalone mode", is this the case for the other modes -

yarn and mesos?



p.s. Did you find it in the code or...just experienced before? #curious



Pozdrawiam,

Jacek Laskowski



https://medium.com/@jaceklaskowski/

Mastering Apache Spark http://bit.ly/mastering-apache-spark

Follow me at https://twitter.com/jaceklaskowski





On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org> wrote:

> In standalone mode, executor assume they have access to a shared file

> system. The driver creates the directory and the executor write files, so

> the executors end up not writing anything since there is no local directory.

>

> On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com> wrote:

>>

>> hi Jacek,

>>

>> Parent directory already present, its my home directory. Im using Linux

>> (Redhat) machine 64 bit.

>> Also I noticed that "test1" folder is created in my master with

>> subdirectory as "_temporary" which is empty. but on slaves, no such

>> directory is created under /home/stuti.

>>

>> Thanks

>> Stuti

>> 

>> From: Jacek Laskowski [ja...@japila.pl]

>> Sent: Tuesday, May 24, 2016 5:27 PM

>> To: Stuti Awasthi

>> Cc: user

>> Subject: Re: Not able to write output to local filsystem from Standalone

>> mode.

>>

>> Hi,

>>

>> What happens when you create the parent directory /home/stuti? I think the

>> failure is due to missing parent directories. What's the OS?

>>

>> Jacek

>>

>> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:

>>

>> Hi All,

>>

>> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2

>> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch

>> shell , read the input file from local filesystem and perform transformation

>> successfully. When I try to write my output in local filesystem path then I

>> receive below error .

>>

>>

>>

>> I tried to search on web and found similar Jira :

>> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows

>> resolved for Spark 1.3+ but already people have posted the same issue still

>> persists in latest versions.

>>

>>

>>

>> ERROR

>>

>> scala> data.saveAsTextFile("/home/stuti/test1")

>>

>> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2,

>> server1): java.io.IOException: The temporary job-output directory

>> file:/home/stuti/test1/_temporary doesn't exist!

>>

>> at

>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)

>>

>> at

>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)

>>

>> at

>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)

>>

>> at

>> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)

>>

>> at

>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)

>>

>> at

>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)

>>

>> at

>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

>>

>> at org.apache.spark.scheduler.Task.run(Task.scala:89)

>>

>> at

>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

>>

>> at

>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

>>

>> at

>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

>>

>> at java.lang.Thread.run(Thread.java:7

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-25 Thread Mathieu Longtin
Experience. I don't use Mesos or Yarn or Hadoop, so I don't know.


On Wed, May 25, 2016 at 2:51 AM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Mathieu,
>
> Thanks a lot for the answer! I did *not* know it's the driver to
> create the directory.
>
> You said "standalone mode", is this the case for the other modes -
> yarn and mesos?
>
> p.s. Did you find it in the code or...just experienced before? #curious
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org>
> wrote:
> > In standalone mode, executor assume they have access to a shared file
> > system. The driver creates the directory and the executor write files, so
> > the executors end up not writing anything since there is no local
> directory.
> >
> > On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com>
> wrote:
> >>
> >> hi Jacek,
> >>
> >> Parent directory already present, its my home directory. Im using Linux
> >> (Redhat) machine 64 bit.
> >> Also I noticed that "test1" folder is created in my master with
> >> subdirectory as "_temporary" which is empty. but on slaves, no such
> >> directory is created under /home/stuti.
> >>
> >> Thanks
> >> Stuti
> >> 
> >> From: Jacek Laskowski [ja...@japila.pl]
> >> Sent: Tuesday, May 24, 2016 5:27 PM
> >> To: Stuti Awasthi
> >> Cc: user
> >> Subject: Re: Not able to write output to local filsystem from Standalone
> >> mode.
> >>
> >> Hi,
> >>
> >> What happens when you create the parent directory /home/stuti? I think
> the
> >> failure is due to missing parent directories. What's the OS?
> >>
> >> Jacek
> >>
> >> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com>
> wrote:
> >>
> >> Hi All,
> >>
> >> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2
> >> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch
> >> shell , read the input file from local filesystem and perform
> transformation
> >> successfully. When I try to write my output in local filesystem path
> then I
> >> receive below error .
> >>
> >>
> >>
> >> I tried to search on web and found similar Jira :
> >> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows
> >> resolved for Spark 1.3+ but already people have posted the same issue
> still
> >> persists in latest versions.
> >>
> >>
> >>
> >> ERROR
> >>
> >> scala> data.saveAsTextFile("/home/stuti/test1")
> >>
> >> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID
> 2,
> >> server1): java.io.IOException: The temporary job-output directory
> >> file:/home/stuti/test1/_temporary doesn't exist!
> >>
> >> at
> >>
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> >>
> >> at
> >>
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
> >>
> >> at
> >>
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> >>
> >> at
> >> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
> >>
> >> at
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
> >>
> >> at
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
> >>
> >> at
> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> >>
> >> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> >>
> >> at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> >>
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-25 Thread Jacek Laskowski
Hi Mathieu,

Thanks a lot for the answer! I did *not* know it's the driver to
create the directory.

You said "standalone mode", is this the case for the other modes -
yarn and mesos?

p.s. Did you find it in the code or...just experienced before? #curious

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, May 24, 2016 at 4:04 PM, Mathieu Longtin <math...@closetwork.org> wrote:
> In standalone mode, executor assume they have access to a shared file
> system. The driver creates the directory and the executor write files, so
> the executors end up not writing anything since there is no local directory.
>
> On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com> wrote:
>>
>> hi Jacek,
>>
>> Parent directory already present, its my home directory. Im using Linux
>> (Redhat) machine 64 bit.
>> Also I noticed that "test1" folder is created in my master with
>> subdirectory as "_temporary" which is empty. but on slaves, no such
>> directory is created under /home/stuti.
>>
>> Thanks
>> Stuti
>> 
>> From: Jacek Laskowski [ja...@japila.pl]
>> Sent: Tuesday, May 24, 2016 5:27 PM
>> To: Stuti Awasthi
>> Cc: user
>> Subject: Re: Not able to write output to local filsystem from Standalone
>> mode.
>>
>> Hi,
>>
>> What happens when you create the parent directory /home/stuti? I think the
>> failure is due to missing parent directories. What's the OS?
>>
>> Jacek
>>
>> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:
>>
>> Hi All,
>>
>> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2
>> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch
>> shell , read the input file from local filesystem and perform transformation
>> successfully. When I try to write my output in local filesystem path then I
>> receive below error .
>>
>>
>>
>> I tried to search on web and found similar Jira :
>> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows
>> resolved for Spark 1.3+ but already people have posted the same issue still
>> persists in latest versions.
>>
>>
>>
>> ERROR
>>
>> scala> data.saveAsTextFile("/home/stuti/test1")
>>
>> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2,
>> server1): java.io.IOException: The temporary job-output directory
>> file:/home/stuti/test1/_temporary doesn't exist!
>>
>> at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>
>> at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>
>> at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>
>> at
>> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
>>
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
>>
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
>>
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> What is the best way to resolve this issue if suppose I don’t want to have
>> Hadoop installed OR is it mandatory to have Hadoop to write the output from
>> Standalone cluster mode.
>>
>>
>>
>> Please suggest.
>>
>>
>>
>> Thanks 
>>
>> Stuti Awasthi
>>
>>
>>
>>
>>
>> ::DISCLAIMER::
>>
>> 
>>
>> The contents of this e-mail and any attachment(s) are confidential and
>> intended for the n

RE: Not able to write output to local filsystem from Standalone mode.

2016-05-24 Thread Stuti Awasthi



Thanks Mathieu,
So either I must have shared filesystem OR Hadoop as filesystem in order to write data from Standalone mode cluster setup environment. Thanks for your input.


Regards
Stuti Awasthi



From: Mathieu Longtin [math...@closetwork.org]
Sent: Tuesday, May 24, 2016 7:34 PM
To: Stuti Awasthi; Jacek Laskowski
Cc: user
Subject: Re: Not able to write output to local filsystem from Standalone mode.




In standalone mode, executor assume they have access to a shared file system. The driver creates the directory and the executor write files, so the executors end up not writing anything since there is no local directory.


On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com> wrote:



hi Jacek,


Parent directory already present, its my home directory. Im using Linux (Redhat) machine 64 bit.
Also I noticed that "test1" folder is created in my master with subdirectory as "_temporary" which is empty. but on slaves, no such directory is created under /home/stuti.


Thanks
Stuti 


From: Jacek Laskowski [ja...@japila.pl]
Sent: Tuesday, May 24, 2016 5:27 PM
To: Stuti Awasthi
Cc: user
Subject: Re: Not able to write output to local filsystem from Standalone mode.












Hi, 
What happens when you create the parent directory /home/stuti? I think the failure is due to missing parent directories. What's the OS?

Jacek
On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:



Hi All,
I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2 Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch shell , read the input file from local filesystem and perform transformation successfully. When
 I try to write my output in local filesystem path then I receive below error .
 
I tried to search on web and found similar Jira : 
https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows resolved for Spark 1.3+ but already people have posted the same issue still persists in latest versions.
 
ERROR
scala> data.saveAsTextFile("/home/stuti/test1")
16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, server1): java.io.IOException: The temporary job-output directory file:/home/stuti/test1/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
 
What is the best way to resolve this issue if suppose I don’t want to have Hadoop installed OR is it mandatory to have Hadoop to write the output from Standalone cluster mode.
 
Please suggest.
 
Thanks 
Stuti Awasthi
 




::DISCLAIMER::

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,

lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents

(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.

Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the

views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,

distribution and / or publication of this message without the prior written consent of authorized representative of

HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.

Before opening any email and/or attachments, please check them for viruses and other defects.
















---

Re: Not able to write output to local filsystem from Standalone mode.

2016-05-24 Thread Mathieu Longtin
In standalone mode, executor assume they have access to a shared file
system. The driver creates the directory and the executor write files, so
the executors end up not writing anything since there is no local directory.

On Tue, May 24, 2016 at 8:01 AM Stuti Awasthi <stutiawas...@hcl.com> wrote:

> hi Jacek,
>
> Parent directory already present, its my home directory. Im using Linux
> (Redhat) machine 64 bit.
> Also I noticed that "test1" folder is created in my master with
> subdirectory as "_temporary" which is empty. but on slaves, no such
> directory is created under /home/stuti.
>
> Thanks
> Stuti
> --
> *From:* Jacek Laskowski [ja...@japila.pl]
> *Sent:* Tuesday, May 24, 2016 5:27 PM
> *To:* Stuti Awasthi
> *Cc:* user
> *Subject:* Re: Not able to write output to local filsystem from
> Standalone mode.
>
> Hi,
>
> What happens when you create the parent directory /home/stuti? I think the
> failure is due to missing parent directories. What's the OS?
>
> Jacek
> On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:
>
> Hi All,
>
> I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2
> Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch
> shell , read the input file from local filesystem and perform
> transformation successfully. When I try to write my output in local
> filesystem path then I receive below error .
>
>
>
> I tried to search on web and found similar Jira :
> https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows
> resolved for Spark 1.3+ but already people have posted the same issue still
> persists in latest versions.
>
>
>
> *ERROR*
>
> scala> data.saveAsTextFile("/home/stuti/test1")
>
> 16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2,
> server1): java.io.IOException: The temporary job-output directory
> file:/home/stuti/test1/_temporary doesn't exist!
>
> at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>
> at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>
> at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>
> at
> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
>
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
>
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
>
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> What is the best way to resolve this issue if suppose I don’t want to have
> Hadoop installed OR is it mandatory to have Hadoop to write the output from
> Standalone cluster mode.
>
>
>
> Please suggest.
>
>
>
> Thanks 
>
> Stuti Awasthi
>
>
>
>
>
> ::DISCLAIMER::
>
> 
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
> 
>
> - To
> unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org

-- 
Mathieu Longtin
1-514-803-8977


RE: Not able to write output to local filsystem from Standalone mode.

2016-05-24 Thread Stuti Awasthi



hi Jacek,


Parent directory already present, its my home directory. Im using Linux (Redhat) machine 64 bit.
Also I noticed that "test1" folder is created in my master with subdirectory as "_temporary" which is empty. but on slaves, no such directory is created under /home/stuti.


Thanks
Stuti 


From: Jacek Laskowski [ja...@japila.pl]
Sent: Tuesday, May 24, 2016 5:27 PM
To: Stuti Awasthi
Cc: user
Subject: Re: Not able to write output to local filsystem from Standalone mode.




Hi, 
What happens when you create the parent directory /home/stuti? I think the failure is due to missing parent directories. What's the OS?

Jacek
On 24 May 2016 11:27 a.m., "Stuti Awasthi" <stutiawas...@hcl.com> wrote:



Hi All,
I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2 Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch shell , read the input file from local filesystem and perform transformation successfully. When
 I try to write my output in local filesystem path then I receive below error .
 
I tried to search on web and found similar Jira : 
https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows resolved for Spark 1.3+ but already people have posted the same issue still persists in latest versions.
 
ERROR
scala> data.saveAsTextFile("/home/stuti/test1")
16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, server1): java.io.IOException: The temporary job-output directory file:/home/stuti/test1/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
 
What is the best way to resolve this issue if suppose I don’t want to have Hadoop installed OR is it mandatory to have Hadoop to write the output from Standalone cluster mode.
 
Please suggest.
 
Thanks 
Stuti Awasthi
 




::DISCLAIMER::

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,

lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents

(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.

Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the

views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,

distribution and / or publication of this message without the prior written consent of authorized representative of

HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.

Before opening any email and/or attachments, please check them for viruses and other defects.











-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Not able to write output to local filsystem from Standalone mode.

2016-05-24 Thread Jacek Laskowski
Hi,

What happens when you create the parent directory /home/stuti? I think the
failure is due to missing parent directories. What's the OS?

Jacek
On 24 May 2016 11:27 a.m., "Stuti Awasthi"  wrote:

Hi All,

I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2
Slaves. Also Im not having Hadoop as filesystem . Now, Im able to launch
shell , read the input file from local filesystem and perform
transformation successfully. When I try to write my output in local
filesystem path then I receive below error .



I tried to search on web and found similar Jira :
https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows
resolved for Spark 1.3+ but already people have posted the same issue still
persists in latest versions.



*ERROR*

scala> data.saveAsTextFile("/home/stuti/test1")

16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2,
server1): java.io.IOException: The temporary job-output directory
file:/home/stuti/test1/_temporary doesn't exist!

at
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)

at
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)

at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)

at
org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)

at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)

at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)



What is the best way to resolve this issue if suppose I don’t want to have
Hadoop installed OR is it mandatory to have Hadoop to write the output from
Standalone cluster mode.



Please suggest.



Thanks 

Stuti Awasthi





::DISCLAIMER::


The contents of this e-mail and any attachment(s) are confidential and
intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as
information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability
on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction,
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses
and other defects.




Not able to write output to local filsystem from Standalone mode.

2016-05-24 Thread Stuti Awasthi
Hi All,
I have 3 nodes Spark 1.6 Standalone mode cluster with 1 Master and 2 Slaves. 
Also Im not having Hadoop as filesystem . Now, Im able to launch shell , read 
the input file from local filesystem and perform transformation successfully. 
When I try to write my output in local filesystem path then I receive below 
error .

I tried to search on web and found similar Jira : 
https://issues.apache.org/jira/browse/SPARK-2984 . Even though it shows 
resolved for Spark 1.3+ but already people have posted the same issue still 
persists in latest versions.

ERROR
scala> data.saveAsTextFile("/home/stuti/test1")
16/05/24 05:03:42 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, 
server1): java.io.IOException: The temporary job-output directory 
file:/home/stuti/test1/_temporary doesn't exist!
at 
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at 
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

What is the best way to resolve this issue if suppose I don't want to have 
Hadoop installed OR is it mandatory to have Hadoop to write the output from 
Standalone cluster mode.

Please suggest.

Thanks 
Stuti Awasthi



::DISCLAIMER::


The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.