Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
Hi Dhruve, thanks.
I've solved the issue with adding max executors.
I wanted to find some place where I can add this behavior in Spark so that
user should not have to worry about the max executors.

Cheers

- Thanks, via mobile,  excuse brevity.

On Sep 24, 2016 1:15 PM, "dhruve ashar"  wrote:

> From your log, its trying to launch every executor with approximately
> 6.6GB of memory. 168510 is an extremely huge no. executors and 168510 x
> 6.6GB is unrealistic for a 12 node cluster.
> 16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
> containers, each with 2 cores and 6758 MB memory including 614 MB overhead
>
> I don't know the size of the data that you are processing here.
>
> Here are some general choices that I would start with.
>
> Start with a smaller no. of minimum executors and assign them reasonable
> memory. This can be around 48 assuming 12 nodes x 4 cores each. You could
> start with processing a subset of your data and see if you are able to get
> a decent performance. Then gradually increase the maximum # of execs for
> dynamic allocation and process the remaining data.
>
>
>
>
> On Fri, Sep 23, 2016 at 7:54 PM, Yash Sharma  wrote:
>
>> Is there anywhere I can help fix this ?
>>
>> I can see the requests being made in the yarn allocator. What should be
>> the upperlimit of the requests made ?
>>
>> https://github.com/apache/spark/blob/master/yarn/src/main/
>> scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L222
>>
>> On Sat, Sep 24, 2016 at 10:27 AM, Yash Sharma  wrote:
>>
>>> Have been playing around with configs to crack this. Adding them here
>>> where it would be helpful to others :)
>>> Number of executors and timeout seemed like the core issue.
>>>
>>> {code}
>>> --driver-memory 4G \
>>> --conf spark.dynamicAllocation.enabled=true \
>>> --conf spark.dynamicAllocation.maxExecutors=500 \
>>> --conf spark.core.connection.ack.wait.timeout=6000 \
>>> --conf spark.akka.heartbeat.interval=6000 \
>>> --conf spark.akka.frameSize=100 \
>>> --conf spark.akka.timeout=6000 \
>>> {code}
>>>
>>> Cheers !
>>>
>>> On Fri, Sep 23, 2016 at 7:50 PM, 
>>> wrote:
>>>
 For testing purpose can you run with fix number of executors and try.
 May be 12 executors for testing and let know the status.

 Get Outlook for Android 



 On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma"  wrote:

 Thanks Aditya, appreciate the help.
>
> I had the exact thought about the huge number of executors requested.
> I am going with the dynamic executors and not specifying the number of
> executors. Are you suggesting that I should limit the number of executors
> when the dynamic allocator requests for more number of executors.
>
> Its a 12 node EMR cluster and has more than a Tb of memory.
>
>
>
> On Fri, Sep 23, 2016 at 5:12 PM, Aditya  co.in> wrote:
>
>> Hi Yash,
>>
>> What is your total cluster memory and number of cores?
>> Problem might be with the number of executors you are allocating. The
>> logs shows it as 168510 which is on very high side. Try reducing your
>> executors.
>>
>>
>> On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:
>>
>>> Hi All,
>>> I have a spark job which runs over a huge bulk of data with Dynamic
>>> allocation enabled.
>>> The job takes some 15 minutes to start up and fails as soon as it
>>> starts*.
>>>
>>> Is there anything I can check to debug this problem. There is not a
>>> lot of information in logs for the exact cause but here is some snapshot
>>> below.
>>>
>>> Thanks All.
>>>
>>> * - by starts I mean when it shows something on the spark web ui,
>>> before that its just blank page.
>>>
>>> Logs here -
>>>
>>> {code}
>>> 16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter
>>> thread with (heartbeat : 3000, initial allocation : 200) intervals
>>> 16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total
>>> number of 168510 executor(s).
>>> 16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
>>> containers, each with 2 cores and 6758 MB memory including 614 MB 
>>> overhead
>>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason
>>> for non-existent executor 22
>>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason
>>> for non-existent executor 19
>>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason
>>> for non-existent executor 18
>>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason
>>> for non-existent executor 12
>>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason
>>> for non-existent executor 11

Re: @scala.annotation.varargs or @_root_.scala.annotation.varargs?

2016-09-23 Thread Hyukjin Kwon
Then, are we going to submit a PR and fix this maybe?

On 9 Sep 2016 9:30 p.m., "Sean Owen"  wrote:

> Oh I get it now. I was necessary in the past. Sure, seems like it
> could be standardized now.
>
> On Fri, Sep 9, 2016 at 1:13 AM, Reynold Xin  wrote:
> > Yea but the earlier email was asking they were introduced in the first
> > place.
> >
> >
> > On Friday, September 9, 2016, Marcelo Vanzin 
> wrote:
> >>
> >> Not after SPARK-14642, right?
> >>
> >> On Thu, Sep 8, 2016 at 5:07 PM, Reynold Xin 
> wrote:
> >> > There is a package called scala.
> >> >
> >> >
> >> > On Friday, September 9, 2016, Hyukjin Kwon 
> wrote:
> >> >>
> >> >> I was also actually wondering why it is being written like this.
> >> >>
> >> >> I actually took a look for this before and wanted to fix them but I
> >> >> found
> >> >> https://github.com/apache/spark/pull/12077/files#r58041468
> >> >>
> >> >> So, I kind of persuaded myself that committers already know about it
> >> >> and
> >> >> there is a reason for this.
> >> >>
> >> >> I'd like to know the full details why we don't import but write full
> >> >> path
> >> >> though.
> >> >>
> >> >>
> >> >> On 9 Sep 2016 5:28 a.m., "Jakob Odersky"  wrote:
> >> >>>
> >> >>> +1 to Sean's answer, importing varargs.
> >> >>> In this case the _root_ is also unnecessary (it would be required in
> >> >>> case you were using it in a nested package called "scala" itself)
> >> >>>
> >> >>> On Thu, Sep 8, 2016 at 9:27 AM, Sean Owen 
> wrote:
> >> >>> > I think the @_root_ version is redundant because
> >> >>> > @scala.annotation.varargs is redundant. Actually wouldn't we just
> >> >>> > import varargs and write @varargs?
> >> >>> >
> >> >>> > On Thu, Sep 8, 2016 at 1:24 PM, Jacek Laskowski 
> >> >>> > wrote:
> >> >>> >> Hi,
> >> >>> >>
> >> >>> >> The code is not consistent with @scala.annotation.varargs
> >> >>> >> annotation.
> >> >>> >> There are classes with @scala.annotation.varargs like
> >> >>> >> DataFrameReader
> >> >>> >> or functions as well as examples of
> >> >>> >> @_root_.scala.annotation.varargs,
> >> >>> >> e.g. Window or UserDefinedAggregateFunction.
> >> >>> >>
> >> >>> >> I think it should be consistent and @scala.annotation.varargs
> only.
> >> >>> >> WDYT?
> >> >>> >>
> >> >>> >> Pozdrawiam,
> >> >>> >> Jacek Laskowski
> >> >>> >> 
> >> >>> >> https://medium.com/@jaceklaskowski/
> >> >>> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> >> >>> >> Follow me at https://twitter.com/jaceklaskowski
> >> >>> >>
> >> >>> >>
> >> >>> >> 
> -
> >> >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>> > 
> -
> >> >>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >> >>> >
> >> >>>
> >> >>> 
> -
> >> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >> >>>
> >> >
> >>
> >>
> >>
> >> --
> >> Marcelo
>


Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
Is there anywhere I can help fix this ?

I can see the requests being made in the yarn allocator. What should be the
upperlimit of the requests made ?

https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala#L222

On Sat, Sep 24, 2016 at 10:27 AM, Yash Sharma  wrote:

> Have been playing around with configs to crack this. Adding them here
> where it would be helpful to others :)
> Number of executors and timeout seemed like the core issue.
>
> {code}
> --driver-memory 4G \
> --conf spark.dynamicAllocation.enabled=true \
> --conf spark.dynamicAllocation.maxExecutors=500 \
> --conf spark.core.connection.ack.wait.timeout=6000 \
> --conf spark.akka.heartbeat.interval=6000 \
> --conf spark.akka.frameSize=100 \
> --conf spark.akka.timeout=6000 \
> {code}
>
> Cheers !
>
> On Fri, Sep 23, 2016 at 7:50 PM, 
> wrote:
>
>> For testing purpose can you run with fix number of executors and try. May
>> be 12 executors for testing and let know the status.
>>
>> Get Outlook for Android 
>>
>>
>>
>> On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma" 
>> wrote:
>>
>> Thanks Aditya, appreciate the help.
>>>
>>> I had the exact thought about the huge number of executors requested.
>>> I am going with the dynamic executors and not specifying the number of
>>> executors. Are you suggesting that I should limit the number of executors
>>> when the dynamic allocator requests for more number of executors.
>>>
>>> Its a 12 node EMR cluster and has more than a Tb of memory.
>>>
>>>
>>>
>>> On Fri, Sep 23, 2016 at 5:12 PM, Aditya >> co.in> wrote:
>>>
 Hi Yash,

 What is your total cluster memory and number of cores?
 Problem might be with the number of executors you are allocating. The
 logs shows it as 168510 which is on very high side. Try reducing your
 executors.


 On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:

> Hi All,
> I have a spark job which runs over a huge bulk of data with Dynamic
> allocation enabled.
> The job takes some 15 minutes to start up and fails as soon as it
> starts*.
>
> Is there anything I can check to debug this problem. There is not a
> lot of information in logs for the exact cause but here is some snapshot
> below.
>
> Thanks All.
>
> * - by starts I mean when it shows something on the spark web ui,
> before that its just blank page.
>
> Logs here -
>
> {code}
> 16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter
> thread with (heartbeat : 3000, initial allocation : 200) intervals
> 16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number
> of 168510 executor(s).
> 16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
> containers, each with 2 cores and 6758 MB memory including 614 MB overhead
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 22
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 19
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 18
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 12
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 11
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 20
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 15
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 7
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 8
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 16
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 21
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 6
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 13
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 14
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 9
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 3
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 17
> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
> non-existent executor 1
> 16/09/23 06:33:36 WARN 

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
Have been playing around with configs to crack this. Adding them here where
it would be helpful to others :)
Number of executors and timeout seemed like the core issue.

{code}
--driver-memory 4G \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.maxExecutors=500 \
--conf spark.core.connection.ack.wait.timeout=6000 \
--conf spark.akka.heartbeat.interval=6000 \
--conf spark.akka.frameSize=100 \
--conf spark.akka.timeout=6000 \
{code}

Cheers !

On Fri, Sep 23, 2016 at 7:50 PM,  wrote:

> For testing purpose can you run with fix number of executors and try. May
> be 12 executors for testing and let know the status.
>
> Get Outlook for Android 
>
>
>
> On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma" 
> wrote:
>
> Thanks Aditya, appreciate the help.
>>
>> I had the exact thought about the huge number of executors requested.
>> I am going with the dynamic executors and not specifying the number of
>> executors. Are you suggesting that I should limit the number of executors
>> when the dynamic allocator requests for more number of executors.
>>
>> Its a 12 node EMR cluster and has more than a Tb of memory.
>>
>>
>>
>> On Fri, Sep 23, 2016 at 5:12 PM, Aditya > co.in> wrote:
>>
>>> Hi Yash,
>>>
>>> What is your total cluster memory and number of cores?
>>> Problem might be with the number of executors you are allocating. The
>>> logs shows it as 168510 which is on very high side. Try reducing your
>>> executors.
>>>
>>>
>>> On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:
>>>
 Hi All,
 I have a spark job which runs over a huge bulk of data with Dynamic
 allocation enabled.
 The job takes some 15 minutes to start up and fails as soon as it
 starts*.

 Is there anything I can check to debug this problem. There is not a lot
 of information in logs for the exact cause but here is some snapshot below.

 Thanks All.

 * - by starts I mean when it shows something on the spark web ui,
 before that its just blank page.

 Logs here -

 {code}
 16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter
 thread with (heartbeat : 3000, initial allocation : 200) intervals
 16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number
 of 168510 executor(s).
 16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
 containers, each with 2 cores and 6758 MB memory including 614 MB overhead
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 22
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 19
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 18
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 12
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 11
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 20
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 15
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 7
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 8
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 16
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 21
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 6
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 13
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 14
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 9
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 3
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 17
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 1
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 10
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 4
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 2
 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
 non-existent executor 5
 16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1
 time(s) in a row.

Re: Why Expression.deterministic method and Nondeterministic trait?

2016-09-23 Thread Reynold Xin
deterministic method describes whether this instance of the expression tree
is deterministic, whereas Nondeterministic trait is about a class.


On Fri, Sep 23, 2016 at 10:46 AM, Jacek Laskowski  wrote:

> Hi Herman,
>
> That helps to know that someone can explain why we've got the two
> nondeterministic states.
>
> It's not possible to say...a non-Nondeterministic expression can be
> non-deterministic (the former is the trait while the latter is the
> method) #strange
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Sep 23, 2016 at 6:44 PM, Herman van Hövell tot Westerflier
>  wrote:
> > Jacek,
> >
> > A non-deterministic expression usually holds some state. The
> > Nondeterministic trait makes sure a user can initialize this state
> properly.
> > Take a look at InterpretedProjection for instance.
> >
> > HTH
> >
> > -Herman
> >
> > On Fri, Sep 23, 2016 at 8:28 AM, Jacek Laskowski 
> wrote:
> >>
> >> Hi,
> >>
> >> Just came across the Expression trait [1] that can be check for
> >> determinism by the method deterministic [2] and trait Nondeterministic
> >> [3]. Why both?
> >>
> >> [1]
> >> https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> expressions/Expression.scala#L53
> >> [2]
> >> https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> expressions/Expression.scala#L80
> >> [3]
> >> https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> expressions/Expression.scala#L271
> >>
> >> Pozdrawiam,
> >> Jacek Laskowski
> >> 
> >> https://medium.com/@jaceklaskowski/
> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> >> Follow me at https://twitter.com/jaceklaskowski
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-23 Thread vaquar khan
+1 non binding
No issue found.
Regards,
Vaquar khan

On 23 Sep 2016 17:25, "Mark Hamstra"  wrote:

Similar but not identical configuration (Java 8/macOs 10.12 with build/mvn
-Phive -Phive-thriftserver -Phadoop-2.7 -Pyarn clean install);
Similar but not identical failure:

...

- line wrapper only initialized once when used as encoder outer scope

Spark context available as 'sc' (master = local-cluster[1,1,1024], app id =
app-20160923150640-).

Spark session available as 'spark'.

Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError:
GC overhead limit exceeded

Exception in thread "dispatcher-event-loop-7" java.lang.OutOfMemoryError:
GC overhead limit exceeded

- define case class and create Dataset together with paste mode

java.lang.OutOfMemoryError: GC overhead limit exceeded

- should clone and clean line object in ClosureCleaner *** FAILED ***

  java.util.concurrent.TimeoutException: Futures timed out after [10
minutes]

...


On Fri, Sep 23, 2016 at 3:08 PM, Sean Owen  wrote:

> +1 Signatures and hashes check out. I checked that the Kinesis
> assembly artifacts are not present.
>
> I compiled and tested on Java 8 / Ubuntu 16 with -Pyarn -Phive
> -Phive-thriftserver -Phadoop-2.7 -Psparkr and only saw one test
> problem. This test never completed. If nobody else sees it, +1,
> assuming it's a bad test or env issue.
>
> - should clone and clean line object in ClosureCleaner *** FAILED ***
>   isContain was true Interpreter output contained 'Exception':
>   Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
> /_/
>
>   Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_91)
>   Type in expressions to have them evaluated.
>   Type :help for more information.
>
>   scala> // Entering paste mode (ctrl-D to finish)
>
>
>   // Exiting paste mode, now interpreting.
>
>   org.apache.spark.SparkException: Job 0 cancelled because
> SparkContext was shut down
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfte
> rSchedulerStop$1.apply(DAGScheduler.scala:818)
> ...
>
>
> On Fri, Sep 23, 2016 at 7:01 AM, Reynold Xin  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.1. The vote is open until Sunday, Sep 25, 2016 at 23:59 PDT and
> passes
> > if a majority of at least 3+1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.0.1
> > [ ] -1 Do not release this package because ...
> >
> >
> > The tag to be voted on is v2.0.1-rc2
> > (04141ad49806a48afccc236b699827997142bd57)
> >
> > This release candidate resolves 284 issues:
> > https://s.apache.org/spark-2.0.1-jira
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1199
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-docs/
> >
> >
> > Q: How can I help test this release?
> > A: If you are a Spark user, you can help us test this release by taking
> an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 2.0.0.
> >
> > Q: What justifies a -1 vote for this release?
> > A: This is a maintenance release in the 2.0.x series.  Bugs already
> present
> > in 2.0.0, missing features, or bugs related to new features will not
> > necessarily block this release.
> >
> > Q: What happened to 2.0.1 RC1?
> > A: There was an issue with RC1 R documentation during release candidate
> > preparation. As a result, rc1 was canceled before a vote was called.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-23 Thread Mark Hamstra
Similar but not identical configuration (Java 8/macOs 10.12 with build/mvn
-Phive -Phive-thriftserver -Phadoop-2.7 -Pyarn clean install);
Similar but not identical failure:

...

- line wrapper only initialized once when used as encoder outer scope

Spark context available as 'sc' (master = local-cluster[1,1,1024], app id =
app-20160923150640-).

Spark session available as 'spark'.

Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError:
GC overhead limit exceeded

Exception in thread "dispatcher-event-loop-7" java.lang.OutOfMemoryError:
GC overhead limit exceeded

- define case class and create Dataset together with paste mode

java.lang.OutOfMemoryError: GC overhead limit exceeded

- should clone and clean line object in ClosureCleaner *** FAILED ***

  java.util.concurrent.TimeoutException: Futures timed out after [10
minutes]

...


On Fri, Sep 23, 2016 at 3:08 PM, Sean Owen  wrote:

> +1 Signatures and hashes check out. I checked that the Kinesis
> assembly artifacts are not present.
>
> I compiled and tested on Java 8 / Ubuntu 16 with -Pyarn -Phive
> -Phive-thriftserver -Phadoop-2.7 -Psparkr and only saw one test
> problem. This test never completed. If nobody else sees it, +1,
> assuming it's a bad test or env issue.
>
> - should clone and clean line object in ClosureCleaner *** FAILED ***
>   isContain was true Interpreter output contained 'Exception':
>   Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version 2.0.1
> /_/
>
>   Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_91)
>   Type in expressions to have them evaluated.
>   Type :help for more information.
>
>   scala> // Entering paste mode (ctrl-D to finish)
>
>
>   // Exiting paste mode, now interpreting.
>
>   org.apache.spark.SparkException: Job 0 cancelled because
> SparkContext was shut down
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:818)
> ...
>
>
> On Fri, Sep 23, 2016 at 7:01 AM, Reynold Xin  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.1. The vote is open until Sunday, Sep 25, 2016 at 23:59 PDT and
> passes
> > if a majority of at least 3+1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.0.1
> > [ ] -1 Do not release this package because ...
> >
> >
> > The tag to be voted on is v2.0.1-rc2
> > (04141ad49806a48afccc236b699827997142bd57)
> >
> > This release candidate resolves 284 issues:
> > https://s.apache.org/spark-2.0.1-jira
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1199
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-docs/
> >
> >
> > Q: How can I help test this release?
> > A: If you are a Spark user, you can help us test this release by taking
> an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 2.0.0.
> >
> > Q: What justifies a -1 vote for this release?
> > A: This is a maintenance release in the 2.0.x series.  Bugs already
> present
> > in 2.0.0, missing features, or bugs related to new features will not
> > necessarily block this release.
> >
> > Q: What happened to 2.0.1 RC1?
> > A: There was an issue with RC1 R documentation during release candidate
> > preparation. As a result, rc1 was canceled before a vote was called.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-23 Thread Ricardo Almeida
+1 (non-binding)

Build:
OK, but can no longer use the "--tgz" option when
calling make-distribution.sh (maybe a problem on my side?)

Run:
No regressions from 2.0.0 detected. Tested our pipelines on a standalone
cluster (Python API)



On 23 September 2016 at 08:01, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.1. The vote is open until Sunday, Sep 25, 2016 at 23:59 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.1
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.1-rc2 (04141ad49806a48afccc236b69982
> 7997142bd57)
>
> This release candidate resolves 284 issues: https://s.apache.org/spark-2.0
> .1-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1199
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 2.0.0.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series.  Bugs already
> present in 2.0.0, missing features, or bugs related to new features will
> not necessarily block this release.
>
> Q: What happened to 2.0.1 RC1?
> A: There was an issue with RC1 R documentation during release candidate
> preparation. As a result, rc1 was canceled before a vote was called.
>
>


Re: Why Expression.deterministic method and Nondeterministic trait?

2016-09-23 Thread Herman van Hövell tot Westerflier
Jacek,

A non-deterministic expression usually holds some state. The
Nondeterministic trait makes sure a user can initialize this state
properly. Take a look at InterpretedProjection

for instance.

HTH

-Herman

On Fri, Sep 23, 2016 at 8:28 AM, Jacek Laskowski  wrote:

> Hi,
>
> Just came across the Expression trait [1] that can be check for
> determinism by the method deterministic [2] and trait Nondeterministic
> [3]. Why both?
>
> [1] https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> expressions/Expression.scala#L53
> [2] https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> expressions/Expression.scala#L80
> [3] https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> expressions/Expression.scala#L271
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Why Expression.deterministic method and Nondeterministic trait?

2016-09-23 Thread Jacek Laskowski
Hi,

Just came across the Expression trait [1] that can be check for
determinism by the method deterministic [2] and trait Nondeterministic
[3]. Why both?

[1] 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L53
[2] 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L80
[3] 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L271

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [SPARK-15717][GraphX] status

2016-09-23 Thread Asher Krim
Thanks Anderson!

I have not tried the fix yet due to the way we currently build spark (we
don't really yet :-(). Once we build internally, I can give it a whirl.

On Thu, Sep 22, 2016 at 6:03 PM, Anderson de Andrade  wrote:

> Done.
>
> On Thu, Sep 22, 2016 at 5:53 PM, Anderson de Andrade <
> adeandrad...@gmail.com> wrote:
>
>> I have updates to that PR that cover other cases. Let me update it.
>>
>> On Thu, Sep 22, 2016 at 5:51 PM, Reynold Xin  wrote:
>>
>>> Did you try the proposed fix? Would be good to know whether it fixes the
>>> issue.
>>>
>>> On Thu, Sep 22, 2016 at 2:49 PM, Asher Krim  wrote:
>>>
 Does anyone know what the status of SPARK-15717 is? It's a simple
 enough looking PR, but there has been no activity on it since June 16th.

 I believe that we are hitting that bug with checkpointed distributed
 LDA. It's a blocker for us and we would really appreciate getting it fixed.

 Jira: https://issues.apache.org/jira/browse/SPARK-15717
 GitHub: https://github.com/apache/spark/pull/13458

 Thanks,
 Asher
 Senior Software Engineer
 HubSpot

>>>
>>>
>>
>


Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread aditya . calangutkar


For testing purpose can you run with fix number of executors and try. May be 12 
executors for testing and let know the status.


Get Outlook for Android






On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma"  wrote:










Thanks Aditya, appreciate the help.
I had the exact thought about the huge number of executors requested.I am going 
with the dynamic executors and not specifying the number of executors. Are you 
suggesting that I should limit the number of executors when the dynamic 
allocator requests for more number of executors.
Its a 12 node EMR cluster and has more than a Tb of memory. 


On Fri, Sep 23, 2016 at 5:12 PM, Aditya  
wrote:
Hi Yash,



What is your total cluster memory and number of cores?

Problem might be with the number of executors you are allocating. The logs 
shows it as 168510 which is on very high side. Try reducing your executors.



On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:


Hi All,

I have a spark job which runs over a huge bulk of data with Dynamic allocation 
enabled.

The job takes some 15 minutes to start up and fails as soon as it starts*.



Is there anything I can check to debug this problem. There is not a lot of 
information in logs for the exact cause but here is some snapshot below.



Thanks All.



* - by starts I mean when it shows something on the spark web ui, before that 
its just blank page.



Logs here -



{code}

16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter thread with 
(heartbeat : 3000, initial allocation : 200) intervals

16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number of 168510 
executor(s).

16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor containers, 
each with 2 cores and 6758 MB memory including 614 MB overhead

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 22

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 19

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 18

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 12

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 11

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 20

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 15

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 7

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 8

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 16

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 21

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 6

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 13

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 14

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 9

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 3

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 17

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 1

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 10

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 4

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 2

16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 5

16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1 time(s) in a 
row.

java.lang.StackOverflowError

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

        at 

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
Thanks Aditya, appreciate the help.

I had the exact thought about the huge number of executors requested.
I am going with the dynamic executors and not specifying the number of
executors. Are you suggesting that I should limit the number of executors
when the dynamic allocator requests for more number of executors.

Its a 12 node EMR cluster and has more than a Tb of memory.



On Fri, Sep 23, 2016 at 5:12 PM, Aditya 
wrote:

> Hi Yash,
>
> What is your total cluster memory and number of cores?
> Problem might be with the number of executors you are allocating. The logs
> shows it as 168510 which is on very high side. Try reducing your executors.
>
>
> On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:
>
>> Hi All,
>> I have a spark job which runs over a huge bulk of data with Dynamic
>> allocation enabled.
>> The job takes some 15 minutes to start up and fails as soon as it starts*.
>>
>> Is there anything I can check to debug this problem. There is not a lot
>> of information in logs for the exact cause but here is some snapshot below.
>>
>> Thanks All.
>>
>> * - by starts I mean when it shows something on the spark web ui, before
>> that its just blank page.
>>
>> Logs here -
>>
>> {code}
>> 16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter
>> thread with (heartbeat : 3000, initial allocation : 200) intervals
>> 16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number of
>> 168510 executor(s).
>> 16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
>> containers, each with 2 cores and 6758 MB memory including 614 MB overhead
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 22
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 19
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 18
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 12
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 11
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 20
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 15
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 7
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 8
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 16
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 21
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 6
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 13
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 14
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 9
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 3
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 17
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 1
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 10
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 4
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 2
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 5
>> 16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1 time(s)
>> in a row.
>> java.lang.StackOverflowError
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> at scala.collection.TraversableLike$WithFilter$$anonfun$
>> foreach$1.apply(TraversableLike.scala:772)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> at scala.collection.TraversableLike$WithFilter$$anonfun$
>> foreach$1.apply(TraversableLike.scala:772)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> at scala.collection.TraversableLike$WithFilter$$anonfun$
>> foreach$1.apply(TraversableLike.scala:772)
>> at 

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-23 Thread Jacek Laskowski
+1

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Sep 23, 2016 at 8:01 AM, Reynold Xin  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.1. The vote is open until Sunday, Sep 25, 2016 at 23:59 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.1
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.1-rc2
> (04141ad49806a48afccc236b699827997142bd57)
>
> This release candidate resolves 284 issues:
> https://s.apache.org/spark-2.0.1-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1199
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-docs/
>
>
> Q: How can I help test this release?
> A: If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 2.0.0.
>
> Q: What justifies a -1 vote for this release?
> A: This is a maintenance release in the 2.0.x series.  Bugs already present
> in 2.0.0, missing features, or bugs related to new features will not
> necessarily block this release.
>
> Q: What happened to 2.0.1 RC1?
> A: There was an issue with RC1 R documentation during release candidate
> preparation. As a result, rc1 was canceled before a vote was called.
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark Yarn Cluster with Reference File

2016-09-23 Thread Aditya

Hi Abhishek,

From your spark-submit it seems your passing the file as a parameter to 
the driver program. So now it depends what exactly you are doing with 
that parameter. Using --files option it will be available to all the 
worker nodes but if in your code if you are referencing using the 
specified path in distributed mode it wont get the file on the worker nodes.


If you can share the snippet of code it will be easy to debug.

On Friday 23 September 2016 01:03 PM, ABHISHEK wrote:

Hello there,

I have Spark Application which refer to an external file ‘abc.drl’ and 
having unstructured data.
Application is able to find this reference file if I  run app in Local 
mode but in Yarn with Cluster mode, it is not able to  find the file 
in the specified path.
I tried with both local and hdfs path with –-files option but it 
didn’t work.



What is working ?
1.Current  Spark Application runs fine if I run it in Local mode as 
mentioned below.

In below command   file path is local path not HDFS.
spark-submit --master local[*]  --class "com.abc.StartMain" 
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl


3.I want to run this Spark application using Yarn with cluster mode.
For that, I used below command but application is not able to find the 
path for the reference file abc.drl.I tried giving both local and HDFS 
path but didn’t work.


spark-submit --master yarn --deploy-mode cluster  --files 
/home/abhietc/abc/abc.drl --class com.abc.StartMain 
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl


spark-submit --master yarn --deploy-mode cluster  --files 
hdfs://abhietc.com:8020/user/abhietc/abc.drl 
 --class 
com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
hdfs://abhietc.com:8020/user/abhietc/abc.drl 



spark-submit --master yarn --deploy-mode cluster  --files 
hdfs://abc.com:8020/tmp/abc.drl  
--class com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
hdfs://abc.com:8020/tmp/abc.drl 



Error Messages:
Surprising we are not doing any Write operation on reference file but 
still log shows that application is trying to write to file instead 
reading the file.

Also log shows File not found exception for both HDFS and Local path.
-
16/09/20 14:49:50 ERROR scheduler.JobScheduler: Error running job 
streaming job 1474363176000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 1.0 (TID 4, abc.com ): 
java.lang.RuntimeException: Unable to write Resource: 
FileResource[file=hdfs:/abc.com:8020/user/abhietc/abc.drl 
]
at 
org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:71)
at 
com.hmrc.taxcalculator.KieSessionFactory$.getNewSession(KieSessionFactory.scala:49)
at 
com.hmrc.taxcalculator.KieSessionFactory$.getKieSession(KieSessionFactory.scala:21)
at 
com.hmrc.taxcalculator.KieSessionFactory$.execute(KieSessionFactory.scala:27)
at 
com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
at 
com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: 
hdfs:/abc.com:8020/user/abhietc/abc.drl 
 (No such file or directory)

at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
at 

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Aditya

Hi Yash,

What is your total cluster memory and number of cores?
Problem might be with the number of executors you are allocating. The 
logs shows it as 168510 which is on very high side. Try reducing your 
executors.


On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:

Hi All,
I have a spark job which runs over a huge bulk of data with Dynamic 
allocation enabled.

The job takes some 15 minutes to start up and fails as soon as it starts*.

Is there anything I can check to debug this problem. There is not a 
lot of information in logs for the exact cause but here is some 
snapshot below.


Thanks All.

* - by starts I mean when it shows something on the spark web ui, 
before that its just blank page.


Logs here -

{code}
16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter 
thread with (heartbeat : 3000, initial allocation : 200) intervals
16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number 
of 168510 executor(s).
16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor 
containers, each with 2 cores and 6758 MB memory including 614 MB overhead
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 22
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 19
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 18
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 12
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 11
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 20
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 15
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 7
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 8
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 16
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 21
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 6
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 13
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 14
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 9
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 3
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 17
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 1
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 10
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 4
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 2
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for 
non-existent executor 5
16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1 
time(s) in a row.

java.lang.StackOverflowError
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at 
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)

{code}

... 

{code}
16/09/23 06:33:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: 
Attempted to get executor loss reason for executor id 7 at RPC address 
, but got no response. Marking as slave lost.
org.apache.spark.SparkException: Fail to find loss reason for 
non-existent executor 7
at 
org.apache.spark.deploy.yarn.YarnAllocator.enqueueGetLossReasonRequest(YarnAllocator.scala:554)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint$$anonfun$receiveAndReply$1.applyOrElse(ApplicationMaster.scala:632)
at 

Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma
Hi All,
I have a spark job which runs over a huge bulk of data with Dynamic
allocation enabled.
The job takes some 15 minutes to start up and fails as soon as it starts*.

Is there anything I can check to debug this problem. There is not a lot of
information in logs for the exact cause but here is some snapshot below.

Thanks All.

* - by starts I mean when it shows something on the spark web ui, before
that its just blank page.

Logs here -

{code}
16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter thread
with (heartbeat : 3000, initial allocation : 200) intervals
16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number of
168510 executor(s).
16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
containers, each with 2 cores and 6758 MB memory including 614 MB overhead
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 22
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 19
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 18
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 12
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 11
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 20
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 15
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 7
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 8
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 16
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 21
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 6
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 13
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 14
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 9
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 3
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 17
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 1
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 10
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 4
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 2
16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
non-existent executor 5
16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1 time(s)
in a row.
java.lang.StackOverflowError
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
at
scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
{code}

... 

{code}
16/09/23 06:33:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:
Attempted to get executor loss reason for executor id 7 at RPC address ,
but got no response. Marking as slave lost.
org.apache.spark.SparkException: Fail to find loss reason for non-existent
executor 7
at
org.apache.spark.deploy.yarn.YarnAllocator.enqueueGetLossReasonRequest(YarnAllocator.scala:554)
at
org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint$$anonfun$receiveAndReply$1.applyOrElse(ApplicationMaster.scala:632)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:104)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at

[VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-23 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version
2.0.1. The vote is open until Sunday, Sep 25, 2016 at 23:59 PDT and passes
if a majority of at least 3+1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.1
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.1-rc2
(04141ad49806a48afccc236b699827997142bd57)

This release candidate resolves 284 issues:
https://s.apache.org/spark-2.0.1-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1199

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.1-rc2-docs/


Q: How can I help test this release?
A: If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions from 2.0.0.

Q: What justifies a -1 vote for this release?
A: This is a maintenance release in the 2.0.x series.  Bugs already present
in 2.0.0, missing features, or bugs related to new features will not
necessarily block this release.

Q: What happened to 2.0.1 RC1?
A: There was an issue with RC1 R documentation during release candidate
preparation. As a result, rc1 was canceled before a vote was called.