Re: Too many executors are created

2015-10-11 Thread Akhil Das
For some reason the executors are getting killed, 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated: app-20150929120924-/24463 is now EXITED (Command exited with code 1) Can you paste your spark-submit command? You can also look in the executor logs and see whats going on.

No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Disha Shrivastava
Dear Spark developers, I am trying to study the effect of increasing number of cores ( CPU's) on speedup and accuracy ( scalability with spark ANN ) performance for the MNIST dataset using ANN implementation provided in the latest spark release. I have formed a cluster of 5 machines with 88

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Still confused. Why are you saying we didn't vote on an archive? refer to the email I linked, which includes both the git tag and a link to all generated artifacts (also in my email). So, there are two things at play here: First, I am not sure what you mean that a source distro can't have binary

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
Out of curiosity: How can you vote on a release that contains 34 binary files? Surely a source code release should only contain source code and not binaries, as you cannot verify the content of these. Looking forward to a response. With regards, Daniel. On 10/2/2015, 4:42:31 AM, Reynold Xin

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
The Spark releases include a source distribution and several binary distributions. This is pretty normal for Apache projects. What are you referring to here? On Sun, Oct 11, 2015 at 3:26 PM, Daniel Gruno wrote: > Out of curiosity: How can you vote on a release that contains

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
On 10/11/2015 05:12 PM, Sean Owen wrote: > The Spark releases include a source distribution and several binary > distributions. This is pretty normal for Apache projects. What are you > referring to here? Surely the _source_ distribution does not contain binaries? How else can you vote on a

Re: Operations with cached RDD

2015-10-11 Thread Nitin Goyal
The problem is not that zipWithIndex is executed again. "groupBy" triggered hash partitioning on your keys and a shuffle happened due to that and that's why you are seeing 2 stages. You can confirm this by clicking on latter "zipWithIndex" stage and input data has "(memory)" written which means

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Daniel: we did not vote on a tag. Please again read the VOTE email I linked to you: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none among other things, it contains a link to the concrete source (and binary) distribution under vote:

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Nicholas Chammas
You can find the source tagged for release on GitHub , as was clearly linked to in the thread to vote on the release (titled "[VOTE] Release Apache Spark 1.5.1 (RC1)"). Is there something about that thread that was unclear? Nick On Sun, Oct

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Of course, but what's making you think this was a binary-only distribution? The downloads page points you directly to the source distro: http://spark.apache.org/downloads.html Look for the last vote, and you'll find it was of course a vote on source (and binary) artifacts:

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
On 10/11/2015 05:29 PM, Sean Owen wrote: > Of course, but what's making you think this was a binary-only > distribution? I'm not saying binary-only, I am saying your source release contains binary programs, which would invalidate a release vote. Is there a release candidate package, that is

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
Here's my issue: How am I to audit that the dependencies you bundle are in fact what you claim they are? How do I know they don't contain malware or - in light of recent events - emissions test rigging? ;) I am not interested in a git tag - that means nothing in the ASF voting process, you

Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Mike Hynes
Having only 2 workers for 5 machines would be your problem: you probably want 1 worker per physical machine, which entails running the spark-daemon.sh script to start a worker on those machines. The partitioning is agnositic to how many executors are available for running the tasks, so you can't

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Agree, but we are talking about the build/ bit right? I don't agree that it invalidates the release, which is probably the more important idea. As a point of process, you would not want to modify and republish the artifact that was already released after being voted on - unless it was invalid in

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
I think Daniel is correct here. The source artifact incorrectly includes jars. It is inadvertent and not part of our intended release process. This was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by updating our build scripts to fix it. However, our build environment was not

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
*to not include binaries. On Sun, Oct 11, 2015 at 9:35 PM, Patrick Wendell wrote: > I think Daniel is correct here. The source artifact incorrectly includes > jars. It is inadvertent and not part of our intended release process. This > was something I noticed in Spark 1.5.0

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
Oh I see - yes it's the build/. I always thought release votes related to a source tag rather than specific binaries. But maybe we can just fix it in 1.5.2 if there is concern about mutating binaries. It seems reasonable to me. For tests... in the past we've tried to avoid having jars inside of

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
Yeah I mean I definitely think we're not violating the *spirit* of the "no binaries" policy, in that we do not include any binary code that is used at runtime. This is because the binaries we distribute relate only to build and testing. Whether we are violating the *letter* of the policy, I'm not

taking the heap dump when an executor goes OOM

2015-10-11 Thread Niranda Perera
Hi all, is there a way for me to get the heap-dump hprof of an executor jvm, when it goes out of memory? is this currently supported or do I have to change some configurations? cheers -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

yarn-cluster mode throwing NullPointerException

2015-10-11 Thread Rachana Srivastava
I am trying to submit a job using yarn-cluster mode using spark-submit command. My code works fine when I use yarn-client mode. Cloudera Version: CDH-5.4.7-1.cdh5.4.7.p0.3 Command Submitted: spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming" \ --driver-java-options

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
No we are voting on the artifacts being released (too) in principle. Although of course the artifacts should be a deterministic function of the source at a certain point in time. I think the concern is about putting Spark binaries or its dependencies into a source release. That should not happen,