Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Sean Owen
I just tried the exact same command and do not see any error. Maybe you can make sure you're starting from a clean extraction of the distro, and check your environment. I'm on OSX, Maven 3.2, Java 8 but I don't know that any of those would be relevant. On Mon, Dec 22, 2014 at 4:10 AM, wyphao.2007

Spark exception when sending message to akka actor

2014-12-22 Thread Priya Ch
Hi All, I have akka remote actors running on 2 nodes. I submitted spark application from node1. In the spark code, in one of the rdd, i am sending message to actor running on node1. My Spark code is as follows: class ActorClient extends Actor with Serializable { import context._ val

Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi, After facing issues with the performance of some of our Spark Streaming jobs, we invested quite some effort figuring out the factors that affect the performance characteristics of a Streaming job. We defined an empirical model that helps us reason about Streaming jobs and applied it to tune

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Timothy Chen
Hi Gerard, Really nice guide! I'm particularly interested in the Mesos scheduling side to more evenly distribute cores across cluster. I wonder if you are using coarse grain mode or fine grain mode? I'm making changes to the spark mesos scheduler and I think we can propose a best way to

Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Patrick Wendell
I also couldn't reproduce this issued. On Mon, Dec 22, 2014 at 2:24 AM, Sean Owen so...@cloudera.com wrote: I just tried the exact same command and do not see any error. Maybe you can make sure you're starting from a clean extraction of the distro, and check your environment. I'm on OSX, Maven

cleaning up cache files left by SPARK-2713

2014-12-22 Thread Cody Koeninger
Is there a reason not to go ahead and move the _cache and _lock files created by Utils.fetchFiles into the work directory, so they can be cleaned up more easily? I saw comments to that effect in the discussion of the PR for 2713, but it doesn't look like it got done. And no, I didn't just have a

Re: cleaning up cache files left by SPARK-2713

2014-12-22 Thread Marcelo Vanzin
https://github.com/apache/spark/pull/3705 On Mon, Dec 22, 2014 at 10:19 AM, Cody Koeninger c...@koeninger.org wrote: Is there a reason not to go ahead and move the _cache and _lock files created by Utils.fetchFiles into the work directory, so they can be cleaned up more easily? I saw comments

Re: spark-yarn_2.10 1.2.0 artifacts

2014-12-22 Thread David McWhorter
Thank you, Sean, using spark-network-yarn seems to do the trick. On 12/19/2014 12:13 PM, Sean Owen wrote: I believe spark-yarn does not exist from 1.2 onwards. Have a look at spark-network-yarn for where some of that went, I believe. On Fri, Dec 19, 2014 at 5:09 PM, David McWhorter

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi Tim, That would be awesome. We have seen some really disparate Mesos allocations for our Spark Streaming jobs. (like (7,4,1) over 3 executors for 4 kafka consumer instead of the ideal (3,3,3,3)) For network dependent consumers, achieving an even deployment would provide a reliable and

Re: Data source interface for making multiple tables available for query

2014-12-22 Thread Michael Armbrust
I agree and this is something that we have discussed in the past. Essentially I think instead of creating a RelationProvider that returns a single table, we'll have something like an external catalog that can return multiple base relations. On Sun, Dec 21, 2014 at 6:43 PM, Venkata ramana

Announcing Spark Packages

2014-12-22 Thread Xiangrui Meng
Dear Spark users and developers, I’m happy to announce Spark Packages (http://spark-packages.org), a community package index to track the growing number of open source packages and libraries that work with Apache Spark. Spark Packages makes it easy for users to find, discuss, rate, and install

More general submitJob API

2014-12-22 Thread Alessandro Baretta
Fellow Sparkers, I'm rather puzzled at the submitJob API. I can't quite figure out how it is supposed to be used. Is there any more documentation about it? Also, is there any simpler way to multiplex jobs on the cluster, such as starting multiple computations in as many threads in the driver and

Re: Announcing Spark Packages

2014-12-22 Thread Andrew Ash
Hi Xiangrui, That link is currently returning a 503 Over Quota error message. Would you mind pinging back out when the page is back up? Thanks! Andrew On Mon, Dec 22, 2014 at 12:37 PM, Xiangrui Meng men...@gmail.com wrote: Dear Spark users and developers, I’m happy to announce Spark

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Xiangrui asked me to report that it's back and running :) On Mon, Dec 22, 2014 at 3:21 PM, peng pc...@uowmail.edu.au wrote: Me 2 :) On 12/22/2014 06:14 PM, Andrew Ash wrote: Hi Xiangrui, That link is currently returning a 503 Over Quota error message. Would you mind pinging back out

Re: Announcing Spark Packages

2014-12-22 Thread Hitesh Shah
Hello Xiangrui, If you have not already done so, you should look at http://www.apache.org/foundation/marks/#domains for the policy on use of ASF trademarked terms in domain names. thanks — Hitesh On Dec 22, 2014, at 12:37 PM, Xiangrui Meng men...@gmail.com wrote: Dear Spark users and

Re: More general submitJob API

2014-12-22 Thread Andrew Ash
Hi Alex, SparkContext.submitJob() is marked as experimental -- most client programs shouldn't be using it. What are you looking to do? For multiplexing jobs, one thing you can do is have multiple threads in your client JVM each submit jobs on your SparkContext job. This is described here in

Re: More general submitJob API

2014-12-22 Thread Alessandro Baretta
Andrew, Thanks, yes, this is what I wanted: basically just to start multiple jobs concurrently in threads. Alex On Mon, Dec 22, 2014 at 4:04 PM, Andrew Ash and...@andrewash.com wrote: Hi Alex, SparkContext.submitJob() is marked as experimental -- most client programs shouldn't be using it.

Re: More general submitJob API

2014-12-22 Thread Patrick Wendell
A SparkContext is thread safe, so you can just have different threads that create their own RDD's and do actions, etc. - Patrick On Mon, Dec 22, 2014 at 4:15 PM, Alessandro Baretta alexbare...@gmail.com wrote: Andrew, Thanks, yes, this is what I wanted: basically just to start multiple jobs

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas

Re: Announcing Spark Packages

2014-12-22 Thread Nicholas Chammas
Okie doke! (I just assumed there was an issue since the policy was brought up.) On Mon Dec 22 2014 at 8:33:53 PM Patrick Wendell pwend...@gmail.com wrote: Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Nicholas Chammas
Does this include contributions made against the spark-ec2 https://github.com/mesos/spark-ec2 repo? On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to the very high volume of contributions, we're switching to an automated process for generating

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
Hey Josh, We don't explicitly track contributions to spark-ec2 in the Apache Spark release notes. The main reason is that usually updates to spark-ec2 include a corresponding update to spark so we get it there. This may not always be the case though, so let me know if you think there is something

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
s/Josh/Nick/ - sorry! On Mon, Dec 22, 2014 at 10:52 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Josh, We don't explicitly track contributions to spark-ec2 in the Apache Spark release notes. The main reason is that usually updates to spark-ec2 include a corresponding update to spark so