Re: About akka used in spark

2015-06-10 Thread Akhil Das
If you look at the maven repo, you can see its from typesafe only http://mvnrepository.com/artifact/org.spark-project.akka/akka-actor_2.10/2.3.4-spark For sbt, you can download the sources by adding withSources() like: libraryDependencies += org.spark-project.akka % akka-actor_2.10 % 2.3.4-spark

Re: About akka used in spark

2015-06-10 Thread Cheng Lian
We only shaded protobuf dependencies because of compatibility issues. The source code is not modified. On 6/10/15 1:55 PM, wangtao (A) wrote: Hi guys, I see group id of akka used in spark is “org.spark-project.akka”. What is its difference with the typesafe one? What is its version? And

答复: [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-10 Thread Tao Wang
+1 Tested with building with Hadoop 2.7.0 and running with tests: WordCount in yarn-client/yarn-cluster mode works fine; Basic sql queries are passed; “spark.sql.autoBroadcastJoinThreshold” works fine; Thrift Server is fine; Running streaming with kafka is good; External shuffle in YARN mode is

Problem with pyspark on Docker talking to YARN cluster

2015-06-10 Thread Ashwin Shankar
All, I was wondering if any of you have solved this problem : I have pyspark(ipython mode) running on docker talking to a yarn cluster(AM/executors are NOT running on docker). When I start pyspark in the docker container, it binds to port *49460.* Once the app is submitted to YARN, the app(AM)

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC4)

2015-06-10 Thread Patrick Wendell
This vote passes! Thanks to everyone who voted. I will get the release artifacts and notes up within a day or two. +1 (23 votes): Reynold Xin* Patrick Wendell* Matei Zaharia* Andrew Or* Timothy Chen Calvin Jia Burak Yavuz Krishna Sankar Hari Shreedharan Ram Sriharsha* Kousuke Saruta Sandy Ryza

Re: Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-06-10 Thread Reynold Xin
This email is good. Just one note -- a lot of people are swamped right before Spark Summit, so you might not get prompt responses this week. On Wed, Jun 10, 2015 at 2:53 PM, Grega Kešpret gr...@celtra.com wrote: I have some time to work on it now. What's a good way to continue the discussions

RE: Problem with pyspark on Docker talking to YARN cluster

2015-06-10 Thread Eron Wright
Options include:use 'spark.driver.host' and 'spark.driver.port' setting to stabilize the driver-side endpoint. (ref)use host networking for your container, i.e. docker run --net=host ...use yarn-cluster mode (see SPARK-5162) Hope this helps,Eron Date: Wed, 10 Jun 2015 13:43:04 -0700 Subject:

Re: Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-06-10 Thread Grega Kešpret
I have some time to work on it now. What's a good way to continue the discussions before coding it? This e-mail list, JIRA or something else? On Mon, Apr 6, 2015 at 12:59 AM, Reynold Xin r...@databricks.com wrote: I think those are great to have. I would put them in the DataFrame API though,

Jcenter / bintray support for spark packages?

2015-06-10 Thread Hector Yee
Hi Spark devs, Is it possible to add jcenter or bintray support for Spark packages? I'm trying to add our artifact which is on jcenter https://bintray.com/airbnb/aerosolve but I noticed in Spark packages it only accepts Maven coordinates. -- Yee Yang Li Hector google.com/+HectorYee

Re: Jcenter / bintray support for spark packages?

2015-06-10 Thread Patrick Wendell
Hey Hector, It's not a bad idea. I think we'd want to do this by virtue of allowing custom repositories, so users can add bintray or others. - Patrick On Wed, Jun 10, 2015 at 6:23 PM, Hector Yee hector@gmail.com wrote: Hi Spark devs, Is it possible to add jcenter or bintray support for

Re: Approximate rank-based statistics (median, 95-th percentile, etc.) for Spark

2015-06-10 Thread Ray Ortigas
Hi Grega and Reynold, Grega, if you still want to use t-digest, I filed this PR because I thought your t-digest suggestion was a good idea. https://github.com/tdunning/t-digest/pull/56 If it is helpful feel free to do whatever with it. Regards, Ray On Wed, Jun 10, 2015 at 2:54 PM, Reynold

Re: Problem with pyspark on Docker talking to YARN cluster

2015-06-10 Thread Ashwin Shankar
Hi Eron, Thanks for your reply, but none of these options works for us. 1. use 'spark.driver.host' and 'spark.driver.port' setting to stabilize the driver-side endpoint. (ref https://spark.apache.org/docs/latest/configuration.html#networking) This unfortunately won't help since if

Re: How to support dependency jars and files on HDFS in standalone cluster mode?

2015-06-10 Thread Cheng Lian
Since the jars are already on HDFS, you can access them directly in your Spark application without using --jars Cheng On 6/11/15 11:04 AM, Dong Lei wrote: Hi spark-dev: I can not use a hdfs location for the “--jars” or “--files” option when doing a spark-submit in a standalone cluster

Re: [sample code] deeplearning4j for Spark ML (@DeveloperAPI)

2015-06-10 Thread Nick Pentreath
Looks very interesting, thanks for sharing this. I haven't had much chance to do more than a quick glance over the code. Quick question - are the Word2Vec and GLOVE implementations fully parallel on Spark? On Mon, Jun 8, 2015 at 6:20 PM, Eron Wright ewri...@live.com wrote: The deeplearning4j

Re: [DISCUSS] Minimize use of MINOR, BUILD, and HOTFIX w/ no JIRA

2015-06-10 Thread Joseph Bradley
+1 On Sat, Jun 6, 2015 at 9:01 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Just a request here - it would be great if people could create JIRA's for any and all merged pull requests. The reason is that when patches get reverted due to build breaks or other issues, it is very

How to support dependency jars and files on HDFS in standalone cluster mode?

2015-06-10 Thread Dong Lei
Hi spark-dev: I can not use a hdfs location for the --jars or --files option when doing a spark-submit in a standalone cluster mode. For example: Spark-submit ... --jars hdfs://ip/1.jar hdfs://ip/app.jar (standalone cluster mode) will not download 1.jar to driver's

Re: [ml] Why all model classes are final?

2015-06-10 Thread Joseph Bradley
Hi Peter, We've tried to be cautious about making APIs public without need, to allow for changes needed in the future which we can't foresee now. Marking classes as final is part of that. While marking things as Experimental or DeveloperApi is a sort of warning, we've often felt that even