Re: Will Spark-SQL support vectorized query engine someday?

2015-01-20 Thread Reynold Xin
I don't know if there is a list, but in general running performance profiler can identify a lot of things... On Tue, Jan 20, 2015 at 12:30 AM, Xuelin Cao xuelincao2...@gmail.com wrote: Thanks, Reynold Regarding the lower hanging fruits, can you give me some example? Where can I find

not found: type LocalSparkContext

2015-01-20 Thread James
Hi all, When I was trying to write a test on my spark application I met ``` Error:(14, 43) not found: type LocalSparkContext class HyperANFSuite extends FunSuite with LocalSparkContext { ``` At the source code of spark-core I could not found LocalSparkContext, thus I wonder how to write a test

Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-20 Thread Andrew Or
Hi Preeze, Is there any designed way that the client connects back to the driver (still running in YARN) for collecting results at a later stage? No, there is not support built into Spark for this. For this to happen seamlessly the driver will have to start a server (pull model) or send the

Re: not found: type LocalSparkContext

2015-01-20 Thread Will Benton
It's declared here: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/LocalSparkContext.scala I assume you're already importing LocalSparkContext, but since the test classes aren't included in Spark packages, you'll also need to package them up in order to use

Re: Spectral clustering

2015-01-20 Thread Xiangrui Meng
Fan and Stephen (cc'ed) are working on this feature. They will update the JIRA page and report progress soon. -Xiangrui On Fri, Jan 16, 2015 at 12:04 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Hi, thinking of picking up this Jira ticket:

Re: Spectral clustering

2015-01-20 Thread Andrew Musselman
Awesome, thanks On Tue, Jan 20, 2015 at 12:56 PM, Xiangrui Meng men...@gmail.com wrote: Fan and Stephen (cc'ed) are working on this feature. They will update the JIRA page and report progress soon. -Xiangrui On Fri, Jan 16, 2015 at 12:04 PM, Andrew Musselman andrew.mussel...@gmail.com

Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
What do y'all think of creating a standardized Spark development environment, perhaps encoded as a Vagrantfile, and publishing it under `dev/`? The goal would be to make it easier for new developers to get started with all the right configs and tools pre-installed. If we use something like

Re: Standardized Spark dev environment

2015-01-20 Thread shenyan zhen
Great suggestion. On Jan 20, 2015 7:14 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: What do y'all think of creating a standardized Spark development environment, perhaps encoded as a Vagrantfile, and publishing it under `dev/`? The goal would be to make it easier for new developers

Re: Standardized Spark dev environment

2015-01-20 Thread Ted Yu
How many profiles (hadoop / hive /scala) would this development environment support ? Cheers On Tue, Jan 20, 2015 at 4:13 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: What do y'all think of creating a standardized Spark development environment, perhaps encoded as a Vagrantfile, and

Re: Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
How many profiles (hadoop / hive /scala) would this development environment support ? As many as we want. We probably want to cover a good chunk of the build matrix https://issues.apache.org/jira/browse/SPARK-2004 that Spark officially supports. What does this provide, concretely? It provides a

Re: Standardized Spark dev environment

2015-01-20 Thread jay vyas
I can comment on both... hi will and nate :) 1) Will's Dockerfile solution is the most simple direct solution to the dev environment question : its a efficient way to build and develop spark environments for dev/test.. It would be cool to put that Dockerfile (and/or maybe a shell script

RE: Standardized Spark dev environment

2015-01-20 Thread nate
If there is some interest in more standardization and setup of dev/test environments spark community might be interested in starting to participate in apache bigtop effort: http://bigtop.apache.org/ While the project had its start and initial focus on packaging, testing, deploying Hadoop/hdfs

Re: Standardized Spark dev environment

2015-01-20 Thread Will Benton
Hey Nick, I did something similar with a Docker image last summer; I haven't updated the images to cache the dependencies for the current Spark master, but it would be trivial to do so: http://chapeau.freevariable.com/2014/08/jvm-test-docker.html best, wb - Original Message -

Re: not found: type LocalSparkContext

2015-01-20 Thread James
I could not correctly import org.apache.spark.LocalSparkContext, I use sbt on Intellij for developing,here is my build sbt. ``` libraryDependencies += org.apache.spark %% spark-core % 1.2.0 libraryDependencies += org.apache.spark %% spark-graphx % 1.2.0 libraryDependencies +=

Re: GraphX ShortestPaths backwards?

2015-01-20 Thread Michael Malak
I created https://issues.apache.org/jira/browse/SPARK-5343 for this. - Original Message - From: Michael Malak michaelma...@yahoo.com To: dev@spark.apache.org dev@spark.apache.org Cc: Sent: Monday, January 19, 2015 5:09 PM Subject: GraphX ShortestPaths backwards? GraphX ShortestPaths

Re: not found: type LocalSparkContext

2015-01-20 Thread Reynold Xin
You don't need the LocalSparkContext. It is only for Spark's own unit test. You can just create a SparkContext and use it in your unit tests, e.g. val sc = new SparkContext(local, my test app, new SparkConf) On Tue, Jan 20, 2015 at 7:27 PM, James alcaid1...@gmail.com wrote: I could not

Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it would be useful to have a Docker image on which we run the tests and build releases, so that we could have a consistent environment that other packagers or people trying to exhaustively run Spark tests could replicate (or at least

KNN for large data set

2015-01-20 Thread DEVAN M.S.
Hi all, Please help me to find out best way for K-nearest neighbor using spark for large data sets.

R: Standardized Spark dev environment

2015-01-20 Thread Paolo Platter
Hi all, I also tried the docker way and it works well. I suggest to look at sequenceiq/spark dockers, they are very active on that field. Paolo Inviata dal mio Windows Phone Da: jay vyasmailto:jayunit100.apa...@gmail.com Inviato: ‎21/‎01/‎2015 04:45 A: Nicholas

Re: Is there any way to support multiple users executing SQL on thrift server?

2015-01-20 Thread Cheng Lian
Hey Yi, I'm quite unfamiliar with Hadoop/HDFS auth mechanisms for now, but would like to investigate this issue later. Would you please open an JIRA for it? Thanks! Cheng On 1/19/15 1:00 AM, Yi Tian wrote: Is there any way to support multiple users executing SQL on one thrift server? I