Exception while running unit tests that makes use of local-cluster mode

2014-10-23 Thread Varadharajan Mukundan
Hi All, When i try to run unit tests that makes use of local-cluster mode (Ex: Accessing HttpBroadcast variables in a local cluster in BroadcastSuite.scala), its failing with the below exception. I'm using java version 1.8.0_05 and scala version 2.10. I tried to look into the jenkins build

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Jianshi Huang
Upvote for the multitanency requirement. I'm also building a data analytic platform and there'll be multiple users running queries and computations simultaneously. One of the paint point is control of resource size. Users don't really know how much nodes they need, they always use as much as

PR for Hierarchical Clustering Needs Review

2014-10-23 Thread RJ Nowling
Hi all, A few months ago, I collected feedback on what the community was looking for in clustering methods. A number of the community members requested a divisive hierarchical clustering method. Yu Ishikawa has stepped up to implement such a method. I've been working with him to communicate

Memory

2014-10-23 Thread Tom Hubregtsen
Hi all, I would like to validate my understanding of memory regions in Spark. Any comments on my description below would be appreciated! Execution is split up into stages, based on wide dependencies between RDDs and actions such as save. All transformations involving narrow dependencies before

Re: reading/writing parquet decimal type

2014-10-23 Thread Michael Allman
Hi Matei, Another thing occurred to me. Will the binary format you're writing sort the data in numeric order? Or would the decimals have to be decoded for comparison? Cheers, Michael On Oct 12, 2014, at 10:48 PM, Matei Zaharia matei.zaha...@gmail.com wrote: The fixed-length binary type

Receiver/DStream storage level

2014-10-23 Thread Michael Allman
I'm implementing a custom ReceiverInputDStream and I'm not sure how to initialize the Receiver with the storage level. The storage level is set on the DStream, but there doesn't seem to be a way to pass it to the Receiver. At the same time, setting the storage level separately on the Receiver

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Marcelo Vanzin
You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174. On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: Upvote for the multitanency requirement. I'm also building a data analytic platform and there'll be multiple users running queries and

scalastyle annoys me a little bit

2014-10-23 Thread Koert Kuipers
100 max width seems very restrictive to me. even the most restrictive environment i have for development (ssh with emacs) i get a lot more characters to work with than that. personally i find the code harder to read, not easier. like i kept wondering why there are weird newlines in the middle of

Re: scalastyle annoys me a little bit

2014-10-23 Thread Patrick Wendell
Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything

Re: scalastyle annoys me a little bit

2014-10-23 Thread Marcelo Vanzin
I know this is all very subjective, but I find long lines difficult to read. I also like how 100 characters fit in my editor setup fine (split wide screen), while a longer line length would mean I can't have two buffers side-by-side without horizontal scrollbars. I think it's fine to add a

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Koert: Have you tried adding the following on your commandline ? -Dscalastyle.failOnViolation=false Cheers On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you

Spark 1.2 feature freeze on November 1

2014-10-23 Thread Patrick Wendell
Hey All, Just a reminder that as planned [1] we'll go into a feature freeze on November 1. On that date I'll cut a 1.2 release branch and make the up-or-down call on any patches that go into that branch, along with individual committers. It is common for us to receive a very large volume of

Re: scalastyle annoys me a little bit

2014-10-23 Thread Koert Kuipers
Hey Ted, i tried: mvn clean package -DskipTests -Dscalastyle.failOnViolation=false no luck, still get [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-core_2.10: Failed during scalastyle execution: You have 3 Scalastyle violation(s). -

Re: PR for Hierarchical Clustering Needs Review

2014-10-23 Thread Xiangrui Meng
Hi RJ, We are close to the v1.2 feature freeze deadline, so I'm busy with the pipeline feature and couple bugs. I will ask other developers to help review the PR. Thanks for working with Yu and helping the code review! Best, Xiangrui On Thu, Oct 23, 2014 at 2:58 AM, RJ Nowling

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Koert: If you have time, you can try this diff - with which you would be able to specify the following on the command line: -Dscalastyle.failonviolation=false diff --git a/pom.xml b/pom.xml index 687cc63..108585e 100644 --- a/pom.xml +++ b/pom.xml @@ -123,6 +123,7 @@

Re: scalastyle annoys me a little bit

2014-10-23 Thread Koert Kuipers
great thanks i will do that On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu yuzhih...@gmail.com wrote: Koert: If you have time, you can try this diff - with which you would be able to specify the following on the command line: -Dscalastyle.failonviolation=false diff --git a/pom.xml b/pom.xml

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Created SPARK-4066 and attached patch there. On Thu, Oct 23, 2014 at 1:07 PM, Koert Kuipers ko...@tresata.com wrote: great thanks i will do that On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu yuzhih...@gmail.com wrote: Koert: If you have time, you can try this diff - with which you would be able

label points with a given index

2014-10-23 Thread Lochana Menikarachchi
SparkConf conf = new SparkConf().setAppName(LogisticRegression).setMaster(local[4]); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDDString lines = sc.textFile(some.csv); JavaRDDLabeledPoint lPoints = lines.map(new CSVLineParser()); Is there anyway to parse an index

Re: label points with a given index

2014-10-23 Thread Lochana Menikarachchi
Figured constructor can be used for this purpose.. On 10/24/14 7:57 AM, Lochana Menikarachchi wrote: SparkConf conf = new SparkConf().setAppName(LogisticRegression).setMaster(local[4]); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDDString lines = sc.textFile(some.csv);

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Evan Chan
Ashwin, I would say the strategies in general are: 1) Have each user submit separate Spark app (each its own Spark Context), with its own resource settings, and share data through HDFS or something like Tachyon for speed. 2) Share a single spark context amongst multiple users, using fair