Re: Removing the Mesos fine-grained mode

2015-11-30 Thread Timothy Chen
Hi Adam, Thanks for the graphs and the tests, definitely interested to dig a bit deeper to find out what's could be the cause of this. Do you have the spark driver logs for both runs? Tim On Mon, Nov 30, 2015 at 9:06 AM, Adam McElwee wrote: > To eliminate any skepticism

Re: Bringing up JDBC Tests to trunk

2015-11-30 Thread Josh Rosen
The JDBC drivers are currently being pulled in as test-scope dependencies of the `sql/core` module: https://github.com/apache/spark/blob/f2fbfa444f6e8d27953ec2d1c0b3abd603c963f9/sql/core/pom.xml#L91 In SBT, these wind up on the Docker JDBC tests' classpath as a transitive dependency of the

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread Burak Yavuz
Or you could also use reflection like in this Spark Package: https://github.com/brkyvz/lazy-linalg/blob/master/src/main/scala/com/brkyvz/spark/linalg/BLASUtils.scala Best, Burak On Mon, Nov 30, 2015 at 12:48 PM, DB Tsai wrote: > The workaround is have your code in the same

Re: Removing the Mesos fine-grained mode

2015-11-30 Thread Adam McElwee
To eliminate any skepticism around whether cpu is a good performance metric for this workload, I did a couple comparison runs of an example job to demonstrate a more universal change in performance metrics (stage/job time) between coarse and fine-grained mode on mesos. The workload is identical

Re: Problem in running MLlib SVM

2015-11-30 Thread Fazlan Nazeem
You should never use the training data to measure your prediction accuracy. Always use a fresh dataset (test data) for this purpose. On Sun, Nov 29, 2015 at 8:36 AM, Jeff Zhang wrote: > I think this should represent the label of LabledPoint (0 means negative 1 > means

Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Jacek Laskowski
Hi, My limited understanding of Spark tells me that a task is the least possible working unit and Spark itself won't give you much. It wouldn't expect so since "acount" is a business entity not Spark's one. What about using mapPartitions* to know the details of partitions and do whatever you

Re: Need suggestions on monitor Spark progress

2015-11-30 Thread Alex Rovner
In these scenarios it's fairly standard to report the metrics either directly or through accumulators ( http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka) to a time series database such as Graphite (http://graphite.wikidot.com/) or OpenTSDB

Re: question about combining small parquet files

2015-11-30 Thread Nezih Yigitbasi
This looks interesting, thanks Ruslan. But, compaction with Hive is as simple as an insert overwrite statement as Hive supports CombineFileInputFormat, is it possible to do the same with Spark? On Thu, Nov 26, 2015 at 9:47 AM, Ruslan Dautkhanov wrote: > An interesting

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread DB Tsai
The workaround is have your code in the same package, or write some utility wrapper in the same package so you can use them in your code. Mostly we implement those BLAS for our own need, and we don't have general use-case in mind. As a result, if we open them up prematurely, it will add our api

Re: Problem in running MLlib SVM

2015-11-30 Thread Joseph Bradley
model.predict should return a 0/1 predicted label. The example code is misleading when it calls the prediction a "score." On Mon, Nov 30, 2015 at 9:13 AM, Fazlan Nazeem wrote: > You should never use the training data to measure your prediction > accuracy. Always use a fresh

Re: Grid search with Random Forest

2015-11-30 Thread Joseph Bradley
It should work with 1.5+. On Thu, Nov 26, 2015 at 12:53 PM, Ndjido Ardo Bar wrote: > > Hi folks, > > Does anyone know whether the Grid Search capability is enabled since the > issue spark-9011 of version 1.4.0 ? I'm getting the "rawPredictionCol > column doesn't exist" when

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread DB Tsai
I used reflection initially, but I found it's very slow especially in a tight loop. Maybe caching the reflection can help which I never try. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Nov 30, 2015 at

FOSDEM 2016 - take action by 4th of December 2015

2015-11-30 Thread Roman Shaposhnik
As most of you probably know FOSDEM 2016 (the biggest, 100% free open source developer conference) is right around the corner: https://fosdem.org/2016/ We hope to have an ASF booth and we would love to see as many ASF projects as possible present at various tracks (AKA Developer rooms):

Re: Grid search with Random Forest

2015-11-30 Thread Benjamin Fradet
Hi Ndjido, This is because GBTClassifier doesn't yet have a rawPredictionCol like the. RandomForestClassifier has. Cf: http://spark.apache.org/docs/latest/ml-ensembles.html#output-columns-predictions-1 On 1 Dec 2015 3:57 a.m., "Ndjido Ardo BAR" wrote: > Hi Joseph, > > Yes

Re: Grid search with Random Forest

2015-11-30 Thread Ndjido Ardo BAR
Hi Benjamin, Thanks, the documentation you sent is clear. Is there any other way to perform a Grid Search with GBT? Ndjido On Tue, 1 Dec 2015 at 08:32, Benjamin Fradet wrote: > Hi Ndjido, > > This is because GBTClassifier doesn't yet have a rawPredictionCol like >

Re: How to add 1.5.2 support to ec2/spark_ec2.py ?

2015-11-30 Thread Alexander Pivovarov
just want to follow up On Nov 25, 2015 9:19 PM, "Alexander Pivovarov" wrote: > Hi Everyone > > I noticed that spark ec2 script is outdated. > How to add 1.5.2 support to ec2/spark_ec2.py? > What else (except of updating spark version in the script) should be done > to add