date:20150217

JavaRDD Aggregate initial value - Closure-serialized zero value reasoning?

2015-02-17 Thread Matt Cheah

Hi everyone, I was using JavaPairRDD¹s combineByKey() to compute all of my aggregations before, since I assumed that every aggregation required a key. However, I realized I could do my analysis using JavaRDD¹s aggregate() instead and not use a key. I have set spark.serializer to use Kryo. As a re

Re: Replacing Jetty with TomCat

2015-02-17 Thread Corey Nolet

Niranda, I'm not sure if I'd say Spark's use of Jetty to expose its UI monitoring layer constitutes a use of "two web servers in a single product". Hadoop uses Jetty as well as do many other applications today that need embedded http layers for serving up their monitoring UI to users. This is comp

Re: Replacing Jetty with TomCat

2015-02-17 Thread Patrick Wendell

Hey Niranda, It seems to me a lot of effort to support multiple libraries inside of Spark like this, so I'm not sure that's a great solution. If you are building an application that embeds Spark, is it not possible for you to continue to use Jetty for Spark's internal servers and use tomcat for y

Re: Replacing Jetty with TomCat

2015-02-17 Thread Niranda Perera

Hi Sean, The main issue we have is, running two web servers in a single product. we think it would not be an elegant solution. Could you please point me to the main areas where jetty server is tightly coupled or extension points where I could plug tomcat instead of jetty? If successful I could con

Re: mllib.recommendation Design

2015-02-17 Thread Debasish Das

There is a usability difference...I am not sure if recommendation.ALS would like to add both userConstraint and productConstraint ? GraphLab CF for example has it and we are ready to support all the features for modest ranks where gram matrices can be made... For large ranks I am still working on

Re: Batch prediciton for ALS

2015-02-17 Thread Debasish Das

It will be really help us if we merge it but I guess it is already diverged from the new ALS...I will also take a look at it again and try update with the new ALS... On Tue, Feb 17, 2015 at 3:22 PM, Xiangrui Meng wrote: > It may be too late to merge it into 1.3. I'm going to make another > pass

Re: Batch prediciton for ALS

2015-02-17 Thread Xiangrui Meng

It may be too late to merge it into 1.3. I'm going to make another pass on your PR today. -Xiangrui On Tue, Feb 10, 2015 at 8:01 AM, Debasish Das wrote: > Hi, > > Will it be possible to merge this PR to 1.3 ? > > https://github.com/apache/spark/pull/3098 > > The batch prediction API in ALS will b

Re: mllib.recommendation Design

2015-02-17 Thread Xiangrui Meng

The current ALS implementation allow pluggable solvers for NormalEquation, where we put CholeskeySolver and NNLS solver. Please check the current implementation and let us know how your constraint solver would fit. For a general matrix factorization package, let's make a JIRA and move our discussio

Re: [ml] Lost persistence for fold in crossvalidation.

2015-02-17 Thread Xiangrui Meng

There are three different regParams defined in the grid and there are tree folds. For simplicity, we didn't split the dataset into three and reuse them, but do the split for each fold. Then we need to cache 3*3 times. Note that the pipeline API is not yet optimized for performance. It would be nice

Re: org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.1] failure: ``varchar'' expected but identifier char found in spark-sql

2015-02-17 Thread Yin Huai

Hi Quizhuang, Right now, char is not supported in DDL. Can you try varchar or string? Thanks, Yin On Mon, Feb 16, 2015 at 10:39 PM, Qiuzhuang Lian wrote: > Hi, > > I am not sure this has been reported already or not, I run into this error > under spark-sql shell as build from newest of spark

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Peter Rudenko

It's fixed today: https://github.com/apache/spark/pull/4593 Thanks, Peter Rudenko On 2015-02-17 18:25, Evan R. Sparks wrote: Josh - thanks for the detailed write up - this seems a little funny to me. I agree that with the current code path there is extra work being done than needs to be (e.g. th

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Evan R. Sparks

Josh - thanks for the detailed write up - this seems a little funny to me. I agree that with the current code path there is extra work being done than needs to be (e.g. the features are re-scaled at every iteration, but the relatively costly process of fitting the StandardScaler should not be re-do

Fwd: [MLlib] Performance problem in GeneralizedLinearAlgorithm

2015-02-17 Thread Josh Devins

Cross-posting as I got no response on the users mailing list last week. Any response would be appreciated :) Josh -- Forwarded message -- From: Josh Devins Date: 9 February 2015 at 15:59 Subject: [MLlib] Performance problem in GeneralizedLinearAlgorithm To: "u...@spark.apache.or

JavaRDD Aggregate initial value - Closure-serialized zero value reasoning?

Re: Replacing Jetty with TomCat

Re: Replacing Jetty with TomCat

Re: Replacing Jetty with TomCat

Re: mllib.recommendation Design

Re: Batch prediciton for ALS

Re: Batch prediciton for ALS

Re: mllib.recommendation Design

Re: [ml] Lost persistence for fold in crossvalidation.

Re: org.apache.spark.sql.sources.DDLException: Unsupported dataType: [1.1] failure: ``varchar'' expected but identifier char found in spark-sql

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

Re: [MLlib] Performance problem in GeneralizedLinearAlgorithm

Fwd: [MLlib] Performance problem in GeneralizedLinearAlgorithm

13 matches

Site Navigation

Mail list logo

Footer information