Hi everyone,
I was using JavaPairRDD¹s combineByKey() to compute all of my aggregations
before, since I assumed that every aggregation required a key. However, I
realized I could do my analysis using JavaRDD¹s aggregate() instead and not
use a key.
I have set spark.serializer to use Kryo. As a re
Niranda,
I'm not sure if I'd say Spark's use of Jetty to expose its UI monitoring
layer constitutes a use of "two web servers in a single product". Hadoop
uses Jetty as well as do many other applications today that need embedded
http layers for serving up their monitoring UI to users. This is comp
Hey Niranda,
It seems to me a lot of effort to support multiple libraries inside of
Spark like this, so I'm not sure that's a great solution.
If you are building an application that embeds Spark, is it not
possible for you to continue to use Jetty for Spark's internal servers
and use tomcat for y
Hi Sean,
The main issue we have is, running two web servers in a single product. we
think it would not be an elegant solution.
Could you please point me to the main areas where jetty server is tightly
coupled or extension points where I could plug tomcat instead of jetty?
If successful I could con
There is a usability difference...I am not sure if recommendation.ALS would
like to add both userConstraint and productConstraint ? GraphLab CF for
example has it and we are ready to support all the features for modest
ranks where gram matrices can be made...
For large ranks I am still working on
It will be really help us if we merge it but I guess it is already diverged
from the new ALS...I will also take a look at it again and try update with
the new ALS...
On Tue, Feb 17, 2015 at 3:22 PM, Xiangrui Meng wrote:
> It may be too late to merge it into 1.3. I'm going to make another
> pass
It may be too late to merge it into 1.3. I'm going to make another
pass on your PR today. -Xiangrui
On Tue, Feb 10, 2015 at 8:01 AM, Debasish Das wrote:
> Hi,
>
> Will it be possible to merge this PR to 1.3 ?
>
> https://github.com/apache/spark/pull/3098
>
> The batch prediction API in ALS will b
The current ALS implementation allow pluggable solvers for
NormalEquation, where we put CholeskeySolver and NNLS solver. Please
check the current implementation and let us know how your constraint
solver would fit. For a general matrix factorization package, let's
make a JIRA and move our discussio
There are three different regParams defined in the grid and there are
tree folds. For simplicity, we didn't split the dataset into three and
reuse them, but do the split for each fold. Then we need to cache 3*3
times. Note that the pipeline API is not yet optimized for
performance. It would be nice
Hi Quizhuang,
Right now, char is not supported in DDL. Can you try varchar or string?
Thanks,
Yin
On Mon, Feb 16, 2015 at 10:39 PM, Qiuzhuang Lian
wrote:
> Hi,
>
> I am not sure this has been reported already or not, I run into this error
> under spark-sql shell as build from newest of spark
It's fixed today: https://github.com/apache/spark/pull/4593
Thanks,
Peter Rudenko
On 2015-02-17 18:25, Evan R. Sparks wrote:
Josh - thanks for the detailed write up - this seems a little funny to me.
I agree that with the current code path there is extra work being done than
needs to be (e.g. th
Josh - thanks for the detailed write up - this seems a little funny to me.
I agree that with the current code path there is extra work being done than
needs to be (e.g. the features are re-scaled at every iteration, but the
relatively costly process of fitting the StandardScaler should not be
re-do
Cross-posting as I got no response on the users mailing list last
week. Any response would be appreciated :)
Josh
-- Forwarded message --
From: Josh Devins
Date: 9 February 2015 at 15:59
Subject: [MLlib] Performance problem in GeneralizedLinearAlgorithm
To: "u...@spark.apache.or
13 matches
Mail list logo