Hello Devs,
This email concerns some timing results for a treeAggregate in
computing a (stochastic) gradient over an RDD of labelled points, as
is currently done in the MLlib optimization routine for SGD.
In SGD, the underlying RDD is downsampled by a fraction f \in (0,1],
and the subgradients
Mike,
I believe the reason you're seeing near identical performance on the
gradient computations is twofold
1) Gradient computations for GLM models are computationally pretty cheap
from a FLOPs/byte read perspective. They are essentially a BLAS "gemv" call
in the dense case, which is well known
y
> without resorting to simulations of nested RDDs.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html
> Sent from the Apache Spark Develop
ions.
Best,
Sim
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14222.html
Sent from the Apache Spark Developers List mailing list archive at Nabble
& performance in terms of how the mass of developers make
technology choices. I have found no exceptions to this, which is why I
wanted to bring the issue with the RDD API up here.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp1411
without resorting to simulations of nested RDDs.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
Aniket, yes, I've done the separate file trick. :) Still, I think we can
solve this problem without nested RDDs.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14192.html
Sent from the Apache Spark Developers List mailing
://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14194.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> sampleByKeyExact and your problem 2 could be implemented in a few less
> lines
> of code.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-
certain information for the key
could be provided along with an Iterable e.g. the counts for the key. Both
sampleByKeyExact and your problem 2 could be implemented in a few less lines
of code.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API
there are absolutely no
high-level abstractions that we can expose via the Iterables borne of RDDs?
I'd love your thoughts.
/Sim
http://linkedin.com/in/simeons <http://linkedin.com/in/simeons>
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/R
11 matches
Mail list logo