Re: RDD API patterns

2015-09-26 Thread Mike Hynes
Hello Devs, This email concerns some timing results for a treeAggregate in computing a (stochastic) gradient over an RDD of labelled points, as is currently done in the MLlib optimization routine for SGD. In SGD, the underlying RDD is downsampled by a fraction f \in (0,1], and the subgradients

Re: RDD API patterns

2015-09-26 Thread Evan R. Sparks
Mike, I believe the reason you're seeing near identical performance on the gradient computations is twofold 1) Gradient computations for GLM models are computationally pretty cheap from a FLOPs/byte read perspective. They are essentially a BLAS "gemv" call in the dense case, which is well known

Re: RDD API patterns

2015-09-19 Thread Juan Rodríguez Hortalá
y > without resorting to simulations of nested RDDs. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html > Sent from the Apache Spark Develop

Re: RDD API patterns

2015-09-19 Thread sim
ions. Best, Sim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14222.html Sent from the Apache Spark Developers List mailing list archive at Nabble

Re: RDD API patterns

2015-09-18 Thread sim
& performance in terms of how the mass of developers make technology choices. I have found no exceptions to this, which is why I wanted to bring the issue with the RDD API up here. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp1411

Re: RDD API patterns

2015-09-18 Thread sim
without resorting to simulations of nested RDDs. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: RDD API patterns

2015-09-18 Thread sim
Aniket, yes, I've done the separate file trick. :) Still, I think we can solve this problem without nested RDDs. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14192.html Sent from the Apache Spark Developers List mailing

Re: RDD API patterns

2015-09-18 Thread sim
://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14194.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

Re: RDD API patterns

2015-09-17 Thread Debasish Das
> sampleByKeyExact and your problem 2 could be implemented in a few less > lines > of code. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-

Re: RDD API patterns

2015-09-16 Thread robineast
certain information for the key could be provided along with an Iterable e.g. the counts for the key. Both sampleByKeyExact and your problem 2 could be implemented in a few less lines of code. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API

RDD API patterns

2015-09-14 Thread sim
there are absolutely no high-level abstractions that we can expose via the Iterables borne of RDDs? I'd love your thoughts. /Sim http://linkedin.com/in/simeons <http://linkedin.com/in/simeons> -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/R