from:"\"Suzen, Mehmet\""

partition size inherited from parent: auto coalesce

2017-01-16 Thread Suzen, Mehmet

Hello List, I was wondering what is the design principle that partition size of an RDD is inherited from the parent. See one simple example below [*]. 'ngauss_rdd2' has significantly less data, intuitively in such cases, shouldn't spark invoke coalesce automatically for performance? What would b

partition size inherited from parent: auto coalesce

2017-01-16 Thread Suzen, Mehmet

Hello List, I was wondering what is the design principle that partition size of an RDD is inherited from the parent. See one simple example below [*]. 'ngauss_rdd2' has significantly less data, intuitively in such cases, shouldn't spark invoke coalesce automatically for performance? What would b

Re: Do we anything for Deep Learning in Spark?

2017-06-21 Thread Suzen, Mehmet

There is a BigDL project: https://github.com/intel-analytics/BigDL On 20 June 2017 at 16:17, Jules Damji wrote: > And we will having a webinar on July 27 going into some more details. Stay > tuned. > > Cheers > Jules > > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Jun 20, 2017, a

Re: A tool to generate simulation data

2017-07-27 Thread Suzen, Mehmet

I suggest RandomRDDs API. It provides nice tools. If you write wrappers around that might be good. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs$ - To unsubscribe e-mail: user-

Re: How can i remove the need for calling cache

2017-08-02 Thread Suzen, Mehmet

On 3 August 2017 at 01:05, jeff saremi wrote: > Vadim: > > This is from the Mastering Spark book: > > "It is strongly recommended that a checkpointed RDD is persisted in memory, > otherwise saving it on a file will require recomputation." Is this really true? I had the impression that DAG will no

Re: How can i remove the need for calling cache

2017-08-02 Thread Suzen, Mehmet

On 3 August 2017 at 03:00, Vadim Semenov wrote: > `saveAsObjectFile` doesn't save the DAG, it acts as a typical action, so it > just saves data to some destination. Yes, that's what I thought, so the statement "..otherwise saving it on a file will require recomputation." from the book is not ent

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Suzen, Mehmet

It depends on what model you would like to train but models requiring optimisation could use SGD with mini batches. See: https://spark.apache.org/docs/latest/mllib-optimization.html#stochastic-gradient-descent-sgd On 23 August 2017 at 14:27, Sea aj wrote: > Hi, > > I am trying to feed a huge dat

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Suzen, Mehmet

gmail.com&idSignature=22> > > On Wed, Aug 23, 2017 at 2:59 PM, Suzen, Mehmet wrote: > >> It depends on what model you would like to train but models requiring >> optimisation could use SGD with mini batches. See: >> https://spark.apache.org/docs/latest/mllib-optim

Re: RDD order preservation through transformations

2017-09-13 Thread Suzen, Mehmet

I think the order has no meaning in RDDs see this post, specially zip methods: https://stackoverflow.com/questions/29268210/mind-blown-rdd-zip-method - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: RDD order preservation through transformations

2017-09-13 Thread Suzen, Mehmet

equence across a > partition as partition is local and computation happens one record at a > time. > > On 13-Sep-2017 9:54 PM, "Suzen, Mehmet" wrote: > > I think the order has no meaning in RDDs see this post, specially zip > methods: > https://stackoverflo

Re: RDD order preservation through transformations

2017-09-13 Thread Suzen, Mehmet

of partitions in mapPartition? On 13 Sep 2017 19:54, "Ankit Maloo" wrote: > > Rdd are fault tolerant as it can be recomputed using DAG without storing the > intermediate RDDs. > > On 13-Sep-2017 11:16 PM, "Suzen, Mehmet" wrote: >> >> But what h

Re: RDD order preservation through transformations

2017-09-14 Thread Suzen, Mehmet

On 14 September 2017 at 10:42, wrote: > val noTs = myData.map(dropTimestamp) > > val scaled = scaler.transform(noTs) > > val projected = (new RowMatrix(scaled)).multiply(principalComponents).rows > > val clusters = myModel.predict(projected) > > val result = myData.zip(clusters) > > > > Do you th

Re: RDD order preservation through transformations

2017-09-15 Thread Suzen, Mehmet

Hi Johan, DataFrames are building on top of RDDs, not sure if the ordering issues are different there. Maybe you could create minimally large enough simulated data and example series of transformations as an example to experiment on. Best, -m Mehmet Süzen, MSc, PhD | PRIVILEGED AND CONFIDENTIAL

Re: Alternative for numpy in Spark Mlib

2018-05-23 Thread Suzen, Mehmet

You can use Breeze, which is part of spark distribution: https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebra Check out the modules under import breeze._ On 23 May 2018 at 07:04, umargeek wrote: > Hi Folks, > > I am planning to rewrite one of my python module written for entropy > cal

partition size inherited from parent: auto coalesce

partition size inherited from parent: auto coalesce

Re: Do we anything for Deep Learning in Spark?

Re: A tool to generate simulation data

Re: How can i remove the need for calling cache

Re: How can i remove the need for calling cache

Re: Training A ML Model on a Huge Dataframe

Re: Training A ML Model on a Huge Dataframe

Re: RDD order preservation through transformations

Re: RDD order preservation through transformations

Re: RDD order preservation through transformations

Re: RDD order preservation through transformations

Re: RDD order preservation through transformations

Re: Alternative for numpy in Spark Mlib

14 matches

Site Navigation

Mail list logo

Footer information