MLlib, what online(streaming) algorithms are available?

2014-09-23 Thread aka.fe2s
Hi, I'm looking for available online ML algorithms (that improve model with new streaming data). The only one I found is linear regression. Is there anything else implemented as part of MLlib? Thanks, Oleksiy.

Re: HdfsWordCount only counts some of the words

2014-09-24 Thread aka.fe2s
I guess because this example is stateless, so it outputs counts only for given RDD. Take a look at stateful word counter StatefulNetworkWordCount.scala On Wed, Sep 24, 2014 at 4:29 AM, SK skrishna...@gmail.com wrote: I execute it as follows: $SPARK_HOME/bin/spark-submit --master master url

Re: Reading Back a Cached RDD

2016-03-28 Thread aka.fe2s
Nick, what is your use-case? On Thu, Mar 24, 2016 at 11:55 PM, Marco Colombo wrote: > You can persist off-heap, for example with tachyon, now called Alluxio. > Take a look at off heap peristance > > Regards > > > Il giovedì 24 marzo 2016, Holden Karau

ml and mllib persistence

2016-07-12 Thread aka.fe2s
What is the reason Spark has an individual implementations of read/write routines for every model in mllib and ml? (Saveable and MLWritable trait impls) Wouldn't a generic implementation via Java serialization mechanism work? I would like to use it to store the models to a custom storage. --

Re: ml and mllib persistence

2016-07-12 Thread aka.fe2s
Okay, I think I found an answer on my question. Some models (for instance org.apache.spark.mllib.recommendation.MatrixFactorizationModel) hold RDDs, so just serializing these objects will not work. -- Oleksiy Dyagilev On Tue, Jul 12, 2016 at 5:40 PM, aka.fe2s <aka.f...@gmail.com> wrote:

Re: location of a partition in the cluster/ how parallelize method distribute the RDD partitions over the cluster.

2016-07-12 Thread aka.fe2s
The local collection is distributed into the cluster when you run any action http://spark.apache.org/docs/latest/programming-guide.html#actions due to laziness of RDD. If you want to control how the collection is split into parititions, you can create your own RDD implementation and implement

off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread aka.fe2s
Hi folks, What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it no longer. -- Oleksiy Dyagilev

Re: LabeledPoint creation

2016-09-07 Thread aka.fe2s
It has 4 categories a = 1 0 0 b = 0 0 0 c = 0 1 0 d = 0 0 1 -- Oleksiy Dyagilev On Wed, Sep 7, 2016 at 10:42 AM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi, > > Any help on above mail use case ? > > Regards, > Rajesh > > On Tue, Sep 6, 2016 at 5:40 PM, Madabhattula Rajesh

Re: How to write data into CouchBase using Spark & Scala?

2016-09-07 Thread aka.fe2s
Most likely you are missing an import statement that enables some Scala implicits. I haven't used this connector, but looks like you need "import com.couchbase.spark._" -- Oleksiy Dyagilev On Wed, Sep 7, 2016 at 9:42 AM, Devi P.V wrote: > I am newbie in CouchBase.I am

static dataframe to streaming

2019-11-05 Thread aka.fe2s
Hi All, What is the most efficient way of converting static dataframe to streaming (structured streaming)? I have a custom sink implemented for structured streaming and I would like to use it to write a static dataframe. I know that I can write a dataframe to files and then source them to a