Re: Machine Learning: Flink and MOA

2018-02-23 Thread Theodore Vasiloudis
Hello Christophe, That's very interesting, I've been working with MOA/SAMOA recently and was considering if we could create some easy integration with Flink. I have a Master student this year that could do some work on this, hopefully we can create something interesting there. Regards, Theodore

Re: Flink Scheduling and FlinkML

2017-04-03 Thread Theodore Vasiloudis
Hello Fabio, what you describe sounds very possible, the easiest way to do it would be to save your incoming data in HDFS as you already do if I understand correctly, and then use the batch ALS algorithm [1] to create your recommendations from the static data, which you could do at regular interva

Re: [POLL] Who still uses Java 7 with Flink ?

2017-03-23 Thread Theodore Vasiloudis
Hello all, I'm sure you've considered this already, but what this data does not include is all the potential future users, i.e. slower moving organizations (banks etc.) which could be on Java 7 still. Whether those are relevant is up for debate. Cheers, Theo On Thu, Mar 23, 2017 at 12:14 PM, Ro

A question about iterations and prioritizing "control" over "data" inputs

2017-03-15 Thread Theodore Vasiloudis
Hello all, I've started thinking about online learning in Flink and one of the issues that has come up in other frameworks is the ability to prioritize "control" over "data" events in iterations. To set an example, say we develop an ML model, that ingests events in parallel, performs an aggregati

Re: FlinkML and DataStream API

2016-12-21 Thread Theodore Vasiloudis
Hello Mäki, I think what you would like to do is train a model using batch, and use the Flink streaming API as a way to serve your model and make predictions. While we don't have an integrated way to do that in FlinkML currently, I definitely think that's possible. I know Marton Balassi has been

Re: Understanding connected streams use without timestamps

2016-11-21 Thread Theodore Vasiloudis
cution of the Connected functions (map1/map2 in this case) are not > affected by the timestamps. In other words it is pretty much arbitrary > which input arrives at the CoMapFunction first. > > So I think you did everything correctly. > > Gyula > > Theodore Vasiloudis ezt í

Understanding connected streams use without timestamps

2016-11-21 Thread Theodore Vasiloudis
Hello all, I was playing around with the the IncrementalLearningSkeleton example and I had a couple of questions regarding the behavior of connected streams. In the example the elements are assigned timestamps, and there is a stream, model, that produces Double[] elements by ingesting and process

Additional steps needed for the Java quickstart guide

2016-11-16 Thread Theodore Vasiloudis
Hello all, I was preparing an exercise for some Master students and I went through running the Java quickstart setup [1] again to verify everything works as expected. I ran into a problem when running from within IDEA, we've encountered this in the past during trainings. While the quickstart gui

Re: Multiclass classification example

2016-10-19 Thread Theodore Vasiloudis
Hello Kursat, We don't have a multi class classifier in FlinkML currently. Regards, Theodore -- Sent from a mobile device. May contain autocorrect errors. On Oct 19, 2016 12:33 AM, "Kürşat Kurt" wrote: > Hi; > > > I am trying to learn Flink Ml lib. > > Where can i find detailed multiclass cl

Re: FlinkML - Fail to execute QuickStart example

2016-10-17 Thread Theodore Vasiloudis
That is my bad, I must have been testing against a private branch when writing the guide, the SVM as it stands only has a predict operation for Vector not LabeledVector. IMHO I would like to have a predict operator for LabeledVector for all predictors (that would just call the existing Vector pred

Re: SVM Multiclass classification

2016-10-14 Thread Theodore Vasiloudis
Hello Kursat, As noted in the documentation, the SVM implementation is for binary classification only for the time being. Regards, Theodore -- Sent from a mobile device. May contain autocorrect errors. On Oct 13, 2016 8:53 PM, "Kürşat Kurt" wrote: > Hi; > > > > I am trying to classify docume

Re: Flink Iterations vs. While loop

2016-09-06 Thread Theodore Vasiloudis
Have you tried profiling the application to see where most of the time is spent during the runs? If most of the time is spent reading in the data maybe any difference between the two methods is being obscured. -- Sent from a mobile device. May contain autocorrect errors. On Sep 6, 2016 4:55 PM,

Re: Flink Iterations vs. While loop

2016-09-05 Thread Theodore Vasiloudis
Hello Dan, are you broadcasting the 85GB of data then? I don't get why you wouldn't store that file on HDFS so it's accessible by your workers. If you have the full code available somewhere we might be able to help better. For L-BFGS you should only be broadcasting the model (i.e. the weight ve

Re: Having a single copy of an object read in a RichMapFunction

2016-08-08 Thread Theodore Vasiloudis
like it creates multiple copies per co-map operation. I >> use the keyed version to match side inputs with the data. >> >> Sent from my iPhone >> >> On Aug 5, 2016, at 12:36 PM, Theodore Vasiloudis < >> theodoros.vasilou...@gmail.com> wrote: >> >>

Re: Having a single copy of an object read in a RichMapFunction

2016-08-05 Thread Theodore Vasiloudis
uld have used side-inputs. > > Sameer > > > > > On Thu, Aug 4, 2016 at 8:56 PM, Theodore Vasiloudis < > theodoros.vasilou...@gmail.com> wrote: > >> Hello all, >> >> for a prototype we are looking into we would like to read a big matrix >> from HDF

Having a single copy of an object read in a RichMapFunction

2016-08-04 Thread Theodore Vasiloudis
Hello all, for a prototype we are looking into we would like to read a big matrix from HDFS, and for every element that comes in a stream of vectors do on multiplication with the matrix. The matrix should fit in the memory of one machine. We can read in the matrix using a RichMapFunction, but tha

Re: Using ML lib SVM with Java

2016-05-09 Thread Theodore Vasiloudis
Hello Malte, As Simone said there is no Java support currently for FlinkML unfortunately. Regards, Theodore On Mon, May 9, 2016 at 3:05 PM, Simone Robutti wrote: > To my knowledge FlinkML does not support an unified API and most things > must be used exclusively with Scala Datasets. > > 2016-0

Re: FYI: Updated Slides Section

2016-04-05 Thread Theodore Vasiloudis
Hello all, you can find my slides on Large-Scale Machine Learning with FlinkML here (from SICS Data Science day and FOSDEM 2016): http://www.slideshare.net/TheodorosVasiloudis/flinkml-large-scale-machine-learning-with-apache-flink Best, Theodore On Mon, Apr 4, 2016 at 3:19 PM, Rubén Casado wrot

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-21 Thread Theodore Vasiloudis
t; at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59) > at > org.apache.flink.runtime.operators.sort.UnilateralSortMer

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-21 Thread Theodore Vasiloudis
s.java:52) > at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:577) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:68) > ... 9 more > On Wed, Jan 20, 2016 at 9:45 PM, Stephan Ewen wrote: > Can you again po

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
ependencies are used. > > Alternatively, you could compile an example program with example input > data which can reproduce the problem. Then I could also take a look at it. > > Cheers, > Till > ​ > > On Wed, Jan 20, 2016 at 5:58 PM, Theodore Vasiloudis < > theodoros.vas

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
ll request ( > https://github.com/apache/flink/pull/1528) or from this branch ( > https://github.com/StephanEwen/incubator-flink kryo) and see if that > fixes it? > > > Thanks, > Stephan > > > > > > On Wed, Jan 20, 2016 at 3:33 PM, Theodore Vasiloudis < > t

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
tion of readLibSVM is what's wrong here. I've tried the new version commited recently by Chiwan, but I still get the same error. I'll see if I can spot a bug in readLibSVM. On Wed, Jan 20, 2016 at 1:43 PM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > It

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
apReferenceResolver" - there > should be no reference resolution during serialization / deserialization. > > Can you try what happens when you explicitly register the type > SparseVector at the ExecutionEnvironment? > > Stephan > > > On Wed, Jan 20, 2016 at 11:24 AM, The

Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Theodore Vasiloudis
Hello all, I'm trying to run a job using FlinkML and I'm confused about the source of an error. The job reads a libSVM formatted file and trains an SVM classifier on it. I've tried this with small datasets and everything works out fine. When trying to run the same job on a large dataset (~11GB

Re: compile flink-gelly-scala using sbt

2015-10-28 Thread Theodore Vasiloudis
2 Dresden > E-Mail: d...@se.inf.tu-dresden.de > > On Wed, Oct 28, 2015 at 3:50 PM, Theodore Vasiloudis < > theodoros.vasilou...@gmail.com> wrote: > >> Your build.sbt seems correct. >> It might be that you are missing some basic imports. >> >> In your code

Re: compile flink-gelly-scala using sbt

2015-10-28 Thread Theodore Vasiloudis
Your build.sbt seems correct. It might be that you are missing some basic imports. In your code have you imported import org.apache.flink.api.scala._ ? On Tue, Oct 27, 2015 at 8:45 PM, Vasiliki Kalavri wrote: > Hi Do, > > I don't really have experience with sbt, but one thing that might caus

Re: Scala Breeze Dependencies not resolving when adding flink-ml on build.sbt

2015-10-28 Thread Theodore Vasiloudis
This sounds similar to this problem: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-ML-as-Dependency-td1582.html The reason is (quoting Till, replace gradle with sbt here): the flink-ml pom contains as a dependency an artifact with artifactId > breeze_${scala.binary.ver

Re: Scala Code Generation

2015-10-19 Thread Theodore Vasiloudis
> > You could generate your own case classes which have more than the 22 > fields, though. Actually that is not possible with case classes in Scala 2.10, you would have to use a normal class if you have more than 22 fields. This constraint was removed in 2.11. On Wed, Oct 14, 2015 at 11:42 AM, T

Re: Extracting weights from linear regression model

2015-10-08 Thread Theodore Vasiloudis
Hello Trevor, I assume you using the MultipleLinearRegression class in a manner similar to our examples, i.e.: // Create multiple linear regression learnerval mlr = MultipleLinearRegression().setIterations(10).setStepsize(0.5).setConvergenceThreshold(0.001) // Obtain training and testing data set