Re: How to run wordcount program on Multi-node Cluster.

2017-08-08 Thread Felix Neutatz
Hi, like Timo said e.g. you need a distributed filesystem like HDFS. Best regards, Felix On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" wrote: Hi Timo, How to make access the files across TM? Thanks & Regards, Ramanji. On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther

[jira] [Created] (FLINK-7029) Documentation for WindowFunction is confusing

2017-06-28 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-7029: Summary: Documentation for WindowFunction is confusing Key: FLINK-7029 URL: https://issues.apache.org/jira/browse/FLINK-7029 Project: Flink Issue Type

Re: Wish to Contribute - Andrea Spina

2017-02-17 Thread Felix Neutatz
Great to have you Andrea :) On Feb 17, 2017 15:21, "Aljoscha Krettek" wrote: > Welcome to the community, Andrea! :-) > > On Fri, 17 Feb 2017 at 10:22 Fabian Hueske wrote: > > > Hi Andrea, > > > > welcome to the community! > > I gave you Contributor

Re: New Flink team member - Kate Eri.

2017-02-13 Thread Felix Neutatz
ports > > between > > > engines / native solvers (i.e. CPU/GPU). > > > > > > https://github.com/apache/mahout/tree/master/viennacl-omp > > > https://github.com/apache/mahout/tree/master/viennacl > > > > > > Best, > > > tg >

Re: New Flink team member - Kate Eri.

2017-02-08 Thread Felix Neutatz
SystemML provides GPU support. I have seen > SystemML’s source code and would like to ask: why you have decided to > implement your own integration with cuda? Did you try to consider ND4J, or > because it is younger, you support your own implementation? > > вт, 7 февр. 2017 г. в 18:35,

Re: New Flink team member - Kate Eri.

2017-02-07 Thread Felix Neutatz
Hi Katherin, we are also working in a similar direction. We implemented a prototype to integrate with SystemML: https://github.com/apache/incubator-systemml/pull/119 SystemML provides many different matrix formats, operations, GPU support and a couple of DL algorithms. Unfortunately, we realized

Re: [DISCUSS] FLIP-5 Only send data to each taskmanager once for broadcasts

2016-12-02 Thread Felix Neutatz
GMT+02:00 Till Rohrmann <trohrm...@apache.org>: > >> Cool first version Felix :-) >> >> On Wed, Aug 10, 2016 at 6:48 PM, Stephan Ewen <se...@apache.org> wrote: >> >> > Cool, nice results! >> > >> > For the iteration unspecializatio

Re: [DISCUSS] FLIP-5 Only send data to each taskmanager once for broadcasts

2016-11-10 Thread Felix Neutatz
t; For the iteration unspecialization - we probably should design this hand > in > > hand with the streaming fault tolerance, as they share the notion of > > "intermediate result versions". > > > > > > On Wed, Aug 10, 2016 at 6:09 PM, Felix Neutatz <neut...@go

Re: [DISCUSS] FLIP-5 Only send data to each taskmanager once for broadcasts

2016-08-10 Thread Felix Neutatz
ound the problem with blocking intermediate results > in > > iterations, then we have to make FLINK-1713 a blocker for this new issue. > > But maybe you can also keep the current broadcasting mechanism to be used > > within iterations only. Then we can address the ite

[DISCUSS] FLIP-5 Only send data to each taskmanager once for broadcasts

2016-07-22 Thread Felix Neutatz
Hi everybody, I want to improve the performance of broadcasts in Flink. Therefore Till told me to start a FLIP on this topic to discuss how to go forward to solve the current issues for broadcasts. The problem in a nutshell: Instead of sending data to each taskmanager only once, at the moment

Re: Broadcast data sent increases with # slots per TM

2016-07-07 Thread Felix Neutatz
weekend, Felix P.S. for super curious people: https://github.com/FelixNeutatz/incubator-flink/commit/7d79d4dfe3f18208a73d6b692b3909f9c69a1da7 2016-06-09 11:50 GMT+02:00 Felix Neutatz <neut...@googlemail.com>: > Hi everybody, > > could we use the org.apache.flink.api.common.cache.D

[jira] [Created] (FLINK-4175) Broadcast data sent increases with # slots per TM

2016-07-07 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-4175: Summary: Broadcast data sent increases with # slots per TM Key: FLINK-4175 URL: https://issues.apache.org/jira/browse/FLINK-4175 Project: Flink Issue Type

Re: Broadcast data sent increases with # slots per TM

2016-06-09 Thread Felix Neutatz
Hi everybody, could we use the org.apache.flink.api.common.cache.DistributedCache to work around this Broadcast issue for the moment, until we fixed it? Or do you think it won't scale either? Best regards, Felix 2016-06-09 10:57 GMT+02:00 Stephan Ewen : > Till is right.

Re: A soft reminder

2015-07-27 Thread Felix Neutatz
Hi, I also encountered the EOF exception for a delta iteration with more data. With less data it works ... Best regards, Felix Am 27.07.2015 10:25 vorm. schrieb Andra Lungu lungu.an...@gmail.com: Hi Stephan, I tried to debug a bit around the EOF Exception. It seems that I am pretty useless

serialization issue

2015-07-09 Thread Felix Neutatz
Hi, I want to use t-digest by Ted Dunning ( https://github.com/tdunning/t-digest/blob/master/src/main/java/com/tdunning/math/stats/ArrayDigest.java) on Flink. Locally that works perfectly. But on the cluster I get the following error: java.lang.Exception: Call to registerInputOutput() of

Re: Building several models in parallel

2015-07-08 Thread Felix Neutatz
models on partitions of my data with mapPartition and the model-parameters (weights) as broadcast variable. If I understood broadcast variables in Flink correctly, you should end up with one model on each TaskManager. Does that work? Felix Am 07.07.2015 um 17:32 schrieb Felix Neutatz: Hi

Re: Read 727 gz files ()

2015-07-07 Thread Felix Neutatz
, 2015 at 3:50 PM, Felix Neutatz neut...@googlemail.com wrote: So do you know how to solve this issue apart from increasing the current file-max (4748198)? 2015-07-06 15:35 GMT+02:00 Stephan Ewen se...@apache.org: I think the error is pretty much exactly in the stack trace: Caused

Building several models in parallel

2015-07-07 Thread Felix Neutatz
) Is there anyway besides an iteration how to do this at the moment? Thanks for your help, Felix Neutatz

Read 727 gz files ()

2015-07-06 Thread Felix Neutatz
Hi, I want to do some simple aggregations on 727 gz files (68 GB total) from HDFS. See code here: https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/Stats.scala We are using a Flink-0.9 SNAPSHOT. I get the following error:

Re: Read 727 gz files ()

2015-07-06 Thread Felix Neutatz
-964b-4883-8eee-12869b9476ab/ 995a38a2c92536383d0057e3482999a9.000329.channel (Too many open files in system) On Mon, Jul 6, 2015 at 3:31 PM, Felix Neutatz neut...@googlemail.com wrote: Hi, I want to do some simple aggregations on 727 gz files (68 GB total) from HDFS. See code here

Hadoopinputformat for numpy und .mat matrices

2015-07-01 Thread Felix Neutatz
Hi everybody, does anybody know whether there is an implementation of the hadoopinputformats for matrices in numpy or .mat format? This would be really needed since a lot of machine learning data is stored in these formats. Thanks for your help, Felix

Documentation not reachable

2015-06-29 Thread Felix Neutatz
Hi, when I want to access the documentation on the website I get the following error: http://ci.apache.org/projects/flink/flink-docs-release-0.9 Service Unavailable The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again

Re: Apache Flink 0.9 ALS API

2015-06-13 Thread Felix Neutatz
Hi Ronny, I agree with you and I would go even further and generalize it overall. So that the movieID could be of type Long or Int and the userID of type String. This would increase usability of the ALS implementation :) Best regards, Felix 2015-06-10 11:28 GMT+02:00 Ronny Bräunlich

[jira] [Created] (FLINK-2208) Built error for IBM Java

2015-06-11 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-2208: Summary: Built error for IBM Java Key: FLINK-2208 URL: https://issues.apache.org/jira/browse/FLINK-2208 Project: Flink Issue Type: Bug Components

Re: built problem - flink 0.9-SNAPSHOT

2015-06-11 Thread Felix Neutatz
done: https://issues.apache.org/jira/browse/FLINK-2208 2015-06-12 0:50 GMT+02:00 Ufuk Celebi u...@apache.org: On 12 Jun 2015, at 00:42, Felix Neutatz neut...@googlemail.com wrote: Yes, it is on a IBM PowerPC machine. So we change that in the documentation to all Java 7,8 ( except IBM

built problem - flink 0.9-SNAPSHOT

2015-06-11 Thread Felix Neutatz
Hi, the documentation says: It [the built of the 0.9 snapshot] works well with OpenJDK 6 and all Java 7 and 8 compilers. But I got the following error: [INFO] --- scala-maven-plugin:3.1.4:compile (scala-compile-first) @ flink-runtime --- [INFO]

Run scala.App on Cluster

2015-06-10 Thread Felix Neutatz
Hi, I try to run this Scala program: https://github.com/FelixNeutatz/wikiTrends/blob/master/extraction/src/main/scala/io/sanfran/wikiTrends/extraction/flink/DownloadTopKPages.scala on a cluster. I tried this command: /share/flink/flink-0.9-SNAPSHOT/bin/flink run

Re: Run scala.App on Cluster

2015-06-10 Thread Felix Neutatz
Thanks :) works like a charm. 2015-06-10 22:28 GMT+02:00 Fabian Hueske fhue...@gmail.com: Hi, use ./bin/flink run -c your.MainClass yourJar to specify the Main class. Check the documentation of the CLI client for details. Cheers, Fabian On Jun 10, 2015 22:24, Felix Neutatz neut

Re: Problem with ML pipeline

2015-06-08 Thread Felix Neutatz
operation. Cheers! Sachin On Jun 8, 2015 10:17 AM, Felix Neutatz neut...@googlemail.com wrote: Probably we also need it for the other classes of the pipeline as well, in order to be able to pass the ID through the whole pipeline. Best regards, Felix

Re: Problem with ML pipeline

2015-06-06 Thread Felix Neutatz
. Cheers, Till On Jun 4, 2015 7:30 PM, Felix Neutatz neut...@googlemail.com wrote: Hi, I have the following use case: I want to to regression for a timeseries dataset like: id, x1, x2, ..., xn, y id = point in time x = features y = target value In the Flink frame work I

Re: ALS implementation

2015-06-05 Thread Felix Neutatz
to your function should help. see: http://mail-archives.apache.org/mod_mbox/flink-user/201504.mbox/%3ccanc1h_vffbqyyiktzcdpihn09r4he4oluiursjnci_rwc+c...@mail.gmail.com%3E Cheers, Andra On Thu, Jun 4, 2015 at 7:07 PM, Felix Neutatz neut...@googlemail.com wrote: after bug fix

Problem with ML pipeline

2015-06-04 Thread Felix Neutatz
Hi, I have the following use case: I want to to regression for a timeseries dataset like: id, x1, x2, ..., xn, y id = point in time x = features y = target value In the Flink frame work I would map this to a LabeledVector (y, DenseVector(x)). (I don't want to use the id as a feature) When I

ALS implementation

2015-06-04 Thread Felix Neutatz
Hi, I played a bit with the ALS recommender algorithm. I used the movielens dataset: http://files.grouplens.org/datasets/movielens/ml-latest-README.html The rating matrix has 21.063.128 entries (ratings). I run the algorithm with 3 configurations: 1. standard jvm heap space: val als = ALS()

Re: MultipleLinearRegression - Strange results

2015-06-02 Thread Felix Neutatz
-master/libs/ml/multiple_linear_regression.html On Mon, Jun 1, 2015 at 11:48 AM, Felix Neutatz neut...@googlemail.com wrote: Hi, I want to use MultipleLinearRegression, but I got really strange results. So I tested it with the housing price dataset

MultipleLinearRegression - Strange results

2015-06-01 Thread Felix Neutatz
Hi, I want to use MultipleLinearRegression, but I got really strange results. So I tested it with the housing price dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data And here I get negative house prices - even when I use the training set as dataset:

Re: Parquet Article / Tutorial

2015-05-11 Thread Felix Neutatz
to keep defaults? On Fri, Apr 24, 2015 at 11:19 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Thanks Felix, Thanks fir the response! I'm looking forward to use it! On Apr 24, 2015 9:01 PM, Felix Neutatz neut...@googlemail.com wrote: Hi Flavio, in Thrift you can try

Re: New project website

2015-05-11 Thread Felix Neutatz
Hi Ufuk, I really like the idea of redesigning the start page. But in my opinion your page design looks more like a documentation webpage than a starting page. In my personal opinion I like the current design better, since you get a really quick overview with many fancy pictures. (So if you

[jira] [Created] (FLINK-1939) Add Parquet Documentation to Wiki

2015-04-26 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1939: Summary: Add Parquet Documentation to Wiki Key: FLINK-1939 URL: https://issues.apache.org/jira/browse/FLINK-1939 Project: Flink Issue Type: Task

Re: Parquet Article / Tutorial

2015-04-24 Thread Felix Neutatz
be great! 2015-04-07 12:45 GMT+02:00 Kostas Tzoumas ktzou...@apache.org javascript:;: Looks very nice! Would love to see a blog post on that! On Mon, Apr 6, 2015 at 7:19 PM, Felix Neutatz neut...@googlemail.com javascript:; wrote

TableAPI - Join on two keys

2015-04-16 Thread Felix Neutatz
Hi, I want to join two tables in the following way: case class WeightedEdge(src: Int, target: Int, weight: Double) case class Community(communityID: Int, nodeID: Int) case class CommunitySumTotal(communityID: Int, sumTotal: Double) val communities: DataSet[Community] val weightedEdges:

[jira] [Created] (FLINK-1899) Table API Bug

2015-04-16 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1899: Summary: Table API Bug Key: FLINK-1899 URL: https://issues.apache.org/jira/browse/FLINK-1899 Project: Flink Issue Type: Bug Components: Expression

Fwd: Collect of user defined classes doesn't work

2015-04-14 Thread Felix Neutatz
Hi, I want to run the following example: import org.apache.flink.api.scala._ case class EdgeType(src: Int, target: Int) object Test { def main(args: Array[String]) { implicit val env = ExecutionEnvironment.getExecutionEnvironment val graphEdges = readEdges(edges.csv)

Re: Collect of user defined classes doesn't work

2015-04-14 Thread Felix Neutatz
in the DataSet.scala file? Does it contain parentheses or not? On Tue, Apr 14, 2015 at 3:48 PM, Felix Neutatz neut...@googlemail.com wrote: I use the latest maven snapshot: dependency groupIdorg.apache.flink/groupId artifactIdflink-scala/artifactId version0.9-SNAPSHOT/version

Re: Collect of user defined classes doesn't work

2015-04-14 Thread Felix Neutatz
GMT+02:00 Robert Metzger rmetz...@apache.org: Hi, which version of Flink are you using? On Tue, Apr 14, 2015 at 3:36 PM, Felix Neutatz neut...@googlemail.com wrote: Hi, I want to run the following example: import org.apache.flink.api.scala._ case class EdgeType(src: Int

Should collect() and count() be treated as data sinks?

2015-04-02 Thread Felix Neutatz
Hi, I have run the following program: final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); List l = Arrays.asList(new Tuple1Long(1L)); TypeInformation t = TypeInfoParser.parse(Tuple1Long); DataSetTuple1Long data = env.fromCollection(l, t); long value = data.count();

Re: Planning Release 0.8.1

2015-02-10 Thread Felix Neutatz
would like to merge this to the release-0.8 branch as well: https://github.com/apache/flink/pull/376 After that, all important changes are in the branch. I'm going to create a release candidate once #376 is merged. On Mon, Feb 9, 2015 at 3:03 PM, Felix Neutatz neut...@googlemail.com wrote

Very strange behaviour of groupBy() - sort() - first()

2015-01-21 Thread Felix Neutatz
Hi, my use case is the following: I have a Tuple2String,Long. I want to group by the String and sum up the Long values accordingly. This works fine with these lines: DataSetLineitem lineitems = getLineitemDataSet(env); lineitems.project(new int []{3,0}).groupBy(0).aggregate(Aggregations.SUM,

[jira] [Created] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction

2015-01-21 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1428: Summary: Typos in Java code example for RichGroupReduceFunction Key: FLINK-1428 URL: https://issues.apache.org/jira/browse/FLINK-1428 Project: Flink Issue