Nested Iterations Outlook

2015-07-17 Thread Maximilian Alber
Hi Flinksters, as far as I know, there is still no support for nested iterations planned. Am I right? So my question is how such use cases should be handled in the future. More specific: when pinning/caching will be available, you suggest to use that feature and program in Spark style? Or is

Map and Reduce cycle

2015-07-17 Thread Bill Sparks
Does flink require all the map tasks to finish before the reducers can proceed like Spark, or can the reducer operations start before all the mappers have finished like the older Hadoop mapreduce. Also my understanding is that flink manages it's own heap, do you/we have a sense of the

Re: Scala: registerAggregationConvergenceCriterion

2015-07-17 Thread Till Rohrmann
Hi Max, I’d recommend you to use the DataSet[T].iterateWithTermination method instead. It has the following syntax: iterationWithTermination(maxIterations: Int)(stepFunction: (DataSet[T] = (DataSet[T], DataSet[_])): DataSet[T] There you see that your step function has to return a tuple of data

Re: Scala: registerAggregationConvergenceCriterion

2015-07-17 Thread Maximilian Alber
Thanks Till! That should work for me. Cheers, Max On Fri, Jul 17, 2015 at 4:13 PM, Till Rohrmann trohrm...@apache.org wrote: Hi Max, I’d recommend you to use the DataSet[T].iterateWithTermination method instead. It has the following syntax: iterationWithTermination(maxIterations:

Scala: registerAggregationConvergenceCriterion

2015-07-17 Thread Maximilian Alber
Hi Flinksters, I try to use BulkIterations with a convergence criterion. Unfortunately, I'm not sure how to use them and I couldn't find a nice example. Here are two code snippets and the resulting error, maybe someone can help. I'm working on the current branch. Example1: if(true){ val

Re: One improvement suggestion: “flink-xx-jobmanager-linux-3lsu.log file can't auto be recovered/detected after mistaking delete

2015-07-17 Thread Maximilian Michels
Hi Chenliang, I've posted a comment in the associated JIRA issue: https://issues.apache.org/jira/browse/FLINK-2367 Thanks, Max On Fri, Jul 17, 2015 at 8:27 AM, Chenliang (Liang, DataSight) chenliang...@huawei.com wrote: *One improvement suggestion, please check if it is valid?* For

Re: Submitting jobs from within Scala code

2015-07-17 Thread Stephan Ewen
Seems that version mismatches are one of the most common sources of issues... Maybe we should think about putting a version number into the messages (at least between client and JobManager) and fail fast on version mismatches... On Thu, Jul 16, 2015 at 5:56 PM, Till Rohrmann trohrm...@apache.org

Streaming window : count with timeout ?

2015-07-17 Thread LINZ, Arnaud
Hello, The data in my stream have a timestamp that may be slightly out of order, but I need to process the data in the proper order. To do this, I use a windowing function and sort the items in a flatMap. However, the source may sometimes send data in “bulk batches” and sometimes “on the

HDFS directory rename

2015-07-17 Thread Flavio Pompermaier
Hi to all, in my Flink job I wanted to move a folder (containing other folders and files) to another location. For example, I wanted to move folder A to folder Y, where my HDFS looks like: myRootDir/X/a/aa/aaa/someFile1 myRootDir/X/b/bb/bbb/someFile2 myRootDir/Y I tried to use rename but it

Re: HDFS directory rename

2015-07-17 Thread fhueske
Do you want to move the folder within a running job? This might cause a lot of problems, because you cannot (easily) control when a move command would be executed. Wouldn’t it be a better idea to do that after a job is finished and use the regular HDFS client? From: Flavio Pompermaier

Re: Flink Kafka example in Scala

2015-07-17 Thread Till Rohrmann
These two links [1, 2] might help to get your job running. The first link describes how to set up a job using Flink's machine learning library, but it works also for the flink-connector-kafka library. Cheers, Till [1] http://stackoverflow.com/a/31455068/4815083 [2]

Re: Flink Scala performance

2015-07-17 Thread Stephan Ewen
The 349ms is how long it takes to run the job. The 18s is what it takes the command line client to submit the job. Like I said before, may be there are super long delays on your system when you spawn JVMs, or in your DNS resolution. Thay way, connecting to the cluster to submit the job will take

Re: No accumulator results in streaming

2015-07-17 Thread Stephan Ewen
Hi Arnaud! In 0.9.0, the streaming API does not forward accumulators. In the next version, it will, and it will actually update them live so that you can retrieve continuously updating accumulators of a streaming job in your client, and in the web frontend. We merged the first part for that on

Re: HDFS directory rename

2015-07-17 Thread Flavio Pompermaier
Of course I move the folder before the job starts or ends :) My job does some transformation on the row data and put the results in another folder. The next time the job is executed checks whether the output folder exists and, if so, it moves such folder to an archive dir. I wanted to use the

Re: Submitting jobs from within Scala code

2015-07-17 Thread Till Rohrmann
This should be rather easy to add with the latest addition of the ActorGateway and the message decoration. ​ On Fri, Jul 17, 2015 at 5:04 PM, Stephan Ewen se...@apache.org wrote: Seems that version mismatches are one of the most common sources of issues... Maybe we should think about putting

Re: OutOfMemoryException: unable to create native thread

2015-07-17 Thread Stephan Ewen
Right now, I would go with the extra field. The roadmap has pending features that improve the scheduling for plans like yours (with many data sources), but it is not yet in the code. On Fri, Jul 17, 2015 at 11:24 AM, chan fentes chanfen...@gmail.com wrote: I am testing my regex file input

Re: Flink Kafka runtime error

2015-07-17 Thread Stephan Ewen
Hi Wendong! The streaming connectors are not in Flink's system classpath, because they depend on many libraries (zookeeper, asm, protocol buffers), and we want to keep the default dependencies slim. This reduces version conflicts for people where the user code depends on these libraries. As a

Flink deadLetters

2015-07-17 Thread Flavio Pompermaier
Hi to all, my job seems to be stucked and there's nothing logged also in debug mode. The only strange thing is a Received message SendHeartbeat at akka://flink/user/taskmanager_1 from Actor[akka://flink/deadLetters]. Could it be a symptom of a problem? Best, Flavio

Re: Containment Join Support

2015-07-17 Thread Martin Junghanns
Hi Fabian, hi Stephen, thanks for answering my question. Good hint with the list replication, I will benchmark this vs. cross + filter. Best, Martin Am 17.07.2015 um 11:17 schrieb Stephan Ewen: I would rewrite this to replicate the list into tuples: foreach x in list: emit (x, list) Then

Re: OutOfMemoryException: unable to create native thread

2015-07-17 Thread chan fentes
I am testing my regex file input format, but because I have a workflow that depends on the filename (each filename contains a number that I need), I need to add another field to each of my tuples. What is the best way to avoid this additional field, which I only need for grouping and one

Re: Flink deadLetters

2015-07-17 Thread Till Rohrmann
That is usually nothing to worry about. This just means that the message was sent without specifying a sender. What Akka then does is to use the `/deadLetters` actor as the sender. What kind of job is it? Cheers, Till On Fri, Jul 17, 2015 at 6:30 PM, Flavio Pompermaier pomperma...@okkam.it

Re: Flink Kafka example in Scala

2015-07-17 Thread Wendong
Hi Aljoscha, Yes, the flink-connector-kafka jar file is under Flink lib directory: flink-0.9.0/lib/flink-connector-kafka-0.9.0.jar and it shows KafkaSink class exists: $ jar tf lib/flink-connector-kafka-0.9.0.jar | grep KafkaSink

Re: Flink Kafka example in Scala

2015-07-17 Thread Wendong
Hi Till, Thanks for the information. I'm using sbt and I have the following line in build.sbt: libraryDependencies += org.apache.flink % flink-connector-kafka % 0.9.0 exclude(org.apache.kafka, kafka_${scala.binary.version}) Also, I copied flink-connector-kafka-0.9.0.jar under