Hi Flinksters,
as far as I know, there is still no support for nested iterations planned.
Am I right?
So my question is how such use cases should be handled in the future.
More specific: when pinning/caching will be available, you suggest to use
that feature and program in Spark style? Or is
Does flink require all the map tasks to finish before the reducers can proceed
like Spark, or can the reducer operations start before all the mappers have
finished like the older Hadoop mapreduce.
Also my understanding is that flink manages it's own heap, do you/we have a
sense of the
Hi Max,
I’d recommend you to use the DataSet[T].iterateWithTermination method
instead. It has the following syntax: iterationWithTermination(maxIterations:
Int)(stepFunction: (DataSet[T] = (DataSet[T], DataSet[_])): DataSet[T]
There you see that your step function has to return a tuple of data
Thanks Till!
That should work for me.
Cheers,
Max
On Fri, Jul 17, 2015 at 4:13 PM, Till Rohrmann trohrm...@apache.org wrote:
Hi Max,
I’d recommend you to use the DataSet[T].iterateWithTermination method
instead. It has the following syntax: iterationWithTermination(maxIterations:
Hi Flinksters,
I try to use BulkIterations with a convergence criterion. Unfortunately,
I'm not sure how to use them and I couldn't find a nice example.
Here are two code snippets and the resulting error, maybe someone can help.
I'm working on the current branch.
Example1:
if(true){
val
Hi Chenliang,
I've posted a comment in the associated JIRA issue:
https://issues.apache.org/jira/browse/FLINK-2367
Thanks,
Max
On Fri, Jul 17, 2015 at 8:27 AM, Chenliang (Liang, DataSight)
chenliang...@huawei.com wrote:
*One improvement suggestion, please check if it is valid?*
For
Seems that version mismatches are one of the most common sources of
issues...
Maybe we should think about putting a version number into the messages (at
least between client and JobManager) and fail fast on version mismatches...
On Thu, Jul 16, 2015 at 5:56 PM, Till Rohrmann trohrm...@apache.org
Hello,
The data in my stream have a timestamp that may be slightly out of order, but I
need to process the data in the proper order. To do this, I use a windowing
function and sort the items in a flatMap.
However, the source may sometimes send data in “bulk batches” and sometimes “on
the
Hi to all,
in my Flink job I wanted to move a folder (containing other folders and
files) to another location.
For example, I wanted to move folder A to folder Y, where my HDFS looks
like:
myRootDir/X/a/aa/aaa/someFile1
myRootDir/X/b/bb/bbb/someFile2
myRootDir/Y
I tried to use rename but it
Do you want to move the folder within a running job? This might cause a lot of
problems, because you cannot (easily) control when a move command would be
executed.
Wouldn’t it be a better idea to do that after a job is finished and use the
regular HDFS client?
From: Flavio Pompermaier
These two links [1, 2] might help to get your job running. The first link
describes how to set up a job using Flink's machine learning library, but
it works also for the flink-connector-kafka library.
Cheers,
Till
[1] http://stackoverflow.com/a/31455068/4815083
[2]
The 349ms is how long it takes to run the job. The 18s is what it takes the
command line client to submit the job.
Like I said before, may be there are super long delays on your system when
you spawn JVMs, or in your DNS resolution. Thay way, connecting to the
cluster to submit the job will take
Hi Arnaud!
In 0.9.0, the streaming API does not forward accumulators.
In the next version, it will, and it will actually update them live so that
you can retrieve continuously updating accumulators of a streaming job in
your client, and in the web frontend.
We merged the first part for that on
Of course I move the folder before the job starts or ends :)
My job does some transformation on the row data and put the results in
another folder.
The next time the job is executed checks whether the output folder exists
and, if so, it moves such folder to an archive dir.
I wanted to use the
This should be rather easy to add with the latest addition of the
ActorGateway and the message decoration.
On Fri, Jul 17, 2015 at 5:04 PM, Stephan Ewen se...@apache.org wrote:
Seems that version mismatches are one of the most common sources of
issues...
Maybe we should think about putting
Right now, I would go with the extra field.
The roadmap has pending features that improve the scheduling for plans like
yours (with many data sources), but it is not yet in the code.
On Fri, Jul 17, 2015 at 11:24 AM, chan fentes chanfen...@gmail.com wrote:
I am testing my regex file input
Hi Wendong!
The streaming connectors are not in Flink's system classpath, because
they depend on many libraries (zookeeper, asm, protocol buffers), and we
want to keep the default dependencies slim. This reduces version conflicts
for people where the user code depends on these libraries.
As a
Hi to all,
my job seems to be stucked and there's nothing logged also in debug mode.
The only strange thing is a
Received message SendHeartbeat at akka://flink/user/taskmanager_1 from
Actor[akka://flink/deadLetters].
Could it be a symptom of a problem?
Best,
Flavio
Hi Fabian, hi Stephen,
thanks for answering my question. Good hint with the list replication, I
will benchmark this vs. cross + filter.
Best, Martin
Am 17.07.2015 um 11:17 schrieb Stephan Ewen:
I would rewrite this to replicate the list into tuples:
foreach x in list: emit (x, list)
Then
I am testing my regex file input format, but because I have a workflow that
depends on the filename (each filename contains a number that I need), I
need to add another field to each of my tuples. What is the best way to
avoid this additional field, which I only need for grouping and one
That is usually nothing to worry about. This just means that the message
was sent without specifying a sender. What Akka then does is to use the
`/deadLetters` actor as the sender.
What kind of job is it?
Cheers,
Till
On Fri, Jul 17, 2015 at 6:30 PM, Flavio Pompermaier pomperma...@okkam.it
Hi Aljoscha,
Yes, the flink-connector-kafka jar file is under Flink lib directory:
flink-0.9.0/lib/flink-connector-kafka-0.9.0.jar
and it shows KafkaSink class exists:
$ jar tf lib/flink-connector-kafka-0.9.0.jar | grep KafkaSink
Hi Till,
Thanks for the information. I'm using sbt and I have the following line in
build.sbt:
libraryDependencies += org.apache.flink % flink-connector-kafka %
0.9.0 exclude(org.apache.kafka, kafka_${scala.binary.version})
Also, I copied flink-connector-kafka-0.9.0.jar under
23 matches
Mail list logo