Hi Miguel,
I just had another idea that you could try.
The problem seems to be that the plan space that the optimizer enumerates
exponentially grows as you add more iterations.
This happens when there are multiple valid execution strategies for a given
operator (mostly applied to Joins).
You coul
Hello Fabian,
It really looks like an issue requiring attention.
Since I am launching the client application via Maven, I opted to change
the maximum memory setting with
export MAVEN_OPTS="-Xms256m -Xmx4m".
To give an example, for three (3) iterations it worked fine with around 4
GB of memory
Hmm, this does not look too good.
As I expected, the program gets stuck in the optimizer. Plan optimization
can be quite expensive for large plans.
There might be a way to improve the optimization of large plans by cutting
the plan space but I would not expect this to be fixed in the near future.
Hello Fabian,
After increasing the message size akka parameter, the client resulted in
the following exception after some time.
This confirms that the JobManager never received the job request:
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.i
Hi Miguel,
if the message size would be the problem, the client should fail with an
exception.
What might happen, is that the client gets stuck while optimizing the
program.
You could take a stacktrace of the client process to identify at which part
the client gets stuck.
Best, Fabian
2017-12-0
Hello Fabian,
Thanks for the help.
I am interested in the duration of specific operators, so the fact that
parts of the execution are in pipeline is not a problem for me.
>From my understanding, the automated way to approach this is to run the
Flink job with the web interface active and then make
The monitoring REST interface provides detailed stats about a job, its
tasks, and processing verticies including their start and end time [1].
However, it is not trivial to make sense of the execution times because
Flink uses pipelined shuffles by default.
That means that the execution of multiple
Hello,
You're right, I was overlooking that.
With your suggestion, I now just define a different sink in each iteration
of the loop.
Then they all output to disk when executing a single bigger plan.
I have one more question: I know I can retrieve the total time this single
job takes to execute, b
Hi,
by calling result.count(), you compute the complete plan from the beginning
and not just the operations you added since the last execution.
Looking at the output you posted, each step takes about 15 seconds (with
about 5 secs of initialization).
So the 20 seconds of the first step include init
Hello Fabian,
Thank you for the reply.
I was hoping the situation had in fact changed.
As far as I know, I am not calling execute() directly even once - it is
being called implicitly by simple DataSink elements added to the plan
through count():
System.out.println(String.format("%d-th graph algo
Hi Miguel,
I'm sorry but AFAIK, the situation has not changed.
Is it possible that you are calling execute() multiple times?
In that case, the 1-st and 2-nd graph would be recomputed before the 3-rd
graph is computed.
That would explain the increasing execution time of 15 seconds.
Best, Fabian
Hello,
I'm facing a problem in an algorithm where I would like to constantly
update a DataSet representing a graph, perform some computation, output one
or more DataSink (such as a file on the local system) and then reuse
the DataSet
for a next iteration.
I want to avoid spilling the results to d
12 matches
Mail list logo