Hi Andra,
sure, if you do the logging for each record (or group of records) using a
list accumulator is a very bad idea.
If you don't need exact stats for each vertex but rather a distribution
over all vertices, you can use a histogram accumulator.
If you need exact vertex stats, I'd go with Vasi
Andra,
why don't you simply print to standard output and gather your metrics from
the taskmanagers' log files after execution?
Wouldn't that work for you?
-V.
On 29 June 2015 at 22:36, Andra Lungu wrote:
> Caution! I am getting philosophical. Stop me if I'm talking nonsense!
>
> You are sugges
Caution! I am getting philosophical. Stop me if I'm talking nonsense!
You are suggesting a list that will have one or two entries per vertex =
(approx) billions. Won't this over-saturate my memory? I am already filling
it with lots of junk resulted from the computation...
On Mon, Jun 29, 2015 at
Have you tried to use a custom accumulator that just appends to a list?
2015-06-29 12:59 GMT+02:00 Andra Lungu :
> Hey Fabian,
>
> I am aware of the way open, preSuperstep(), postSuperstep() etc can help me
> within an interation, unfortunately I am writing my own method here. I
> could try to br
Hey Fabian,
I am aware of the way open, preSuperstep(), postSuperstep() etc can help me
within an interation, unfortunately I am writing my own method here. I
could try to briefly describe it:
public static final class PropagateNeighborValues implements
NeighborsFunctionWithVertexValue(...) {
You can measure the time of each iteration in the open() methods operators
within an iteration. open() will be called before each iteration.
The times can be collected by either printing to std out (you need to
collect the files then...) or by implementing a list accumulator. Each time
should inclu
Why don't you use Flink dataset output functions (like writeAsText,
writeAsCsv, etc..)?
Or if they are not sufficient you can implement/override your own
InputFormat.
>From what is my experience static variables are evil in distributed
environments..
Moreover, one of the main strengths of Flink ar
Hey guys,
Me again :) So now that my wonderful job finishes, I would like to monitor
it a bit (i.e. build some charts on the number of messages per vertex,
compute the total amount of time elapsed per computation per vertex, etc).
The main computational-intensive operation is a coGroup. There, wi