Re: How to analyze space usage of Flink algorithms

2016-12-19 Thread Fabian Hueske
Your functions do not need to implement RichFunction (although, each
function can be a RichFunction and it should not be a problem to adapt the
job).
The system metrics are automatically collected. Metrics are exposed via a
Reporter [1].
So you do not need to take care of the collection but rather specify where
the collected metrics should be reported to.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/metrics.html#reporter

2016-12-19 9:59 GMT+01:00 otherwise777 :

> Thank you for your reply,
> I'm afraid i still don't understand it, the part i don't understand is how
> to actually analyze it. It's ok if i can just analyze the system instead of
> the actual job, but how would i actually do that?
> I don't have any function in my program that extends the richfunction
> afaik,
> so how would i call the getRuntimeContext() to print or store it?
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/How-to-analyze-
> space-usage-of-Flink-algorithms-tp10555p10686.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: How to analyze space usage of Flink algorithms

2016-12-19 Thread otherwise777
Thank you for your reply,
I'm afraid i still don't understand it, the part i don't understand is how
to actually analyze it. It's ok if i can just analyze the system instead of
the actual job, but how would i actually do that?
I don't have any function in my program that extends the richfunction afaik,
so how would i call the getRuntimeContext() to print or store it? 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-analyze-space-usage-of-Flink-algorithms-tp10555p10686.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: How to analyze space usage of Flink algorithms

2016-12-16 Thread Fabian Hueske
The system metrics [1] are only available on a system level, i.e. not for
an individual job.
The reason is that multiple job might run concurrently on the same task
manager JVM process. So it would not be possible to separate their heap
usage.
The same would be true for the approach that monitors the task manager tmp
directory.

You would need to correlate your measurements with the time range in which
a job is executed.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/metrics.html#system-metrics

2016-12-16 9:08 GMT+01:00 otherwise777 :

> Hey Fabian,
>
> Thanks for the quick reply,
> I was looking through the flink metrics [1] but i couldn't find anything in
> there how to analyze the environment from start to finish, only for
> functions that extend the richmapfunction
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/apis/metrics.html#list-of-all-variables
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/How-to-analyze-
> space-usage-of-Flink-algorithms-tp10555p10661.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: How to analyze space usage of Flink algorithms

2016-12-16 Thread otherwise777
Hey Fabian,

Thanks for the quick reply, 
I was looking through the flink metrics [1] but i couldn't find anything in
there how to analyze the environment from start to finish, only for
functions that extend the richmapfunction

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/metrics.html#list-of-all-variables



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-analyze-space-usage-of-Flink-algorithms-tp10555p10661.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Greg Hogan
This does sound like a nice feature, both per-job and per-taskmanager bytes
written to and read from disk.

On Fri, Dec 9, 2016 at 8:51 AM, Chesnay Schepler  wrote:

> We do not measure how much data we are spilling to disk.
>
>
> On 09.12.2016 14:43, Fabian Hueske wrote:
>
> Hi,
>
> the heap mem usage should be available via Flink's metrics system.
> Not sure if that also captures spilled data. Chesnay (in CC) should know
> that.
>
> If the spilled data is not available as a metric, you can try to write a
> small script that monitors the directories to which Flink spills (Config
> parameter: taskmanager.tmp.dirs [1]).
> The script would repeatedly list all files and keep for each file the max
> size (files are deleted once the are not used anymore). This is not super
> precise but might be good enough.
>
> Hope this helps,
> Fabian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/setup/config.html#jobmanager-amp-taskmanager
>
> 2016-12-09 14:12 GMT+01:00 otherwise777 :
>
>> Currently i'm doing some analysis for some algorithms that i use in Flink,
>> I'm interested in the Space and time it takes to execute them. For the
>> Time
>> i used getNetRuntime() in the executionenvironment, but I have no idea how
>> to analyse the amount of space an algorithm uses.
>> Space can mean different things here, like Heap space, disk space, overal
>> memory or allocated memory. I would like to analyze some of these.
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-maili
>> ng-list-archive.2336050.n4.nabble.com/How-to-analyze-spac
>> e-usage-of-Flink-algorithms-tp10555.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>
>


Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Chesnay Schepler

We do not measure how much data we are spilling to disk.

On 09.12.2016 14:43, Fabian Hueske wrote:

Hi,

the heap mem usage should be available via Flink's metrics system.
Not sure if that also captures spilled data. Chesnay (in CC) should 
know that.


If the spilled data is not available as a metric, you can try to write 
a small script that monitors the directories to which Flink spills 
(Config parameter: |taskmanager.tmp.dirs| [1]).
The script would repeatedly list all files and keep for each file the 
max size (files are deleted once the are not used anymore). This is 
not super precise but might be good enough.


Hope this helps,
Fabian

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/config.html#jobmanager-amp-taskmanager


2016-12-09 14:12 GMT+01:00 otherwise777 >:


Currently i'm doing some analysis for some algorithms that i use
in Flink,
I'm interested in the Space and time it takes to execute them. For
the Time
i used getNetRuntime() in the executionenvironment, but I have no
idea how
to analyse the amount of space an algorithm uses.
Space can mean different things here, like Heap space, disk space,
overal
memory or allocated memory. I would like to analyze some of these.



--
View this message in context:

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-analyze-space-usage-of-Flink-algorithms-tp10555.html


Sent from the Apache Flink User Mailing List archive. mailing list
archive at Nabble.com.






Re: How to analyze space usage of Flink algorithms

2016-12-09 Thread Fabian Hueske
Hi,

the heap mem usage should be available via Flink's metrics system.
Not sure if that also captures spilled data. Chesnay (in CC) should know
that.

If the spilled data is not available as a metric, you can try to write a
small script that monitors the directories to which Flink spills (Config
parameter: taskmanager.tmp.dirs [1]).
The script would repeatedly list all files and keep for each file the max
size (files are deleted once the are not used anymore). This is not super
precise but might be good enough.

Hope this helps,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/config.html#jobmanager-amp-taskmanager

2016-12-09 14:12 GMT+01:00 otherwise777 :

> Currently i'm doing some analysis for some algorithms that i use in Flink,
> I'm interested in the Space and time it takes to execute them. For the Time
> i used getNetRuntime() in the executionenvironment, but I have no idea how
> to analyse the amount of space an algorithm uses.
> Space can mean different things here, like Heap space, disk space, overal
> memory or allocated memory. I would like to analyze some of these.
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/How-to-analyze-
> space-usage-of-Flink-algorithms-tp10555.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>