Re: Resource usage of a spark application

2015-05-21 Thread Ryan Williams
On Thu, May 21, 2015 at 5:22 AM Peter Prettenhofer <
peter.prettenho...@gmail.com> wrote:

> Thanks Akhil, Ryan!
>
> @Akhil: YARN can only tell me how much vcores my app has been granted but
> not actual cpu usage, right? Pulling mem/cpu usage from the OS means i need
> to map JVM executor processes to the context they belong to, right?
>
> @Ryan: what a great blog post -- this is super relevant for me to analyze
> the state of the cluster as a whole. However, it seems to me that those
> metrics are mostly reported globally and not per spark application.
>

Thanks! You can definitely analyze metrics per-application in several ways:

   - If you're running Spark on YARN, use the "app" URL param
   
   to specify a YARN application ID, which will set the Spark application ID
   as well as parse job start/end times.
   - Set the "prefix" URL param
   
   to your Spark app's ID, and all metrics will be namespaced to that app ID.
  - You actually have to do one of these two, otherwise it doesn't know
  what app's metrics to look for; it is set up specifically to view per-app
  metrics.
   - There is a dropdown in the upper-left of the page (sorry, don't have a
   screenshot right now) that will let you select from all app IDs that
   graphite has seen metrics from.

Let me know, here or in issues on the repo, if you have any issues with
that or that doesn't make sense!


>
> 2015-05-19 21:43 GMT+02:00 Ryan Williams :
>
>> Hi Peter, a few months ago I was using MetricsSystem to export to
>> Graphite and then view in Grafana; relevant scripts and some
>> instructions are here
>>  if you want to
>> take a look.
>>
>>
>> On Sun, May 17, 2015 at 8:48 AM Peter Prettenhofer <
>> peter.prettenho...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm looking for a way to measure the current memory / cpu usage of a
>>> spark application to provide users feedback how much resources are actually
>>> being used.
>>> It seems that the metric system provides this information to some
>>> extend. It logs metrics on application level (nr of cores granted) and on
>>> the JVM level (memory usage).
>>> Is this the recommended way to gather this kind of information? If so,
>>> how do i best map a spark application to the corresponding JVM processes?
>>>
>>> If not, should i rather request this information from the resource
>>> manager (e.g. Mesos/YARN)?
>>>
>>> thanks,
>>>  Peter
>>>
>>> --
>>> Peter Prettenhofer
>>>
>>
>
>
> --
> Peter Prettenhofer
>


Re: Resource usage of a spark application

2015-05-21 Thread Akhil Das
Yes Peter that's correct, you need to identify the processes and with that
you can pull the actual usage metrics.

Thanks
Best Regards

On Thu, May 21, 2015 at 2:52 PM, Peter Prettenhofer <
peter.prettenho...@gmail.com> wrote:

> Thanks Akhil, Ryan!
>
> @Akhil: YARN can only tell me how much vcores my app has been granted but
> not actual cpu usage, right? Pulling mem/cpu usage from the OS means i need
> to map JVM executor processes to the context they belong to, right?
>
> @Ryan: what a great blog post -- this is super relevant for me to analyze
> the state of the cluster as a whole. However, it seems to me that those
> metrics are mostly reported globally and not per spark application.
>
> 2015-05-19 21:43 GMT+02:00 Ryan Williams :
>
>> Hi Peter, a few months ago I was using MetricsSystem to export to
>> Graphite and then view in Grafana; relevant scripts and some
>> instructions are here
>>  if you want to
>> take a look.
>>
>>
>> On Sun, May 17, 2015 at 8:48 AM Peter Prettenhofer <
>> peter.prettenho...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm looking for a way to measure the current memory / cpu usage of a
>>> spark application to provide users feedback how much resources are actually
>>> being used.
>>> It seems that the metric system provides this information to some
>>> extend. It logs metrics on application level (nr of cores granted) and on
>>> the JVM level (memory usage).
>>> Is this the recommended way to gather this kind of information? If so,
>>> how do i best map a spark application to the corresponding JVM processes?
>>>
>>> If not, should i rather request this information from the resource
>>> manager (e.g. Mesos/YARN)?
>>>
>>> thanks,
>>>  Peter
>>>
>>> --
>>> Peter Prettenhofer
>>>
>>
>
>
> --
> Peter Prettenhofer
>


Re: Resource usage of a spark application

2015-05-21 Thread Peter Prettenhofer
Thanks Akhil, Ryan!

@Akhil: YARN can only tell me how much vcores my app has been granted but
not actual cpu usage, right? Pulling mem/cpu usage from the OS means i need
to map JVM executor processes to the context they belong to, right?

@Ryan: what a great blog post -- this is super relevant for me to analyze
the state of the cluster as a whole. However, it seems to me that those
metrics are mostly reported globally and not per spark application.

2015-05-19 21:43 GMT+02:00 Ryan Williams :

> Hi Peter, a few months ago I was using MetricsSystem to export to Graphite
> and then view in Grafana; relevant scripts and some instructions are here
>  if you want to
> take a look.
>
>
> On Sun, May 17, 2015 at 8:48 AM Peter Prettenhofer <
> peter.prettenho...@gmail.com> wrote:
>
>> Hi all,
>>
>> I'm looking for a way to measure the current memory / cpu usage of a
>> spark application to provide users feedback how much resources are actually
>> being used.
>> It seems that the metric system provides this information to some extend.
>> It logs metrics on application level (nr of cores granted) and on the JVM
>> level (memory usage).
>> Is this the recommended way to gather this kind of information? If so,
>> how do i best map a spark application to the corresponding JVM processes?
>>
>> If not, should i rather request this information from the resource
>> manager (e.g. Mesos/YARN)?
>>
>> thanks,
>>  Peter
>>
>> --
>> Peter Prettenhofer
>>
>


-- 
Peter Prettenhofer


Re: Resource usage of a spark application

2015-05-19 Thread Ryan Williams
Hi Peter, a few months ago I was using MetricsSystem to export to Graphite
and then view in Grafana; relevant scripts and some instructions are here
 if you want to
take a look.

On Sun, May 17, 2015 at 8:48 AM Peter Prettenhofer <
peter.prettenho...@gmail.com> wrote:

> Hi all,
>
> I'm looking for a way to measure the current memory / cpu usage of a spark
> application to provide users feedback how much resources are actually being
> used.
> It seems that the metric system provides this information to some extend.
> It logs metrics on application level (nr of cores granted) and on the JVM
> level (memory usage).
> Is this the recommended way to gather this kind of information? If so, how
> do i best map a spark application to the corresponding JVM processes?
>
> If not, should i rather request this information from the resource manager
> (e.g. Mesos/YARN)?
>
> thanks,
>  Peter
>
> --
> Peter Prettenhofer
>


Re: Resource usage of a spark application

2015-05-17 Thread Akhil Das
You can either pull the high level information from your resource manager,
or if you want more control/specific information you can write a script and
pull the resource usage information from the OS. Something like this

will help.

Thanks
Best Regards

On Sun, May 17, 2015 at 6:18 PM, Peter Prettenhofer <
peter.prettenho...@gmail.com> wrote:

> Hi all,
>
> I'm looking for a way to measure the current memory / cpu usage of a spark
> application to provide users feedback how much resources are actually being
> used.
> It seems that the metric system provides this information to some extend.
> It logs metrics on application level (nr of cores granted) and on the JVM
> level (memory usage).
> Is this the recommended way to gather this kind of information? If so, how
> do i best map a spark application to the corresponding JVM processes?
>
> If not, should i rather request this information from the resource manager
> (e.g. Mesos/YARN)?
>
> thanks,
>  Peter
>
> --
> Peter Prettenhofer
>


Resource usage of a spark application

2015-05-17 Thread Peter Prettenhofer
Hi all,

I'm looking for a way to measure the current memory / cpu usage of a spark
application to provide users feedback how much resources are actually being
used.
It seems that the metric system provides this information to some extend.
It logs metrics on application level (nr of cores granted) and on the JVM
level (memory usage).
Is this the recommended way to gather this kind of information? If so, how
do i best map a spark application to the corresponding JVM processes?

If not, should i rather request this information from the resource manager
(e.g. Mesos/YARN)?

thanks,
 Peter

-- 
Peter Prettenhofer