Unsubscribe

2023-12-12 Thread Daniel Maangi



Unsubscribe

2023-12-12 Thread Klaus Schaefers
-- 
“Overfitting” is not about an excessive amount of physical exercise...


Unsubscribe

2023-12-12 Thread Sergey Boytsov
Unsubscribe

--


Re: Cluster-mode job compute-time/cost metrics

2023-12-12 Thread murat migdisoglu
Hey Jack,

Emr serverless is a great fit for this. You can get these metrics for each
job when they are completed. Besides that, if you create separate "emr
applications" per group and tag them appropriately, you can use the cost
explorer to see the amount of resources being used.
If emr serverless is not an option, I would probably work with Spark Event
Listeners and collect the metrics during runtime and publish them to a
monitoring tool such as grafana/datadog.


On Tue, Dec 12, 2023 at 9:31 AM Jörn Franke  wrote:

> It could be simpler and faster to use tagging of resources for billing:
>
>
> https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-tags-billing.html
>
> That could also include other resources (eg s3).
>
> Am 12.12.2023 um 04:47 schrieb Jack Wells :
>
> 
> Hello Spark experts - I’m running Spark jobs in cluster mode using a
> dedicated cluster for each job. Is there a way to see how much compute time
> each job takes via Spark APIs, metrics, etc.? In case it makes a
> difference, I’m using AWS EMR - I’d ultimately like to be able to say this
> job costs $X since it took Y minutes on Z instance types (assuming all of
> the nodes are the same instance type), but I figure I could probably need
> to get the Z instance type through EMR APIs.
>
> Thanks!
> Jack
>
>

-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare


Re: Cluster-mode job compute-time/cost metrics

2023-12-12 Thread Jörn Franke
It could be simpler and faster to use tagging of resources for billing:

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-tags-billing.html

That could also include other resources (eg s3).

> Am 12.12.2023 um 04:47 schrieb Jack Wells :
> 
> 
> Hello Spark experts - I’m running Spark jobs in cluster mode using a 
> dedicated cluster for each job. Is there a way to see how much compute time 
> each job takes via Spark APIs, metrics, etc.? In case it makes a difference, 
> I’m using AWS EMR - I’d ultimately like to be able to say this job costs $X 
> since it took Y minutes on Z instance types (assuming all of the nodes are 
> the same instance type), but I figure I could probably need to get the Z 
> instance type through EMR APIs.
> 
> Thanks!
> Jack
>