On 10 Mar 2016, at 18:21, Frank Luo wrote:
Thanks David/Jeff.
To avoid further confusions, let me make sure I am clear on what I am
trying to do: I would like to know how many hours in a day my cluster
is running at its full capacity, and when that happens, how long is my
waiting queue. I founded similar information on Ambari as below, but
I’d like to dive deeper, hence asking.
From what I see, container per job information, especially pending
containers, is only available from an application’s trackingUrl, but
that just applies to M/R jobs. I am not able to get the same
information from a Tez applications’ trackingUrl (Tez’s url
doesn’t do anything for hdp2.2). So how does Ambari find the
information out?
Using the REST API you'd query the resource manager's "apps" method,
then the appmasters through the RM proxy with the "jobs" method
(sequentially, using the app ids found at step 1 in turn). Works for MR,
there used to be an issue with spark jobs, haven't looked at that. This
is only for running jobs; you'd probably want to query the history
server too which may return more complete info with less indirection.
Also, have a look at the "scheduler" method on the RM, which you may
find useful.
The docs are here:
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html
For MR stuff:
https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html
But most useful is probably the timeline server, which I didn't have a
chance to use and possibly provides what you need:
https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1
All this from memory since I haven't touched a cluster lately, and
hoping it's not completely missing the point ;-)
David
[cid:[email protected]]
From: David Morel [mailto:[email protected]]
Sent: Thursday, March 10, 2016 1:03 AM
To: Jeff Zhang
Cc: [email protected]; Frank Luo
Subject: Re: how to use Yarn API to find task/attempt status
The REST API should help. A working implementation (in perl, not java,
sorry) is visible here : http://search.cpan.org/dist/Net-Hadoop-YARN/
Read the comments, they matter :-)
Le 10 mars 2016 7:28 AM, "Jeff Zhang"
<[email protected]<mailto:[email protected]>> a écrit :
If it is for M/R, then maybe this is what you want
https://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/JobStatus.html
On Thu, Mar 10, 2016 at 1:58 PM, Frank Luo
<[email protected]<mailto:[email protected]>> wrote:
Let’s say there are 10 standard M/R jobs running. How to find how
many tasks are done/running/pending?
From: Jeff Zhang [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, March 09, 2016 9:33 PM
To: Frank Luo
Cc: [email protected]<mailto:[email protected]>
Subject: Re: how to use Yarn API to find task/attempt status
I don't think it is related with yarn. Yarn don't know about task/task
attempt, it only knows containers. So it should be your application to
provide such function.
On Thu, Mar 10, 2016 at 11:29 AM, Frank Luo
<[email protected]<mailto:[email protected]>> wrote:
Anyone had a similar issue and knows the answer?
From: Frank Luo
Sent: Wednesday, March 09, 2016 1:59 PM
To: '[email protected]<mailto:[email protected]>'
Subject: how to use Yarn API to find task/attempt status
I have a need to programmatically find out how many tasks are pending
in Yarn. Is there a way to do it through a Java API?
I looked at YarnClient, but not able to find what I need.
Thx in advance.
Frank Luo
This email and any attachments transmitted with it are intended for
use by the intended recipient(s) only. If you have received this email
in error, please notify the sender immediately and then delete it. If
you are not the intended recipient, you must not keep, use, disclose,
copy or distribute this email without the author’s prior permission.
We take precautions to minimize the risk of transmitting software
viruses, but we advise you to perform your own virus checks on any
attachment to this message. We cannot accept liability for any loss or
damage caused by software viruses. The information contained in this
communication may be confidential and may be subject to the
attorney-client privilege.
--
Best Regards
Jeff Zhang
This email and any attachments transmitted with it are intended for
use by the intended recipient(s) only. If you have received this email
in error, please notify the sender immediately and then delete it. If
you are not the intended recipient, you must not keep, use, disclose,
copy or distribute this email without the author’s prior permission.
We take precautions to minimize the risk of transmitting software
viruses, but we advise you to perform your own virus checks on any
attachment to this message. We cannot accept liability for any loss or
damage caused by software viruses. The information contained in this
communication may be confidential and may be subject to the
attorney-client privilege.
--
Best Regards
Jeff Zhang
This email and any attachments transmitted with it are intended for
use by the intended recipient(s) only. If you have received this email
in error, please notify the sender immediately and then delete it. If
you are not the intended recipient, you must not keep, use, disclose,
copy or distribute this email without the author’s prior permission.
We take precautions to minimize the risk of transmitting software
viruses, but we advise you to perform your own virus checks on any
attachment to this message. We cannot accept liability for any loss or
damage caused by software viruses. The information contained in this
communication may be confidential and may be subject to the
attorney-client privilege.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]