Re: new to hadoop, jobs never leaving accepted

Jason Laughman Wed, 10 Jul 2019 10:41:26 -0700

I added a couple of bigger servers and now I see multiple containers running, 
but I still can’t get a job to run.  The job details now say:


Diagnostics:    [Wed Jul 10 17:27:59 +0000 2019] Application is added to the 
scheduler and is not yet activated. Queue's AM resource limit exceeded. Details 
: AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, 
vCores:1>; Queue Resource Limit for AM = <memory:3072, vCores:1>; User AM 
Resource Limit of the queue = <memory:3072, vCores:1>; Queue AM Resource Usage 
= <memory:2048, vCores:2>;

I understand WHAT that’s saying, but I don’t understand WHY.  Here’s what my 
scheduler details look like, I don’t see why it’s complaining about the AM, 
unless something’s not talking to something else right:

Queue State:    RUNNING
Used Capacity:  12.5%
Configured Capacity:    100.0%
Configured Max Capacity:        100.0%
Absolute Used Capacity: 12.5%
Absolute Configured Capacity:   100.0%
Absolute Configured Max Capacity:       100.0%
Used Resources: <memory:3072, vCores:3>
Configured Max Application Master Limit:        10.0
Max Application Master Resources:       <memory:3072, vCores:1>
Used Application Master Resources:      <memory:3072, vCores:3>
Max Application Master Resources Per User:      <memory:3072, vCores:1>
Num Schedulable Applications:   3
Num Non-Schedulable Applications:       38
Num Containers: 3
Max Applications:       10000
Max Applications Per User:      10000
Configured Minimum User Limit Percent:  100%
Configured User Limit Factor:   1.0
Accessible Node Labels: *
Ordering Policy:        FifoOrderingPolicy
Preemption:     disabled
Intra-queue Preemption: disabled
Default Node Label Expression:  <DEFAULT_PARTITION>
Default Application Priority:   0

User Name       Max Resource    Weight  Used Resource   Max AM Resource Used AM 
Resource        Schedulable Apps        Non-Schedulable Apps
hdfs    <memory:0, vCores:0>    1.0     <memory:0, vCores:0>    <memory:3072, 
vCores:1> <memory:0, vCores:0>    0       1
dr.who  <memory:24576, vCores:1>        1.0     <memory:3072, vCores:3> 
<memory:3072, vCores:1> <memory:3072, vCores:3> 3       37

> On Jul 10, 2019, at 3:37 AM, yangtao.yt <yangtao...@alibaba-inc.com> wrote:
> 
> Hi, Jason.
> 
> According to the information you provided, your cluster has two nodes with 
> the same resource <memory:1732, vCores:2>, the single running container is AM 
> container which already takes over <memory:1024, vCores: 1>.
> I think a possible cause is that available resource of your cluster was 
> insufficient for requesting new containers, please refer to the application 
> attempt UI 
> (http://<RM-HOST>:<RM-HTTP-PORT>/cluster/appattempt/<APP-ATTEMPT-ID>), you 
> can find outstanding requests with required resources over there. Another 
> possible cause is the queue/user limit, you can refer to scheduler UI 
> (http://<RM-HOST>:<RM-HTTP-PORT>/cluster/scheduler) to check resource quotas 
> and usage of the queue.
> Hope it helps.
> 
> Best,
> Tao Yang
> 
>> 在 2019年7月10日，上午8:23，Jason Laughman <ja...@bernetechconsulting.com> 写道：
>> 
>> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating 
>> through HDFS, but when I try to run a job via Hive (I see that it’s 
>> deprecated, but it’s what I’m working with for now) it never gets out of 
>> accepted state in the web tool.  I’ve done some Googling and the general 
>> consensus is that it’s resource constraints, so can someone tell me if I’ve 
>> got enough horsepower here?
>> 
>> I’ve got one small name server, three small data servers, and two larger 
>> data servers.  I figured out the the small data servers were too small 
>> because even if I tried to tweak YARN parameters for RAM and CPU the 
>> resource managers would immediately shutdown.  I added the two larger data 
>> servers, and now I see two active nodes but only with a total of one 
>> container:
>> 
>> $ yarn node -list
>> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at 
>> <resource_manager>:8032
>> Total Nodes:2
>>        Node-Id            Node-State Node-Http-Address       
>> Number-of-Running-Containers
>> node1:40079          RUNNING node1:8042                                 1
>> node2:36311          RUNNING node2:8042                                 0
>> 
>> There are a ton of some sort of automated jobs backed up on there, and when 
>> I try to run anything through Hive it just sits there and eventually times 
>> out (I do see it get accepted).  My larger nodes are 4 GB RAM and 2 vcores 
>> and I set YARN to do automated resource allocation with 
>> yarn.nodemanager.resource.detect-hardware-capabilities.  Is that enough to 
>> even get a POC lab working?  I don’t care about having the three smaller 
>> servers running as resource nodes, but I’d like to have a better 
>> understanding of what’s going on with the larger servers, because it seems 
>> like they’re close to working.
>> 
>> Here’s the metrics data from the website, hopefully somebody can parse it.
>> Cluster Metrics
>> Apps Submitted       Apps Pending    Apps Running    Apps Completed  
>> Containers Running      Memory Used     Memory Total    Memory Reserved 
>> VCores Used     VCores Total    VCores Reserved
>> 292  284     1       7       1       1 GB    3.38 GB 0 B     1       4       >> 0
>> Cluster Nodes Metrics
>> Active Nodes Decommissioning Nodes   Decommissioned Nodes    Lost Nodes      
>> Unhealthy Nodes Rebooted Nodes  Shutdown Nodes
>> 2    0       0       0       0       0       4
>> Scheduler Metrics
>> Scheduler Type       Scheduling Resource Type        Minimum Allocation      
>> Maximum Allocation      Maximum Cluster Application Priority
>> Capacity Scheduler   [MEMORY]        <memory:1024, vCores:1> <memory:1732, 
>> vCores:2> 0
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: user-h...@hadoop.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Re: new to hadoop, jobs never leaving accepted

Reply via email to