Re: new to hadoop, jobs never leaving accepted

yangtao.yt Wed, 10 Jul 2019 00:43:57 -0700

Hi, Jason.

According to the information you provided, your cluster has two nodes with the 
same resource <memory:1732, vCores:2>, the single running container is AM 
container which already takes over <memory:1024, vCores: 1>.
I think a possible cause is that available resource of your cluster was 
insufficient for requesting new containers, please refer to the application 
attempt UI 
(http://<RM-HOST>:<RM-HTTP-PORT>/cluster/appattempt/<APP-ATTEMPT-ID>), you can 
find outstanding requests with required resources over there. Another possible 
cause is the queue/user limit, you can refer to scheduler UI 
(http://<RM-HOST>:<RM-HTTP-PORT>/cluster/scheduler) to check resource quotas 
and usage of the queue.
Hope it helps.


Best,
Tao Yang

> 在 2019年7月10日，上午8:23，Jason Laughman <ja...@bernetechconsulting.com> 写道：
> 
> I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating through 
> HDFS, but when I try to run a job via Hive (I see that it’s deprecated, but 
> it’s what I’m working with for now) it never gets out of accepted state in 
> the web tool.  I’ve done some Googling and the general consensus is that it’s 
> resource constraints, so can someone tell me if I’ve got enough horsepower 
> here?
> 
> I’ve got one small name server, three small data servers, and two larger data 
> servers.  I figured out the the small data servers were too small because 
> even if I tried to tweak YARN parameters for RAM and CPU the resource 
> managers would immediately shutdown.  I added the two larger data servers, 
> and now I see two active nodes but only with a total of one container:
> 
> $ yarn node -list
> 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at 
> <resource_manager>:8032
> Total Nodes:2
>         Node-Id            Node-State Node-Http-Address       
> Number-of-Running-Containers
> node1:40079           RUNNING node1:8042                                 1
> node2:36311           RUNNING node2:8042                                 0
> 
> There are a ton of some sort of automated jobs backed up on there, and when I 
> try to run anything through Hive it just sits there and eventually times out 
> (I do see it get accepted).  My larger nodes are 4 GB RAM and 2 vcores and I 
> set YARN to do automated resource allocation with 
> yarn.nodemanager.resource.detect-hardware-capabilities.  Is that enough to 
> even get a POC lab working?  I don’t care about having the three smaller 
> servers running as resource nodes, but I’d like to have a better 
> understanding of what’s going on with the larger servers, because it seems 
> like they’re close to working.
> 
> Here’s the metrics data from the website, hopefully somebody can parse it.
> Cluster Metrics
> Apps Submitted        Apps Pending    Apps Running    Apps Completed  
> Containers Running      Memory Used     Memory Total    Memory Reserved 
> VCores Used     VCores Total    VCores Reserved
> 292   284     1       7       1       1 GB    3.38 GB 0 B     1       4       > 0
> Cluster Nodes Metrics
> Active Nodes  Decommissioning Nodes   Decommissioned Nodes    Lost Nodes      
> Unhealthy Nodes Rebooted Nodes  Shutdown Nodes
> 2     0       0       0       0       0       4
> Scheduler Metrics
> Scheduler Type        Scheduling Resource Type        Minimum Allocation      
> Maximum Allocation      Maximum Cluster Application Priority
> Capacity Scheduler    [MEMORY]        <memory:1024, vCores:1> <memory:1732, 
> vCores:2> 0
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org

smime.p7s
Description: S/MIME cryptographic signature

Re: new to hadoop, jobs never leaving accepted

Reply via email to