Hi, Jason. According to the information you provided, your cluster has two nodes with the same resource <memory:1732, vCores:2>, the single running container is AM container which already takes over <memory:1024, vCores: 1>. I think a possible cause is that available resource of your cluster was insufficient for requesting new containers, please refer to the application attempt UI (http://<RM-HOST>:<RM-HTTP-PORT>/cluster/appattempt/<APP-ATTEMPT-ID>), you can find outstanding requests with required resources over there. Another possible cause is the queue/user limit, you can refer to scheduler UI (http://<RM-HOST>:<RM-HTTP-PORT>/cluster/scheduler) to check resource quotas and usage of the queue. Hope it helps.
Best, Tao Yang > 在 2019年7月10日,上午8:23,Jason Laughman <ja...@bernetechconsulting.com> 写道: > > I’ve been setting up a Hadoop 2.9.1 cluster and have data replicating through > HDFS, but when I try to run a job via Hive (I see that it’s deprecated, but > it’s what I’m working with for now) it never gets out of accepted state in > the web tool. I’ve done some Googling and the general consensus is that it’s > resource constraints, so can someone tell me if I’ve got enough horsepower > here? > > I’ve got one small name server, three small data servers, and two larger data > servers. I figured out the the small data servers were too small because > even if I tried to tweak YARN parameters for RAM and CPU the resource > managers would immediately shutdown. I added the two larger data servers, > and now I see two active nodes but only with a total of one container: > > $ yarn node -list > 19/07/09 23:54:11 INFO client.RMProxy: Connecting to ResourceManager at > <resource_manager>:8032 > Total Nodes:2 > Node-Id Node-State Node-Http-Address > Number-of-Running-Containers > node1:40079 RUNNING node1:8042 1 > node2:36311 RUNNING node2:8042 0 > > There are a ton of some sort of automated jobs backed up on there, and when I > try to run anything through Hive it just sits there and eventually times out > (I do see it get accepted). My larger nodes are 4 GB RAM and 2 vcores and I > set YARN to do automated resource allocation with > yarn.nodemanager.resource.detect-hardware-capabilities. Is that enough to > even get a POC lab working? I don’t care about having the three smaller > servers running as resource nodes, but I’d like to have a better > understanding of what’s going on with the larger servers, because it seems > like they’re close to working. > > Here’s the metrics data from the website, hopefully somebody can parse it. > Cluster Metrics > Apps Submitted Apps Pending Apps Running Apps Completed > Containers Running Memory Used Memory Total Memory Reserved > VCores Used VCores Total VCores Reserved > 292 284 1 7 1 1 GB 3.38 GB 0 B 1 4 > 0 > Cluster Nodes Metrics > Active Nodes Decommissioning Nodes Decommissioned Nodes Lost Nodes > Unhealthy Nodes Rebooted Nodes Shutdown Nodes > 2 0 0 0 0 0 4 > Scheduler Metrics > Scheduler Type Scheduling Resource Type Minimum Allocation > Maximum Allocation Maximum Cluster Application Priority > Capacity Scheduler [MEMORY] <memory:1024, vCores:1> <memory:1732, > vCores:2> 0 > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org
smime.p7s
Description: S/MIME cryptographic signature