Hi,

I create a job with following parameters:
org.apache.flink.configuration.Configuration{
yarn.containers.vcores=2
yarn.appmaster.vcores=1
}

ClusterSpecification{
taskManagerMemoryMB=1024
slotsPerTaskManager=1
}
After I launch job programmatically I have :
yarn node -list -showDetails
Configured Resources : <memory:8192, vCores:8>
Allocated Resources : <memory:1250, vCores:1> - I suppose this was created
for JobManager

But in logs I see 3 requests to create Requesting new TaskExecutor
container with resources <memory:2048, vCores:2>

Here is a log fragment:
 JobManager successfully registered at ResourceManager, leader id:
00000000000000000000000000000000.
 org.apache.flink.yarn.YarnResourceManager                     - Requesting
new TaskExecutor container with resources <memory:2048, vCores:2>. Number
pending requests 1.
 org.apache.flink.yarn.YarnResourceManager                     - Request
slot with profile ResourceProfile{UNKNOWN} for job
64080d7889797133215e501e72b23a74 with allocation id
a1c9ff2b7ec9ad662108b8a2b2301fcf.
 org.apache.flink.yarn.YarnResourceManager                     - Requesting
new TaskExecutor container with resources <memory:2048, vCores:2>. Number
pending requests 2.
 org.apache.flink.yarn.YarnResourceManager                     - Request
slot with profile ResourceProfile{UNKNOWN} for job
64080d7889797133215e501e72b23a74 with allocation id
21f57b4324bdd50dd293547bc4b19ce2.
 org.apache.flink.yarn.YarnResourceManager                     - Requesting
new TaskExecutor container with resources <memory:2048, vCores:2>. Number
pending requests 3.
Close ResourceManager connection
Shut down cluster because application is in FAILED, diagnostics null.

Here are things I would like to clarify:
Why there are 3 requests to create TaskExecutor instead of 1?
Why no task executor is created despite I have 7 cores and 7 GB  of free
RAM?
What is ResourceProfile{UNKNOWN}?
What is diagnostic null?

When I change number ClusterSpecification.slotsPerTaskManager to 1 - I get :
"Cannot serve slot request, no ResourceManager connected"
"Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources"
Why ResourceManager aint created despite I request even even less resource
for this?


Regards,
Vitaliy

Reply via email to