Andrey Zagrebin created FLINK-15300:
---------------------------------------

             Summary: Shuffle memory fraction sanity check does not account for 
its min/max limit
                 Key: FLINK-15300
                 URL: https://issues.apache.org/jira/browse/FLINK-15300
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Configuration
            Reporter: Andrey Zagrebin
            Assignee: Andrey Zagrebin
             Fix For: 1.10.0


If we have a configuration which results in setting shuffle memory size to its 
min or max, not fraction during TM startup then starting TM parses generated 
dynamic properties and while doing the sanity check 
(TaskExecutorResourceUtils#sanityCheckShuffleMemory) it fails because it checks 
the exact fraction for min/max value.

Example, start TM with the following Flink config:
{code:java}
taskmanager.memory.total-flink.size: 350m
taskmanager.memory.framework.heap.size: 16m
taskmanager.memory.shuffle.fraction: 0.1{code}
It will result in the following extra program args:
{code:java}
taskmanager.memory.shuffle.max: 67108864b
 taskmanager.memory.framework.off-heap.size: 134217728b
 taskmanager.memory.managed.size: 146800642b
 taskmanager.cpu.cores: 1.0
 taskmanager.memory.task.heap.size: 2097150b
 taskmanager.memory.task.off-heap.size: 0b
 taskmanager.memory.shuffle.min: 67108864b{code}
where the derived fraction was less than shuffle memory min size (64mb),
so it was set to the min value: 64mb.



 

While TM starts, TaskExecutorResourceUtils#sanityCheckShuffleMemory trows the 
following exception:
{code:java}
org.apache.flink.configuration.IllegalConfigurationException: Derived Shuffle 
Memory size(64 Mb (67108864 bytes)) does not match configured Shuffle Memory 
fraction 
(0.10000000149011612).org.apache.flink.configuration.IllegalConfigurationException:
 Derived Shuffle Memory size(64 Mb (67108864 bytes)) does not match configured 
Shuffle Memory fraction (0.10000000149011612). at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.sanityCheckShuffleMemory(TaskExecutorResourceUtils.java:552)
 at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithExplicitTaskAndManagedMemory(TaskExecutorResourceUtils.java:183)
 at 
org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:135)
{code}
This can be fixed by checking whether the fraction to assert is within the 
min/max range.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to