[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953585#comment-16953585 ] Till Rohrmann commented on FLINK-14123: --- Great. Please also close the now obsolete PR. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Assignee: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952874#comment-16952874 ] liupengcheng commented on FLINK-14123: -- [~trohrmann] Thanks! I will update these docs in this PR. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952651#comment-16952651 ] Till Rohrmann commented on FLINK-14123: --- The release notes you can find under https://github.com/apache/flink/blob/master/docs/release-notes/. The getting help can be found in the flink-web repository https://github.com/apache/flink-web/blob/asf-site/gettinghelp.md. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952028#comment-16952028 ] Xintong Song commented on FLINK-14123: -- I don't no either. I think you can try ask in the dev mailing list. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951816#comment-16951816 ] liupengcheng commented on FLINK-14123: -- Thanks [~xintongsong], I also tend to agree with Till too now, let us just update these docs in this PR. Can you tell me the file path for these docs in flink project? I try to find them under the docs directory, but only find the release_notes files. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950981#comment-16950981 ] Xintong Song commented on FLINK-14123: -- [~liupengcheng], I consulted [~trohrmann] today regarding this issue, Till is the Apache Flink PMC that works on runtime components. Till shares the concern that changing the default fraction may break some of the setups that work fine previously, and we should not do that for debug versions. Instead, he suggests us to add information of this issue in the [Getting Help - Got an Error Message|https://flink.apache.org/gettinghelp.html#got-an-error-message] and [release notes|https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.9.html] for the debug versions, to guide users who run into this GC error to the workaround solution (manually configuring the fraction). I tend to agree with Till. What do you think? > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950063#comment-16950063 ] liupengcheng commented on FLINK-14123: -- [~xintongsong] I think it's a common case, because I just ran terasort test. Maybe the reason why this issue is not reported by any other people before is that they are using java 9+(G1 GC by default) or Flink are now mainly used on some streaming cases. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950010#comment-16950010 ] Xintong Song commented on FLINK-14123: -- Good point. FLIP-49 does not fix this problem for bug fix versions of previous releases. My concern is that, changing this value may fix the GC overhead error, but it may also introduce regression for some other user cases. I think it's ok that some users have to change their configs when switching between different releases, as long as there's a good reason for that. But having to change configs for switching to debug versions of the same release sounds a little aggressive, especially for users who do not run into this error. I'm not sure which one is worse, having the error which can be fixed by manually adjust the configuration, or having the regression in cases that used to work. Does this error occur only in some rare cases or commonly for all flink batch jobs? I'm asking because I don't find any other people run into the same problem. If it's a common case, maybe we should fix it in the debug versions. [~sewen] What do you think? > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950006#comment-16950006 ] liupengcheng commented on FLINK-14123: -- [~xintongsong] I still have some doubt about your opinion, even if like you have.said that the FLIP-49 is very likely to be release in Flink 1.10, but this new design only affect versions Flink 1.10+, then how can we avoid the oom risks for 1.9.x. I think this issue is a serious stable problem,we'd better fix it for later bugfix versions on 1.9. What do you think? cc [~sewen] > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by:
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948071#comment-16948071 ] liupengcheng commented on FLINK-14123: -- [~xintongsong] Thanks, I got it! maybe we can have more discussion in the future and maybe more test on your PR, currently, maybe let's just ref my PR in your pull request, or add more comment about the new default value and notice people that it may cause performance regressions. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946470#comment-16946470 ] Xintong Song commented on FLINK-14123: -- [~liupengcheng], The reason we have smaller default value for the new 'taskmanager.memory.managed.fraction' compared to the current 'taskmanager.memory.fraction', is because of the definition change. As explained in my previous comment, the new fraction has a larger denominator compared to the original one, thus in order to remain the (roughly) same managed memory size we need to have a smaller default value. I admit there's no prove that choosing 0.5 as the default value for the new fraction can result in the same managed memory size as before. Actually given that we are changing the entire task executor memory configuration methods, I don't even think it's possible to choose one proper default value and have the same managed memory size as before in all scenarios. I think the default value for the new fraction is still discussable and can be adjusted during realistic practices and performance tests before the next release. As for this PR, I don't think merging it would cause any problem. It just seems to me that merging this PR is also not bringing any benefit. Given that we decided to deprecated the original fraction (according to the latest discussion [here|https://github.com/apache/flink/pull/9760]) for FLIP-49, which is planned and very likely to be released in Flink 1.10, merging this PR that changes the default value of the to-be-removed original fraction can only affect the intermediate snapshot versions before FLIP-49 is finished, but not any release version. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946446#comment-16946446 ] liupengcheng commented on FLINK-14123: -- [~xintongsong] Hi, thanks for your comment to let me know the FLIP-49, but I think there are no comment or annotation to explain why we should make the new `taskmanager.memory.managed.fraction` down to 0.5, this big change is not convincing enough. So maybe I think before your [PR|https://github.com/apache/flink/pull/9760] we can merge this change. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942428#comment-16942428 ] Xintong Song commented on FLINK-14123: -- [~StephanEwen], I'm not sure about this. Actually, there is a discussion regarding whether we should be backwards compatible on 'taskmanager.memory.fraction', in this [PR|[https://github.com/apache/flink/pull/9760]]. The problem is that, the definition of managed memory fraction becomes different. * In current codebase, 'taskmanager.memory.fraction' = managed memory / (total flink memory - network memory). * According to FLIP-49, the new config option 'taskmanager.memory.managed.fraction' = managed memory / total flink memory. If we support backwards compatibility on key 'taskmanager.memory.fraction', then it could be confusing for the user that the same configured fraction actually leads to different managed memory size. Maybe not supporting the legacy fraction could be a good way to draw attention of users on the definition change. If we decide not supporting the legacy fraction, then there would be no necessary on changing the default value. If we decide to be backwards compatible on this, then it makes more sense to decrease the default value because of the definition changes. Actually, the default value of the new fraction according to FLIP-49 is 0.5. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941325#comment-16941325 ] Stephan Ewen commented on FLINK-14123: -- [~xintongsong][~azagrebin] Given that in the future, we compute the amount of managed memory prior to starting the internal services, it probably makes sense to lower the fraction anyways. It would also be good to have good experience on the parallel GC, which seems to be still the best GC, throughput wise, for batch processing. What do you think about this change here? > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by:
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940610#comment-16940610 ] liupengcheng commented on FLINK-14123: -- [~StephanEwen] Thanks for your reply, I tested the app with G1 GC and CMS, it succeeded. So I think this failure mainly affect the Parallel GC. But I think we should make the default gc work, and many users are now still using jdk8, and there are bugs for G1 GC in jdk8. What do you think to adjust this default value to 0.6? > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934441#comment-16934441 ] Stephan Ewen commented on FLINK-14123: -- Would the behavior be the same when using G1 GC? Or CMS? CMS my be used mainly be streaming-only users, but G1 is definitely used as the default also by various batch users. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934283#comment-16934283 ] liupengcheng commented on FLINK-14123: -- [~StephanEwen] Yes, you are right, we are using jdk8, and we are now use the default Parallel GC. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934190#comment-16934190 ] Stephan Ewen commented on FLINK-14123: -- Thanks! What GC were you using? G1, CMS, Parallel? The logs look like you were using Parallel Scavenge GC, is that correct? > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933272#comment-16933272 ] liupengcheng commented on FLINK-14123: -- [~StephanEwen] Yes, I tested terasort with changed value on real clusters, and it passed. That's why I' m so strongly suggest to change it to 0.6. > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at > org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:84) > at >
[jira] [Commented] (FLINK-14123) Change taskmanager.memory.fraction default value to 0.6
[ https://issues.apache.org/jira/browse/FLINK-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933186#comment-16933186 ] Stephan Ewen commented on FLINK-14123: -- I get the reasoning behind this, it probably makes sense. We only have to be careful because changing this value now will cause regressions for some users in some cases. So we need to capture this in the release notes. Did you verify that it avoids the GC overhead error with the changed value? > Change taskmanager.memory.fraction default value to 0.6 > --- > > Key: FLINK-14123 > URL: https://issues.apache.org/jira/browse/FLINK-14123 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we are testing flink batch task, such as terasort, however, it > started only awhile then it failed due to OOM. > > {code:java} > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: a807e1d635bd4471ceea4282477f8850) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:539) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort$.main(FlinkTeraSort.scala:89) > at > com.github.ehiggs.spark.terasort.FlinkTeraSort.main(FlinkTeraSort.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746) > at > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1007) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1080) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1080) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259) > ... 23 more > Caused by: java.lang.RuntimeException: Error obtaining the sorted input: > Thread 'SortMerger Reading Thread' terminated due to an exception: GC > overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:650) > at > org.apache.flink.runtime.operators.BatchTask.getInput(BatchTask.java:1109) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:82) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Thread 'SortMerger Reading Thread' terminated > due to an exception: GC overhead limit exceeded > at > org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:831) > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > at >