[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420568#comment-16420568 ] ASF GitHub Bot commented on DRILL-6125: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1105 > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Labels: ready-to-commit > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419083#comment-16419083 ] ASF GitHub Bot commented on DRILL-6125: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1105 +1 > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Labels: ready-to-commit > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416189#comment-16416189 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1105 @arina-ielchiieva Please review. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395574#comment-16395574 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 Squashed commits. @arina-ielchiieva please let me know if you have any comments. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393419#comment-16393419 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173545351 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -227,11 +280,19 @@ public void run() { @Override public Void run() throws Exception { injector.injectChecked(fragmentContext.getExecutionControls(), "fragment-execution", IOException.class); - /* - * Run the query until root.next returns false OR we no longer need to continue. - */ - while (shouldContinue() && root.next()) { -// loop + + while (shouldContinue()) { +// Fragment is not cancelled + +for (FragmentHandle fragmentHandle; (fragmentHandle = receiverFinishedQueue.poll()) != null;) { --- End diff -- OK, I see. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393411#comment-16393411 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173543287 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -245,19 +306,17 @@ public Void run() throws Exception { // we have a heap out of memory error. The JVM in unstable, exit. CatastrophicFailure.exit(e, "Unable to handle out of memory condition in FragmentExecutor.", -2); } +} catch (InterruptedException e) { + // Swallow interrupted exceptions since we intentionally interrupt the root when cancelling a query + logger.trace("Interruped root: {}", e); } catch (Throwable t) { fail(t); } finally { - // no longer allow this thread to be interrupted. We synchronize here to make sure that cancel can't set an - // interruption after we have moved beyond this block. - synchronized (myThreadRef) { -myThreadRef.set(null); -Thread.interrupted(); - } - - // Make sure the event processor is started at least once - eventProcessor.start(); + // Don't process any more termination requests, we are done. + eventProcessor.terminate(); --- End diff -- There is a corner case. If we didn't include eventProcessor.terminate() we could theoretically receive a cancellation request for the first time after the interrupts were cleared for the FragmentExecutor#run thread. The cancellation would then interrupt the Thread again, and our FragmentExecutor would finish and leave the thread it used in an interrupted state. This could cause problems for the next FragmentExecutor that uses the same thread. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393397#comment-16393397 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173542042 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -227,11 +280,19 @@ public void run() { @Override public Void run() throws Exception { injector.injectChecked(fragmentContext.getExecutionControls(), "fragment-execution", IOException.class); - /* - * Run the query until root.next returns false OR we no longer need to continue. - */ - while (shouldContinue() && root.next()) { -// loop + + while (shouldContinue()) { +// Fragment is not cancelled + +for (FragmentHandle fragmentHandle; (fragmentHandle = receiverFinishedQueue.poll()) != null;) { --- End diff -- The original code allowed PartitionSenderRootExec#receivingFragmentFinished() to be called before, after or concurrently with PartitionSenderRootExec#innerNext(). The idea was that PartitionSenderRootExec#receivingFragmentFinished would terminate a partition. The terminate method of OutgoingRecordBatch would cause all outgoing records to a finished receiver to be dropped. So the finished downstream receivers would stop receiving data from the PartitionSenderRootExec, but the downstream receivers that did not finish would continue to receive records. There is a gray area here because I'm not sure why only some of the downstream receivers would send a RECEIVER_FINISHED message and others wouldn't, but the original design seems to make an assumption that this is a very common thing and is optimized for it. So I assume the original authors know something that we don't with respect to that. Also calling receivingFragmentFinished after we finished processing all the data would defeat the purpose, since the intention was to allow our fragment to terminate early if the downstream receivers decided they don't need anymore data (ex. A select limit query which only asks the first 100 rows of a result with 10 million rows). If we called it only after the next() loop was done we would always process all upstream records even when we didn't have to and we would see significant degradation in the performance of limit queries. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393371#comment-16393371 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173537455 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -488,47 +548,74 @@ void receiverFinished(FragmentHandle handle) { sendEvent(new FragmentEvent(EventType.RECEIVER_FINISHED, handle)); } +/** + * Tell the {@link FragmentEventProcessor} not to process anymore events. This keeps stray cancellation requests + * from being processed after the root has finished running and interrupts in the root thread have been cleared. + */ +public void terminate() { + terminate.set(true); +} + @Override protected void processEvent(FragmentEvent event) { + if (event.type.equals(EventType.RECEIVER_FINISHED)) { +// Finish request +if (terminate.get()) { + // We have already recieved a cancellation or we have terminated the event processor. Do not process anymore finish requests. + return; +} + } else { +// Cancel request +if (!terminate.compareAndSet(false, true)) { + // We have already received a cancellation or we have terminated the event processor. Do not process anymore cancellation requests. + // This prevents the root thread from being interrupted at an inappropriate time. + return; +} + } + switch (event.type) { case CANCEL: - /* - * We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. - */ + // We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. updateState(FragmentState.CANCELLATION_REQUESTED); - - /* - * Interrupt the thread so that it exits from any blocking operation it could be executing currently. We - * synchronize here to ensure we don't accidentally create a race condition where we interrupt the close out - * procedure of the main thread. - */ - synchronized (myThreadRef) { -final Thread myThread = myThreadRef.get(); -if (myThread != null) { - logger.debug("Interrupting fragment thread {}", myThread.getName()); - myThread.interrupt(); -} - } + // The root was started so we have to interrupt it in case it is performing a blocking operation. + killThread(); + terminate.set(true); --- End diff -- Doh! Thanks for catching > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393372#comment-16393372 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173537487 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -488,47 +548,74 @@ void receiverFinished(FragmentHandle handle) { sendEvent(new FragmentEvent(EventType.RECEIVER_FINISHED, handle)); } +/** + * Tell the {@link FragmentEventProcessor} not to process anymore events. This keeps stray cancellation requests + * from being processed after the root has finished running and interrupts in the root thread have been cleared. + */ +public void terminate() { + terminate.set(true); +} + @Override protected void processEvent(FragmentEvent event) { + if (event.type.equals(EventType.RECEIVER_FINISHED)) { +// Finish request +if (terminate.get()) { + // We have already recieved a cancellation or we have terminated the event processor. Do not process anymore finish requests. + return; +} + } else { +// Cancel request +if (!terminate.compareAndSet(false, true)) { + // We have already received a cancellation or we have terminated the event processor. Do not process anymore cancellation requests. + // This prevents the root thread from being interrupted at an inappropriate time. + return; +} + } + switch (event.type) { case CANCEL: - /* - * We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. - */ + // We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. updateState(FragmentState.CANCELLATION_REQUESTED); - - /* - * Interrupt the thread so that it exits from any blocking operation it could be executing currently. We - * synchronize here to ensure we don't accidentally create a race condition where we interrupt the close out - * procedure of the main thread. - */ - synchronized (myThreadRef) { -final Thread myThread = myThreadRef.get(); -if (myThread != null) { - logger.debug("Interrupting fragment thread {}", myThread.getName()); - myThread.interrupt(); -} - } + // The root was started so we have to interrupt it in case it is performing a blocking operation. + killThread(); + terminate.set(true); break; - case CANCEL_AND_FINISH: + // In this case the root was never started so we do not have to interrupt the thread. updateState(FragmentState.CANCELLATION_REQUESTED); + // The FragmentExecutor#run() loop will not execute in this case so we have to cleanup resources here cleanup(FragmentState.FINISHED); + terminate.set(true); --- End diff -- Thanks for catching > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393325#comment-16393325 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173527617 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -227,11 +280,19 @@ public void run() { @Override public Void run() throws Exception { injector.injectChecked(fragmentContext.getExecutionControls(), "fragment-execution", IOException.class); - /* - * Run the query until root.next returns false OR we no longer need to continue. - */ - while (shouldContinue() && root.next()) { -// loop + + while (shouldContinue()) { +// Fragment is not cancelled + +for (FragmentHandle fragmentHandle; (fragmentHandle = receiverFinishedQueue.poll()) != null;) { --- End diff -- How this processing affects `root.next()`? Should it be also/only done after fragment is canceled or done (after while loop)? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393326#comment-16393326 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173528013 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -488,47 +548,74 @@ void receiverFinished(FragmentHandle handle) { sendEvent(new FragmentEvent(EventType.RECEIVER_FINISHED, handle)); } +/** + * Tell the {@link FragmentEventProcessor} not to process anymore events. This keeps stray cancellation requests + * from being processed after the root has finished running and interrupts in the root thread have been cleared. + */ +public void terminate() { + terminate.set(true); +} + @Override protected void processEvent(FragmentEvent event) { + if (event.type.equals(EventType.RECEIVER_FINISHED)) { +// Finish request +if (terminate.get()) { + // We have already recieved a cancellation or we have terminated the event processor. Do not process anymore finish requests. + return; +} + } else { +// Cancel request +if (!terminate.compareAndSet(false, true)) { + // We have already received a cancellation or we have terminated the event processor. Do not process anymore cancellation requests. + // This prevents the root thread from being interrupted at an inappropriate time. + return; +} + } + switch (event.type) { case CANCEL: - /* - * We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. - */ + // We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. updateState(FragmentState.CANCELLATION_REQUESTED); - - /* - * Interrupt the thread so that it exits from any blocking operation it could be executing currently. We - * synchronize here to ensure we don't accidentally create a race condition where we interrupt the close out - * procedure of the main thread. - */ - synchronized (myThreadRef) { -final Thread myThread = myThreadRef.get(); -if (myThread != null) { - logger.debug("Interrupting fragment thread {}", myThread.getName()); - myThread.interrupt(); -} - } + // The root was started so we have to interrupt it in case it is performing a blocking operation. + killThread(); + terminate.set(true); break; - case CANCEL_AND_FINISH: + // In this case the root was never started so we do not have to interrupt the thread. updateState(FragmentState.CANCELLATION_REQUESTED); + // The FragmentExecutor#run() loop will not execute in this case so we have to cleanup resources here cleanup(FragmentState.FINISHED); + terminate.set(true); --- End diff -- it is not necessary (terminate should be already true). > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference >
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393323#comment-16393323 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173526625 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -245,19 +306,17 @@ public Void run() throws Exception { // we have a heap out of memory error. The JVM in unstable, exit. CatastrophicFailure.exit(e, "Unable to handle out of memory condition in FragmentExecutor.", -2); } +} catch (InterruptedException e) { + // Swallow interrupted exceptions since we intentionally interrupt the root when cancelling a query + logger.trace("Interruped root: {}", e); } catch (Throwable t) { fail(t); } finally { - // no longer allow this thread to be interrupted. We synchronize here to make sure that cancel can't set an - // interruption after we have moved beyond this block. - synchronized (myThreadRef) { -myThreadRef.set(null); -Thread.interrupted(); - } - - // Make sure the event processor is started at least once - eventProcessor.start(); + // Don't process any more termination requests, we are done. + eventProcessor.terminate(); --- End diff -- Is this terminate() required? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393324#comment-16393324 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173527951 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -488,47 +548,74 @@ void receiverFinished(FragmentHandle handle) { sendEvent(new FragmentEvent(EventType.RECEIVER_FINISHED, handle)); } +/** + * Tell the {@link FragmentEventProcessor} not to process anymore events. This keeps stray cancellation requests + * from being processed after the root has finished running and interrupts in the root thread have been cleared. + */ +public void terminate() { + terminate.set(true); +} + @Override protected void processEvent(FragmentEvent event) { + if (event.type.equals(EventType.RECEIVER_FINISHED)) { +// Finish request +if (terminate.get()) { + // We have already recieved a cancellation or we have terminated the event processor. Do not process anymore finish requests. + return; +} + } else { +// Cancel request +if (!terminate.compareAndSet(false, true)) { + // We have already received a cancellation or we have terminated the event processor. Do not process anymore cancellation requests. + // This prevents the root thread from being interrupted at an inappropriate time. + return; +} + } + switch (event.type) { case CANCEL: - /* - * We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. - */ + // We set the cancel requested flag but the actual cancellation is managed by the run() loop, if called. updateState(FragmentState.CANCELLATION_REQUESTED); - - /* - * Interrupt the thread so that it exits from any blocking operation it could be executing currently. We - * synchronize here to ensure we don't accidentally create a race condition where we interrupt the close out - * procedure of the main thread. - */ - synchronized (myThreadRef) { -final Thread myThread = myThreadRef.get(); -if (myThread != null) { - logger.debug("Interrupting fragment thread {}", myThread.getName()); - myThread.interrupt(); -} - } + // The root was started so we have to interrupt it in case it is performing a blocking operation. + killThread(); + terminate.set(true); --- End diff -- it is not necessary (terminate should be already true). > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392513#comment-16392513 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @vrozov @arina-ielchiieva Applied review comments, please let me know if there are anymore comments. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392432#comment-16392432 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173367485 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -231,9 +261,9 @@ public Void run() throws Exception { while (shouldContinue()) { // Fragment is not cancelled -if (eventProcessor.hasFinishedRequests()) { +if (!recieverFinishedQueue.isEmpty()) { --- End diff -- Thanks for catching removed. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392431#comment-16392431 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173367462 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -488,47 +527,66 @@ void receiverFinished(FragmentHandle handle) { sendEvent(new FragmentEvent(EventType.RECEIVER_FINISHED, handle)); } +/** + * Tell the {@link FragmentEventProcessor} not to process anymore events. This keeps stray cancellation requests + * from being processed after the root has finished running and interrupts in the root thread have been cleared. + */ +public synchronized void terminate() { + terminate.set(true); +} + @Override protected void processEvent(FragmentEvent event) { + if (terminate.get()) { --- End diff -- Thanks for catching. Fixed. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392427#comment-16392427 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173367313 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -475,6 +483,8 @@ public void drillbitUnregistered(final Set * This is especially important as fragments can take longer to start */ private class FragmentEventProcessor extends EventProcessor { +private boolean terminate = false; --- End diff -- Done > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392429#comment-16392429 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173367348 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -67,6 +88,11 @@ private volatile RootExec root; private final AtomicReference fragmentState = new AtomicReference<>(FragmentState.AWAITING_ALLOCATION); + /** + * Holds all of the messages sent by downstream recievers that have finished. The {@link FragmentExecutor#run()} thread reads from this queue and passes the + * finished messages to the fragment's {@link RootExec} via the {@link RootExec#receivingFragmentFinished(FragmentHandle)} method. + */ + private final Queue recieverFinishedQueue = new ConcurrentLinkedQueue<>(); --- End diff -- Thanks for catching. Fixed all the misspellings of receiver. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392426#comment-16392426 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173367305 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -475,6 +483,8 @@ public void drillbitUnregistered(final Set * This is especially important as fragments can take longer to start */ private class FragmentEventProcessor extends EventProcessor { +private boolean terminate = false; +private List finishedHandles = Lists.newArrayList(); --- End diff -- Done > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391959#comment-16391959 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173303074 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -488,47 +527,66 @@ void receiverFinished(FragmentHandle handle) { sendEvent(new FragmentEvent(EventType.RECEIVER_FINISHED, handle)); } +/** + * Tell the {@link FragmentEventProcessor} not to process anymore events. This keeps stray cancellation requests + * from being processed after the root has finished running and interrupts in the root thread have been cleared. + */ +public synchronized void terminate() { + terminate.set(true); +} + @Override protected void processEvent(FragmentEvent event) { + if (terminate.get()) { --- End diff -- use atomic compare and set to avoid race condition. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391958#comment-16391958 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173300352 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -67,6 +88,11 @@ private volatile RootExec root; private final AtomicReference fragmentState = new AtomicReference<>(FragmentState.AWAITING_ALLOCATION); + /** + * Holds all of the messages sent by downstream recievers that have finished. The {@link FragmentExecutor#run()} thread reads from this queue and passes the + * finished messages to the fragment's {@link RootExec} via the {@link RootExec#receivingFragmentFinished(FragmentHandle)} method. + */ + private final Queue recieverFinishedQueue = new ConcurrentLinkedQueue<>(); --- End diff -- please fix typo (receiver) > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391960#comment-16391960 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173300621 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -231,9 +261,9 @@ public Void run() throws Exception { while (shouldContinue()) { // Fragment is not cancelled -if (eventProcessor.hasFinishedRequests()) { +if (!recieverFinishedQueue.isEmpty()) { --- End diff -- I don't think `isEmpty()` is necessary, it is the same as `poll() != null`. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391562#comment-16391562 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173229346 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -475,6 +483,8 @@ public void drillbitUnregistered(final Set * This is especially important as fragments can take longer to start */ private class FragmentEventProcessor extends EventProcessor { +private boolean terminate = false; --- End diff -- Consider using AtomicBoolean and avoiding `synchronized`. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391563#comment-16391563 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r173224242 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -475,6 +483,8 @@ public void drillbitUnregistered(final Set * This is especially important as fragments can take longer to start */ private class FragmentEventProcessor extends EventProcessor { +private boolean terminate = false; +private List finishedHandles = Lists.newArrayList(); --- End diff -- Consider using Queue and possibly concurrent Queue and moving `finishedHandles` to FragmentExecutor. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390683#comment-16390683 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @vrozov @arina-ielchiieva Handled multiple receiver finished messages correctly. This PR is ready for another round of review. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390184#comment-16390184 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 Nope never mind I think I spoke too soon. I just realized we may get multiple receivingFragmentFinished requests, one for each downstream receiver. Back to the drawing board. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390169#comment-16390169 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @arina-ielchiieva @vrozov In light of Vlad's comments I have reworked the synchronization model yet again. This change now removes all synchronization from PartitionSenderRootExec and enforces the guarantee that all the lifecycle methods of the PartitionSenderRootExec will only be called by a single Run thread in the FragmentExecutor. Also while making this change I discovered a few other bugs with how cancellations and receiver finishes are handled, so I have addressed those bugs as well. I will go into more detail about what I changed below. # Motivation As Vlad pointed out **close** and **innerNext** are never called concurrently. After closer inspection of the code I also released that currently (in apache master) innerNext and close will always be called by the **FragmentExecutor#run** thread. The only method of PartitionSenderRootExec that is not called by the **FragmentExecutor#run** thread is **receivingFragmentFinished**. In order to simplify the implementation of PartitionSenderRootExec and also simplify the design of the FragmentExecutor I changed the code so that only the **FragmentExecutor#run** thread calls **receivingFragmentFinished** as well. In this way we can remove all the synchronization from PartitionSenderRootExec. This was done by by: 1. Making the event processor save the FragmentHandle in the event that a receiver finish request was sent. 2. After the **root.next()** loop terminates in **FragmentExecutor#run** the eventProcessor is checked to see if a finish request was received. If so **receivingFragmentFinished** is called on root by the **FragmentExecutor#run** method. # Other Bugs ## Processing of multiple termination requests The event processor would process all cancellation and finish requests, even if there is more than one. This doesn't really make sense, since we can only cancel or finish once. So I have changed the event processor to execute only the first termination request and ignore all the others. ## Simultaneous Cancellation and Finishing Since the event processor would process multiple termination requests concurrently it was possible for a cancel and a finish message to be received and processed simultaneously. The results of this were not well defined since **close** and **receivingFragmentFinished** could be called concurrently. # Other Improvements Vlad also pointed out that we did not need the hasCloseoutThread atomic reference, since we were already using the myThreadRef atomic reference. That cleaned up the code a bit. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.14.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388572#comment-16388572 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @vrozov Thanks for catching this, I believe you are right. hasClouseoutThread guarantees innerNext and close won't be called concurrently. However, I still believe innerNext and receivingFragmentFinished could be called concurrently, since the ControlMessageHandler thread executes recievingFragmentFinished. Additionally in rare cases where a limit query is cancelled recievingFragmentFinished and close could be called concurrently as well. While reflecting on your comments I also saw another issue where the root could be blocked on a next call but a Finished event would not cause root to terminate. In light of all of this I actually think the **synchronized** is not sufficient. We will have to have some way to interrupt the execution of the root when we received a finish signal and only close out the resources after receivingFragmentFinished has been called. Similarly if we receive a finish signal we should ignore any cancellation requests instead of trying to cancel and finish simultaneously and vice versa. I will rework the solution to address these issues and let you know when I have an update. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388402#comment-16388402 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1105 @ilooner @arina-ielchiieva How `innerNext()` and `close()` can execute concurrently? Does not `FragmentExecutor.hasCloseoutThread` ensure that either `close()` is called on the `run()` thread or `run()` exits if the fragment is already cancelled? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388261#comment-16388261 ] ASF GitHub Bot commented on DRILL-6125: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1105 Looks good, @vrozov are you ok with the changes? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386765#comment-16386765 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r172327159 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -358,7 +357,7 @@ public void close() throws Exception { } } - public void sendEmptyBatch(boolean isLast) { + public synchronized void sendEmptyBatch(boolean isLast) { --- End diff -- Done > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386766#comment-16386766 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @arina-ielchiieva Applied review comments. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386762#comment-16386762 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r172327066 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -245,6 +244,8 @@ public Void run() throws Exception { // we have a heap out of memory error. The JVM in unstable, exit. CatastrophicFailure.exit(e, "Unable to handle out of memory condition in FragmentExecutor.", -2); } +} catch (InterruptedException e) { + // Swallow interrupted exceptions since we intentionally interrupt the root when cancelling a query --- End diff -- Done > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386763#comment-16386763 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r172327106 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -231,7 +231,7 @@ public boolean innerNext() { } @VisibleForTesting - protected void createPartitioner() throws SchemaChangeException { + protected synchronized void createPartitioner() throws SchemaChangeException { --- End diff -- Removed > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386690#comment-16386690 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @vrozov The main issue is that InnerNext and Close should not execute concurrently. Even when we are using AtomicReferences and volatile the following sequence of events could happen which could cause a memory leak: 1. Let's say there are two Threads. The **Close thread** which starts at the beginning of the close method, and the **Next thread** which starts at the beginning of the innerNext method. 1. Now let's say the **Next Thread** runs and checks **ok**. Since close has not been called yet **ok** is true. 1. Now the **Next Thread** is after the ok check. 1. The **Close thread** now starta executing. And the close thread clears the partitioner. 1. Now after the partitioner is cleared the **Next Thread** can resume executing. If the next thread receives an OK_SCHEMA he will allocate a new partitioner. Since the OK_SCHEMA message may include records the partitioner may partition some data as well. 1. Now the **Close thread** is done, but there is a partitioner that has not been closed, and we will leak memory. In order to property solve this problem we need to make sure that the innerNext and close methods are mutually exclusive so the above scenario can never happen. The easiest way to do that is to use the synchronized key word. If we use the synchronized keyword then we don't have to use volatile or atomic reference. Also as a side note using synchronized will probably be more efficient since a cache flush would only be triggered at the start of the innerNext and close method. Alternatively if we used volatile and AtomicReference a cache flush would be triggered every time we accessed the ok and partitioner variables. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386506#comment-16386506 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on the issue: https://github.com/apache/drill/pull/1105 @ilooner What state is protected by `syncrhonized`? Why is it not sufficient to use `volatile` and `AtomicReference`? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384784#comment-16384784 ] ASF GitHub Bot commented on DRILL-6125: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r172024863 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -231,7 +231,7 @@ public boolean innerNext() { } @VisibleForTesting - protected void createPartitioner() throws SchemaChangeException { + protected synchronized void createPartitioner() throws SchemaChangeException { --- End diff -- Why you need to synchronized`createParitioner`? It's being called from `innerNext` which is already synchronized. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384786#comment-16384786 ] ASF GitHub Bot commented on DRILL-6125: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r172024881 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/fragment/FragmentExecutor.java --- @@ -245,6 +244,8 @@ public Void run() throws Exception { // we have a heap out of memory error. The JVM in unstable, exit. CatastrophicFailure.exit(e, "Unable to handle out of memory condition in FragmentExecutor.", -2); } +} catch (InterruptedException e) { + // Swallow interrupted exceptions since we intentionally interrupt the root when cancelling a query --- End diff -- Maybe add logger.trace? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384785#comment-16384785 ] ASF GitHub Bot commented on DRILL-6125: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r172024966 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -358,7 +357,7 @@ public void close() throws Exception { } } - public void sendEmptyBatch(boolean isLast) { + public synchronized void sendEmptyBatch(boolean isLast) { --- End diff -- Method can be made private. Plus maybe not be synchronized as well since is being called from `innerNext`. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383023#comment-16383023 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @arina-ielchiieva @vrozov I believe I have a solution. There were several issues with the original code. 1. It made incorrect assumptions about how cache invalidation works with java **synchronized**. 2. It assumed **innerNext** and **close** would be called sequentially. I believe this PR fixes these issues now and I have gone into more detail about the problems below. # 1. Incorrect Cache Invalidation Assumptions The original code was trying to be smart by trying to reduce synchronization overhead on **innerNext**. So the code in **innerNext** did not synchronize before changing the partitioner object since this would be called often. The code in **close()** and ** receivingFragmentFinished()** synchronized before accessing the partitioner with the intention that this would trigger an update of the partitioner variable state across all threads. Unfortunately, this assumption is invalid (see https://stackoverflow.com/questions/22706739/does-synchronized-guarantee-a-thread-will-see-the-latest-value-of-a-non-volatile). Every thread that accesses a variable must synchronize before accessing a variable in order to properly invalidate cached data on a core. For example if **Thread A** modifies **Variable 1** then **Thread B** synchronizes before accessing **Variable 1** then there is no guarantee **Thread B** will see the most updated value for **Variable 1** since it might . ## Solution In summary the right thing to do is the simple thing. Make the methods synchronized. Unfortunately there is no way to outsmart the system and reduce synchronization overhead without causing race conditions. # 2. Concurrent InnerNext and Close Calls The original code did not consider the case that innerNext was in the middle of execution when close was called. It did try to handle the case where **innerNext** could be called after **close** by setting the **ok** variable. But it didn't even do that right because there was no synchronization around the **ok** variable. ## Solution The right thing to do is the simple thing. Make sure the methods are synchronized so close has to wait until innerNext is done before executing. Also when a query is cancelled the executing thread should be interrupted the thread running innerNext incase it is on a blocking call. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377840#comment-16377840 ] ASF GitHub Bot commented on DRILL-6125: --- Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r170782805 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -62,7 +62,7 @@ private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PartitionSenderRootExec.class); private RecordBatch incoming; private HashPartitionSender operator; - private PartitionerDecorator partitioner; + private volatile PartitionerDecorator partitioner; --- End diff -- Consider using `AtomicReference` instead of `volatile` > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374057#comment-16374057 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @priteshm @arina-ielchiieva I should have updated this PR earlier this week, here is my update. After reflecting on Arina's comments and reading some more docs about how java implements volatile and synchronization, I think this solution might not fix the original race condition. I need to to more reading to get a better understanding. Additionally I realized there is another race condition where two threads are simultaneously calling close and innerNext which could cause a memory leak. Haven't had a chance to dig further this week, so I will try to wrap this up next week. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373941#comment-16373941 ] ASF GitHub Bot commented on DRILL-6125: --- Github user priteshm commented on the issue: https://github.com/apache/drill/pull/1105 @arina-ielchiieva is this bug ready to commit? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369007#comment-16369007 ] ASF GitHub Bot commented on DRILL-6125: --- Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/1105#discussion_r169048352 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -348,9 +348,12 @@ public void close() throws Exception { logger.debug("Partition sender stopping."); super.close(); ok = false; -if (partitioner != null) { - updateAggregateStats(); - partitioner.clear(); + +synchronized (this) { --- End diff -- 1. Should partitioner be volatile? 2. Should we check if partitioner is not null before synchronization as well (DCL)? > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359167#comment-16359167 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @sachouche I traced through the code. The updateAggregateStats method is called, which then calls the getOutgoingBatches method of the code generated Partitioners. That method is just a simple getter. So no one is acquiring the same lock. But even if someone else was, the code in the close method is single threaded, and synchronize blocks are reentrant. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Minor > Fix For: 1.13.0 > > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347395#comment-16347395 ] ASF GitHub Bot commented on DRILL-6125: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/1105 @sachouche @arina-ielchiieva > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6125) PartitionSenderRootExec can leak memory because close method is not synchronized
[ https://issues.apache.org/jira/browse/DRILL-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347393#comment-16347393 ] ASF GitHub Bot commented on DRILL-6125: --- GitHub user ilooner opened a pull request: https://github.com/apache/drill/pull/1105 DRILL-6125: Fix possible memory leak when query is cancelled. A detailed description of the problem and solution can be found here: https://issues.apache.org/jira/browse/DRILL-6125 You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilooner/drill DRILL-6125 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1105.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1105 commit 1d1725a276c058e8c09e456963bac928d1f062ed Author: Timothy FarkasDate: 2018-01-30T23:55:41Z DRILL-6125: Fix possible memory leak when query is cancelled. > PartitionSenderRootExec can leak memory because close method is not > synchronized > > > Key: DRILL-6125 > URL: https://issues.apache.org/jira/browse/DRILL-6125 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas >Assignee: Timothy Farkas >Priority: Major > > PartitionSenderRootExec creates a PartitionerDecorator and saves it in the > *partitioner* field. The creation of the partitioner happens in the > createPartitioner method. This method get's called by the main fragment > thread. The partitioner field is accessed by the fragment thread during > normal execution but it can also be accessed by the receivingFragmentFinished > method which is a callback executed by the event processor thread. Because > multiple threads can access the partitioner field synchronization is done on > creation and on when receivingFragmentFinished. However, the close method can > also be called by the event processor thread, and the close method does not > synchronize before accessing the partitioner field. Since synchronization is > not done the event processor thread may have an old reference to the > partitioner when a query cancellation is done. Since it has an old reference > the current partitioner can may not be cleared and a memory leak may occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)