[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-06 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@vrozov Thanks for catching this, I believe you are right. 
hasClouseoutThread guarantees innerNext and close won't be called concurrently. 
However, I still believe innerNext and receivingFragmentFinished could be 
called concurrently, since the ControlMessageHandler thread executes 
recievingFragmentFinished. Additionally in rare cases where a limit query is 
cancelled recievingFragmentFinished and close could be called concurrently as 
well. While reflecting on your comments I also saw another issue where the root 
could be blocked on a next call but a Finished event would not cause root to 
terminate.

In light of all of this I actually think the **synchronized** is not 
sufficient. We will have to have some way to interrupt the execution of the 
root when we received a finish signal and only close out the resources after 
receivingFragmentFinished has been called. Similarly if we receive a finish 
signal we should ignore any cancellation requests instead of trying to cancel 
and finish simultaneously and vice versa.

I will rework the solution to address these issues and let you know when I 
have an update.


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-06 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1105
  
@ilooner @arina-ielchiieva How `innerNext()` and `close()` can execute 
concurrently? Does not `FragmentExecutor.hasCloseoutThread` ensure that either 
`close()` is called on the `run()` thread or `run()` exits if the fragment is 
already cancelled?


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-06 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1105
  
Looks good, @vrozov are you ok with the changes?


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-05 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@arina-ielchiieva Applied review comments.


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-05 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@vrozov The main issue is that InnerNext and Close should not execute 
concurrently. Even when we are using AtomicReferences and volatile the 
following sequence of events could happen which could cause a memory leak:

  1. Let's say there are two Threads. The **Close thread** which starts at 
the beginning of the close method, and the **Next thread** which starts at the 
beginning of the innerNext method.
  1. Now let's say the **Next Thread** runs and checks **ok**. Since close 
has not been called yet **ok** is true.
  1. Now the **Next Thread** is after the ok check.
  1. The **Close thread** now starta executing. And the close thread clears 
the partitioner.
  1. Now after the partitioner is cleared the **Next Thread** can resume 
executing. If the next thread receives an OK_SCHEMA he will allocate a new 
partitioner. Since the OK_SCHEMA message may include records the partitioner 
may partition some data as well.
  1. Now the **Close thread** is done, but there is a partitioner that has 
not been closed, and we will leak memory.

In order to property solve this problem we need to make sure that the 
innerNext and close methods are mutually exclusive so the above scenario can 
never happen. The easiest way to do that is to use the synchronized key word. 
If we use the synchronized keyword then we don't have to use volatile or atomic 
reference.

Also as a side note using synchronized will probably be more efficient 
since a cache flush would only be triggered at the start of the innerNext and 
close method. Alternatively if we used volatile and AtomicReference a cache 
flush would be triggered every time we accessed the ok and partitioner 
variables.



---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-05 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1105
  
@ilooner What state is protected by `syncrhonized`? Why is it not 
sufficient to use `volatile` and `AtomicReference`?


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-03-01 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@arina-ielchiieva @vrozov I believe I have a solution. There were several 
issues with the original code.

1. It made incorrect assumptions about how cache invalidation works with 
java **synchronized**.
2. It assumed **innerNext** and **close** would be called sequentially.

I believe this PR fixes these issues now and I have gone into more detail 
about the problems below.

# 1. Incorrect Cache Invalidation Assumptions

The original code was trying to be smart by trying to reduce 
synchronization overhead on **innerNext**. So the code in **innerNext** did not 
synchronize before changing the partitioner object since this would be called 
often. The code in **close()** and ** receivingFragmentFinished()** 
synchronized before accessing the partitioner with the intention that this 
would trigger an update of the partitioner variable state across all threads. 
Unfortunately, this assumption is invalid (see 
https://stackoverflow.com/questions/22706739/does-synchronized-guarantee-a-thread-will-see-the-latest-value-of-a-non-volatile).
 Every thread that accesses a variable must synchronize before accessing a 
variable in order to properly invalidate cached data on a core. 

For example if **Thread A** modifies **Variable 1** then **Thread B** 
synchronizes before accessing **Variable 1** then there is no guarantee 
**Thread B** will see the most updated value for **Variable 1** since it might .

## Solution

In summary the right thing to do is the simple thing. Make the methods 
synchronized. Unfortunately there is no way to outsmart the system and reduce 
synchronization overhead without causing race conditions.

# 2. Concurrent InnerNext and Close Calls

The original code did not consider the case that innerNext was in the 
middle of execution when close was called. It did try to handle the case where 
**innerNext** could be called after **close** by setting the **ok** variable. 
But it didn't even do that right because there was no synchronization around 
the **ok** variable.

## Solution

The right thing to do is the simple thing. Make sure the methods are 
synchronized so close has to wait until innerNext is done before executing. 
Also when a query is cancelled the executing thread should be interrupted the 
thread running innerNext incase it is on a blocking call.


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-02-23 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@priteshm @arina-ielchiieva I should have updated this PR earlier this 
week, here is my update. After reflecting on Arina's comments and reading some 
more docs about how java implements volatile and synchronization, I think this 
solution might not fix the original race condition. I need to to more reading 
to get a better understanding. Additionally I realized there is another race 
condition where two threads are simultaneously calling close and innerNext 
which could cause a memory leak. Haven't had a chance to dig further this week, 
so I will try to wrap this up next week.


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-02-22 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1105
  
@arina-ielchiieva is this bug ready to commit?


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-02-09 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@sachouche I traced through the code. The updateAggregateStats method is 
called, which then calls the getOutgoingBatches method of the code generated 
Partitioners. That method is just a simple getter. So no one is acquiring the 
same lock. But even if someone else was, the code in the close method is single 
threaded, and synchronize blocks are reentrant. 


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-01-31 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@sachouche @arina-ielchiieva 


---