[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-11-07 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678526#comment-16678526
 ] 

Sankar Hariappan edited comment on HIVE-20682 at 11/8/18 6:49 AM:
--

06.patch fixed test failure and findbugs issue.


was (Author: sankarh):
06.patch fixed test failure and windbags issue.

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch, HIVE-20682.05.patch, 
> HIVE-20682.06.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> Also, when we reset *sessionHive* object with new one due to changes in 
> *sessionConf*, the old one should be closed when no async thread is referring 
> to it. This can be done using "*finalize*" method of Hive object where we can 
> close HMS connection when Hive object is garbage collected.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-31 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665492#comment-16665492
 ] 

Sankar Hariappan edited comment on HIVE-20682 at 10/31/18 7:03 AM:
---

[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

cc [~daijy], [~thejas]


was (Author: sankarh):
[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

 

cc [~daijy], [~thejas], [~maheshk114]

 

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to 

[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-31 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665492#comment-16665492
 ] 

Sankar Hariappan edited comment on HIVE-20682 at 10/31/18 7:02 AM:
---

[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

 

cc [~daijy], [~thejas], [~maheshk114]

 


was (Author: sankarh):
[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

Btw, there is one another issue where Hive.get(sessionConf) directly stores the 
reference to sessionConf in Hive object which makes isCompatible() method to 
return true always.
{code:java}
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
  Hive db = hiveDB.get();
  if (db == null || !db.isCurrentUserOwner() || needsRefresh
  || (c != null && !isCompatible(db, c, isFastCheck))) {
if (db != null) {
  LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + 
needsRefresh +
  ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
  closeCurrent();
}
db = create(c, doRegisterAllFns);
  }
  if (c != null) {
db.conf = c;
  }
  return db;
}{code}
So, as of today, Thread local Hive object never gets recreated even if we 
change any HMS configs. Only REPL commands re-create Hive object as it uses 
different HiveConf object not sessionConf to set HMS configs.
  
 This issue can be resolved by always cloning HiveConf object instead of 
storing the reference to input HiveConf.

But, it will be problem for Hive.getWithFastCheck(conf). I think, this method 
is added as an optimisation to avoid checking individual HMS configs. It only 
checks if the input conf object is same as the one stored in Hive object and if 
not, then re-create Hive object.

So, this needs to be fixed carefully. 

Please share your thoughts on this.

cc [~daijy], [~thejas], [~maheshk114]

 

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a 

[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-31 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669666#comment-16669666
 ] 

Sankar Hariappan edited comment on HIVE-20682 at 10/31/18 6:59 AM:
---

[~maheshk114]

I think there is misunderstanding. There won't be any impact in current usage.
 * Reference count of Hive object should be incremented only when some thread 
sets it in Thread local. Not for every Hive.get() call.
 * It will be decremented when the thread removes it from Thread local. If the 
reference count is 0, then current thread will close the MS connection as well.
 * If the Thread local is forcefully overwritten without removing the previous 
Thread local properly, then the set() method ensures the previous one is 
removed gracefully before overwriting with new one. 
 * So, just like current usage, it is expected to close the Hive object only 
once within a thread after setting it in thread local.

Regarding, changes in sessionConf, we need to do the following.
 * Reset sessionHive object using new sessionConf. But, the MS connection in 
sessionHive object should be closed based on reference count as it might be 
referred by some other async thread. All these methods are synchronised and so 
no race conditions possible.
 * If the previous query execution itself has recreated Thread local due to 
config change, then sessionHive should be reset to previous Thread local Hive 
itself instead of re-allocating.

Let me update the patch with this change and please review it.

cc [~pvary], [~daijy]


was (Author: sankarh):
[~maheshk114]

I think there is misunderstanding. There won't be any impact in current usage.
 * Reference count of Hive object should be incremented only when some thread 
sets it in Thread local. Not for every Hive.get() call.
 * It will be decremented when the thread removes it from Thread local. If the 
reference count is 0, then current thread will close the MS connection as well.
 * If the Thread local is forcefully overwritten without removing the previous 
Thread local properly, then the set() method ensures the previous one is 
removed gracefully before overwriting with new one. 
 * So, just like current usage, it is expected to close the Hive object only 
once within a thread after setting it in thread local.

Regarding, changes in sessionConf, we need to do the following.
 * Reset sessionHive object using new sessionConf. But, the MS connection in 
sessionHive object should be closed based on reference count as it might be 
referred by some other async thread. All these methods are synchronised and so 
no race conditions possible.
 * Currently, sessionHive stores the sessionConf object reference and the same 
sessionConf is updated for any configurations set within the session. This 
makes isCompatible() to return true always even if there is MS related config 
changes. So, to fix this, we need to recreate sessionConf object when there is 
a set command.

Let me update the patch with this change and please review it.

cc [~pvary], [~daijy]

> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> 

[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-30 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669573#comment-16669573
 ] 

mahesh kumar behera edited comment on HIVE-20682 at 10/31/18 4:02 AM:
--

[~sankarh]  [~pvary]  [~thejas]  [~daijy]

I have concerned for the reference count design as 
1.  It will break the current usage pattern where we don't bother to close the 
old hive object and any code any where can call hive.get without bothering 
about leak. The only place close has to be called is during thread exit.
2. Even with reference count , we can not allow closure of session hive object 
without proper synchronization control.

i think there are two constrains within which we have to do the code change 
1. If session hive config is changed, then session hive object should also be 
reset.
2. The session hive object is shared by async threads, so some synchronization 
control has to be added if the session hive object is changed.

 isFastCheck behavior should not be changed as i feel it is supposed to be like 
that. We just have to be cautious while using get with isFastCheck set to true. 




was (Author: maheshk114):
[~sankarh]  [~pvary]  [~thejas]  [~daijy]

I have concerned for the reference count design as 
1.  It will break the current usage pattern where we don't bother to close the 
old hive object and any code any where can call hive.get without bothering 
about leak. The only place close has to be called is during thread exit.
2. Even with reference count , we can not allow closure of session hive object 
without proper synchronization control.

i think there are two constrains within which we have to do the code change 
1. If session hive config is changed, then session hive object should also be 
reset.
2. The session hive object is shared by async threads, so some synchronization 
control has to be added if the session hive object is changed.

 isFastCheck behavior should not be changed as i feel it is supposed to be like 
that. We just have to be cautious when to use get with isFastCheck set to true. 



> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-30 Thread mahesh kumar behera (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669573#comment-16669573
 ] 

mahesh kumar behera edited comment on HIVE-20682 at 10/31/18 4:01 AM:
--

[~sankarh]  [~pvary]  [~thejas]  [~daijy]

I have concerned for the reference count design as 
1.  It will break the current usage pattern where we don't bother to close the 
old hive object and any code any where can call hive.get without bothering 
about leak. The only place close has to be called is during thread exit.
2. Even with reference count , we can not allow closure of session hive object 
without proper synchronization control.

i think there are two constrains within which we have to do the code change 
1. If session hive config is changed, then session hive object should also be 
reset.
2. The session hive object is shared by async threads, so some synchronization 
control has to be added if the session hive object is changed.

 isFastCheck behavior should not be changed as i feel it is supposed to be like 
that. We just have to be cautious when to use get with isFastCheck set to true. 




was (Author: maheshk114):
[~sankarh]  [~pvary]  [~thejas]  [~daijy]

I have concerned for the reference count design as 
1.  It will break the current usage pattern where we don't bother to close the 
old hive object and any code any where can call hive.get without bothering 
about leak. The only place close has to be called is during thread exit.
2. Even with reference count , we can not allow closure of session hive object 
without proper synchronization control.

i think there are two constrains within which we have to do the code change 
1. If session hive config is changed, then session hive object should also be 
reset.
2. The session hive object is shared by async threads, so some synchronization 
control has to be added if the session hive object is changed.
3. isFastCheck behavior should not be changed as i feel it is supposed to be 
like that. We just have to be cautious when to use get with isFastCheck set to 
true. 



> Async query execution can potentially fail if shared sessionHive is closed by 
> master thread.
> 
>
> Key: HIVE-20682
> URL: https://issues.apache.org/jira/browse/HIVE-20682
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20682.01.patch, HIVE-20682.02.patch, 
> HIVE-20682.03.patch, HIVE-20682.04.patch
>
>
> *Problem description:*
> The master thread initializes the *sessionHive* object in *HiveSessionImpl* 
> class when we open a new session for a client connection and by default all 
> queries from this connection shares the same sessionHive object. 
> If the master thread executes a *synchronous* query, it closes the 
> sessionHive object (referred via thread local hiveDb) if  
> {{Hive.isCompatible}} returns false and sets new Hive object in thread local 
> HiveDb but doesn't change the sessionHive object in the session. Whereas, 
> *asynchronous* query execution via async threads never closes the sessionHive 
> object and it just creates a new one if needed and sets it as their thread 
> local hiveDb.
> So, the problem can happen in the case where an *asynchronous* query is being 
> executed by async threads refers to sessionHive object and the master thread 
> receives a *synchronous* query that closes the same sessionHive object. 
> Also, each query execution overwrites the thread local hiveDb object to 
> sessionHive object which potentially leaks a metastore connection if the 
> previous synchronous query execution re-created the Hive object.
> *Possible Fix:*
> The *sessionHive* object could be shared my multiple threads and so it 
> shouldn't be allowed to be closed by any query execution threads when they 
> re-create the Hive object due to changes in Hive configurations. But the Hive 
> objects created by query execution threads should be closed when the thread 
> exits.
> So, it is proposed to have an *isAllowClose* flag (default: *true*) in Hive 
> object which should be set to *false* for *sessionHive* and would be 
> forcefully closed when the session is closed or released.
> cc [~pvary]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-26 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665492#comment-16665492
 ] 

Sankar Hariappan edited comment on HIVE-20682 at 10/26/18 6:43 PM:
---

[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

Btw, there is one another issue where Hive.get(sessionConf) directly stores the 
reference to sessionConf in Hive object which makes isCompatible() method to 
return true always.
{code:java}
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
  Hive db = hiveDB.get();
  if (db == null || !db.isCurrentUserOwner() || needsRefresh
  || (c != null && !isCompatible(db, c, isFastCheck))) {
if (db != null) {
  LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + 
needsRefresh +
  ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
  closeCurrent();
}
db = create(c, doRegisterAllFns);
  }
  if (c != null) {
db.conf = c;
  }
  return db;
}{code}
So, as of today, Thread local Hive object never gets recreated even if we 
change any HMS configs. Only REPL commands re-create Hive object as it uses 
different HiveConf object not sessionConf to set HMS configs.
  
 This issue can be resolved by always cloning HiveConf object instead of 
storing the reference to input HiveConf.

But, it will be problem for Hive.getWithFastCheck(conf). I think, this method 
is added as an optimisation to avoid checking individual HMS configs. It only 
checks if the input conf object is same as the one stored in Hive object and if 
not, then re-create Hive object.

So, this needs to be fixed carefully. 

Please share your thoughts on this.

cc [~daijy], [~thejas], [~maheshk114]

 


was (Author: sankarh):
[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

Btw, there is one another issue where Hive.get(sessionConf) directly stores the 
reference to sessionConf in Hive object which makes isCompatible() method to 
return true always.
{code:java}
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
  Hive db = hiveDB.get();
  if (db == null || !db.isCurrentUserOwner() || needsRefresh
  || (c != null && !isCompatible(db, c, isFastCheck))) {
if (db != null) {
  LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + 
needsRefresh +
  ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
  closeCurrent();
}
db = create(c, doRegisterAllFns);
  }
  if (c != null) {
db.conf = c;
  }
  return db;
}{code}
So, as of today, Thread local Hive object never gets recreated even if we 

[jira] [Comment Edited] (HIVE-20682) Async query execution can potentially fail if shared sessionHive is closed by master thread.

2018-10-26 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665492#comment-16665492
 ] 

Sankar Hariappan edited comment on HIVE-20682 at 10/26/18 6:16 PM:
---

[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

Btw, there is one another issue where Hive.get(sessionConf) directly stores the 
reference to sessionConf in Hive object which makes isCompatible() method to 
return true always.
{code:java}
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
  Hive db = hiveDB.get();
  if (db == null || !db.isCurrentUserOwner() || needsRefresh
  || (c != null && !isCompatible(db, c, isFastCheck))) {
if (db != null) {
  LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + 
needsRefresh +
  ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
  closeCurrent();
}
db = create(c, doRegisterAllFns);
  }
  if (c != null) {
db.conf = c;
  }
  return db;
}{code}
So, as of today, Thread local Hive object never gets recreated even if we 
change any HMS configs. Only REPL commands re-create Hive object as it uses 
different HiveConf object not sessionConf to set HMS configs.
 
This issue can be resolved by always cloning HiveConf object instead of storing 
the input HiveConf reference.

But, it will be problem for Hive.getWithFastCheck(conf). I think, this method 
is added as an optimisation to avoid checking individual HMS configs. It only 
checks if the input conf object is same as the one stored in Hive object and if 
not, then re-create Hive object.

So, this needs to be fixed carefully. 

Please share your thoughts on this.

cc [~daijy], [~thejas], [~maheshk114]

 


was (Author: sankarh):
[~pvary], Thanks for your thoughts!
 * H1 is sessionHive object and will be preserved in session object through out 
the life time of session. HMS connection of sessionHive will be closed only 
when we close the session.
 * H2 is local to given thread and will be closed when we set sessionHive again 
in next query. ThreadLocalHive.set() method closes the previous thread local 
Hive object H2 before overwriting it.
 * Also, if an async thread reallocates Hive object (let's say H3), then it 
will be closed by Hive.closeCurrent call when the thread exits.
 * So, I don't think there would be any HMS connection leak here.
 * parentHive is always sessionHive object where allowClose flag is false. So, 
the "assert (!parentHive.allowClose());" will never fail.

But I agree, there is a chance that each and every query re-creates Hive object 
if the HMS relevant configs are changed in between queries. I missed this part 
where sessionConf would be changed when user sets any Hive configurations via 
cli. 

So, I think, the earlier thread reference count solution would solve this 
issue. [~maheshk114], please comment if you think otherwise.

Btw, there is one another issue where Hive.get(sessionConf) directly stores the 
reference to sessionConf in Hive object which makes isCompatible() method to 
return true always.
{code:java}
private static Hive getInternal(HiveConf c, boolean needsRefresh, boolean 
isFastCheck,
boolean doRegisterAllFns) throws HiveException {
  Hive db = hiveDB.get();
  if (db == null || !db.isCurrentUserOwner() || needsRefresh
  || (c != null && !isCompatible(db, c, isFastCheck))) {
if (db != null) {
  LOG.debug("Creating new db. db = " + db + ", needsRefresh = " + 
needsRefresh +
  ", db.isCurrentUserOwner = " + db.isCurrentUserOwner());
  closeCurrent();
}
db = create(c, doRegisterAllFns);
  }
  if (c != null) {
db.conf = c;
  }
  return db;
}{code}
So, as of today, Thread local Hive object never gets recreated even if we 
change