AngersZhuuuu opened a new pull request #26141:
URL: https://github.com/apache/spark/pull/26141


   ### What changes were proposed in this pull request?
   Run sql in spark thrift server, each session 's thrift server about method 
will be called in one thread, but when running query statement,  we have two 
mode:
    1. sync
    2. async
    
https://github.com/apache/spark/blob/5a482e72091c8db940408905e8c044f7f5d7814f/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L205-L238
   
   In sync mode, we just submit query in current session's corresponding thread 
and wait Spark to running query and return result,  and the query method will 
always wait for query return.
   In async mode, in SparkExecuteStatementOperation, we will submit query in a 
backend thread pool, and update operation state,  after submitted to backend 
thread poll, ExecuteStatement method will return a OperationHandle to client 
side, and client side will request operation status continuously. after backend 
thread running sql and return , it will update corresponding  operation status, 
when client got operation status is final status, it will got error or start 
fetching result of this operation.
   
   When we use pyhive connect to SparkThriftServer, it will run statement in 
sync mode.
   When we query data of hive table , it will check serde class in 
HiveTableScanExec#addColumnMetadataToConf
   
   
https://github.com/apache/spark/blob/5a482e72091c8db940408905e8c044f7f5d7814f/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala#L123
   
   Since we run statement in sync mode, it will use HiveSession's SessionState, 
 and use it's conf's classLoader. then error happened.
   We should reset it when we start run sql in sync mode.
   ### Why are the changes needed?
   Fix bug
   
   ### Does this PR introduce any user-facing change?
   NO
   
   
   ### How was this patch tested?
   UT


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to