[GitHub] [spark] juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT

GitBox Fri, 01 Nov 2019 08:27:15 -0700

juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new 
module sql/thriftserver  with all code and UT
URL: https://github.com/apache/spark/pull/26340#issuecomment-548830411
 
 
   > Add ThriftServerDelagationTokenManager and implement a method verifyToken 
since hive 1.2. don't have it
   > Remove unused Table Type class as I mentioned before. ClassicTypeMapping, 
HiveTableTypeMapping, TableTypeMapping and TableTypeMappingFactory
   > Add VariableSubstitution since it's different in hive 1.2.1 and hive 2.3.5
   > Change package name 
   
   These parts are needed to get it to work in an initial PR, but could we keep 
the others to followup PRs? I think each of them deserves a separate focused 
review:
   
   > return class StructType etc
    
   In this PR, the change to StructType propagates through all layers, all the 
way to ThriftCLIService, and only then it's changed into the thrift format by 
the SchemaMapper. It would be good to see where are all the changes propagated 
by that, and if maybe keeping TableSchema and just using SchemaMapper in more 
places to limit these changes makes sense.
   
   > Implement RowSet classes 
   
   I am looking forward to this change, as let us skip one format conversion 
step in serialization, but I think it deserves some performance benchmarks to 
see what is the impact. For ColumnBasedSet it can for example have performance 
impact whether we do "foreach row, add to each column" or "foreach column, 
construct the Tcolumn from all rows".
   Also now both RowBasedSet and ColumnBasedSet internally keep SparkRows, and 
only at the end convert all rows to Thrift structures, while before it was 
converting on the fly as the rows were added.
   
   > integrate HiveServer2 and SparkThriftServer
   
   This could use a detailed review to see that there are no behaviour changes.
   
   > Some implement class convert to scala such as OerationManager, 
SessionManager etc..
   
   I think this should be reviewed in two separate PRs
   * remove the Hive Operation classes that we override like 
ExecuteStatementOperation, GetTablesOperation etc... and make Spark operation 
inherit from Operation directly - when doing that it would be good to see and 
review what shared code can be moved to the base class (like e.g. all 
operations needing to execute with correct scheduler pool; exception handling 
etc.)
   * rewrite the manager classes like OperationManager, SessionManager - again, 
review what shared code can be moved to the base class...
   
   In both cases I think it also deserves a discussion whether the base class 
should be translated to scala or stay in Java.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT

Reply via email to