[GitHub] [spark] juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT

GitBox Fri, 01 Nov 2019 06:17:28 -0700

juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new 
module sql/thriftserver  with all code and UT
URL: https://github.com/apache/spark/pull/26340#issuecomment-548784101
 
 
   @AngersZhuuuu I am trying to review it, but it's too many changes happening 
at the same time... Having the code translated into scala also makes it very 
hard to diff against the old thriftserver. I think it will be very hard to 
merge this PR with so many changes. It needs to be more incremental.
   
   The first motivation behind this is that right now, because of the hadoop 
profile changes and Hive dependencies, thriftserver code had to be forked into 
v1.2.1 and v2.3.5.
   Could we try to fix that first with minimal changes first?
   Could you try to modify sql/hive-thriftserver/v2.3.5 code in a **minimal** 
way so that it can compile and run both with hadoop-2.7 and hadoop-3.2? Try to 
make the least changes possible - don't translate from java to scala, don't 
modify the rowsets, still work with TableSchema instead of StructType, don't 
remove the intermediate Operation classes (if there is any code there that is 
Hive version dependent, just fix it there or remove it because it's dead as 
overriden). Just make the minimum changes sp that sql/hive-thriftserver/pom.xml 
can be pointed to v2.3.5 code regardless of hive.version and compile and run 
successfully in both hadoop-2.7 and hadoop-3.2 profiles.
   
   Then, if we get that to work and be merged, the other changes could be made 
on top of that foundation:
   * in 2nd PR, the no longer used v1.2.1 code could be removed and v2.3.5 
moved to sql/hive-thriftserver/src
   Then on top of that other changes could be made in separate PRs:
   * remove the intermediate not used Operation classes
   * change the RowSets to work with SparkRows
   * update the v2.3.5 code with even newer Hive code, if we want that
   * get rid of other Hive dependency.
   * change the namespace to org.apache.spark.sql.thriftserver and other 
renaming...
   * ...
   
   I know I wrote before to do it in a new module, but my intention was to be 
able to do it in smaller incremental changes, and now it ended in one huge PR 
anyway. It would be much more managable to review, test and merge each of these 
changes separately, and I think the committers would be more willing to merge 
this first step that makes the existing v2.3.5 code work with both hadoop 
profiles and get rid of the code duplication, and then let working 
incrementally on top of that.
   
   @wangyum @gatorsmile @dongjoon-hyun what do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT

Reply via email to