juliuszsompolski commented on issue #26340: [WIP][SPARK-29108][SQL] Add new module sql/thriftserver with all code and UT URL: https://github.com/apache/spark/pull/26340#issuecomment-548784101 @AngersZhuuuu I am trying to review it, but it's too many changes happening at the same time... Having the code translated into scala also makes it very hard to diff against the old thriftserver. I think it will be very hard to merge this PR with so many changes. It needs to be more incremental. The first motivation behind this is that right now, because of the hadoop profile changes and Hive dependencies, thriftserver code had to be forked into v1.2.1 and v2.3.5. Could we try to fix that first with minimal changes first? Could you try to modify sql/hive-thriftserver/v2.3.5 code in a **minimal** way so that it can compile and run both with hadoop-2.7 and hadoop-3.2? Try to make the least changes possible - don't translate from java to scala, don't modify the rowsets, still work with TableSchema instead of StructType, don't remove the intermediate Operation classes (if there is any code there that is Hive version dependent, just fix it there or remove it because it's dead as overriden). Just make the minimum changes sp that sql/hive-thriftserver/pom.xml can be pointed to v2.3.5 code regardless of hive.version and compile and run successfully in both hadoop-2.7 and hadoop-3.2 profiles. Then, if we get that to work and be merged, the other changes could be made on top of that foundation: * in 2nd PR, the no longer used v1.2.1 code could be removed and v2.3.5 moved to sql/hive-thriftserver/src Then on top of that other changes could be made in separate PRs: * remove the intermediate not used Operation classes * change the RowSets to work with SparkRows * update the v2.3.5 code with even newer Hive code, if we want that * get rid of other Hive dependency. * change the namespace to org.apache.spark.sql.thriftserver and other renaming... * ... I know I wrote before to do it in a new module, but my intention was to be able to do it in smaller incremental changes, and now it ended in one huge PR anyway. It would be much more managable to review, test and merge each of these changes separately, and I think the committers would be more willing to merge this first step that makes the existing v2.3.5 code work with both hadoop profiles and get rid of the code duplication, and then let working incrementally on top of that. @wangyum @gatorsmile @dongjoon-hyun what do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
