Peng Cheng created SPARK-38009: ---------------------------------- Summary: In start-thriftserver.sh arguments, "--hiveconf xxx" should have higher precedence over "--conf spark.hadoop.xxx", or any other hadoop configurations Key: SPARK-38009 URL: https://issues.apache.org/jira/browse/SPARK-38009 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 2.4.8 Environment: The above experiment is conducted on Apache Spark 2.4.7 & 3.2.0 respectively.
OS: Ubuntu 20.04 Java: OpenJDK1.8.0 Reporter: Peng Cheng By convention, An Apache Hive server will read configuration options from different sources with different precedence, and the precedence of "–hiveconf" options in command line options should only be lower than those set by using the {*}set command (see [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration] for detail){*}. It should be higher than hadoop configuration, or any of the configuration files on the server (including, but not limited to hive-site.xml and core-site.xml) This convention is clearly not maintained very well by Apache Spark thrift server. As demonstrated in the following example: If I start this server with diverging option values on "hive.server2.thrift.port": ``` ./sbin/start-thriftserver.sh \ --conf spark.hadoop.hive.server2.thrift.port=10001 \ --hiveconf hive.server2.thrift.port=10002 ``` "–conf"/port 10001 will be preferred over "–hiveconf"/port 10002: ``` Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /home/xxx/spark-2.4.7-bin-hadoop2.7-scala2.12/conf/:/home/xxx/spark-2.4.7-bin-hadoop2.7-scala2.12/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --conf spark.hadoop.hive.server2.thrift.port=10001 --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server spark-internal --hiveconf hive.server2.thrift.port=10002 ======================================== ... 22/01/24 17:32:18 INFO ThriftCLIService: Starting ThriftBinaryCLIService on port 10001 with 5...500 worker threads ``` replacing "--conf" line with an entry in core-site.xml makes no difference. I doubt if this divergence from conventional hive server behaviour is deliberate. Thus I'm calling the precedence of hive configuration options to be set to be on par or maximally similar to that of an Apache Hive server of the same version. To my knowledge, it should be: SET command > --hiveconf > hive-site.xml > hive-default.xml > --conf > core-site.xml >. core-default.xml -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org