[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131329214 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -134,6 +135,16 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Hive 1.2 + not supported in CLI throw new RuntimeException("Remote operations not supported") } +// Respect the configurations set by --hiveconf from the command line +// (based on Hive's CliDriver). +val hiveConfFromCmd = sessionState.getOverriddenConfigurations.entrySet().asScala +val newHiveConf = hiveConfFromCmd.map { kv => + // If the same property is configured by spark.hadoop.xxx, we ignore it and + // obey settings from spark properties + val k = kv.getKey + val v = sys.props.getOrElseUpdate(SPARK_HADOOP_PROP_PREFIX + k, kv.getValue) --- End diff -- I checked the whole project that `newClientForExecution ` is only used at [HiveThriftServer2.scala#L58](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L58), [HiveThriftServer2.scala#L86](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L86) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131323098 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" +sys.props.foreach { case (key, value) => --- End diff -- As I mentioned above, we should not do this for `newClientForExecution`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131322795 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -134,6 +135,16 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Hive 1.2 + not supported in CLI throw new RuntimeException("Remote operations not supported") } +// Respect the configurations set by --hiveconf from the command line +// (based on Hive's CliDriver). +val hiveConfFromCmd = sessionState.getOverriddenConfigurations.entrySet().asScala +val newHiveConf = hiveConfFromCmd.map { kv => + // If the same property is configured by spark.hadoop.xxx, we ignore it and + // obey settings from spark properties + val k = kv.getKey + val v = sys.props.getOrElseUpdate(SPARK_HADOOP_PROP_PREFIX + k, kv.getValue) --- End diff -- `newClientForExecution ` is used for us to read/write hive serde tables. This is the major concern I have. Let us add another parameter in `newTemporaryConfiguration `. When `newClientForExecution ` is calling `newTemporaryConfiguration `, we should not get the hive conf from sys.prop. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131322107 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -50,6 +50,7 @@ private[hive] object SparkSQLCLIDriver extends Logging { private val prompt = "spark-sql" private val continuedPrompt = "".padTo(prompt.length, ' ') private var transport: TSocket = _ + private final val SPARK_HADOOP_PROP_PREFIX = "spark.hadoop." --- End diff -- After thinking more, I think we should just consider `spark.hadoop.` in this PR, unless we get the other feedbacks from the community. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131321807 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -134,6 +135,16 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Hive 1.2 + not supported in CLI throw new RuntimeException("Remote operations not supported") } +// Respect the configurations set by --hiveconf from the command line +// (based on Hive's CliDriver). +val hiveConfFromCmd = sessionState.getOverriddenConfigurations.entrySet().asScala +val newHiveConf = hiveConfFromCmd.map { kv => + // If the same property is configured by spark.hadoop.xxx, we ignore it and + // obey settings from spark properties + val k = kv.getKey + val v = sys.props.getOrElseUpdate(SPARK_HADOOP_PROP_PREFIX + k, kv.getValue) --- End diff -- `newClientForExecution` is used ONLY in HiveThriftServer2, where it is used to get a hiveconf. There is no more a execution hive client, IMO this method be removed. This activity happens after `SparkSQLEnv.init`, so it is OK for `spark.hadoop.` properties. I realize that `--hiveconf` should be added to `sys.props` as `spark.hadoop.xxx` before `SparkSQLEnv.init` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131320806 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -134,6 +135,16 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Hive 1.2 + not supported in CLI throw new RuntimeException("Remote operations not supported") } +// Respect the configurations set by --hiveconf from the command line +// (based on Hive's CliDriver). +val hiveConfFromCmd = sessionState.getOverriddenConfigurations.entrySet().asScala +val newHiveConf = hiveConfFromCmd.map { kv => + // If the same property is configured by spark.hadoop.xxx, we ignore it and + // obey settings from spark properties + val k = kv.getKey + val v = sys.props.getOrElseUpdate(SPARK_HADOOP_PROP_PREFIX + k, kv.getValue) --- End diff -- When we build `SparkConf` in `SparkSQLEnv`, we get the conf from system prop because `loadDefaults` is set to `true`. That is the way we pass `-hiveconf` values to `sc.hadoopConfiguration`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131320240 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131320143 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -157,12 +168,8 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Execute -i init files (always in silent mode) cli.processInitFiles(sessionState) -// Respect the configurations set by --hiveconf from the command line -// (based on Hive's CliDriver). -val it = sessionState.getOverriddenConfigurations.entrySet().iterator() -while (it.hasNext) { - val kv = it.next() - SparkSQLEnv.sqlContext.setConf(kv.getKey, kv.getValue) +newHiveConf.foreach{ kv => --- End diff -- thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131320120 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -157,12 +168,8 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Execute -i init files (always in silent mode) cli.processInitFiles(sessionState) -// Respect the configurations set by --hiveconf from the command line -// (based on Hive's CliDriver). -val it = sessionState.getOverriddenConfigurations.entrySet().iterator() --- End diff -- --hiveconf abc.def will be add to system properties as spark.hadoop.abc.def if is not existed , before `SparkSQLEnv.init` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131318739 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -157,12 +168,8 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Execute -i init files (always in silent mode) cli.processInitFiles(sessionState) -// Respect the configurations set by --hiveconf from the command line -// (based on Hive's CliDriver). -val it = sessionState.getOverriddenConfigurations.entrySet().iterator() --- End diff -- What is the reason you move it to line 140? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131317729 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -134,6 +135,16 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Hive 1.2 + not supported in CLI throw new RuntimeException("Remote operations not supported") } +// Respect the configurations set by --hiveconf from the command line +// (based on Hive's CliDriver). +val hiveConfFromCmd = sessionState.getOverriddenConfigurations.entrySet().asScala +val newHiveConf = hiveConfFromCmd.map { kv => + // If the same property is configured by spark.hadoop.xxx, we ignore it and + // obey settings from spark properties + val k = kv.getKey + val v = sys.props.getOrElseUpdate(SPARK_HADOOP_PROP_PREFIX + k, kv.getValue) --- End diff -- Let me try to summarize the impacts of these changes. The [initial call](https://github.com/yaooqinn/spark/blob/5043eb69b41d1d0263e8814da27a934491bc936c/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L86) of `newTemporaryConfiguration` is before we setting `sys.props`. The subsequent call of `newTemporaryConfiguration` in `newClientForExecution` will be used for Hive execution clients. Thus, the changes will affect Hive execution clients. Could you check all the codes in Spark are using `sys.prop`? Will this change impact them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131315665 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -157,12 +168,8 @@ private[hive] object SparkSQLCLIDriver extends Logging { // Execute -i init files (always in silent mode) cli.processInitFiles(sessionState) -// Respect the configurations set by --hiveconf from the command line -// (based on Hive's CliDriver). -val it = sessionState.getOverriddenConfigurations.entrySet().iterator() -while (it.hasNext) { - val kv = it.next() - SparkSQLEnv.sqlContext.setConf(kv.getKey, kv.getValue) +newHiveConf.foreach{ kv => --- End diff -- `foreach{` -> `foreach {` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131314418 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- @yaooqinn Please follow what @tejasapatil said and create a util function. In addition, `newTemporaryConfiguration` is being used for `SparkSQLCLIDriver`, and thus, please update the function description of `newTemporaryConfiguration`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131194926 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -50,6 +50,7 @@ private[hive] object SparkSQLCLIDriver extends Logging { private val prompt = "spark-sql" private val continuedPrompt = "".padTo(prompt.length, ' ') private var transport: TSocket = _ + private final val SPARK_HADOOP_PROP_PREFIX = "spark.hadoop." --- End diff -- `spark.hadoop.` was tribal knowledge and was a sneaky way to stick values into Hadoop `Configuration` object (which can later also pass on to `HiveConf`). What does `spark.hive.` do ? Have never seen such configs and would like to know. Keeping that aside, are you proposing to drop that prefix at L145 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131188713 --- Diff: docs/configuration.md --- @@ -2326,7 +2326,7 @@ from this directory. # Inheriting Hadoop Cluster Configuration If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that -should be included on Spark's classpath: +should be included on Spark's class path: --- End diff -- nit: everywhere in the documentation `classpath` is being used so changing just one instance will make the doc inconsistent. Lets keep this as it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131187701 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark applications interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's class path. + +Multiple running applications might require different Hadoop/Hive client side configurations. +You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, `hive-site.xml` in +Spark's class path for each application, but it is not very convenient and these +files are best to be shared with common properties to avoid hard-coding certain configurations. + +The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`. +They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defalut.conf` + +In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For +instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties. + +{% highlight scala %} +val conf = new SparkConf().set("spark.hadoop.abc.def","xyz") +val sc = new SparkContext(conf) +{% endhighlight %} + +Also, you can modify or add configurations at runtime: +{% highlight bash %} +./bin/spark-submit \ + --name "My app" \ + --master local[4] \ + --conf spark.eventLog.enabled=false \ + --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ + --conf spark.hadoop.abc.def=xyz \ + myApp.jar +{% endhighlight %} + +## Typical Hadoop/Hive Configurations --- End diff -- curious : whats the motive behind having this section ? I feel that we should not get into suggesting these configs external to spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131185632 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- lets move this to a util method so that we know this is done in 2 places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131095350 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -50,6 +50,7 @@ private[hive] object SparkSQLCLIDriver extends Logging { private val prompt = "spark-sql" private val continuedPrompt = "".padTo(prompt.length, ' ') private var transport: TSocket = _ + private final val SPARK_HADOOP_PROP_PREFIX = "spark.hadoop." --- End diff -- good point. I see `spark.hive` in some of my configs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131094720 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark applications interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's class path. + +Multiple running applications might require different Hadoop/Hive client side configurations. +You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, `hive-site.xml` in +Spark's class path for each application, but it is not very convenient and these +files are best to be shared with common properties to avoid hard-coding certain configurations. + +The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`. +They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defalut.conf` + +In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For +instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties. + +{% highlight scala %} +val conf = new SparkConf().set("spark.hadoop.abc.def","xyz") +val sc = new SparkContext(conf) +{% endhighlight %} + +Also, you can modify or add configurations at runtime: +{% highlight bash %} +./bin/spark-submit \ + --name "My app" \ + --master local[4] \ + --conf spark.eventLog.enabled=false \ + --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ + --conf spark.hadoop.abc.def=xyz \ + myApp.jar +{% endhighlight %} + +## Typical Hadoop/Hive Configurations + + + + spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version + 1 + +The file output committer algorithm version, valid algorithm version number: 1 or 2. +Version 2 may have better performance, but version 1 may handle failures better in certain situations, +as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. + + + + + spark.hadoop.fs.hdfs.impl.disable.cache --- End diff -- this is a pretty dangerous one to point people at, especially since it's fixed in future Hadoop versions & backported to some distros âand the cost of creating a new HDFS client on every worker can get very expensive if you have a spark process with many threads, all fielding work from the same user (thread pools, IPC connections, ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131093892 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark applications interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's class path. + +Multiple running applications might require different Hadoop/Hive client side configurations. +You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, `hive-site.xml` in +Spark's class path for each application, but it is not very convenient and these +files are best to be shared with common properties to avoid hard-coding certain configurations. --- End diff -- "best shared" You can'd do that anyway on a production Spark on Yarn cluster as if you did., lots of other things would break. How about ``` In a Spark cluster running on YARN, these configuration files are set cluster-wide, and cannot safely be changed by the application. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131093320 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,61 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark applications interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive --- End diff -- s/applications/r/application is/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131074348 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's ClassPath. --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131068575 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's ClassPath. + +In most cases, you may have more than one applications running and rely on some different Hadoop/Hive --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131068501 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's ClassPath. + +In most cases, you may have more than one applications running and rely on some different Hadoop/Hive +client side configurations. You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, +`hive-site.xml` in Spark's ClassPath for each application, but it is not very convenient and these +files are best to be shared with common properties to avoid hard-coding certain configurations. + +The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`. +They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defalut.conf` + +In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For +instance. Spark allows you to simply create an empty conf and set spark/spark hadoop properties. + +{% highlight scala %} +val conf = new SparkConf().set("spark.hadoop.abc.def","xyz") +val sc = new SparkContext(conf) +{% endhighlight %} + +Also, you can modify or add configurations at runtime: +{% highlight bash %} +./bin/spark-submit \ + --name "My app" \ + --master local[4] \ + --conf spark.eventLog.enabled=false \ + --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ + --conf spark.hadoop.abc.def=xyz + myApp.jar +{% endhighlight %} + +## Typical Hadoop/Hive Configurations + + + + spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version + 1 + +The file output committer algorithm version, valid algorithm version number: 1 or 2. +Version 2 may have better performance, but version 1 may handle failures better in certain situations, +as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. + + + + + spark.hadoop.fs.hdfs.impl.disable.cache + false + +Don't cache 'hdfs' filesystem instances. Set true if HDFS Token Expiry in long-running spark applicaitons.https://issues.apache.org/jira/browse/HDFS-9276;>HDFS-9276. --- End diff -- @gatorsmile i guess fs.hdfs.impl.disable.cache true means disable caching dfs client instance no tokens, FileSystem.get() will always create a new one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131060237 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's ClassPath. + +In most cases, you may have more than one applications running and rely on some different Hadoop/Hive --- End diff -- `In most cases, you may have more than one applications running and rely on some different Hadoop/Hive` -> `Multiple running applications might require different Hadoop/Hive client side configurations.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131059952 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's ClassPath. --- End diff -- `ClassPath ` -> `class path` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131059673 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive --- End diff -- `Application ` -> `applications` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131059429 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` to a location containing the configuration files. + +# Custom Hadoop/Hive Configuration + +If your Spark Application interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark's ClassPath. + +In most cases, you may have more than one applications running and rely on some different Hadoop/Hive +client side configurations. You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, +`hive-site.xml` in Spark's ClassPath for each application, but it is not very convenient and these +files are best to be shared with common properties to avoid hard-coding certain configurations. + +The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`. +They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defalut.conf` + +In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For +instance. Spark allows you to simply create an empty conf and set spark/spark hadoop properties. + +{% highlight scala %} +val conf = new SparkConf().set("spark.hadoop.abc.def","xyz") +val sc = new SparkContext(conf) +{% endhighlight %} + +Also, you can modify or add configurations at runtime: +{% highlight bash %} +./bin/spark-submit \ + --name "My app" \ + --master local[4] \ + --conf spark.eventLog.enabled=false \ + --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ + --conf spark.hadoop.abc.def=xyz + myApp.jar +{% endhighlight %} + +## Typical Hadoop/Hive Configurations + + + + spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version + 1 + +The file output committer algorithm version, valid algorithm version number: 1 or 2. +Version 2 may have better performance, but version 1 may handle failures better in certain situations, +as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815. + + + + + spark.hadoop.fs.hdfs.impl.disable.cache + false + +Don't cache 'hdfs' filesystem instances. Set true if HDFS Token Expiry in long-running spark applicaitons.https://issues.apache.org/jira/browse/HDFS-9276;>HDFS-9276. --- End diff -- `When true, HDFS instances do not cache delegation tokens. With the cached tokens, HDFS delegation token updates might fail in long-running Spark applications.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r131057644 --- Diff: docs/configuration.md --- @@ -2335,5 +2335,59 @@ The location of these configuration files varies across Hadoop versions, but a common location is inside of `/etc/hadoop/conf`. Some tools create configurations on-the-fly, but offer a mechanisms to download copies of them. -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` --- End diff -- @zsxwing @liancheng Could you please take a look at the documentation? Anything is missing or inaccurate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r130792745 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -283,4 +283,17 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { "SET conf3;" -> "conftest" ) } + + test("SPARK-21451: spark.sql.warehouse.dir should respect options in --hiveconf") { +runCliWithin(1.minute)("set spark.sql.warehouse.dir;" -> warehousePath.getAbsolutePath) + } + + test("SPARK-21451: Apply spark.hadoop.* configurations") { --- End diff -- Yes, after sc initialized, spark.hadoop.hive.metastore.warehouse.dir will be translated into a hadoop conf hive.metastore.warehouse.dir as an alternative of warehouse dir. This test case couldn't tell whether this pr works. CliSuite may not see these values only if we explicitly set them to SqlConf. The original code did break another test case anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r130536572 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala --- @@ -50,6 +50,7 @@ private[hive] object SparkSQLCLIDriver extends Logging { private val prompt = "spark-sql" private val continuedPrompt = "".padTo(prompt.length, ' ') private var transport: TSocket = _ + private final val SPARK_HADOOP_PROP_PREFIX = "spark.hadoop." --- End diff -- Just a question, why the prefix has to be `spark.hadoop.`? See the related PR: https://github.com/apache/spark/pull/2379 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r130534640 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -283,4 +283,17 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { "SET conf3;" -> "conftest" ) } + + test("SPARK-21451: spark.sql.warehouse.dir should respect options in --hiveconf") { +runCliWithin(1.minute)("set spark.sql.warehouse.dir;" -> warehousePath.getAbsolutePath) + } + + test("SPARK-21451: Apply spark.hadoop.* configurations") { --- End diff -- Without the fix, this test case still can succeed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r130029786 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -33,4 +33,13 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(conf(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname) === "") } } + + test("newTemporaryConfiguration respect spark.hadoop.foo=bar in SparkConf") { +sys.props.put("spark.hadoop.foo", "bar") +Seq(true, false) foreach { useInMemoryDerby => + val hiveConf = HiveUtils.newTemporaryConfiguration(useInMemoryDerby) + intercept[NoSuchElementException](hiveConf("spark.hadoop.foo") === "bar") --- End diff -- nit: assert(!hiveConf.contains("spark.hadoop.foo")) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r129998159 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -33,4 +33,13 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(conf(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname) === "") } } + + test("newTemporaryConfiguration respect spark.hadoop.foo=bar in SparkConf") { +sys.props.put("spark.hadoop.foo", "bar") --- End diff -- @cloud-fan at the very beginning, the spark-sumit do the same thing that add properties from --conf and spark-default.conf to sys.props. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r129774912 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala --- @@ -33,4 +33,13 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils with TestHiveSingleton assert(conf(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname) === "") } } + + test("newTemporaryConfiguration respect spark.hadoop.foo=bar in SparkConf") { +sys.props.put("spark.hadoop.foo", "bar") --- End diff -- The test says we should respect hadoop conf in `SparkConf`, but why we handle system properties? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r128186557 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- check https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r128184486 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- do we have documents saying that `spark.hadoop.xxx` is supported? or are you proposing a new feature? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user yaooqinn commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r128170401 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- if we run `bin/spark-sql --conf spark.hadoop.hive.exec.strachdir=/some/dir` or in spark-default.conf, SessionState.start(cliSessionState) will not use this dir but the default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18668#discussion_r128143343 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging { propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "") propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "") +// Copy any "spark.hadoop.foo=bar" system properties into conf as "foo=bar" --- End diff -- why do we should do so? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...
GitHub user yaooqinn opened a pull request: https://github.com/apache/spark/pull/18668 [SPARK-21451][SQL]get `spark.hadoop.*` properties from sysProps to hiveconf ## What changes were proposed in this pull request? get `spark.hadoop.*` properties from sysProps to hiveconf ## How was this patch tested? UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/yaooqinn/spark SPARK-21451 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18668 commit 89d9b86616196fde5d0b3a08fb284e6af6afe588 Author: Kent YaoDate: 2017-07-18T06:41:24Z HiveConf in SparkSQLCLIDriver doesn't respect spark.hadoop.some.hive.variables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org