[GitHub] [spark] srowen commented on a change in pull request #31598: [SPARK-34478][SQL] When build SparkSession, we should check config keys

GitBox Sun, 28 Feb 2021 07:20:19 -0800


srowen commented on a change in pull request #31598:
URL: https://github.com/apache/spark/pull/31598#discussion_r584313063




##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>
+  <td>Effect before start driver JVM.</td>
+  <td>
+    Configuration used to submit application, such as `spark.driver.memory`, 
`spark.driver.extraClassPath`,

Review comment:
       _an_ application

##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>

Review comment:
       Lauch -> Launch
   I think this is better as "Configurations needed at driver launch"?

##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>
+  <td>Effect before start driver JVM.</td>
+  <td>
+    Configuration used to submit application, such as `spark.driver.memory`, 
`spark.driver.extraClassPath`,
+    these kind of properties only effect before driver's JVM is started, so it 
would be suggested to set through
+    configuration file or `spark-submit` command line options.  
+  </td>
+  <td>
+    The following is a list of such configurations:<br/>
+     `spark.driver.memory`<br/>
+     `spark.driver.memoryOverhead`<br/>
+     `spark.driver.cores`<br/>
+     `spark.driver.userClassPathFirst`<br/>
+     `spark.driver.extraClassPath`<br/>
+     `spark.driver.defaultJavaOptions`<br/>
+     `spark.driver.extraJavaOptions`<br/>
+     `spark.driver.extraLibraryPath`<br/>
+     `spark.driver.resource.*`<br/>
+     `spark.pyspark.driver.python`<br/>
+     `spark.pyspark.python`<br/>
+     `spark.r.shell.command`<br/>
+     `spark.launcher.childProcLoggerName`<br/>
+     `spark.launcher.childConnectionTimeout`<br/>
+     `spark.yarn.driver.*`
+  </td>
+</tr>
+<tr>
+  <td><code>Application Deploy Related Configuration</code></td>
+  <td>Effect before start SparkContext.</td>
+  <td>
+    Like "spark.master", "spark.executor.instances", this kind of properties 
may not
+    be affected when setting programmatically through `SparkConf` in runtime 
after SparkContext has been started,
+    or the behavior is depending on which cluster manager and deploy mode you 
choose, so it would be suggested to
+    set through configuration file, `spark-submit` command line options, or 
setting programmatically through `SparkConf`
+    in runtime before start SparkContext.  
+  </td>
+  <td>
+    
+  </td>
+</tr>
+<tr>
+  <td><code>Spark Runtime Control Related Configuration</code></td>
+  <td>Effect when runtime.</td>

Review comment:
       I am not sure this extra column helps. Move this into the description.

##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>
+  <td>Effect before start driver JVM.</td>
+  <td>
+    Configuration used to submit application, such as `spark.driver.memory`, 
`spark.driver.extraClassPath`,
+    these kind of properties only effect before driver's JVM is started, so it 
would be suggested to set through
+    configuration file or `spark-submit` command line options.  
+  </td>
+  <td>
+    The following is a list of such configurations:<br/>
+     `spark.driver.memory`<br/>

Review comment:
       Can you use a `<ul>` here?

##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>
+  <td>Effect before start driver JVM.</td>
+  <td>
+    Configuration used to submit application, such as `spark.driver.memory`, 
`spark.driver.extraClassPath`,
+    these kind of properties only effect before driver's JVM is started, so it 
would be suggested to set through

Review comment:
       Start a new sentence.
   It's not suggested, it's required, right?

##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>
+  <td>Effect before start driver JVM.</td>
+  <td>
+    Configuration used to submit application, such as `spark.driver.memory`, 
`spark.driver.extraClassPath`,
+    these kind of properties only effect before driver's JVM is started, so it 
would be suggested to set through
+    configuration file or `spark-submit` command line options.  
+  </td>
+  <td>
+    The following is a list of such configurations:<br/>
+     `spark.driver.memory`<br/>
+     `spark.driver.memoryOverhead`<br/>
+     `spark.driver.cores`<br/>
+     `spark.driver.userClassPathFirst`<br/>
+     `spark.driver.extraClassPath`<br/>
+     `spark.driver.defaultJavaOptions`<br/>
+     `spark.driver.extraJavaOptions`<br/>
+     `spark.driver.extraLibraryPath`<br/>
+     `spark.driver.resource.*`<br/>
+     `spark.pyspark.driver.python`<br/>
+     `spark.pyspark.python`<br/>
+     `spark.r.shell.command`<br/>
+     `spark.launcher.childProcLoggerName`<br/>
+     `spark.launcher.childConnectionTimeout`<br/>
+     `spark.yarn.driver.*`
+  </td>
+</tr>
+<tr>
+  <td><code>Application Deploy Related Configuration</code></td>
+  <td>Effect before start SparkContext.</td>
+  <td>
+    Like "spark.master", "spark.executor.instances", this kind of properties 
may not

Review comment:
       Back-tick quote for consistency

##########
File path: docs/configuration.md
##########
@@ -114,12 +114,61 @@ in the `spark-defaults.conf` file. A few configuration 
keys have been renamed si
 versions of Spark; in such cases, the older key names are still accepted, but 
take lower
 precedence than any instance of the newer key.
 
-Spark properties mainly can be divided into two kinds: one is related to 
deploy, like
-"spark.driver.memory", "spark.executor.instances", this kind of properties may 
not be affected when
-setting programmatically through `SparkConf` in runtime, or the behavior is 
depending on which
-cluster manager and deploy mode you choose, so it would be suggested to set 
through configuration
-file or `spark-submit` command line options; another is mainly related to 
Spark runtime control,
-like "spark.task.maxFailures", this kind of properties can be set in either 
way.
+Spark properties mainly can be divided into three kinds: 
+<table class="table">
+<tr><th>Configuration Type</th><th>Effect 
Scope</th><th>Usage</th><th>Remark</th></tr>
+<tr>
+  <td><code>Lauch Driver Related Configuration</code></td>
+  <td>Effect before start driver JVM.</td>
+  <td>
+    Configuration used to submit application, such as `spark.driver.memory`, 
`spark.driver.extraClassPath`,
+    these kind of properties only effect before driver's JVM is started, so it 
would be suggested to set through
+    configuration file or `spark-submit` command line options.  
+  </td>
+  <td>
+    The following is a list of such configurations:<br/>
+     `spark.driver.memory`<br/>
+     `spark.driver.memoryOverhead`<br/>
+     `spark.driver.cores`<br/>
+     `spark.driver.userClassPathFirst`<br/>
+     `spark.driver.extraClassPath`<br/>
+     `spark.driver.defaultJavaOptions`<br/>
+     `spark.driver.extraJavaOptions`<br/>
+     `spark.driver.extraLibraryPath`<br/>
+     `spark.driver.resource.*`<br/>
+     `spark.pyspark.driver.python`<br/>
+     `spark.pyspark.python`<br/>
+     `spark.r.shell.command`<br/>
+     `spark.launcher.childProcLoggerName`<br/>
+     `spark.launcher.childConnectionTimeout`<br/>
+     `spark.yarn.driver.*`
+  </td>
+</tr>
+<tr>
+  <td><code>Application Deploy Related Configuration</code></td>
+  <td>Effect before start SparkContext.</td>
+  <td>
+    Like "spark.master", "spark.executor.instances", this kind of properties 
may not
+    be affected when setting programmatically through `SparkConf` in runtime 
after SparkContext has been started,
+    or the behavior is depending on which cluster manager and deploy mode you 
choose, so it would be suggested to
+    set through configuration file, `spark-submit` command line options, or 
setting programmatically through `SparkConf`
+    in runtime before start SparkContext.  
+  </td>
+  <td>
+    
+  </td>
+</tr>
+<tr>
+  <td><code>Spark Runtime Control Related Configuration</code></td>
+  <td>Effect when runtime.</td>
+  <td>
+    Like "spark.task.maxFailures", this kind of properties can be set in 
either way. 

Review comment:
       Just say "all other properties can be set either way"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #31598: [SPARK-34478][SQL] When build SparkSession, we should check config keys

Reply via email to