[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-29 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213783925
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

I took a look at the history of this code and the only constant here is how 
all places are really confused about the units over time.

It seems to me that things shifted back and forth a bit, and that now the 
behavior actually differs depending on cluster manager: YARN and k8s default to 
Mb, while others default to bytes.

I think it's just safer to not say what the default unit is and encourage 
people to explicitly identify it; I think that's what's being done for a lot of 
new time- and size-based configs. The "no unit" approach was always meant to be 
only for backwards compatibility anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-29 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213774632
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

Oh, hm.

```
$ ./bin/spark-shell --master local --driver-memory=1024
Error occurred during initialization of VM
Too small initial heap
$ ./bin/spark-shell --master local --conf spark.driver.memory=1024
Error occurred during initialization of VM
Too small initial heap
```

Although the option referenced above is used, it's used in YARN and K8S, 
not in this code path.

Paging @vanzin for an opinion? I'd prefer to change docs rather than 
behavior but it looks like we might have divergent behavior here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-29 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213708177
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

I think I got the point you want to report @ivoson, IIUC, this is a bug in 
the code not in doc, we should also make `spark.driver.memory=1024` with the 
unit of MiB, maybe this change the original behavior and we can announce in 
migrate guide? cc @srowen @HyukjinKwon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread ivoson
Github user ivoson commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213546414
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

If we start a job with conf spark.executor.memory=1024 or 
spark.driver.memory=1024, it means 1024 bytes for now. So I think we should 
update the doc to avoid confusion for spark users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread ivoson
Github user ivoson commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213544524
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

@xuanyuanking @HyukjinKwon @srowen thanks for your reply. I've also noticed 
the code above, but the 'DRIVER_MEMORY' and 'EXECUTOR_MEMORY' in the 
config/package.scala never used, maybe this is for future usage I think.  The 
code below shows how the conf is used for now, please take a look.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L465


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1130


https://github.com/ivoson/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L265


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213524321
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

MiB is correct in both cases, according to the code. I believe this PR 
should be closed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213522085
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

why MiB is wrong btw?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/22252#discussion_r213343952
  
--- Diff: docs/configuration.md ---
@@ -152,7 +152,7 @@ of the most common options to set are:
   spark.driver.memory
   1g
   
-Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in MiB 
+Amount of memory to use for the driver process, i.e. where 
SparkContext is initialized, in bytes 
--- End diff --

Check the config code here.

https://github.com/apache/spark/blob/99d2e4e00711cffbfaee8cb3da9b6b3feab8ff18/core/src/main/scala/org/apache/spark/internal/config/package.scala#L40-L43


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

2018-08-28 Thread ivoson
GitHub user ivoson opened a pull request:

https://github.com/apache/spark/pull/22252

[SPARK-25261][MINOR][DOC] correct the default unit for 
spark.executor|driver.memory as described in configuration.md

## What changes were proposed in this pull request?

As described in 
[SPARK-25261](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-25261),the
 unit of spark.executor.memory and spark.driver.memory is bytes if no unit 
specified, while in 
https://spark.apache.org/docs/latest/configuration.html#application-properties, 
they are descibed as MiB, which may lead to some misunderstandings.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ivoson/spark branch-correct-configuration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22252


commit 8f360b97e9f0afff7e6627df15cc17f103a5c2b3
Author: Huang Tengfei 
Date:   2018-08-28T07:28:06Z

update configuration.md, correct the default unit of spark.executor.memory 
and spark.driver.memory.

Change-Id: I2935b93559aa3e016bbab7a083c8c24bbdc6f685

commit 3eb3b66a52435366f258cbb40d01d8f3b0141bff
Author: huangtengfei02 
Date:   2018-08-28T08:29:25Z

fix style issue

Change-Id: I73e82b8bd07064d874bf76df52cb661097532884




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org