[GitHub] spark pull request #16195: [Spark-18765] [CORE] Make values for spark.yarn.{...

2016-12-13 Thread daisukebe
Github user daisukebe closed the pull request at:

https://github.com/apache/spark/pull/16195


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16195: [Spark-18765] [CORE] Make values for spark.yarn.{...

2016-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16195#discussion_r92017061
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -61,11 +61,19 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
 
   // Additional memory to allocate to containers
   val amMemoryOverheadConf = if (isClusterMode) driverMemOverheadKey else 
amMemOverheadKey
-  val amMemoryOverhead = sparkConf.getInt(amMemoryOverheadConf,
-math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, 
MEMORY_OVERHEAD_MIN))
-
-  val executorMemoryOverhead = 
sparkConf.getInt("spark.yarn.executor.memoryOverhead",
-math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+  val amMemoryOverheadDefault = math.max(
+(MEMORY_OVERHEAD_FACTOR * executorMemory).toInt,
--- End diff --

This is wrong, it should be `amMemory` not `executorMemory`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16195: [Spark-18765] [CORE] Make values for spark.yarn.{...

2016-12-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16195#discussion_r91592278
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -61,11 +61,15 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
 
   // Additional memory to allocate to containers
   val amMemoryOverheadConf = if (isClusterMode) driverMemOverheadKey else 
amMemOverheadKey
-  val amMemoryOverhead = sparkConf.getInt(amMemoryOverheadConf,
-math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, 
MEMORY_OVERHEAD_MIN))
-
-  val executorMemoryOverhead = 
sparkConf.getInt("spark.yarn.executor.memoryOverhead",
-math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+  val amMemoryOverheadDefault = math.max((MEMORY_OVERHEAD_FACTOR * 
executorMemory).toInt,
+ MEMORY_OVERHEAD_MIN)
--- End diff --

Yes, that's better. Just try to follow the existing style, always.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16195: [Spark-18765] [CORE] Make values for spark.yarn.{...

2016-12-08 Thread daisukebe
Github user daisukebe commented on a diff in the pull request:

https://github.com/apache/spark/pull/16195#discussion_r91589664
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -61,11 +61,15 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
 
   // Additional memory to allocate to containers
   val amMemoryOverheadConf = if (isClusterMode) driverMemOverheadKey else 
amMemOverheadKey
-  val amMemoryOverhead = sparkConf.getInt(amMemoryOverheadConf,
-math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, 
MEMORY_OVERHEAD_MIN))
-
-  val executorMemoryOverhead = 
sparkConf.getInt("spark.yarn.executor.memoryOverhead",
-math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+  val amMemoryOverheadDefault = math.max((MEMORY_OVERHEAD_FACTOR * 
executorMemory).toInt,
+ MEMORY_OVERHEAD_MIN)
--- End diff --

I should have followed your example earlier. Is the indentation below 
correct?

```
  val amMemoryOverheadDefault = math.max(
(MEMORY_OVERHEAD_FACTOR * executorMemory).toInt,
MEMORY_OVERHEAD_MIN)
  val amMemoryOverhead = sparkConf.getSizeAsMb(
amMemoryOverheadConf,
amMemoryOverheadDefault.toString).toInt
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16195: [Spark-18765] [CORE] Make values for spark.yarn.{...

2016-12-07 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16195#discussion_r91375843
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -61,11 +61,12 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
 
   // Additional memory to allocate to containers
   val amMemoryOverheadConf = if (isClusterMode) driverMemOverheadKey else 
amMemOverheadKey
-  val amMemoryOverhead = sparkConf.getInt(amMemoryOverheadConf,
-math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toInt, 
MEMORY_OVERHEAD_MIN))
-
-  val executorMemoryOverhead = 
sparkConf.getInt("spark.yarn.executor.memoryOverhead",
-math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN))
+  val amMemoryOverhead = sparkConf
+.getSizeAsMb(amMemoryOverheadConf,
+math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toInt, 
MEMORY_OVERHEAD_MIN).toString).toInt
--- End diff --

Indentation here is really weird. I recommend instead:

```
val amMemoryOverhead = sparkConf.getSizeAsMb(
  amMemoryOverheadConf,
  blah blah blah)
```

Same elsewhere. Might be even good to pull the default value calculation 
into a separate variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16195: [Spark-18765] [CORE] Make values for spark.yarn.{...

2016-12-07 Thread daisukebe
GitHub user daisukebe opened a pull request:

https://github.com/apache/spark/pull/16195

[Spark-18765] [CORE] Make values for 
spark.yarn.{am|driver|executor}.memoryOverhead have configurable units

## What changes were proposed in this pull request?

Make values for spark.yarn.{am|driver|executor}.memoryOverhead have 
configurable units

## How was this patch tested?

Manual tests were done by running spark-shell and SparkPi.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daisukebe/spark SPARK-18765

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16195


commit 9c0cf22f7681ae05d894ae05f6a91a9467787519
Author: Grzegorz Chilkiewicz 
Date:   2016-02-02T19:16:24Z

[SPARK-12711][ML] ML StopWordsRemover does not protect itself from column 
name duplication

Fixes problem and verifies fix by test suite.
Also - adds optional parameter: nullable (Boolean) to: 
SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.

Author: Grzegorz Chilkiewicz 

Closes #10741 from grzegorz-chilkiewicz/master.

(cherry picked from commit b1835d727234fdff42aa8cadd17ddcf43b0bed15)
Signed-off-by: Joseph K. Bradley 

commit 3c92333ee78f249dae37070d3b6558b9c92ec7f4
Author: Daoyuan Wang 
Date:   2016-02-02T19:09:40Z

[SPARK-13056][SQL] map column would throw NPE if value is null

Jira:
https://issues.apache.org/jira/browse/SPARK-13056

Create a map like
{ "a": "somestring", "b": null}
Query like
SELECT col["b"] FROM t1;
NPE would be thrown.

Author: Daoyuan Wang 

Closes #10964 from adrian-wang/npewriter.

(cherry picked from commit 358300c795025735c3b2f96c5447b1b227d4abc1)
Signed-off-by: Michael Armbrust 

Conflicts:
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

commit e81333be05cc5e2a41e5eb1a630c5af59a47dd23
Author: Kevin (Sangwoo) Kim 
Date:   2016-02-02T21:24:09Z

[DOCS] Update StructType.scala

The example will throw error like
:20: error: not found: value StructType

Need to add this line:
import org.apache.spark.sql.types._

Author: Kevin (Sangwoo) Kim 

Closes #10141 from swkimme/patch-1.

(cherry picked from commit b377b03531d21b1d02a8f58b3791348962e1f31b)
Signed-off-by: Michael Armbrust 

commit 2f8abb4afc08aa8dc4ed763bcb93ff6b1d6f0d78
Author: Adam Budde 
Date:   2016-02-03T03:35:33Z

[SPARK-13122] Fix race condition in MemoryStore.unrollSafely()

https://issues.apache.org/jira/browse/SPARK-13122

A race condition can occur in MemoryStore's unrollSafely() method if two 
threads that
return the same value for currentTaskAttemptId() execute this method 
concurrently. This
change makes the operation of reading the initial amount of unroll memory 
used, performing
the unroll, and updating the associated memory maps atomic in order to 
avoid this race
condition.

Initial proposed fix wraps all of unrollSafely() in a 
memoryManager.synchronized { } block. A cleaner approach might be introduce a 
mechanism that synchronizes based on task attempt ID. An alternative option 
might be to track unroll/pending unroll memory based on block ID rather than 
task attempt ID.

Author: Adam Budde 

Closes #11012 from budde/master.

(cherry picked from commit ff71261b651a7b289ea2312abd6075da8b838ed9)
Signed-off-by: Andrew Or 

Conflicts:
core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

commit 5fe8796c2fa859e30cf5ba293bee8957e23163bc
Author: Mario Briggs 
Date:   2016-02-03T17:50:28Z

[SPARK-12739][STREAMING] Details of batch in Streaming tab uses two 
Duration columns

I have clearly prefix the two 'Duration' columns in 'Details of Batch' 
Streaming tab as 'Output Op Duration' and 'Job Duration'

Author: Mario Briggs 
Author: mariobriggs 

Closes #11022 from mariobriggs/spark-12739.

(cherry picked from commit e9eb248edfa81d75f99c9afc2063e6b3d9ee7392)
Signed-off-by: Shixiong Zhu 

commit cdfb2a1410aa799596c8b751187dbac28b2cc678
Author: Wenchen Fan 
Date: