date:20170426

spark git commit: [SPARK-20425][SQL] Support a vertical display mode for Dataset.show

2017-04-26 Thread lixiao

Repository: spark
Updated Branches:
  refs/heads/master 66636ef0b -> b4724db19


[SPARK-20425][SQL] Support a vertical display mode for Dataset.show

## What changes were proposed in this pull request?
This pr added a new display mode for `Dataset.show` to print output rows 
vertically (one line per column value). In the current master, when printing 
Dataset with many columns, the readability is low like;

```
scala> val df = spark.range(100).selectExpr((0 until 100).map(i => s"rand() AS 
c$i"): _*)
scala> df.show(3, 0)
+--+--+--+---+--+--+---+--+--+--+--+---+--+--+--+---+---+---+--+--+---+--+---+--+---+---+---++---+--+---++--+--+---+---+---+--+--+---+--+--+---+---+---+--+++---+---+-
 
--+---+---+---++---+---+---+---+--+--+---+---+--+---+--+--+-+---+---+--+---+--+---+---+---+--+---+--+---+---+---+---+---+---+---+---+--+---+---+--+--+--+---+--+---+--+---+---+---+
|c0|c1|c2|c3 
|c4|c5|c6 |c7
|c8|c9|c10   |c11
|c12   |c13   |c14   |c15
|c16|c17|c18   |c19   
|c20|c21   |c22|c23   
|c24|c25|c26|c27
 |c28|c29   |c30|c31
 |c32   |c33   |c34|c35
|c36|c37   |c38   |c39
|c40   |c41   |c42|c43
|c44|c45   |c46 |c47
 |c48|c49|c50  
   |c51|c52|c53|c54 
|c55|c56|c57|c58
|c59   |c60   |c61|c62  
  |c63   |c64|c65   |c66   
|c67  |c68|c69|c70   
|c71|c72   |c73|c74
|c75|c76   |c77|c78   
|c79|c80|c81|c82
|c83|c84|c85|c86
|c87   |c88|c89|c90   
|c91   |c92   |c93|c94   
|c95|c96   |c97|c98
|c99|
+--+--+--+---+--+--+---+--+--+--+--+---+--+--+--+---+---+---+--+--+---+--+---+--+---+---+--

[1/2] spark git commit: Preparing Spark release v2.2.0-rc1

2017-04-26 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 d6efda512 -> 75544c019


Preparing Spark release v2.2.0-rc1


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8ccb4a57
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8ccb4a57
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8ccb4a57

Branch: refs/heads/branch-2.2
Commit: 8ccb4a57c82146c1a8f8966c7e64010cf5632cb6
Parents: d6efda5
Author: Patrick Wendell 
Authored: Wed Apr 26 17:32:19 2017 -0700
Committer: Patrick Wendell 
Committed: Wed Apr 26 17:32:19 2017 -0700

--
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 37 files changed, 37 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9d8607d..3a7003f 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.2.0
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8657af7..5e9ffd1 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.2.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 24c10fb..c3e10d1 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.2.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 5e5a80b..10ea657 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.2.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/sketch/pom.xml
--
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 1356c47..1a1f652 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0-SNAPSHOT
+2.2.0
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/8ccb4a57/common/tags/pom.xml
---

[2/2] spark git commit: Preparing development version 2.2.0-SNAPSHOT

2017-04-26 Thread pwendell

Preparing development version 2.2.0-SNAPSHOT


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75544c01
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75544c01
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75544c01

Branch: refs/heads/branch-2.2
Commit: 75544c01939297cc39b4c3095fce435a22b833c0
Parents: 8ccb4a5
Author: Patrick Wendell 
Authored: Wed Apr 26 17:32:23 2017 -0700
Committer: Patrick Wendell 
Committed: Wed Apr 26 17:32:23 2017 -0700

--
 assembly/pom.xml  | 2 +-
 common/network-common/pom.xml | 2 +-
 common/network-shuffle/pom.xml| 2 +-
 common/network-yarn/pom.xml   | 2 +-
 common/sketch/pom.xml | 2 +-
 common/tags/pom.xml   | 2 +-
 common/unsafe/pom.xml | 2 +-
 core/pom.xml  | 2 +-
 docs/_config.yml  | 2 +-
 examples/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml | 2 +-
 external/flume-assembly/pom.xml   | 2 +-
 external/flume-sink/pom.xml   | 2 +-
 external/flume/pom.xml| 2 +-
 external/kafka-0-10-assembly/pom.xml  | 2 +-
 external/kafka-0-10-sql/pom.xml   | 2 +-
 external/kafka-0-10/pom.xml   | 2 +-
 external/kafka-0-8-assembly/pom.xml   | 2 +-
 external/kafka-0-8/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml | 2 +-
 external/kinesis-asl/pom.xml  | 2 +-
 external/spark-ganglia-lgpl/pom.xml   | 2 +-
 graphx/pom.xml| 2 +-
 launcher/pom.xml  | 2 +-
 mllib-local/pom.xml   | 2 +-
 mllib/pom.xml | 2 +-
 pom.xml   | 2 +-
 python/pyspark/version.py | 2 +-
 repl/pom.xml  | 2 +-
 resource-managers/mesos/pom.xml   | 2 +-
 resource-managers/yarn/pom.xml| 2 +-
 sql/catalyst/pom.xml  | 2 +-
 sql/core/pom.xml  | 2 +-
 sql/hive-thriftserver/pom.xml | 2 +-
 sql/hive/pom.xml  | 2 +-
 streaming/pom.xml | 2 +-
 tools/pom.xml | 2 +-
 37 files changed, 37 insertions(+), 37 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/assembly/pom.xml
--
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 3a7003f..9d8607d 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0
+2.2.0-SNAPSHOT
 ../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/network-common/pom.xml
--
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 5e9ffd1..8657af7 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0
+2.2.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/network-shuffle/pom.xml
--
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index c3e10d1..24c10fb 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0
+2.2.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/network-yarn/pom.xml
--
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 10ea657..5e5a80b 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0
+2.2.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/sketch/pom.xml
--
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 1a1f652..1356c47 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.11
-2.2.0
+2.2.0-SNAPSHOT
 ../../pom.xml
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/75544c01/common/tags/pom.xml
--
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 525ece5..9

[spark] Git Push Summary

2017-04-26 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v2.2.0-rc1 [created] 8ccb4a57c

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20435][CORE] More thorough redaction of sensitive information

2017-04-26 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b48bb3ab2 -> d6efda512


[SPARK-20435][CORE] More thorough redaction of sensitive information

This change does a more thorough redaction of sensitive information from logs 
and UI
Add unit tests that ensure that no regressions happen that leak sensitive 
information to the logs.

The motivation for this change was appearance of password like so in 
`SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
`"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf 
spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
`
Previously redaction logic was only checking if the key matched the secret 
regex pattern, it'd redact it's value. That worked for most cases. However, in 
the above case, the key (sun.java.command) doesn't tell much, so the value 
needs to be searched. This PR expands the check to check for values as well.

## How was this patch tested?

New unit tests added that ensure that no sensitive information is present in 
the event logs or the yarn logs. Old unit test in UtilsSuite was modified 
because the test was asserting that a non-sensitive property's value won't be 
redacted. However, the non-sensitive value had the literal "secret" in it which 
was causing it to redact. Simply updating the non-sensitive property's value to 
another arbitrary value (that didn't have "secret" in it) fixed it.

Author: Mark Grover 

Closes #17725 from markgrover/spark-20435.

(cherry picked from commit 66636ef0b046e5d1f340c3b8153d7213fa9d19c7)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6efda51
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6efda51
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d6efda51

Branch: refs/heads/branch-2.2
Commit: d6efda512e9d40e0a51c03675477bfb20c6bc7ae
Parents: b48bb3a
Author: Mark Grover 
Authored: Wed Apr 26 17:06:21 2017 -0700
Committer: Marcelo Vanzin 
Committed: Wed Apr 26 17:06:30 2017 -0700

--
 .../apache/spark/internal/config/package.scala  |  4 +--
 .../spark/scheduler/EventLoggingListener.scala  | 16 ++---
 .../scala/org/apache/spark/util/Utils.scala | 22 ++---
 .../apache/spark/deploy/SparkSubmitSuite.scala  | 34 
 .../org/apache/spark/util/UtilsSuite.scala  | 10 --
 docs/configuration.md   |  4 +--
 .../spark/deploy/yarn/YarnClusterSuite.scala| 32 ++
 7 files changed, 100 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d6efda51/core/src/main/scala/org/apache/spark/internal/config/package.scala
--
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 89aeea4..2f0a306 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -244,8 +244,8 @@ package object config {
 ConfigBuilder("spark.redaction.regex")
   .doc("Regex to decide which Spark configuration properties and 
environment variables in " +
 "driver and executor environments contain sensitive information. When 
this regex matches " +
-"a property, its value is redacted from the environment UI and various 
logs like YARN " +
-"and event logs.")
+"a property key or value, the value is redacted from the environment 
UI and various logs " +
+"like YARN and event logs.")
   .regexConf
   .createWithDefault("(?i)secret|password".r)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d6efda51/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala 
b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
index aecb3a9..a7dbf87 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
@@ -252,11 +252,17 @@ private[spark] class EventLoggingListener(
 
   private[spark] def redactEvent(
   event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = 
{
-// "Spark Properties" entry will always exist because the map is always 
populated with it.
-val redactedProps = Utils.redact(sparkConf, 
event.environmentDetails("Spark Properties"))
-val redactedEnvironmentDetails = event.environmentDetails +
-  ("Spark Properties" -> redactedProps)
-SparkListenerEnvironmentUpdate(redactedEnvironmentDet

spark git commit: [SPARK-20435][CORE] More thorough redaction of sensitive information

2017-04-26 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master 2ba1eba37 -> 66636ef0b


[SPARK-20435][CORE] More thorough redaction of sensitive information

This change does a more thorough redaction of sensitive information from logs 
and UI
Add unit tests that ensure that no regressions happen that leak sensitive 
information to the logs.

The motivation for this change was appearance of password like so in 
`SparkListenerEnvironmentUpdate` in event logs under some JVM configurations:
`"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf 
spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ..."
`
Previously redaction logic was only checking if the key matched the secret 
regex pattern, it'd redact it's value. That worked for most cases. However, in 
the above case, the key (sun.java.command) doesn't tell much, so the value 
needs to be searched. This PR expands the check to check for values as well.

## How was this patch tested?

New unit tests added that ensure that no sensitive information is present in 
the event logs or the yarn logs. Old unit test in UtilsSuite was modified 
because the test was asserting that a non-sensitive property's value won't be 
redacted. However, the non-sensitive value had the literal "secret" in it which 
was causing it to redact. Simply updating the non-sensitive property's value to 
another arbitrary value (that didn't have "secret" in it) fixed it.

Author: Mark Grover 

Closes #17725 from markgrover/spark-20435.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66636ef0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66636ef0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66636ef0

Branch: refs/heads/master
Commit: 66636ef0b046e5d1f340c3b8153d7213fa9d19c7
Parents: 2ba1eba
Author: Mark Grover 
Authored: Wed Apr 26 17:06:21 2017 -0700
Committer: Marcelo Vanzin 
Committed: Wed Apr 26 17:06:21 2017 -0700

--
 .../apache/spark/internal/config/package.scala  |  4 +--
 .../spark/scheduler/EventLoggingListener.scala  | 16 ++---
 .../scala/org/apache/spark/util/Utils.scala | 22 ++---
 .../apache/spark/deploy/SparkSubmitSuite.scala  | 34 
 .../org/apache/spark/util/UtilsSuite.scala  | 10 --
 docs/configuration.md   |  4 +--
 .../spark/deploy/yarn/YarnClusterSuite.scala| 32 ++
 7 files changed, 100 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/66636ef0/core/src/main/scala/org/apache/spark/internal/config/package.scala
--
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 89aeea4..2f0a306 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -244,8 +244,8 @@ package object config {
 ConfigBuilder("spark.redaction.regex")
   .doc("Regex to decide which Spark configuration properties and 
environment variables in " +
 "driver and executor environments contain sensitive information. When 
this regex matches " +
-"a property, its value is redacted from the environment UI and various 
logs like YARN " +
-"and event logs.")
+"a property key or value, the value is redacted from the environment 
UI and various logs " +
+"like YARN and event logs.")
   .regexConf
   .createWithDefault("(?i)secret|password".r)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/66636ef0/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala 
b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
index aecb3a9..a7dbf87 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala
@@ -252,11 +252,17 @@ private[spark] class EventLoggingListener(
 
   private[spark] def redactEvent(
   event: SparkListenerEnvironmentUpdate): SparkListenerEnvironmentUpdate = 
{
-// "Spark Properties" entry will always exist because the map is always 
populated with it.
-val redactedProps = Utils.redact(sparkConf, 
event.environmentDetails("Spark Properties"))
-val redactedEnvironmentDetails = event.environmentDetails +
-  ("Spark Properties" -> redactedProps)
-SparkListenerEnvironmentUpdate(redactedEnvironmentDetails)
+// environmentDetails maps a string descriptor to a set of properties
+// Similar to:
+//

spark git commit: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-26 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/master a277ae80a -> 2ba1eba37


[SPARK-12868][SQL] Allow adding jars from hdfs

## What changes were proposed in this pull request?
Spark 2.2 is going to be cut, it'll be great if SPARK-12868 can be resolved 
before that. There have been several PRs for this like 
[PR#16324](https://github.com/apache/spark/pull/16324) , but all of them are 
inactivity for a long time or have been closed.

This PR added a SparkUrlStreamHandlerFactory, which relies on 'protocol' to 
choose the appropriate
UrlStreamHandlerFactory like FsUrlStreamHandlerFactory to create 
URLStreamHandler.

## How was this patch tested?
1. Add a new unit test.
2. Check manually.
Before: throw an exception with " failed unknown protocol: hdfs"
https://cloud.githubusercontent.com/assets/8546874/24075277/5abe0a7c-0bd5-11e7-900e-ec3d3105da0b.png";>

After:
https://cloud.githubusercontent.com/assets/8546874/24075283/69382a60-0bd5-11e7-8d30-d9405c3aaaba.png";>

Author: Weiqing Yang 

Closes #17342 from weiqingy/SPARK-18910.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ba1eba3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2ba1eba3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2ba1eba3

Branch: refs/heads/master
Commit: 2ba1eba371213d1ac3d1fa1552e5906e043c2ee4
Parents: a277ae8
Author: Weiqing Yang 
Authored: Wed Apr 26 13:54:40 2017 -0700
Committer: Marcelo Vanzin 
Committed: Wed Apr 26 13:54:40 2017 -0700

--
 .../org/apache/spark/sql/internal/SharedState.scala| 10 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 +
 2 files changed, 22 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2ba1eba3/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
index f834569..a93b701 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
@@ -17,12 +17,14 @@
 
 package org.apache.spark.sql.internal
 
+import java.net.URL
 import java.util.Locale
 
 import scala.reflect.ClassTag
 import scala.util.control.NonFatal
 
 import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FsUrlStreamHandlerFactory
 
 import org.apache.spark.{SparkConf, SparkContext, SparkException}
 import org.apache.spark.internal.Logging
@@ -154,7 +156,13 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
   }
 }
 
-object SharedState {
+object SharedState extends Logging {
+  try {
+URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
+  } catch {
+case e: Error =>
+  logWarning("URL.setURLStreamHandlerFactory failed to set 
FsUrlStreamHandlerFactory")
+  }
 
   private val HIVE_EXTERNAL_CATALOG_CLASS_NAME = 
"org.apache.spark.sql.hive.HiveExternalCatalog"
 

http://git-wip-us.apache.org/repos/asf/spark/blob/2ba1eba3/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 0dd9296..3ecbf96 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql
 
 import java.io.File
 import java.math.MathContext
+import java.net.{MalformedURLException, URL}
 import java.sql.Timestamp
 import java.util.concurrent.atomic.AtomicBoolean
 
@@ -2606,4 +2607,16 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == 
ae.getSimpleMessage)
 }
   }
+
+  test("SPARK-12868: Allow adding jars from hdfs ") {
+val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
+val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
+
+// if 'hdfs' is not supported, MalformedURLException will be thrown
+new URL(jarFromHdfs)
+
+intercept[MalformedURLException] {
+  new URL(jarFromInvalidFs)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-26 Thread vanzin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 e278876ba -> b48bb3ab2


[SPARK-12868][SQL] Allow adding jars from hdfs

## What changes were proposed in this pull request?
Spark 2.2 is going to be cut, it'll be great if SPARK-12868 can be resolved 
before that. There have been several PRs for this like 
[PR#16324](https://github.com/apache/spark/pull/16324) , but all of them are 
inactivity for a long time or have been closed.

This PR added a SparkUrlStreamHandlerFactory, which relies on 'protocol' to 
choose the appropriate
UrlStreamHandlerFactory like FsUrlStreamHandlerFactory to create 
URLStreamHandler.

## How was this patch tested?
1. Add a new unit test.
2. Check manually.
Before: throw an exception with " failed unknown protocol: hdfs"
https://cloud.githubusercontent.com/assets/8546874/24075277/5abe0a7c-0bd5-11e7-900e-ec3d3105da0b.png";>

After:
https://cloud.githubusercontent.com/assets/8546874/24075283/69382a60-0bd5-11e7-8d30-d9405c3aaaba.png";>

Author: Weiqing Yang 

Closes #17342 from weiqingy/SPARK-18910.

(cherry picked from commit 2ba1eba371213d1ac3d1fa1552e5906e043c2ee4)
Signed-off-by: Marcelo Vanzin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b48bb3ab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b48bb3ab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b48bb3ab

Branch: refs/heads/branch-2.2
Commit: b48bb3ab2c8134f6b533af29a241dce114076720
Parents: e278876
Author: Weiqing Yang 
Authored: Wed Apr 26 13:54:40 2017 -0700
Committer: Marcelo Vanzin 
Committed: Wed Apr 26 13:54:49 2017 -0700

--
 .../org/apache/spark/sql/internal/SharedState.scala| 10 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 13 +
 2 files changed, 22 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b48bb3ab/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
index f834569..a93b701 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
@@ -17,12 +17,14 @@
 
 package org.apache.spark.sql.internal
 
+import java.net.URL
 import java.util.Locale
 
 import scala.reflect.ClassTag
 import scala.util.control.NonFatal
 
 import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FsUrlStreamHandlerFactory
 
 import org.apache.spark.{SparkConf, SparkContext, SparkException}
 import org.apache.spark.internal.Logging
@@ -154,7 +156,13 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
   }
 }
 
-object SharedState {
+object SharedState extends Logging {
+  try {
+URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
+  } catch {
+case e: Error =>
+  logWarning("URL.setURLStreamHandlerFactory failed to set 
FsUrlStreamHandlerFactory")
+  }
 
   private val HIVE_EXTERNAL_CATALOG_CLASS_NAME = 
"org.apache.spark.sql.hive.HiveExternalCatalog"
 

http://git-wip-us.apache.org/repos/asf/spark/blob/b48bb3ab/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
--
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 0dd9296..3ecbf96 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql
 
 import java.io.File
 import java.math.MathContext
+import java.net.{MalformedURLException, URL}
 import java.sql.Timestamp
 import java.util.concurrent.atomic.AtomicBoolean
 
@@ -2606,4 +2607,16 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == 
ae.getSimpleMessage)
 }
   }
+
+  test("SPARK-12868: Allow adding jars from hdfs ") {
+val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
+val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
+
+// if 'hdfs' is not supported, MalformedURLException will be thrown
+new URL(jarFromHdfs)
+
+intercept[MalformedURLException] {
+  new URL(jarFromInvalidFs)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 6709bcf6e -> e278876ba


[SPARK-20474] Fixing OnHeapColumnVector reallocation

## What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 
'elementsAppended'. This variable is only updated when using the 
ColumnVector.appendX API, while ColumnVector.putX is more commonly used.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17773 from michal-databricks/spark-20474.

(cherry picked from commit a277ae80a2836e6533b338d2b9c4e59ed8a1daae)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e278876b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e278876b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e278876b

Branch: refs/heads/branch-2.2
Commit: e278876ba3d66d3fb249df59c3de8d78ca25c5f0
Parents: 6709bcf
Author: Michal Szafranski 
Authored: Wed Apr 26 12:47:37 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 12:47:50 2017 -0700

--
 .../vectorized/OnHeapColumnVector.java  | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e278876b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
index 9b410ba..94ed322 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
@@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends 
ColumnVector {
   int[] newLengths = new int[newCapacity];
   int[] newOffsets = new int[newCapacity];
   if (this.arrayLengths != null) {
-System.arraycopy(this.arrayLengths, 0, newLengths, 0, 
elementsAppended);
-System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, 
elementsAppended);
+System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
+System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity);
   }
   arrayLengths = newLengths;
   arrayOffsets = newOffsets;
 } else if (type instanceof BooleanType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ByteType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ShortType) {
   if (shortData == null || shortData.length < newCapacity) {
 short[] newData = new short[newCapacity];
-if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
elementsAppended);
+if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
capacity);
 shortData = newData;
   }
 } else if (type instanceof IntegerType || type instanceof DateType ||
   DecimalType.is32BitDecimalType(type)) {
   if (intData == null || intData.length < newCapacity) {
 int[] newData = new int[newCapacity];
-if (intData != null) System.arraycopy(intData, 0, newData, 0, 
elementsAppended);
+if (intData != null) System.arraycopy(intData, 0, newData, 0, 
capacity);
 intData = newData;
   }
 } else if (type instanceof LongType || type instanceof TimestampType ||
 DecimalType.is64BitDecimalType(type)) {
   if (longData == null || longData.length < newCapacity) {
 long[] newData = new long[newCapacity];
-if (longData != null) System.arraycopy(longData, 0, newData, 0, 
elementsAppended);
+if (longData != null) System.arraycopy(longData, 0, newData, 0, 
capacity);
 longData = newData;
   }
 } else if (type instanceof FloatType) {
   if (floatData == null || floatData.length < newCapacity) {
 float[] newData = new float[newCapacity];
-if (floatData != null) System.arraycopy(floatData, 0, newData, 0, 
elementsAppended);
+if (floatData != null) System.arraycopy(floatData, 0

spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 99c6cf9ef -> a277ae80a


[SPARK-20474] Fixing OnHeapColumnVector reallocation

## What changes were proposed in this pull request?
OnHeapColumnVector reallocation copies to the new storage data up to 
'elementsAppended'. This variable is only updated when using the 
ColumnVector.appendX API, while ColumnVector.putX is more commonly used.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17773 from michal-databricks/spark-20474.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a277ae80
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a277ae80
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a277ae80

Branch: refs/heads/master
Commit: a277ae80a2836e6533b338d2b9c4e59ed8a1daae
Parents: 99c6cf9
Author: Michal Szafranski 
Authored: Wed Apr 26 12:47:37 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 12:47:37 2017 -0700

--
 .../vectorized/OnHeapColumnVector.java  | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a277ae80/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
index 9b410ba..94ed322 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
@@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends 
ColumnVector {
   int[] newLengths = new int[newCapacity];
   int[] newOffsets = new int[newCapacity];
   if (this.arrayLengths != null) {
-System.arraycopy(this.arrayLengths, 0, newLengths, 0, 
elementsAppended);
-System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, 
elementsAppended);
+System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
+System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity);
   }
   arrayLengths = newLengths;
   arrayOffsets = newOffsets;
 } else if (type instanceof BooleanType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ByteType) {
   if (byteData == null || byteData.length < newCapacity) {
 byte[] newData = new byte[newCapacity];
-if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
elementsAppended);
+if (byteData != null) System.arraycopy(byteData, 0, newData, 0, 
capacity);
 byteData = newData;
   }
 } else if (type instanceof ShortType) {
   if (shortData == null || shortData.length < newCapacity) {
 short[] newData = new short[newCapacity];
-if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
elementsAppended);
+if (shortData != null) System.arraycopy(shortData, 0, newData, 0, 
capacity);
 shortData = newData;
   }
 } else if (type instanceof IntegerType || type instanceof DateType ||
   DecimalType.is32BitDecimalType(type)) {
   if (intData == null || intData.length < newCapacity) {
 int[] newData = new int[newCapacity];
-if (intData != null) System.arraycopy(intData, 0, newData, 0, 
elementsAppended);
+if (intData != null) System.arraycopy(intData, 0, newData, 0, 
capacity);
 intData = newData;
   }
 } else if (type instanceof LongType || type instanceof TimestampType ||
 DecimalType.is64BitDecimalType(type)) {
   if (longData == null || longData.length < newCapacity) {
 long[] newData = new long[newCapacity];
-if (longData != null) System.arraycopy(longData, 0, newData, 0, 
elementsAppended);
+if (longData != null) System.arraycopy(longData, 0, newData, 0, 
capacity);
 longData = newData;
   }
 } else if (type instanceof FloatType) {
   if (floatData == null || floatData.length < newCapacity) {
 float[] newData = new float[newCapacity];
-if (floatData != null) System.arraycopy(floatData, 0, newData, 0, 
elementsAppended);
+if (floatData != null) System.arraycopy(floatData, 0, newData, 0, 
capacity);
 floatData = newData;
   }
 } else if (type instanceof DoubleTyp

spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b65858bb3 -> 6709bcf6e


[SPARK-20473] Enabling missing types in ColumnVector.Array

## What changes were proposed in this pull request?
ColumnVector implementations originally did not support some Catalyst types 
(float, short, and boolean). Now that they do, those types should be also added 
to the ColumnVector.Array.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17772 from michal-databricks/spark-20473.

(cherry picked from commit 99c6cf9ef16bf8fae6edb23a62e46546a16bca80)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6709bcf6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6709bcf6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6709bcf6

Branch: refs/heads/branch-2.2
Commit: 6709bcf6e66e99e17ba2a3b1482df2dba1a15716
Parents: b65858b
Author: Michal Szafranski 
Authored: Wed Apr 26 11:21:25 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 11:21:57 2017 -0700

--
 .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6709bcf6/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
index 354c878..b105e60 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
@@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public boolean getBoolean(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getBoolean(offset + ordinal);
 }
 
 @Override
@@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public short getShort(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getShort(offset + ordinal);
 }
 
 @Override
@@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public float getFloat(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getFloat(offset + ordinal);
 }
 
 @Override


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array

2017-04-26 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 66dd5b83f -> 99c6cf9ef


[SPARK-20473] Enabling missing types in ColumnVector.Array

## What changes were proposed in this pull request?
ColumnVector implementations originally did not support some Catalyst types 
(float, short, and boolean). Now that they do, those types should be also added 
to the ColumnVector.Array.

## How was this patch tested?
Tested using existing unit tests.

Author: Michal Szafranski 

Closes #17772 from michal-databricks/spark-20473.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99c6cf9e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99c6cf9e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99c6cf9e

Branch: refs/heads/master
Commit: 99c6cf9ef16bf8fae6edb23a62e46546a16bca80
Parents: 66dd5b8
Author: Michal Szafranski 
Authored: Wed Apr 26 11:21:25 2017 -0700
Committer: Reynold Xin 
Committed: Wed Apr 26 11:21:25 2017 -0700

--
 .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/99c6cf9e/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
--
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
index 354c878..b105e60 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
@@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public boolean getBoolean(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getBoolean(offset + ordinal);
 }
 
 @Override
@@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public short getShort(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getShort(offset + ordinal);
 }
 
 @Override
@@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable 
{
 
 @Override
 public float getFloat(int ordinal) {
-  throw new UnsupportedOperationException();
+  return data.getFloat(offset + ordinal);
 }
 
 @Override


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay

2017-04-26 Thread irashid

Repository: spark
Updated Branches:
  refs/heads/master dbb06c689 -> 66dd5b83f


[SPARK-20391][CORE] Rename memory related fields in ExecutorSummay

## What changes were proposed in this pull request?

This is a follow-up of #14617 to make the name of memory related fields more 
meaningful.

Here  for the backward compatibility, I didn't change `maxMemory` and 
`memoryUsed` fields.

## How was this patch tested?

Existing UT and local verification.

CC squito and tgravescs .

Author: jerryshao 

Closes #17700 from jerryshao/SPARK-20391.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/66dd5b83
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/66dd5b83
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/66dd5b83

Branch: refs/heads/master
Commit: 66dd5b83ff95d5f91f37dcdf6aac89faa0b871c5
Parents: dbb06c6
Author: jerryshao 
Authored: Wed Apr 26 09:01:50 2017 -0500
Committer: Imran Rashid 
Committed: Wed Apr 26 09:01:50 2017 -0500

--
 .../org/apache/spark/ui/static/executorspage.js | 48 +-
 .../org/apache/spark/status/api/v1/api.scala| 11 +++--
 .../apache/spark/ui/exec/ExecutorsPage.scala| 21 
 .../executor_memory_usage_expectation.json  | 51 
 .../executor_node_blacklisting_expectation.json | 51 
 5 files changed, 105 insertions(+), 77 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/66dd5b83/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js 
b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
index 930a069..cb9922d 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
@@ -253,10 +253,14 @@ $(document).ready(function () {
 var deadTotalBlacklisted = 0;
 
 response.forEach(function (exec) {
-exec.onHeapMemoryUsed = 
exec.hasOwnProperty('onHeapMemoryUsed') ? exec.onHeapMemoryUsed : 0;
-exec.maxOnHeapMemory = exec.hasOwnProperty('maxOnHeapMemory') 
? exec.maxOnHeapMemory : 0;
-exec.offHeapMemoryUsed = 
exec.hasOwnProperty('offHeapMemoryUsed') ? exec.offHeapMemoryUsed : 0;
-exec.maxOffHeapMemory = 
exec.hasOwnProperty('maxOffHeapMemory') ? exec.maxOffHeapMemory : 0;
+var memoryMetrics = {
+usedOnHeapStorageMemory: 0,
+usedOffHeapStorageMemory: 0,
+totalOnHeapStorageMemory: 0,
+totalOffHeapStorageMemory: 0
+};
+
+exec.memoryMetrics = exec.hasOwnProperty('memoryMetrics') ? 
exec.memoryMetrics : memoryMetrics;
 });
 
 response.forEach(function (exec) {
@@ -264,10 +268,10 @@ $(document).ready(function () {
 allRDDBlocks += exec.rddBlocks;
 allMemoryUsed += exec.memoryUsed;
 allMaxMemory += exec.maxMemory;
-allOnHeapMemoryUsed += exec.onHeapMemoryUsed;
-allOnHeapMaxMemory += exec.maxOnHeapMemory;
-allOffHeapMemoryUsed += exec.offHeapMemoryUsed;
-allOffHeapMaxMemory += exec.maxOffHeapMemory;
+allOnHeapMemoryUsed += 
exec.memoryMetrics.usedOnHeapStorageMemory;
+allOnHeapMaxMemory += 
exec.memoryMetrics.totalOnHeapStorageMemory;
+allOffHeapMemoryUsed += 
exec.memoryMetrics.usedOffHeapStorageMemory;
+allOffHeapMaxMemory += 
exec.memoryMetrics.totalOffHeapStorageMemory;
 allDiskUsed += exec.diskUsed;
 allTotalCores += exec.totalCores;
 allMaxTasks += exec.maxTasks;
@@ -286,10 +290,10 @@ $(document).ready(function () {
 activeRDDBlocks += exec.rddBlocks;
 activeMemoryUsed += exec.memoryUsed;
 activeMaxMemory += exec.maxMemory;
-activeOnHeapMemoryUsed += exec.onHeapMemoryUsed;
-activeOnHeapMaxMemory += exec.maxOnHeapMemory;
-activeOffHeapMemoryUsed += exec.offHeapMemoryUsed;
-activeOffHeapMaxMemory += exec.maxOffHeapMemory;
+activeOnHeapMemoryUsed += 
exec.memoryMetrics.usedOnHeapStorageMemory;
+activeOnHeapMaxMemory += 
exec.memoryMetrics.totalOnHeapStorageMemory;
+activeOffHeapMemoryUsed += 
exec.memoryMetrics.usedOffHeapStorageMemory;
+activeOffHeapMaxMemory += 
exec.memoryMetrics.totalOffHeapStorageMemory;
 activeDiskUsed += exec.diskUse

spark git commit: [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay

2017-04-26 Thread irashid

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 34dec68d7 -> b65858bb3


[SPARK-20391][CORE] Rename memory related fields in ExecutorSummay

## What changes were proposed in this pull request?

This is a follow-up of #14617 to make the name of memory related fields more 
meaningful.

Here  for the backward compatibility, I didn't change `maxMemory` and 
`memoryUsed` fields.

## How was this patch tested?

Existing UT and local verification.

CC squito and tgravescs .

Author: jerryshao 

Closes #17700 from jerryshao/SPARK-20391.

(cherry picked from commit 66dd5b83ff95d5f91f37dcdf6aac89faa0b871c5)
Signed-off-by: Imran Rashid 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b65858bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b65858bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b65858bb

Branch: refs/heads/branch-2.2
Commit: b65858bb3cb8e69b1f73f5f2c76a7cd335120695
Parents: 34dec68
Author: jerryshao 
Authored: Wed Apr 26 09:01:50 2017 -0500
Committer: Imran Rashid 
Committed: Wed Apr 26 09:02:13 2017 -0500

--
 .../org/apache/spark/ui/static/executorspage.js | 48 +-
 .../org/apache/spark/status/api/v1/api.scala| 11 +++--
 .../apache/spark/ui/exec/ExecutorsPage.scala| 21 
 .../executor_memory_usage_expectation.json  | 51 
 .../executor_node_blacklisting_expectation.json | 51 
 5 files changed, 105 insertions(+), 77 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b65858bb/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
--
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js 
b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
index 930a069..cb9922d 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
@@ -253,10 +253,14 @@ $(document).ready(function () {
 var deadTotalBlacklisted = 0;
 
 response.forEach(function (exec) {
-exec.onHeapMemoryUsed = 
exec.hasOwnProperty('onHeapMemoryUsed') ? exec.onHeapMemoryUsed : 0;
-exec.maxOnHeapMemory = exec.hasOwnProperty('maxOnHeapMemory') 
? exec.maxOnHeapMemory : 0;
-exec.offHeapMemoryUsed = 
exec.hasOwnProperty('offHeapMemoryUsed') ? exec.offHeapMemoryUsed : 0;
-exec.maxOffHeapMemory = 
exec.hasOwnProperty('maxOffHeapMemory') ? exec.maxOffHeapMemory : 0;
+var memoryMetrics = {
+usedOnHeapStorageMemory: 0,
+usedOffHeapStorageMemory: 0,
+totalOnHeapStorageMemory: 0,
+totalOffHeapStorageMemory: 0
+};
+
+exec.memoryMetrics = exec.hasOwnProperty('memoryMetrics') ? 
exec.memoryMetrics : memoryMetrics;
 });
 
 response.forEach(function (exec) {
@@ -264,10 +268,10 @@ $(document).ready(function () {
 allRDDBlocks += exec.rddBlocks;
 allMemoryUsed += exec.memoryUsed;
 allMaxMemory += exec.maxMemory;
-allOnHeapMemoryUsed += exec.onHeapMemoryUsed;
-allOnHeapMaxMemory += exec.maxOnHeapMemory;
-allOffHeapMemoryUsed += exec.offHeapMemoryUsed;
-allOffHeapMaxMemory += exec.maxOffHeapMemory;
+allOnHeapMemoryUsed += 
exec.memoryMetrics.usedOnHeapStorageMemory;
+allOnHeapMaxMemory += 
exec.memoryMetrics.totalOnHeapStorageMemory;
+allOffHeapMemoryUsed += 
exec.memoryMetrics.usedOffHeapStorageMemory;
+allOffHeapMaxMemory += 
exec.memoryMetrics.totalOffHeapStorageMemory;
 allDiskUsed += exec.diskUsed;
 allTotalCores += exec.totalCores;
 allMaxTasks += exec.maxTasks;
@@ -286,10 +290,10 @@ $(document).ready(function () {
 activeRDDBlocks += exec.rddBlocks;
 activeMemoryUsed += exec.memoryUsed;
 activeMaxMemory += exec.maxMemory;
-activeOnHeapMemoryUsed += exec.onHeapMemoryUsed;
-activeOnHeapMaxMemory += exec.maxOnHeapMemory;
-activeOffHeapMemoryUsed += exec.offHeapMemoryUsed;
-activeOffHeapMaxMemory += exec.maxOffHeapMemory;
+activeOnHeapMemoryUsed += 
exec.memoryMetrics.usedOnHeapStorageMemory;
+activeOnHeapMaxMemory += 
exec.memoryMetrics.totalOnHeapStorageMemory;
+activeOffHeapMemoryUsed += 
exec.memoryMetrics.usedOffHeapStorageMemory;
+activeOffHeapMaxM

spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests

2017-04-26 Thread yliang

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 612952251 -> 34dec68d7


[MINOR][ML] Fix some PySpark & SparkR flaky tests

## What changes were proposed in this pull request?
Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which 
means they are not converged. I donât think checking intermediate result 
during iteration make sense, and these intermediate result may vulnerable and 
not stable, so we should switch to check the converged result. We hit this 
issue at #17746 when we upgrade breeze to 0.13.1.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang 

Closes #17757 from yanboliang/flaky-test.

(cherry picked from commit dbb06c689c157502cb081421baecce411832aad8)
Signed-off-by: Yanbo Liang 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/34dec68d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/34dec68d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/34dec68d

Branch: refs/heads/branch-2.2
Commit: 34dec68d7eb647d997fdb27fe65d579c74b39e58
Parents: 6129522
Author: Yanbo Liang 
Authored: Wed Apr 26 21:34:18 2017 +0800
Committer: Yanbo Liang 
Committed: Wed Apr 26 21:34:35 2017 +0800

--
 .../tests/testthat/test_mllib_classification.R  | 17 +
 python/pyspark/ml/classification.py | 71 ++--
 2 files changed, 38 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/34dec68d/R/pkg/inst/tests/testthat/test_mllib_classification.R
--
diff --git a/R/pkg/inst/tests/testthat/test_mllib_classification.R 
b/R/pkg/inst/tests/testthat/test_mllib_classification.R
index af7cbdc..cbc7087 100644
--- a/R/pkg/inst/tests/testthat/test_mllib_classification.R
+++ b/R/pkg/inst/tests/testthat/test_mllib_classification.R
@@ -284,22 +284,11 @@ test_that("spark.mlp", {
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # test initialWeights
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
+  model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights =
 c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
   mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
   expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
-c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 
9.0))
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2)
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", 
"1.0", "0.0"))
+   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # Test formula works well
   df <- suppressWarnings(createDataFrame(iris))
@@ -310,8 +299,6 @@ test_that("spark.mlp", {
   expect_equal(summary$numOfOutputs, 3)
   expect_equal(summary$layers, c(4, 3))
   expect_equal(length(summary$weights), 15)
-  expect_equal(head(summary$weights, 5), list(-0.5793153, -4.652961, 6.216155, 
-6.649478,
-   -10.51147), tolerance = 1e-3)
 })
 
 test_that("spark.naiveBayes", {

http://git-wip-us.apache.org/repos/asf/spark/blob/34dec68d/python/pyspark/ml/classification.py
--
diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 8649683..a9756ea 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -185,34 +185,33 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, 
HasLabelCol, HasPredicti
 >>> from pyspark.sql import Row
 >>> from pyspark.ml.linalg import Vectors
 >>> bdf = sc.parallelize([
-... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
-... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], 
[]))]).toDF()
->>> blor = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight")
+... Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)),
+... Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)),
+... Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)),
+... Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 
3.0))]).toDF()
+

spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests

2017-04-26 Thread yliang

Repository: spark
Updated Branches:
  refs/heads/master 7fecf5130 -> dbb06c689


[MINOR][ML] Fix some PySpark & SparkR flaky tests

## What changes were proposed in this pull request?
Some PySpark & SparkR tests run with tiny dataset and tiny ```maxIter```, which 
means they are not converged. I donât think checking intermediate result 
during iteration make sense, and these intermediate result may vulnerable and 
not stable, so we should switch to check the converged result. We hit this 
issue at #17746 when we upgrade breeze to 0.13.1.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang 

Closes #17757 from yanboliang/flaky-test.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dbb06c68
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dbb06c68
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dbb06c68

Branch: refs/heads/master
Commit: dbb06c689c157502cb081421baecce411832aad8
Parents: 7fecf51
Author: Yanbo Liang 
Authored: Wed Apr 26 21:34:18 2017 +0800
Committer: Yanbo Liang 
Committed: Wed Apr 26 21:34:18 2017 +0800

--
 .../tests/testthat/test_mllib_classification.R  | 17 +
 python/pyspark/ml/classification.py | 71 ++--
 2 files changed, 38 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dbb06c68/R/pkg/inst/tests/testthat/test_mllib_classification.R
--
diff --git a/R/pkg/inst/tests/testthat/test_mllib_classification.R 
b/R/pkg/inst/tests/testthat/test_mllib_classification.R
index af7cbdc..cbc7087 100644
--- a/R/pkg/inst/tests/testthat/test_mllib_classification.R
+++ b/R/pkg/inst/tests/testthat/test_mllib_classification.R
@@ -284,22 +284,11 @@ test_that("spark.mlp", {
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # test initialWeights
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
+  model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights =
 c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
   mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
   expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
-c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 
9.0))
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "2.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2)
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", 
"1.0", "0.0"))
+   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # Test formula works well
   df <- suppressWarnings(createDataFrame(iris))
@@ -310,8 +299,6 @@ test_that("spark.mlp", {
   expect_equal(summary$numOfOutputs, 3)
   expect_equal(summary$layers, c(4, 3))
   expect_equal(length(summary$weights), 15)
-  expect_equal(head(summary$weights, 5), list(-0.5793153, -4.652961, 6.216155, 
-6.649478,
-   -10.51147), tolerance = 1e-3)
 })
 
 test_that("spark.naiveBayes", {

http://git-wip-us.apache.org/repos/asf/spark/blob/dbb06c68/python/pyspark/ml/classification.py
--
diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 8649683..a9756ea 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -185,34 +185,33 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, 
HasLabelCol, HasPredicti
 >>> from pyspark.sql import Row
 >>> from pyspark.ml.linalg import Vectors
 >>> bdf = sc.parallelize([
-... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
-... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], 
[]))]).toDF()
->>> blor = LogisticRegression(maxIter=5, regParam=0.01, weightCol="weight")
+... Row(label=1.0, weight=1.0, features=Vectors.dense(0.0, 5.0)),
+... Row(label=0.0, weight=2.0, features=Vectors.dense(1.0, 2.0)),
+... Row(label=1.0, weight=3.0, features=Vectors.dense(2.0, 1.0)),
+... Row(label=0.0, weight=4.0, features=Vectors.dense(3.0, 
3.0))]).toDF()
+>>> blor = LogisticRegression(regParam=0.01, weightCol="weight")
 >>> blorModel = blor.fit(bdf)

spark git commit: [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…

2017-04-26 Thread tgraves

Repository: spark
Updated Branches:
  refs/heads/master 7a365257e -> 7fecf5130


[SPARK-19812] YARN shuffle service fails to relocate recovery DB acroâ¦

â¦ss NFS directories

## What changes were proposed in this pull request?

Change from using java Files.move to use Hadoop filesystem operations to move 
the directories.  The java Files.move does not work when moving directories 
across NFS mounts and in fact also says that if the directory has entries you 
should do a recursive move. We are already using Hadoop filesystem here so just 
use the local filesystem from there as it handles this properly.

Note that the DB here is actually a directory of files and not just a single 
file, hence the change in the name of the local var.

## How was this patch tested?

Ran YarnShuffleServiceSuite unit tests.  Unfortunately couldn't easily add one 
here since involves NFS.
Ran manual tests to verify that the DB directories were properly moved across 
NFS mounted directories. Have been running this internally for weeks.

Author: Tom Graves 

Closes #17748 from tgravescs/SPARK-19812.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7fecf513
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7fecf513
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7fecf513

Branch: refs/heads/master
Commit: 7fecf5130163df9c204a2764d121a7011d007f4e
Parents: 7a36525
Author: Tom Graves 
Authored: Wed Apr 26 08:23:31 2017 -0500
Committer: Tom Graves 
Committed: Wed Apr 26 08:23:31 2017 -0500

--
 .../spark/network/yarn/YarnShuffleService.java  | 23 +++-
 1 file changed, 13 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7fecf513/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
--
diff --git 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
index c7620d0..4acc203 100644
--- 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
+++ 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
@@ -21,7 +21,6 @@ import java.io.File;
 import java.io.IOException;
 import java.nio.charset.StandardCharsets;
 import java.nio.ByteBuffer;
-import java.nio.file.Files;
 import java.util.List;
 import java.util.Map;
 
@@ -340,9 +339,9 @@ public class YarnShuffleService extends AuxiliaryService {
* when it previously was not. If YARN NM recovery is enabled it uses that 
path, otherwise
* it will uses a YARN local dir.
*/
-  protected File initRecoveryDb(String dbFileName) {
+  protected File initRecoveryDb(String dbName) {
 if (_recoveryPath != null) {
-File recoveryFile = new File(_recoveryPath.toUri().getPath(), 
dbFileName);
+File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName);
 if (recoveryFile.exists()) {
   return recoveryFile;
 }
@@ -350,7 +349,7 @@ public class YarnShuffleService extends AuxiliaryService {
 // db doesn't exist in recovery path go check local dirs for it
 String[] localDirs = 
_conf.getTrimmedStrings("yarn.nodemanager.local-dirs");
 for (String dir : localDirs) {
-  File f = new File(new Path(dir).toUri().getPath(), dbFileName);
+  File f = new File(new Path(dir).toUri().getPath(), dbName);
   if (f.exists()) {
 if (_recoveryPath == null) {
   // If NM recovery is not enabled, we should specify the recovery 
path using NM local
@@ -363,17 +362,21 @@ public class YarnShuffleService extends AuxiliaryService {
   // make sure to move all DBs to the recovery path from the old NM 
local dirs.
   // If another DB was initialized first just make sure all the DBs 
are in the same
   // location.
-  File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName);
-  if (!newLoc.equals(f)) {
+  Path newLoc = new Path(_recoveryPath, dbName);
+  Path copyFrom = new Path(f.toURI()); 
+  if (!newLoc.equals(copyFrom)) {
+logger.info("Moving " + copyFrom + " to: " + newLoc); 
 try {
-  Files.move(f.toPath(), newLoc.toPath());
+  // The move here needs to handle moving non-empty directories 
across NFS mounts
+  FileSystem fs = FileSystem.getLocal(_conf);
+  fs.rename(copyFrom, newLoc);
 } catch (Exception e) {
   // Fail to move recovery file to new path, just continue on with 
new DB location
   logger.error("Failed to move recovery file {} to the path {}",
-dbFileName,

spark git commit: [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…

2017-04-26 Thread tgraves

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 a2f5ced32 -> 612952251


[SPARK-19812] YARN shuffle service fails to relocate recovery DB acroâ¦

â¦ss NFS directories

## What changes were proposed in this pull request?

Change from using java Files.move to use Hadoop filesystem operations to move 
the directories.  The java Files.move does not work when moving directories 
across NFS mounts and in fact also says that if the directory has entries you 
should do a recursive move. We are already using Hadoop filesystem here so just 
use the local filesystem from there as it handles this properly.

Note that the DB here is actually a directory of files and not just a single 
file, hence the change in the name of the local var.

## How was this patch tested?

Ran YarnShuffleServiceSuite unit tests.  Unfortunately couldn't easily add one 
here since involves NFS.
Ran manual tests to verify that the DB directories were properly moved across 
NFS mounted directories. Have been running this internally for weeks.

Author: Tom Graves 

Closes #17748 from tgravescs/SPARK-19812.

(cherry picked from commit 7fecf5130163df9c204a2764d121a7011d007f4e)
Signed-off-by: Tom Graves 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/61295225
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/61295225
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/61295225

Branch: refs/heads/branch-2.2
Commit: 612952251c5ac626e256bc2ab9414faf1662dde9
Parents: a2f5ced
Author: Tom Graves 
Authored: Wed Apr 26 08:23:31 2017 -0500
Committer: Tom Graves 
Committed: Wed Apr 26 08:24:12 2017 -0500

--
 .../spark/network/yarn/YarnShuffleService.java  | 23 +++-
 1 file changed, 13 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/61295225/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
--
diff --git 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
index c7620d0..4acc203 100644
--- 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
+++ 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
@@ -21,7 +21,6 @@ import java.io.File;
 import java.io.IOException;
 import java.nio.charset.StandardCharsets;
 import java.nio.ByteBuffer;
-import java.nio.file.Files;
 import java.util.List;
 import java.util.Map;
 
@@ -340,9 +339,9 @@ public class YarnShuffleService extends AuxiliaryService {
* when it previously was not. If YARN NM recovery is enabled it uses that 
path, otherwise
* it will uses a YARN local dir.
*/
-  protected File initRecoveryDb(String dbFileName) {
+  protected File initRecoveryDb(String dbName) {
 if (_recoveryPath != null) {
-File recoveryFile = new File(_recoveryPath.toUri().getPath(), 
dbFileName);
+File recoveryFile = new File(_recoveryPath.toUri().getPath(), dbName);
 if (recoveryFile.exists()) {
   return recoveryFile;
 }
@@ -350,7 +349,7 @@ public class YarnShuffleService extends AuxiliaryService {
 // db doesn't exist in recovery path go check local dirs for it
 String[] localDirs = 
_conf.getTrimmedStrings("yarn.nodemanager.local-dirs");
 for (String dir : localDirs) {
-  File f = new File(new Path(dir).toUri().getPath(), dbFileName);
+  File f = new File(new Path(dir).toUri().getPath(), dbName);
   if (f.exists()) {
 if (_recoveryPath == null) {
   // If NM recovery is not enabled, we should specify the recovery 
path using NM local
@@ -363,17 +362,21 @@ public class YarnShuffleService extends AuxiliaryService {
   // make sure to move all DBs to the recovery path from the old NM 
local dirs.
   // If another DB was initialized first just make sure all the DBs 
are in the same
   // location.
-  File newLoc = new File(_recoveryPath.toUri().getPath(), dbFileName);
-  if (!newLoc.equals(f)) {
+  Path newLoc = new Path(_recoveryPath, dbName);
+  Path copyFrom = new Path(f.toURI()); 
+  if (!newLoc.equals(copyFrom)) {
+logger.info("Moving " + copyFrom + " to: " + newLoc); 
 try {
-  Files.move(f.toPath(), newLoc.toPath());
+  // The move here needs to handle moving non-empty directories 
across NFS mounts
+  FileSystem fs = FileSystem.getLocal(_conf);
+  fs.rename(copyFrom, newLoc);
 } catch (Exception e) {
   // Fail to move recovery file to new path, just continue on with 
new DB location

spark git commit: [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

2017-04-26 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 c8803c068 -> a2f5ced32


[SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

## What changes were proposed in this pull request?

Simple documentation change to remove explicit vendor references.

## How was this patch tested?

NA

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: anabranch 

Closes #17695 from anabranch/remove-vendor.

(cherry picked from commit 7a365257e934e838bd90f6a0c50362bf47202b0e)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2f5ced3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2f5ced3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2f5ced3

Branch: refs/heads/branch-2.2
Commit: a2f5ced3236db665bb33adc1bf1f90553997f46b
Parents: c8803c0
Author: anabranch 
Authored: Wed Apr 26 09:49:05 2017 +0100
Committer: Sean Owen 
Committed: Wed Apr 26 09:49:13 2017 +0100

--
 docs/configuration.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a2f5ced3/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 87b7632..8b53e92 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -2270,8 +2270,8 @@ should be included on Spark's classpath:
 * `hdfs-site.xml`, which provides default behaviors for the HDFS client.
 * `core-site.xml`, which sets the default filesystem name.
 
-The location of these configuration files varies across CDH and HDP versions, 
but
-a common location is inside of `/etc/hadoop/conf`. Some tools, such as 
Cloudera Manager, create
+The location of these configuration files varies across Hadoop versions, but
+a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of them.
 
 To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

2017-04-26 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master df58a95a3 -> 7a365257e


[SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

## What changes were proposed in this pull request?

Simple documentation change to remove explicit vendor references.

## How was this patch tested?

NA

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: anabranch 

Closes #17695 from anabranch/remove-vendor.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7a365257
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7a365257
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7a365257

Branch: refs/heads/master
Commit: 7a365257e934e838bd90f6a0c50362bf47202b0e
Parents: df58a95
Author: anabranch 
Authored: Wed Apr 26 09:49:05 2017 +0100
Committer: Sean Owen 
Committed: Wed Apr 26 09:49:05 2017 +0100

--
 docs/configuration.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7a365257/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 87b7632..8b53e92 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -2270,8 +2270,8 @@ should be included on Spark's classpath:
 * `hdfs-site.xml`, which provides default behaviors for the HDFS client.
 * `core-site.xml`, which sets the default filesystem name.
 
-The location of these configuration files varies across CDH and HDP versions, 
but
-a common location is inside of `/etc/hadoop/conf`. Some tools, such as 
Cloudera Manager, create
+The location of these configuration files varies across Hadoop versions, but
+a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of them.
 
 To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-20425][SQL] Support a vertical display mode for Dataset.show

[1/2] spark git commit: Preparing Spark release v2.2.0-rc1

[2/2] spark git commit: Preparing development version 2.2.0-SNAPSHOT

[spark] Git Push Summary

spark git commit: [SPARK-20435][CORE] More thorough redaction of sensitive information

spark git commit: [SPARK-20435][CORE] More thorough redaction of sensitive information

spark git commit: [SPARK-12868][SQL] Allow adding jars from hdfs

spark git commit: [SPARK-12868][SQL] Allow adding jars from hdfs

spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation

spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation

spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array

spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array

spark git commit: [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay

spark git commit: [SPARK-20391][CORE] Rename memory related fields in ExecutorSummay

spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests

spark git commit: [MINOR][ML] Fix some PySpark & SparkR flaky tests

spark git commit: [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…

spark git commit: [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro…

spark git commit: [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

spark git commit: [SPARK-20400][DOCS] Remove References to 3rd Party Vendor Tools

20 matches

Site Navigation

Mail list logo

Footer information