date:20180703

[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21701
  
**[Test build #92553 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92553/testReport)**
 for PR 21701 at commit 
[`c0d1c6e`](https://github.com/apache/spark/commit/c0d1c6e0a5532eeab0848834d2dc348808e54069).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `sealed trait MultipleWatermarkPolicy `
  * `case class WatermarkTracker(policy: MultipleWatermarkPolicy) extends 
Logging `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21701
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92553/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss

Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199701953
  
--- Diff: python/pyspark/sql/conf.py ---
@@ -59,7 +62,7 @@ def unset(self, key):
 
 def _checkType(self, obj, identifier):
 """Assert that an object is of type str."""
-if not isinstance(obj, str) and not isinstance(obj, unicode):
+if not isinstance(obj, basestring):
--- End diff --

Is there an issue here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss

Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199702001
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,6 +49,9 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

It creates a new function called __raw_input()__ that is identical to the 
builtin __input()__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21701
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-03 Thread cclauss

Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20838#discussion_r199701965
  
--- Diff: dev/merge_spark_pr.py ---
@@ -39,6 +39,9 @@
 except ImportError:
 JIRA_IMPORTED = False
 
+if sys.version_info[0] >= 3:
+raw_input = input
--- End diff --

It creates a new function called __raw_input()__ that is identical to the 
builtin __input()__.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21596
  
**[Test build #92554 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92554/testReport)**
 for PR 21596 at commit 
[`5006467`](https://github.com/apache/spark/commit/50064675706f7ac46f2665da752e0f410ad84183).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92554/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21633
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/632/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21633
  
**[Test build #92555 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92555/testReport)**
 for PR 21633 at commit 
[`4419f52`](https://github.com/apache/spark/commit/4419f52bf0104cc44fc6b27183030876778bbdc4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21633
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread cclauss

GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/21702

[SPARK-23698] Remove raw_input() from Python 2

Signed-off-by: cclauss 

## What changes were proposed in this pull request?

Humans will be able to enter text in Python 3 prompts which they can not do 
today.
The Python builtin __raw_input()__ was removed in Python 3 in favor of 
__input()__.  This PR does the same thing in Python 2.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
flake8 testing

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark python-fix-raw_input

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21702


commit 960769a735933d58d136ea068954ad83d4731b10
Author: cclauss 
Date:   2018-07-03T07:10:46Z

[SPARK-23698] Remove raw_input() from Python 2

Signed-off-by: cclauss 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-07-03 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21667#discussion_r199705148
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -42,63 +38,27 @@ object DataSourceUtils {
 
   /**
* Verify if the schema is supported in datasource. This verification 
should be done
-   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
-   * in `FileFormat`.
-   *
-   * Unsupported data types of csv, json, orc, and parquet are as follows;
-   *  csv -> R/W: Interval, Null, Array, Map, Struct
-   *  json -> W: Interval
-   *  orc -> W: Interval, Null
-   *  parquet -> R/W: Interval, Null
+   * in a driver side.
*/
   private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
-def throwUnsupportedException(dataType: DataType): Unit = {
-  throw new UnsupportedOperationException(
-s"$format data source does not support ${dataType.simpleString} 
data type.")
-}
-
-def verifyType(dataType: DataType): Unit = dataType match {
-  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
-   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
-
-  // All the unsupported types for CSV
-  case _: NullType | _: CalendarIntervalType | _: StructType | _: 
ArrayType | _: MapType
-  if format.isInstanceOf[CSVFileFormat] =>
-throwUnsupportedException(dataType)
-
-  case st: StructType => st.foreach { f => verifyType(f.dataType) }
-
-  case ArrayType(elementType, _) => verifyType(elementType)
-
-  case MapType(keyType, valueType, _) =>
-verifyType(keyType)
-verifyType(valueType)
-
-  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
-
-  // Interval type not supported in all the write path
-  case _: CalendarIntervalType if !isReadPath =>
-throwUnsupportedException(dataType)
-
-  // JSON and ORC don't support an Interval type, but we pass it in 
read pass
-  // for back-compatibility.
-  case _: CalendarIntervalType if format.isInstanceOf[JsonFileFormat] 
||
-format.isInstanceOf[OrcFileFormat] =>
+def verifyType(dataType: DataType): Unit = {
+  if (!format.supportDataType(dataType, isReadPath)) {
+throw new UnsupportedOperationException(
+  s"$format data source does not support ${dataType.simpleString} 
data type.")
+  }
+  dataType match {
--- End diff --

I see. I will update it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21702
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21702
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21702
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-07-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21649#discussion_r199706822
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3905,6 +3905,16 @@ setMethod("rollup",
 groupedData(sgd)
   })
 
+isTypeAllowedForSqlHint <- function(x) {
+  if (is.character(x) | is.numeric(x)) {
+TRUE
+  } else if (is.list(x)) {
+all (sapply(x, (function (y) is.character(y) | is.numeric(y
+  } else {
+FALSE
+  }
+}
+
 #' hint
 #'
 #' Specifies execution plan hint and return a new SparkDataFrame.
--- End diff --

the concern would be if other types in python or R are going to be 
translated/mapped properly to Java/Scala types, so this is probably ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-07-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21678
  
ok then, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...

2018-07-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21678
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21666: [SPARK-24535][SPARKR] fix tests on java check err...

2018-07-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21666#discussion_r199713355
  
--- Diff: R/pkg/R/client.R ---
@@ -61,6 +61,11 @@ generateSparkSubmitArgs <- function(args, sparkHome, 
jars, sparkSubmitOpts, pack
 }
 
 checkJavaVersion <- function() {
+  if (is_windows()) {
+# See SPARK-24535
--- End diff --

updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/633/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21666
  
**[Test build #92556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92556/testReport)**
 for PR 21666 at commit 
[`e1d1a64`](https://github.com/apache/spark/commit/e1d1a64f5bf38710560c6c83b46d8562bb53dd35).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21666
  
**[Test build #92557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92557/testReport)**
 for PR 21666 at commit 
[`8d9ef83`](https://github.com/apache/spark/commit/8d9ef83deaecc9a0c0c193b7a56c6c4177cbb952).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/634/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21596
  
**[Test build #92558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92558/testReport)**
 for PR 21596 at commit 
[`4b78651`](https://github.com/apache/spark/commit/4b786518095c7ed2fd034f74e5b4bd83a3062c29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21702#discussion_r199718062
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,13 +49,16 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version < '3':
+input = raw_input
--- End diff --

If we can do the opposite, the diff should be only 4 lines though


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...

2018-07-03 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/21703

[SPARK-24732][SQL] Type coercion between MapTypes.

## What changes were proposed in this pull request?

Currently we don't allow type coercion between maps.
We can support type coercion between MapTypes where both the key types and 
the value types are compatible.

## How was this patch tested?

Added tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-24732/maptypecoercion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21703


commit 928501a63f2ae90b4d95949e6fc505b762d03ac7
Author: Takuya UESHIN 
Date:   2018-07-03T08:08:25Z

Type coercion between MapTypes.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21703
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/635/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21703
  
cc @gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21703
  
**[Test build #92559 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92559/testReport)**
 for PR 21703 at commit 
[`928501a`](https://github.com/apache/spark/commit/928501a63f2ae90b4d95949e6fc505b762d03ac7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat

2018-07-03 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21073#discussion_r199722217
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -551,6 +551,36 @@ object TypeCoercion {
   case None => s
 }
 
+  case m @ MapConcat(children) if children.forall(c => 
MapType.acceptsType(c.dataType)) &&
+!haveSameType(children) =>
+val keyTypes = 
children.map(_.dataType.asInstanceOf[MapType].keyType)
--- End diff --

As for 1), I submitted a pr #21703. I'm not sure we can merge it yet, but 
it will help you improve this pr.
As for 2), Adding casts to the same type should not be the problem because 
the extra casts will be removed during the optimization phase.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread cclauss

Github user cclauss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21702#discussion_r199723478
  
--- Diff: dev/create-release/releaseutils.py ---
@@ -49,13 +49,16 @@
 print("Install using 'sudo pip install unidecode'")
 sys.exit(-1)
 
+if sys.version < '3':
+input = raw_input
--- End diff --

Two downsides to that approach:
1. We stick with the legacy Python syntax which will unnecessarily 
complicate our lives (and our diffs) [in 18 months](http://pythonclock.org)
2. This approach reduces the linting errors from 10 down to just 2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21666
  
**[Test build #92556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92556/testReport)**
 for PR 21666 at commit 
[`e1d1a64`](https://github.com/apache/spark/commit/e1d1a64f5bf38710560c6c83b46d8562bb53dd35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92556/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92557/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21666
  
**[Test build #92557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92557/testReport)**
 for PR 21666 at commit 
[`8d9ef83`](https://github.com/apache/spark/commit/8d9ef83deaecc9a0c0c193b7a56c6c4177cbb952).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21703
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...

2018-07-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21699#discussion_r199742580
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -340,36 +340,52 @@ class RelationalGroupedDataset protected[sql](
 
   /**
* Pivots a column of the current `DataFrame` and performs the specified 
aggregation.
-   * There are two versions of pivot function: one that requires the 
caller to specify the list
-   * of distinct values to pivot on, and one that does not. The latter is 
more concise but less
-   * efficient, because Spark needs to first compute the list of distinct 
values internally.
*
* {{{
*   // Compute the sum of earnings for each year by course with each 
course as a separate column
-   *   df.groupBy("year").pivot("course", Seq("dotNET", 
"Java")).sum("earnings")
-   *
-   *   // Or without specifying column values (less efficient)
-   *   df.groupBy("year").pivot("course").sum("earnings")
+   *   df.groupBy($"year").pivot($"course", Seq("dotNET", 
"Java")).sum($"earnings")
* }}}
*
-   * @param pivotColumn Name of the column to pivot.
+   * @param pivotColumn the column to pivot.
* @param values List of values that will be translated to columns in 
the output DataFrame.
-   * @since 1.6.0
+   * @since 2.4.0
*/
-  def pivot(pivotColumn: String, values: Seq[Any]): 
RelationalGroupedDataset = {
+  def pivot(pivotColumn: Column, values: Seq[Any]): 
RelationalGroupedDataset = {
--- End diff --

To make diffs smaller, can you move this under the signature `def 
pivot(pivotColumn: String, values: Seq[Any])`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21699
  
cc: @rxin @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21699
  
 `def pivot(pivotColumn: String)`, too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21693: [SPARK-24673][SQL] scala sql function from_utc_ti...

2018-07-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21693#discussion_r199744502
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2945,6 +2956,17 @@ object functions {
 ToUTCTimestamp(ts.expr, Literal(tz))
   }
 
+  /**
+   * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a 
time in the given time
+   * zone, and renders that time as a timestamp in UTC. For example, 
'GMT+1' would yield
+   * '2017-07-14 01:40:00.0'.
+   * @group datetime_funcs
+   * @since 1.5.0
--- End diff --

`@since 2.4.0`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21693: [SPARK-24673][SQL] scala sql function from_utc_ti...

2018-07-03 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21693#discussion_r199744569
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2934,6 +2934,17 @@ object functions {
 FromUTCTimestamp(ts.expr, Literal(tz))
   }
 
+  /**
+   * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a 
time in UTC, and renders
+   * that time as a timestamp in the given time zone. For example, 'GMT+1' 
would yield
+   * '2017-07-14 03:40:00.0'.
+   * @group datetime_funcs
+   * @since 1.5.0
--- End diff --

`@since 2.4.0`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21693: [SPARK-24673][SQL] scala sql function from_utc_timestamp...

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21693
  
cc: @ueshin @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21260
  
**[Test build #92560 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92560/testReport)**
 for PR 21260 at commit 
[`45eb477`](https://github.com/apache/spark/commit/45eb477623d89fb9352bf38b75c0a27e228f291f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21260
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21260
  
**[Test build #92560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92560/testReport)**
 for PR 21260 at commit 
[`45eb477`](https://github.com/apache/spark/commit/45eb477623d89fb9352bf38b75c0a27e228f291f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21260
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92560/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21633
  
**[Test build #92555 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92555/testReport)**
 for PR 21633 at commit 
[`4419f52`](https://github.com/apache/spark/commit/4419f52bf0104cc44fc6b27183030876778bbdc4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21633
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21633
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92555/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21699
  
**[Test build #92561 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92561/testReport)**
 for PR 21699 at commit 
[`d62b7e7`](https://github.com/apache/spark/commit/d62b7e789f38219b62fb5b010fb2cacc0324fe29).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...

2018-07-03 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/21704

[SPARK-24734][SQL] Fix containsNull of Concat for array type.

## What changes were proposed in this pull request?

Currently `Concat` for array type uses the data type of the first child as 
its own data type, but the children might include an array containing nulls.
We should aware the nullabilities of all children.

## How was this patch tested?

Modified and added some tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-24734/concat_containsnull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21704


commit d87a8c6c0d1a4db5c9444781160a65562f8ea738
Author: Takuya UESHIN 
Date:   2018-07-03T11:21:06Z

Fix containsNull of Concat for array type.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21704
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/636/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21704
  
**[Test build #92562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92562/testReport)**
 for PR 21704 at commit 
[`d87a8c6`](https://github.com/apache/spark/commit/d87a8c6c0d1a4db5c9444781160a65562f8ea738).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-07-03 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/13599
  
Is there any work being done on this PR at this point in time? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...

2018-07-03 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21704
  
cc @mn-mikke @gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21703
  
**[Test build #92559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92559/testReport)**
 for PR 21703 at commit 
[`928501a`](https://github.com/apache/spark/commit/928501a63f2ae90b4d95949e6fc505b762d03ac7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21703
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92559/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...

2018-07-03 Thread mn-mikke

Github user mn-mikke commented on the issue:

https://github.com/apache/spark/pull/21704
  
@ueshin Thanks for bringing this topic! This problem with different 
```nullable```/```containsNull``` flags seems to be more generic. In 
[21687](https://github.com/apache/spark/pull/21687), we've touched a similar 
problem with ```CaseWhen``` and ```If``` expression. So I think It would nice 
if we could think together about a generic and consistent solution for all 
espressions. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21596
  
**[Test build #92558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92558/testReport)**
 for PR 21596 at commit 
[`4b78651`](https://github.com/apache/spark/commit/4b786518095c7ed2fd034f74e5b4bd83a3062c29).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92558/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21633
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21633
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/637/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21633
  
**[Test build #92563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92563/testReport)**
 for PR 21633 at commit 
[`4419f52`](https://github.com/apache/spark/commit/4419f52bf0104cc44fc6b27183030876778bbdc4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...

2018-07-03 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21704
  
@mn-mikke Thanks! I'll take a look and join the discussion later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2

2018-07-03 Thread cclauss

Github user cclauss commented on the issue:

https://github.com/apache/spark/pull/21702
  
Tested data entry...
In __./dev/merge_spark_pr.py__ just after __clean_up()__, I added the lines:
```
while not continue_maybe('y to conntinue'):
print('loop')
sys.exit()
```
Test: 'y' or 'Y' caused a loop while all others including "", "yes", "n", 
"N", "0", "1", "." caused an exit()
Identical results on both Python 2 and Python 3
---
In __./dev/create-release/releaseutils.py__ just after __yesOrNoPrompt()_ I 
added the lines:
```
while not yesOrNoPrompt('y to quit'):
print('got an 'n'')
sys.exit()
```
Test: 'y' caused an exit() while all others including "", "Y", "yes", "n", 
"N", "0", "1", "." caused a loop
Identical results on both Python 2 and Python 3



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread MaxGekk

Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21596
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...

2018-07-03 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/21669
  
@ifilonenko build fails due to the tags issue I guess. I fixed it in the 
other PR ;)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry and example for D...

2018-07-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20446
  
@WeichenXu123 looks like there was one more outstanding comment, about 
using `.show()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...

2018-07-03 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r199801815
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -336,7 +336,7 @@ private[spark] class SparkSubmit extends Logging {
 val targetDir = Utils.createTempDir()
 
 // assure a keytab is available from any place in a JVM
-if (clusterManager == YARN || clusterManager == LOCAL || 
isMesosClient) {
+if (clusterManager == YARN || clusterManager == LOCAL || isMesosClient 
|| isKubernetesCluster) {
--- End diff --

This check has been restrictive for customers in the past. There are cases 
where spark submit should not have the file locally and keytab should be 
mounted as a secret within the cluster.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21433
  
**[Test build #4203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4203/testReport)**
 for PR 21433 at commit 
[`245181a`](https://github.com/apache/spark/commit/245181a6ebb03b4f394097297ae245705aaf9b0f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21596#discussion_r199803209
  
--- Diff: pom.xml ---
@@ -158,8 +158,8 @@
 2.11.12
 2.11
 1.9.13
-2.6.7
-
2.6.7.1
+2.9.6
+
2.9.6
--- End diff --

I suspect we can collapse these two versions; they were broken out to 
handle the fact that a few 2.6.x Jackson releases didn't publish all artifacts. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...

2018-07-03 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r199803715
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -211,6 +211,51 @@ private[spark] object Config extends Logging {
 "Ensure that major Python version is either Python2 or Python3")
   .createWithDefault("2")
 
+  val KUBERNETES_KERBEROS_SUPPORT =
+ConfigBuilder("spark.kubernetes.kerberos.enabled")
+  .doc("Specify whether your job is a job that will require a 
Delegation Token to access HDFS")
--- End diff --

I think kerberos goes beyond DTs so it shouldnt be specific to that. Also I 
think you dont need the user to pass that. You just need to call: 
UserGroupInformation.isSecurityEnabled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21705
  
cc: @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...

2018-07-03 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/21705

[SPARK-24727][SQL] Add a static config to control cache size for generated 
classes

## What changes were proposed in this pull request?
Since SPARK-24250 has been resolved, executors correctly references 
user-defined configurations. So, this pr added a static config to control cache 
size for generated classes in `CodeGenerator`.

## How was this patch tested?
Manually checked that executors referenced `spark.sql.cacheSize` correctly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-24727

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21705


commit 8ee8e00f156e577b32b01d015b8bd24f72ae7340
Author: Takeshi Yamamuro 
Date:   2018-07-03T13:14:35Z

Fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/638/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21693: [SPARK-24673][SQL] scala sql function from_utc_timestamp...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21693
  
**[Test build #4204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4204/testReport)**
 for PR 21693 at commit 
[`d4ebc8f`](https://github.com/apache/spark/commit/d4ebc8f45aa78eae13cb6166204f0f5de9de4bd8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21657
  
@HyukjinKwon kindly ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21705
  
**[Test build #92564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92564/testReport)**
 for PR 21705 at commit 
[`8ee8e00`](https://github.com/apache/spark/commit/8ee8e00f156e577b32b01d015b8bd24f72ae7340).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21705
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...

2018-07-03 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21669#discussion_r199806583
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 ---
@@ -81,4 +83,35 @@ private[spark] object Constants {
   val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc";
   val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver"
   val MEMORY_OVERHEAD_MIN_MIB = 384L
+
+  // Hadoop Configuration
+  val HADOOP_FILE_VOLUME = "hadoop-properties"
+  val HADOOP_CONF_DIR_PATH = "/etc/hadoop/conf"
+  val ENV_HADOOP_CONF_DIR = "HADOOP_CONF_DIR"
+  val HADOOP_CONF_DIR_LOC = "spark.kubernetes.hadoop.conf.dir"
+  val HADOOP_CONFIG_MAP_SPARK_CONF_NAME =
+"spark.kubernetes.hadoop.executor.hadoopConfigMapName"
+
+  // Kerberos Configuration
+  val KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME =
+"spark.kubernetes.kerberos.delegation-token-secret-name"
+  val KERBEROS_KEYTAB_SECRET_NAME =
+"spark.kubernetes.kerberos.key-tab-secret-name"
+  val KERBEROS_KEYTAB_SECRET_KEY =
+"spark.kubernetes.kerberos.key-tab-secret-key"
+  val KERBEROS_SPARK_USER_NAME =
+"spark.kubernetes.kerberos.spark-user-name"
+  val KERBEROS_SECRET_LABEL_PREFIX =
+"hadoop-tokens"
+  val SPARK_HADOOP_PREFIX = "spark.hadoop."
+  val HADOOP_SECURITY_AUTHENTICATION =
+SPARK_HADOOP_PREFIX + "hadoop.security.authentication"
+
+  // Kerberos Token-Refresh Server
+  val KERBEROS_REFRESH_LABEL_KEY = "refresh-hadoop-tokens"
--- End diff --

I left a comment also in the design doc, can we also provide the option for 
using an existing renewal service like when integrating with an external hadoop 
cluster where people already have that. This is how it worked for mesos so far.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21705
  
can we add a test case in `ExecutorSideSQLConfSuite` to prove that static 
conf also works?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21705
  
ok, will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-03 Thread szhem

Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19410
  
@mallman 

Just my two cents regarding built-in solutions:

Periodic checkpointer deletes checkpoint files not to pollute the hard 
drive. Although disk storage is cheap it's not free. 

For example, in my case (graph with >1B vertices and about the same amount 
of edges) checkpoint directory with a single checkpoint took about 150-200GB. 
Checkpoint interval was set to 5, and then job was able to complete in 
about 100 iterations.
So in case of not cleaning up unnecessary checkpoints, the checkpoint 
directory could grow up to 6TB (which is quite a lot) in my case.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/639/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...

2018-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21705
  
**[Test build #92565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92565/testReport)**
 for PR 21705 at commit 
[`0a9eaa2`](https://github.com/apache/spark/commit/0a9eaa26356e6c0adef53b07c47ed19265aa9383).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...

2018-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21705#discussion_r199817343
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala
 ---
@@ -40,16 +40,24 @@ class ExecutorSideSQLConfSuite extends SparkFunSuite 
with SQLTestUtils {
 spark = null
   }
 
+  private def withStaticSQLConf(pairs: (String, String)*)(f: => Unit): 
Unit = {
--- End diff --

ah sorry I was wrong. Static conf is no different from normal conf, it's 
just immutable during runtime. Maybe just call this method `withSQLConf`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 435 matches

Mail list logo