[GitHub] spark issue #21380: [SPARK-24329][SQL] Remove comments filtering before pars...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21380
  
**[Test build #90889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90889/testReport)**
 for PR 21380 at commit 
[`3652268`](https://github.com/apache/spark/commit/36522689f9579ec05e7d69d1d7bd1f507f6bdbc0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r189525864
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 val accUpdate3 = new LongAccumulator
 accUpdate3.metadata = acc3.metadata
 accUpdate3.setValue(18)
-val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3)
-val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo)
+
+val accumUpdates1 = Seq(accUpdate1, accUpdate2)
+val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo)
 val exceptionFailure = new ExceptionFailure(
   new SparkException("fondue?"),
-  accumInfo).copy(accums = accumUpdates)
+  accumInfo1).copy(accums = accumUpdates1)
--- End diff --

We can avoid the `copy` call.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...

2018-05-21 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21361
  
@cloud-fan sorry but I am not sure I got it. May you please provide me some 
more details about the  end-to-end test case for `GetMapValue` you want me to 
add? Thanks.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

2018-05-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21363#discussion_r189526889
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -90,6 +90,7 @@ private[csv] object CSVInferSchema {
   // DecimalTypes have different precisions and scales, so we try 
to find the common type.
   findTightestCommonType(typeSoFar, tryParseDecimal(field, 
options)).getOrElse(StringType)
 case DoubleType => tryParseDouble(field, options)
+case DateType => tryParseDate(field, options)
--- End diff --

this also is a behavior change. Shall we document it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21331
  
**[Test build #90890 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90890/testReport)**
 for PR 21331 at commit 
[`ccbdd11`](https://github.com/apache/spark/commit/ccbdd11a1f2ff6f08db47694f315109b61c8726e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3412/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21331
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r189530510
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 val accUpdate3 = new LongAccumulator
 accUpdate3.metadata = acc3.metadata
 accUpdate3.setValue(18)
-val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3)
-val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo)
+
+val accumUpdates1 = Seq(accUpdate1, accUpdate2)
+val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo)
 val exceptionFailure = new ExceptionFailure(
   new SparkException("fondue?"),
-  accumInfo).copy(accums = accumUpdates)
+  accumInfo1).copy(accums = accumUpdates1)
--- End diff --

Ah, this `copy` call cannot be avoided as only the 2 arguments constructor
``` private[spark] def this(e: Throwable, accumUpdates: 
Seq[AccumulableInfo]) ``` is defined.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...

2018-05-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21165#discussion_r189530671
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 val accUpdate3 = new LongAccumulator
 accUpdate3.metadata = acc3.metadata
 accUpdate3.setValue(18)
-val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3)
-val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo)
+
+val accumUpdates1 = Seq(accUpdate1, accUpdate2)
+val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo)
 val exceptionFailure = new ExceptionFailure(
   new SparkException("fondue?"),
-  accumInfo).copy(accums = accumUpdates)
+  accumInfo1).copy(accums = accumUpdates1)
 submit(new MyRDD(sc, 1, Nil), Array(0))
 runEvent(makeCompletionEvent(taskSets.head.tasks.head, 
exceptionFailure, "result"))
+
 assert(AccumulatorContext.get(acc1.id).get.value === 15L)
 assert(AccumulatorContext.get(acc2.id).get.value === 13L)
+
+val accumUpdates2 = Seq(accUpdate3)
+val accumInfo2 = accumUpdates2.map(AccumulatorSuite.makeInfo)
+
+val taskKilled = new TaskKilled(
+  "test",
+  accumInfo2).copy(accums = accumUpdates2)
--- End diff --

We can avoid this `copy` call


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21165
  
**[Test build #90891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90891/testReport)**
 for PR 21165 at commit 
[`74911b7`](https://github.com/apache/spark/commit/74911b7a8d7714618ab060b3227e33505b0c5d05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...

2018-05-21 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21368#discussion_r189781578
  
--- Diff: python/pyspark/sql/session.py ---
@@ -547,6 +547,40 @@ def _create_from_pandas_with_arrow(self, pdf, schema, 
timezone):
 df._schema = schema
 return df
 
+@staticmethod
+def _create_shell_session():
+"""
+Initialize a SparkSession for a pyspark shell session. This is 
called from shell.py
+to make error handling simpler without needing to declare local 
variables in that
+script, which would expose those to users.
+"""
+import py4j
+from pyspark.conf import SparkConf
+from pyspark.context import SparkContext
+try:
+# Try to access HiveConf, it will raise exception if Hive is 
not added
+conf = SparkConf()
+if conf.get('spark.sql.catalogImplementation', 'hive').lower() 
== 'hive':
+SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf()
+return SparkSession.builder\
+.enableHiveSupport()\
+.getOrCreate()
+else:
+return SparkSession.builder.getOrCreate()
+except py4j.protocol.Py4JError:
+if conf.get('spark.sql.catalogImplementation', '').lower() == 
'hive':
+warnings.warn("Fall back to non-hive support because 
failing to access HiveConf, "
+  "please make sure you build spark with hive")
+
+try:
+return SparkSession.builder.getOrCreate()
--- End diff --

the call flow seems to be changed here? I think this line is meant to be 
inside the handling of Py4JError?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21389
  
**[Test build #90934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90934/testReport)**
 for PR 21389 at commit 
[`0d88bcb`](https://github.com/apache/spark/commit/0d88bcb58f9298bed433b8febc4c9cfb5d92f6a9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90934/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21205: [SPARK-24134][Docs]A missing full-stop in doc "Tuning Sp...

2018-05-21 Thread XD-DENG
Github user XD-DENG commented on the issue:

https://github.com/apache/spark/pull/21205
  
Hi can any project admin check this PR? Understand it's a quite minor issue 
(just a missing comma), but the effort needed for checking is also quite low.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #90876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90876/testReport)**
 for PR 20345 at commit 
[`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21236
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90880/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90879/testReport)**
 for PR 21266 at commit 
[`d8c308f`](https://github.com/apache/spark/commit/d8c308fa43a001328b8645e0d339875342c25c67).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21236
  
**[Test build #90880 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)**
 for PR 21236 at commit 
[`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #90878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90878/testReport)**
 for PR 21288 at commit 
[`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21236
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90876/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21379
  
**[Test build #90874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90874/testReport)**
 for PR 21379 at commit 
[`8d97b0d`](https://github.com/apache/spark/commit/8d97b0deb5ed96094f70f16376b677fe3ff1bdfc).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90874/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21379
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90879/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90878/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21236
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21236
  
I'd retrigger the build for just checking again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21236
  
**[Test build #90880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)**
 for PR 21236 at commit 
[`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/21370
  
So one thing we might want to take a look at is 
application/vnd.dataresource+json for tables in the notebooks (see 
https://github.com/nteract/improved-spark-viz ).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21356
  
**[Test build #90873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90873/testReport)**
 for PR 21356 at commit 
[`09d55af`](https://github.com/apache/spark/commit/09d55afa4167460e732b2f4acb3cdde6029cf952).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21356
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90873/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21356
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189509532
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
+  false
+  
+Open eager evaluation on jupyter or not. If yes, dataframe will be ran 
automatically
--- End diff --

true is better since the default value is false.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189509097
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.jupyter.eagerEval.enabled
+  false
+  
+Open eager evaluation on jupyter or not. If yes, dataframe will be ran 
automatically
--- End diff --

nit: Open -> Enable


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r189510270
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -237,9 +238,13 @@ class Dataset[T] private[sql](
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
* @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param html If set to true, return output as html table.
--- End diff --

hmm, should we do this html in python side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...

2018-05-21 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/21366#discussion_r189739169
  
--- Diff: pom.xml ---
@@ -150,6 +150,7 @@
 
 4.5.4
 4.4.8
+3.0.1
--- End diff --

Noted, will remove in the next push.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...

2018-05-21 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21356
  
Merging to master / 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #90925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90925/testReport)**
 for PR 21366 at commit 
[`aabc187`](https://github.com/apache/spark/commit/aabc1872280f2f1c993a619e489c70370144990f).
 * This patch **fails build dependency tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90925/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20887
  
**[Test build #90918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90918/testReport)**
 for PR 20887 at commit 
[`f19cda3`](https://github.com/apache/spark/commit/f19cda3921fee2f7d7885b041b15607436e45d0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

2018-05-21 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21295#discussion_r189748452
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
@@ -147,7 +147,8 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptCont
 this.sparkSchema = 
StructType$.MODULE$.fromString(sparkRequestedSchemaString);
 this.reader = new ParquetFileReader(
 configuration, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());
-for (BlockMetaData block : blocks) {
+// use the blocks from the reader in case some do not match filters 
and will not be read
--- End diff --

Actually, it is fine and more correct for this to be ported to older 
versions. I doubt it will because it is unnecessary though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...

2018-05-21 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21295#discussion_r189748419
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -879,6 +879,18 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   }
 }
   }
+
+  test("SPARK-24230: filter row group using dictionary") {
+withSQLConf(("parquet.filter.dictionary.enabled", "true")) {
--- End diff --

Actually, it is fine and more correct for this to be ported to older 
versions. I doubt it will because it is unnecessary though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should ...

2018-05-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20887#discussion_r189740429
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2792,4 +2793,40 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("`Cast` to CHAR/VARCHAR should truncate the values") {
+withTable("t") {
+  val m = intercept[ParseException] {
+sql("SELECT CAST('abc' AS CHAR(0))")
+  }.getMessage
+  assert(m.contains("Char length 0 is out of range [1, 255]"))
+
+  val m2 = intercept[ParseException] {
+sql("SELECT CAST('abc' AS VARCHAR(0))")
+  }.getMessage
+  assert(m2.contains("VarChar length 0 is out of range [1, 65535]"))
+
+  checkAnswer(
+sql("SELECT CAST('abc' AS CHAR(2)), CAST('abc' AS CHAR(4))"),
+Row("ab", "abc"))
+
+  sql("CREATE TABLE t(a STRING) USING PARQUET")
+  sql("INSERT INTO t VALUES ('abc')")
+  sql("INSERT INTO t VALUES (null)")
+
+  checkAnswer(
+sql("SELECT CAST(a AS CHAR(2)), CAST(a AS CHAR(3)), CAST(a AS 
CHAR(4)) FROM t"),
+Row("ab", "abc", "abc") :: Row(null, null, null) :: Nil)
+
+  sql(
+"""
+  |CREATE TABLE t_ctas
+  |USING ORC
+  |AS SELECT CAST(a AS CHAR(2)) c1, CAST(a AS CHAR(3)) c2, CAST(a 
AS CHAR(4)) c3 FROM t
--- End diff --

We already support `CHAR` and `VARCHAR` syntax and that is misleading the 
end users. This PR is trying to mitigate those suffering.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90918/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20887
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21386: [SPARK-23928][SQL][WIP] Add shuffle collection fu...

2018-05-21 Thread pkuwm
Github user pkuwm commented on a diff in the pull request:

https://github.com/apache/spark/pull/21386#discussion_r189746613
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -555,6 +557,100 @@ case class ArraySort(child: Expression) extends 
UnaryExpression with ArraySortLi
   override def prettyName: String = "array_sort"
 }
 
+
+/**
+ * Returns a random permutation of the given array..
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Returns a random permutation of the given 
array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 20, 3, 5));
+   [3, 1, 5, 20]
+  > SELECT _FUNC_(array(1, 20, null, 3));
+   [20, null, 3, 1]
+  """, since = "2.4.0")
+case class Shuffle(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
--- End diff --

Correct. Input is an Array. No string for input. Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3433/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3322/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21342#discussion_r189754010
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -111,12 +112,18 @@ case class BroadcastExchangeExec(
   SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, 
metrics.values.toSeq)
   broadcasted
 } catch {
+  // SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we throw
+  // SparkFatalException, which is a subclass of Exception. 
ThreadUtils.awaitResult
+  // will catch this exception and re-throw the wrapped fatal 
throwable.
   case oe: OutOfMemoryError =>
-throw new OutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
+throw new SparkFatalException(
+  new OutOfMemoryError(s"Not enough memory to build and 
broadcast the table to " +
--- End diff --

I agree that we're likely to have reclaimable space at this point, so the 
chance of a second OOM / failure here seems small. I'm pretty sure that the 
OutOfMemoryError being caught here often originates from Spark itself where we 
explicitly throw another `OutOfMemoryError` at a lower layer of the system, in 
which case we still actually have heap to allocate strings. We should 
investigate and clean up that practice, but let's do that in a separate PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18894: [SPARK-21673] Use the correct sandbox environment variab...

2018-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18894
  
**[Test build #90927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90927/testReport)**
 for PR 18894 at commit 
[`4ccb4be`](https://github.com/apache/spark/commit/4ccb4be26083bd60de0538550a094b231cd8590f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...

2018-05-21 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21342#discussion_r189754203
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SparkFatalException.scala ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+/**
+ * SPARK-24294: To bypass scala bug: 
https://github.com/scala/bug/issues/9554, we catch
+ * fatal throwable in {@link scala.concurrent.Future}'s body, and re-throw
+ * SparkFatalException, which wraps the fatal throwable inside.
+ */
+private[spark] final class SparkFatalException(val throwable: Throwable) 
extends Exception
--- End diff --

OTOH I guess we're actually only using this in one place right now, so I 
think things are correct as written, but I was just kind of abstractly worrying 
about potential future pitfalls in case people start using this pattern in new 
code without also noticing the `ThreadUtils.awayResult` requirement.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...

2018-05-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21368#discussion_r189754122
  
--- Diff: 
repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -37,7 +37,14 @@ class SparkILoop(in0: Option[BufferedReader], out: 
JPrintWriter)
 @transient val spark = if (org.apache.spark.repl.Main.sparkSession != 
null) {
 org.apache.spark.repl.Main.sparkSession
   } else {
-org.apache.spark.repl.Main.createSparkSession()
+try {
+  org.apache.spark.repl.Main.createSparkSession()
+} catch {
+  case e: Exception =>
+println("Failed to initialize Spark session:")
+e.printStackTrace()
+sys.exit(1)
--- End diff --

how about just squashing the commits if it's not hard?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-21 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21343
  
How is this different from SPARK-23639 or, in other words, why doesn't the 
fix for that bug work for you?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21268: [SPARK-24209][SHS] Automatic retrieve proxyBase f...

2018-05-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21268


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6