[GitHub] spark issue #21380: [SPARK-24329][SQL] Remove comments filtering before pars...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21380 **[Test build #90889 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90889/testReport)** for PR 21380 at commit [`3652268`](https://github.com/apache/spark/commit/36522689f9579ec05e7d69d1d7bd1f507f6bdbc0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r189525864 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi val accUpdate3 = new LongAccumulator accUpdate3.metadata = acc3.metadata accUpdate3.setValue(18) -val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3) -val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo) + +val accumUpdates1 = Seq(accUpdate1, accUpdate2) +val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo) val exceptionFailure = new ExceptionFailure( new SparkException("fondue?"), - accumInfo).copy(accums = accumUpdates) + accumInfo1).copy(accums = accumUpdates1) --- End diff -- We can avoid the `copy` call. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21361: [SPARK-24313][SQL] Fix collection operations' interprete...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21361 @cloud-fan sorry but I am not sure I got it. May you please provide me some more details about the end-to-end test case for `GetMapValue` you want me to add? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21363#discussion_r189526889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -90,6 +90,7 @@ private[csv] object CSVInferSchema { // DecimalTypes have different precisions and scales, so we try to find the common type. findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(StringType) case DoubleType => tryParseDouble(field, options) +case DateType => tryParseDate(field, options) --- End diff -- this also is a behavior change. Shall we document it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21331 **[Test build #90890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90890/testReport)** for PR 21331 at commit [`ccbdd11`](https://github.com/apache/spark/commit/ccbdd11a1f2ff6f08db47694f315109b61c8726e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3412/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21331: [SPARK-24276][SQL] Order of literals in IN should not af...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r189530510 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi val accUpdate3 = new LongAccumulator accUpdate3.metadata = acc3.metadata accUpdate3.setValue(18) -val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3) -val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo) + +val accumUpdates1 = Seq(accUpdate1, accUpdate2) +val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo) val exceptionFailure = new ExceptionFailure( new SparkException("fondue?"), - accumInfo).copy(accums = accumUpdates) + accumInfo1).copy(accums = accumUpdates1) --- End diff -- Ah, this `copy` call cannot be avoided as only the 2 arguments constructor ``` private[spark] def this(e: Throwable, accumUpdates: Seq[AccumulableInfo]) ``` is defined. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21165: [Spark-20087][CORE] Attach accumulators / metrics...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/21165#discussion_r189530671 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -1868,15 +1868,26 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi val accUpdate3 = new LongAccumulator accUpdate3.metadata = acc3.metadata accUpdate3.setValue(18) -val accumUpdates = Seq(accUpdate1, accUpdate2, accUpdate3) -val accumInfo = accumUpdates.map(AccumulatorSuite.makeInfo) + +val accumUpdates1 = Seq(accUpdate1, accUpdate2) +val accumInfo1 = accumUpdates1.map(AccumulatorSuite.makeInfo) val exceptionFailure = new ExceptionFailure( new SparkException("fondue?"), - accumInfo).copy(accums = accumUpdates) + accumInfo1).copy(accums = accumUpdates1) submit(new MyRDD(sc, 1, Nil), Array(0)) runEvent(makeCompletionEvent(taskSets.head.tasks.head, exceptionFailure, "result")) + assert(AccumulatorContext.get(acc1.id).get.value === 15L) assert(AccumulatorContext.get(acc2.id).get.value === 13L) + +val accumUpdates2 = Seq(accUpdate3) +val accumInfo2 = accumUpdates2.map(AccumulatorSuite.makeInfo) + +val taskKilled = new TaskKilled( + "test", + accumInfo2).copy(accums = accumUpdates2) --- End diff -- We can avoid this `copy` call --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21165: [Spark-20087][CORE] Attach accumulators / metrics to 'Ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21165 **[Test build #90891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90891/testReport)** for PR 21165 at commit [`74911b7`](https://github.com/apache/spark/commit/74911b7a8d7714618ab060b3227e33505b0c5d05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21368#discussion_r189781578 --- Diff: python/pyspark/sql/session.py --- @@ -547,6 +547,40 @@ def _create_from_pandas_with_arrow(self, pdf, schema, timezone): df._schema = schema return df +@staticmethod +def _create_shell_session(): +""" +Initialize a SparkSession for a pyspark shell session. This is called from shell.py +to make error handling simpler without needing to declare local variables in that +script, which would expose those to users. +""" +import py4j +from pyspark.conf import SparkConf +from pyspark.context import SparkContext +try: +# Try to access HiveConf, it will raise exception if Hive is not added +conf = SparkConf() +if conf.get('spark.sql.catalogImplementation', 'hive').lower() == 'hive': +SparkContext._jvm.org.apache.hadoop.hive.conf.HiveConf() +return SparkSession.builder\ +.enableHiveSupport()\ +.getOrCreate() +else: +return SparkSession.builder.getOrCreate() +except py4j.protocol.Py4JError: +if conf.get('spark.sql.catalogImplementation', '').lower() == 'hive': +warnings.warn("Fall back to non-hive support because failing to access HiveConf, " + "please make sure you build spark with hive") + +try: +return SparkSession.builder.getOrCreate() --- End diff -- the call flow seems to be changed here? I think this line is meant to be inside the handling of Py4JError? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21389 **[Test build #90934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90934/testReport)** for PR 21389 at commit [`0d88bcb`](https://github.com/apache/spark/commit/0d88bcb58f9298bed433b8febc4c9cfb5d92f6a9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21389 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90934/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21205: [SPARK-24134][Docs]A missing full-stop in doc "Tuning Sp...
Github user XD-DENG commented on the issue: https://github.com/apache/spark/pull/21205 Hi can any project admin check this PR? Understand it's a quite minor issue (just a missing comma), but the effort needed for checking is also quite low. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #90876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90876/testReport)** for PR 20345 at commit [`94d9171`](https://github.com/apache/spark/commit/94d9171b8ec26c21724dd393cf4fc83ff52623e7). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21236 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90880/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21266 **[Test build #90879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90879/testReport)** for PR 21266 at commit [`d8c308f`](https://github.com/apache/spark/commit/d8c308fa43a001328b8645e0d339875342c25c67). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21236 **[Test build #90880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21288 **[Test build #90878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90878/testReport)** for PR 21288 at commit [`39e5a50`](https://github.com/apache/spark/commit/39e5a507fe22cade6bed0613eefbccab15cf45ff). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21236 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90876/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21379 **[Test build #90874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90874/testReport)** for PR 21379 at commit [`8d97b0d`](https://github.com/apache/spark/commit/8d97b0deb5ed96094f70f16376b677fe3ff1bdfc). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90874/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Add an option to quote a partition co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90879/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90878/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21236 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21236 I'd retrigger the build for just checking again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21236: [SPARK-23935][SQL] Adding map_entries function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21236 **[Test build #90880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90880/testReport)** for PR 21236 at commit [`baa61e5`](https://github.com/apache/spark/commit/baa61e5a29b1626f203fb75197355bc136949e75). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/21370 So one thing we might want to take a look at is application/vnd.dataresource+json for tables in the notebooks (see https://github.com/nteract/improved-spark-viz ). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21356 **[Test build #90873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90873/testReport)** for PR 21356 at commit [`09d55af`](https://github.com/apache/spark/commit/09d55afa4167460e732b2f4acb3cdde6029cf952). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21356 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90873/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21356 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189509532 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled + false + +Open eager evaluation on jupyter or not. If yes, dataframe will be ran automatically --- End diff -- true is better since the default value is false. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189509097 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM to Python worker for every task. + + spark.jupyter.eagerEval.enabled + false + +Open eager evaluation on jupyter or not. If yes, dataframe will be ran automatically --- End diff -- nit: Open -> Enable --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189510270 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -237,9 +238,13 @@ class Dataset[T] private[sql]( * @param truncate If set to more than 0, truncates strings to `truncate` characters and * all cells will be aligned right. * @param vertical If set to true, prints output rows vertically (one line per column value). + * @param html If set to true, return output as html table. --- End diff -- hmm, should we do this html in python side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to...
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r189739169 --- Diff: pom.xml --- @@ -150,6 +150,7 @@ 4.5.4 4.4.8 +3.0.1 --- End diff -- Noted, will remove in the next push. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21356: [SPARK-24309][CORE] AsyncEventQueue should stop on inter...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21356 Merging to master / 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #90925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90925/testReport)** for PR 21366 at commit [`aabc187`](https://github.com/apache/spark/commit/aabc1872280f2f1c993a619e489c70370144990f). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90925/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20887 **[Test build #90918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90918/testReport)** for PR 20887 at commit [`f19cda3`](https://github.com/apache/spark/commit/f19cda3921fee2f7d7885b041b15607436e45d0e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21295#discussion_r189748452 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -147,7 +147,8 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont this.sparkSchema = StructType$.MODULE$.fromString(sparkRequestedSchemaString); this.reader = new ParquetFileReader( configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns()); -for (BlockMetaData block : blocks) { +// use the blocks from the reader in case some do not match filters and will not be read --- End diff -- Actually, it is fine and more correct for this to be ported to older versions. I doubt it will because it is unnecessary though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21295#discussion_r189748419 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -879,6 +879,18 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-24230: filter row group using dictionary") { +withSQLConf(("parquet.filter.dictionary.enabled", "true")) { --- End diff -- Actually, it is fine and more correct for this to be ported to older versions. I doubt it will because it is unnecessary though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20887#discussion_r189740429 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2792,4 +2793,40 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("`Cast` to CHAR/VARCHAR should truncate the values") { +withTable("t") { + val m = intercept[ParseException] { +sql("SELECT CAST('abc' AS CHAR(0))") + }.getMessage + assert(m.contains("Char length 0 is out of range [1, 255]")) + + val m2 = intercept[ParseException] { +sql("SELECT CAST('abc' AS VARCHAR(0))") + }.getMessage + assert(m2.contains("VarChar length 0 is out of range [1, 65535]")) + + checkAnswer( +sql("SELECT CAST('abc' AS CHAR(2)), CAST('abc' AS CHAR(4))"), +Row("ab", "abc")) + + sql("CREATE TABLE t(a STRING) USING PARQUET") + sql("INSERT INTO t VALUES ('abc')") + sql("INSERT INTO t VALUES (null)") + + checkAnswer( +sql("SELECT CAST(a AS CHAR(2)), CAST(a AS CHAR(3)), CAST(a AS CHAR(4)) FROM t"), +Row("ab", "abc", "abc") :: Row(null, null, null) :: Nil) + + sql( +""" + |CREATE TABLE t_ctas + |USING ORC + |AS SELECT CAST(a AS CHAR(2)) c1, CAST(a AS CHAR(3)) c2, CAST(a AS CHAR(4)) c3 FROM t --- End diff -- We already support `CHAR` and `VARCHAR` syntax and that is misleading the end users. This PR is trying to mitigate those suffering. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20887 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90918/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20887: [SPARK-23774][SQL] `Cast` to CHAR/VARCHAR should truncat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20887 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21386: [SPARK-23928][SQL][WIP] Add shuffle collection fu...
Github user pkuwm commented on a diff in the pull request: https://github.com/apache/spark/pull/21386#discussion_r189746613 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -555,6 +557,100 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi override def prettyName: String = "array_sort" } + +/** + * Returns a random permutation of the given array.. + */ +@ExpressionDescription( + usage = "_FUNC_(array) - Returns a random permutation of the given array.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 20, 3, 5)); + [3, 1, 5, 20] + > SELECT _FUNC_(array(1, 20, null, 3)); + [20, null, 3, 1] + """, since = "2.4.0") +case class Shuffle(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { --- End diff -- Correct. Input is an Array. No string for input. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3433/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3322/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189754010 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -111,12 +112,18 @@ case class BroadcastExchangeExec( SQLMetrics.postDriverMetricUpdates(sparkContext, executionId, metrics.values.toSeq) broadcasted } catch { + // SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we throw + // SparkFatalException, which is a subclass of Exception. ThreadUtils.awaitResult + // will catch this exception and re-throw the wrapped fatal throwable. case oe: OutOfMemoryError => -throw new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " + +throw new SparkFatalException( + new OutOfMemoryError(s"Not enough memory to build and broadcast the table to " + --- End diff -- I agree that we're likely to have reclaimable space at this point, so the chance of a second OOM / failure here seems small. I'm pretty sure that the OutOfMemoryError being caught here often originates from Spark itself where we explicitly throw another `OutOfMemoryError` at a lower layer of the system, in which case we still actually have heap to allocate strings. We should investigate and clean up that practice, but let's do that in a separate PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18894: [SPARK-21673] Use the correct sandbox environment variab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18894 **[Test build #90927 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90927/testReport)** for PR 18894 at commit [`4ccb4be`](https://github.com/apache/spark/commit/4ccb4be26083bd60de0538550a094b231cd8590f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21342: [SPARK-24294] Throw SparkException when OOM in Br...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/21342#discussion_r189754203 --- Diff: core/src/main/scala/org/apache/spark/util/SparkFatalException.scala --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util + +/** + * SPARK-24294: To bypass scala bug: https://github.com/scala/bug/issues/9554, we catch + * fatal throwable in {@link scala.concurrent.Future}'s body, and re-throw + * SparkFatalException, which wraps the fatal throwable inside. + */ +private[spark] final class SparkFatalException(val throwable: Throwable) extends Exception --- End diff -- OTOH I guess we're actually only using this in one place right now, so I think things are correct as written, but I was just kind of abstractly worrying about potential future pitfalls in case people start using this pattern in new code without also noticing the `ThreadUtils.awayResult` requirement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21368: [SPARK-16451][repl] Fail shell if SparkSession fa...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21368#discussion_r189754122 --- Diff: repl/scala-2.12/src/main/scala/org/apache/spark/repl/SparkILoop.scala --- @@ -37,7 +37,14 @@ class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter) @transient val spark = if (org.apache.spark.repl.Main.sparkSession != null) { org.apache.spark.repl.Main.sparkSession } else { -org.apache.spark.repl.Main.createSparkSession() +try { + org.apache.spark.repl.Main.createSparkSession() +} catch { + case e: Exception => +println("Failed to initialize Spark session:") +e.printStackTrace() +sys.exit(1) --- End diff -- how about just squashing the commits if it's not hard? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21343 How is this different from SPARK-23639 or, in other words, why doesn't the fix for that bug work for you? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21268: [SPARK-24209][SHS] Automatic retrieve proxyBase f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21268 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org