[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r232928066 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { df.where($"city".contains(new java.lang.Character('A'))), Seq(Row("Amsterdam"))) } + + test("SPARK-25942: typed aggregation on primitive type") { +val ds = Seq(1, 2, 3).toDS() + +val agg = ds.groupByKey(_ >= 2) + .agg(sum("value").as[Long], sum($"value" + 1).as[Long]) +assert(agg.collect() === Seq((false, 1, 2), (true, 5, 7))) + } + + test("SPARK-25942: typed aggregation on product type") { +val ds = Seq((1, 2), (2, 3), (3, 4)).toDS() +val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 1).as[Long]) +assert(agg.collect().sorted === Seq(((1, 2), 1, 3), ((2, 3), 2, 4), ((3, 4), 3, 5))) --- End diff -- Is there any suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r232926151 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { df.where($"city".contains(new java.lang.Character('A'))), Seq(Row("Amsterdam"))) } + + test("SPARK-25942: typed aggregation on primitive type") { +val ds = Seq(1, 2, 3).toDS() + +val agg = ds.groupByKey(_ >= 2) + .agg(sum("value").as[Long], sum($"value" + 1).as[Long]) +assert(agg.collect() === Seq((false, 1, 2), (true, 5, 7))) + } + + test("SPARK-25942: typed aggregation on product type") { +val ds = Seq((1, 2), (2, 3), (3, 4)).toDS() +val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 1).as[Long]) +assert(agg.collect().sorted === Seq(((1, 2), 1, 3), ((2, 3), 2, 4), ((3, 4), 3, 5))) --- End diff -- Using `checkDataset` comes out an error: ``` [error] found : org.apache.spark.sql.Dataset[((Int, Int), Long, Long)] [error] required: org.apache.spark.sql.Dataset[((Int, Int), AnyVal, AnyVal)] [error] Note: ((Int, Int), Long, Long) <: ((Int, Int), AnyVal, AnyVal), but class Dataset is invariant in type T. [error] You may wish to define T as +T instead. (SLS 4.5) [error] checkDataset(agg, ((1, 2), 1, 3), ((2, 3), 2, 4), ((3, 4), 3, 5)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23009: SPARK-26011: pyspark app with "spark.jars.package...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23009#discussion_r232921575 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging { if (!StringUtils.isBlank(resolvedMavenCoordinates)) { args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates) -if (args.isPython) { +if (args.isPython || isInternal(args.primaryResource)) { --- End diff -- Yeah I get what the code does, was just wondering why it always sets a pyfiles now even when it's not a pyspark app. But the answer is that pyspark apps also need resolved Maven dependencies, I believe. @vanzin does this look right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4970/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23020 **[Test build #98758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98758/testReport)** for PR 23020 at commit [`494eb2c`](https://github.com/apache/spark/commit/494eb2c1a10f39378095fd08ee11865d8608bc4d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23020 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23020: [MINOR][BUILD] Remove *.crc from .gitignore
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/23020 [MINOR][BUILD] Remove *.crc from .gitignore ## What changes were proposed in this pull request? Remove *.crc from .gitignore as there are actual .crc files in the test source dirs and IJ warns about it ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark gitignore Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23020.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23020 commit 494eb2c1a10f39378095fd08ee11865d8608bc4d Author: Sean Owen Date: 2018-11-13T07:23:03Z Remove *.crc from .gitignore as there are actual .crc files in the test source dirs and IJ warns about it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22989 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22989 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98749/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22989 **[Test build #98749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98749/testReport)** for PR 22989 at commit [`ff234d3`](https://github.com/apache/spark/commit/ff234d31a5a8e296b845910717dcd78be67b1740). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22974 **[Test build #98757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98757/testReport)** for PR 22974 at commit [`d965752`](https://github.com/apache/spark/commit/d9657524b956bd1d4ddf5fb4dc18d7c69b01a50b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4969/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22989: [SPARK-25986][Build] Add rules to ban throw Error...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22989#discussion_r23291 --- Diff: dev/checkstyle-suppressions.xml --- @@ -46,4 +46,12 @@ files="sql/catalyst/src/main/java/org/apache/spark/sql/streaming/GroupStateTimeout.java"/> +
[GitHub] spark pull request #22989: [SPARK-25986][Build] Add rules to ban throw Error...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22989#discussion_r232917995 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/VectorIndexerSuite.scala --- @@ -283,7 +283,9 @@ class VectorIndexerSuite extends MLTest with DefaultReadWriteTest with Logging { points.zip(rows.map(_(0))).foreach { case (orig: SparseVector, indexed: SparseVector) => assert(orig.indices.length == indexed.indices.length) - case _ => throw new UnknownError("Unit test has a bug in it.") // should never happen + case _ => +// should never happen +throw new IllegalAccessException("Unit test has a bug in it.") --- End diff -- Just `fail()` here? or at least not `IllegalAccessException` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22967#discussion_r232916947 --- Diff: pom.xml --- @@ -2718,7 +2710,6 @@ *:*_2.11 -*:*_2.10 --- End diff -- @dbtsai sorry for the late idea here -- this isn't essential for the change, and you don't have to make it here -- but I thought of a better way. Really we want the default `maven-enforcer-plugin` config above to exclude _2.10 and _2.11 dependencies, and remove everything from the `scala-2.12` profile (or else, one still has to enable the profile to get all Scala 2.12 config). Then, move this `maven-enforcer-plugin` config to the `scala-2.11` profile. That copy should only exclude _2.10 dependencies. However to make sure Maven doesn't also add that to the _2.11 exclusion rule in the parent, the `combine.children="append"` attribute here can become `combine.self="override"`. That should get the desired effects. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98748/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23014 **[Test build #98748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98748/testReport)** for PR 23014 at commit [`d5084dc`](https://github.com/apache/spark/commit/d5084dc6a40b03567343701ecefd808ab9d8e453). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22967 **[Test build #98756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98756/testReport)** for PR 22967 at commit [`52dc4a1`](https://github.com/apache/spark/commit/52dc4a1d625154fb3baab201f9ff3f979b497602). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22967 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22967: [SPARK-25956] Make Scala 2.12 as default Scala version i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22967 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4968/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98747/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22974 **[Test build #98747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98747/testReport)** for PR 22974 at commit [`b0eb584`](https://github.com/apache/spark/commit/b0eb584aa6c3efe51f680578b86c523b14d41eff). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user LiehuoChen commented on the issue: https://github.com/apache/spark/pull/21588 Hi HyukjinKwon, Thanks for all the works to try to make the Jenkin test pass. I patched this PR to spark 2.4, and anything works fine but failed in org.apache.spark.deploy.yarn.YarnClusterSuite for following four unit tests: 1). run Spark in yarn-cluster mode 2). run Spark in yarn-cluster mode with different configurations, ensuring redaction 3). run Spark in yarn-client mode 4). run Spark in yarn-client mode with different configurations, ensuring redaction 1), 2), failed everytime with really few useful error Msg, like: `FAILED did not equal FINISHED Exception in thread "main" org.apache.spark.SparkException: Application application_1542090777201_0002 finished with failed status [info] at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149) .. [info] at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) (BaseYarnClusterSuite.scala:201)` 2).4). succeed most of time, but sometimes failed on ` Exception in thread "main" java.io.IOException: Server returned HTTP response code: 500 for URL: http://user-c02wq03ghtdg.corp.uber.com:61313/node/containerlogs/container_1541809642345_0002_01_02/lhc/stdout?start=-4096` and `Fail to invoke HBaseConfiguration [info] java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration` Any you ever see the similars errors before? do you did any other fixes besides this PR to make the all test pass. Thanks for your time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22977 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22954 **[Test build #98755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98755/testReport)** for PR 22954 at commit [`954bc0e`](https://github.com/apache/spark/commit/954bc0eec206902cb8176338e1f72886f5b3c626). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22977 since this PR only touches mima, and the jenkins already passed the mima check, I'm going to merge it to master, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22954 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4967/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/22977 LGTM. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22518 BTW can you include a simple benchmark to show this problem? e.g. just run a query in spark-shell, and post the result before and after this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22518 I'd like to merge this simple PR first, to address the performance problem (unnecessary subquery execution). Let's create a new ticket for subquery filter pushing to data source, and have more people to attend the discussion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22518#discussion_r232906707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala --- @@ -47,7 +47,8 @@ private[sql] object PruneFileSourcePartitions extends Rule[LogicalPlan] { case a: AttributeReference => a.withName(logicalRelation.output.find(_.semanticEquals(a)).get.name) } - } + }.filterNot(SubqueryExpression.hasSubquery) --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22518#discussion_r232906743 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with SharedSQLContext { assert(getNumSortsInQuery(query5) == 1) } } + + test("SPARK-25482: Reuse same Subquery in order to execute it only once") { --- End diff -- let's update the test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22518#discussion_r232906652 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -155,15 +155,14 @@ object FileSourceStrategy extends Strategy with Logging { case a: AttributeReference => a.withName(l.output.find(_.semanticEquals(a)).get.name) } - } + }.filterNot(SubqueryExpression.hasSubquery) --- End diff -- shall we do the filter before the `map`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22961: [SPARK-25947][SQL] Reduce memory usage in Shuffle...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22961#discussion_r232906123 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -214,13 +214,22 @@ object ShuffleExchangeExec { override def getPartition(key: Any): Int = key.asInstanceOf[Int] } case RangePartitioning(sortingExpressions, numPartitions) => -// Internally, RangePartitioner runs a job on the RDD that samples keys to compute -// partition bounds. To get accurate samples, we need to copy the mutable keys. +// Extract only fields used for sorting to avoid collecting large fields that does not +// affect sorting result when deciding partition bounds in RangePartitioner val rddForSampling = rdd.mapPartitionsInternal { iter => + val projection = +UnsafeProjection.create(sortingExpressions.map(_.child), outputAttributes) val mutablePair = new MutablePair[InternalRow, Null]() - iter.map(row => mutablePair.update(row.copy(), null)) + // Internally, RangePartitioner runs a job on the RDD that samples keys to compute + // partition bounds. To get accurate samples, we need to copy the mutable keys. + iter.map(row => mutablePair.update(projection(row).copy(), null)) } -implicit val ordering = new LazilyGeneratedOrdering(sortingExpressions, outputAttributes) +// Construct ordering on extracted sort key. +val orderingAttributes = sortingExpressions.zipWithIndex.map { case (ord, i) => + ord.copy(child = BoundReference(i, ord.dataType, ord.nullable)) +} +implicit val ordering: Ordering[InternalRow] = + new LazilyGeneratedOrdering(orderingAttributes) --- End diff -- yea, let's follow the previous style: https://github.com/apache/spark/pull/22961/files#diff-3ceee31a3da1b7c7132f666126fbL223 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r232905784 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { df.where($"city".contains(new java.lang.Character('A'))), Seq(Row("Amsterdam"))) } + + test("SPARK-25942: typed aggregation on primitive type") { +val ds = Seq(1, 2, 3).toDS() + +val agg = ds.groupByKey(_ >= 2) + .agg(sum("value").as[Long], sum($"value" + 1).as[Long]) +assert(agg.collect() === Seq((false, 1, 2), (true, 5, 7))) + } + + test("SPARK-25942: typed aggregation on product type") { +val ds = Seq((1, 2), (2, 3), (3, 4)).toDS() +val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 1).as[Long]) +assert(agg.collect().sorted === Seq(((1, 2), 1, 3), ((2, 3), 2, 4), ((3, 4), 3, 5))) --- End diff -- can we use `checkAnswer`/`CheckDataset`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23002 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23002 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4966/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23014 **[Test build #98754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98754/testReport)** for PR 23014 at commit [`f807b8a`](https://github.com/apache/spark/commit/f807b8acc7169c5d2d560d3cb9d80b123981d49a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22721 cc @jiangxb1987 Could you take a look at this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4965/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22977 **[Test build #98753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98753/testReport)** for PR 22977 at commit [`802b521`](https://github.com/apache/spark/commit/802b521989c4e4365dcc44df0bae4bcc505a7428). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21465 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21465 **[Test build #98751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98751/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r232895848 --- Diff: R/pkg/R/SQLContext.R --- @@ -172,36 +257,72 @@ getDefaultSqlSource <- function() { createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0, numPartitions = NULL) { sparkSession <- getSparkSession() - + arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.enabled")[[1]] == "true" + shouldUseArrow <- FALSE + firstRow <- NULL if (is.data.frame(data)) { - # Convert data into a list of rows. Each row is a list. - - # get the names of columns, they will be put into RDD - if (is.null(schema)) { -schema <- names(data) - } +# get the names of columns, they will be put into RDD +if (is.null(schema)) { + schema <- names(data) +} - # get rid of factor type - cleanCols <- function(x) { -if (is.factor(x)) { - as.character(x) -} else { - x -} +# get rid of factor type +cleanCols <- function(x) { + if (is.factor(x)) { +as.character(x) + } else { +x } +} +data[] <- lapply(data, cleanCols) + +args <- list(FUN = list, SIMPLIFY = FALSE, USE.NAMES = FALSE) +if (arrowEnabled) { + shouldUseArrow <- tryCatch({ --- End diff -- Yup, correct. Let me address other comments as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22944: [SPARK-25942][SQL] Aggregate expressions shouldn't be re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22944 **[Test build #98752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98752/testReport)** for PR 22944 at commit [`71dff40`](https://github.com/apache/spark/commit/71dff408a3da828e628aa29f81e30cdcb822fd37). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22944: [SPARK-25942][SQL] Aggregate expressions shouldn't be re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22944 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22944: [SPARK-25942][SQL] Aggregate expressions shouldn't be re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22944 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4964/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21465 **[Test build #98751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98751/testReport)** for PR 21465 at commit [`1169db8`](https://github.com/apache/spark/commit/1169db8083c06248a43709f9e0b633029a37775d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22977 **[Test build #98750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98750/testReport)** for PR 22977 at commit [`8d9f5c7`](https://github.com/apache/spark/commit/8d9f5c768415607b9aa779a6dee291724047d6b4). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98750/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23006: [SPARK-26007][SQL] DataFrameReader.csv() respects...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23006 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22944: [SPARK-25942][SQL] Aggregate expressions shouldn'...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22944#discussion_r232894304 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1556,6 +1556,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { df.where($"city".contains(new java.lang.Character('A'))), Seq(Row("Amsterdam"))) } + + test("SPARK-25942: typed aggregation on primitive type") { +val ds = Seq(1, 2, 3).toDS() + +val agg = ds.groupByKey(_ >= 2) + .agg(sum("value").as[Long], sum($"value" + 1).as[Long]) --- End diff -- `TypedAggregateExpression.withInputInfo` needs the `UnresolvedDeserializer` which depends on input encoder and input attributes. In analyzer, we can't have such inputs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23006: [SPARK-26007][SQL] DataFrameReader.csv() respects to spa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23006 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23014#discussion_r232893546 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -101,10 +101,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) { String message = "Cannot reserve additional contiguous bytes in the vectorized reader (" + (requiredCapacity >= 0 ? "requested " + requiredCapacity + " bytes" : "integer overflow") + "). As a workaround, you can reduce the vectorized reader batch size, or disable the " + -"vectorized reader. For parquet file format, refer to " + +"vectorized reader, or disable " + SQLConf.BUCKETING_ENABLED().key() + " if you read " + +"from bucket table. For Parquet file format, refer to " + SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().key() + " (default " + SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().defaultValueString() + -") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; for orc file format, " + +") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; for Orc file format, " + --- End diff -- `Orc` is `ORC` BTW :-). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22977 **[Test build #98750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98750/testReport)** for PR 22977 at commit [`8d9f5c7`](https://github.com/apache/spark/commit/8d9f5c768415607b9aa779a6dee291724047d6b4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4963/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22989 @srowen Great thanks for your guidance, address all your suggestion in ff234d3 and update the record table in https://github.com/apache/spark/pull/22989#issuecomment-437939830. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Add rules to ban throw Errors in ap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22989 **[Test build #98749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98749/testReport)** for PR 22989 at commit [`ff234d3`](https://github.com/apache/spark/commit/ff234d31a5a8e296b845910717dcd78be67b1740). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22989 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22989: [SPARK-25986][Build] Banning throw new OutOfMemoryErrors
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4962/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23014 Yes. The `filePartitions` are the same as the bucket number when `BucketedRead`: https://github.com/apache/spark/blob/ab5752cb952e6536a68a988289e57100fdbba142/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L382-L414 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22961: [SPARK-25947][SQL] Reduce memory usage in Shuffle...
Github user mu5358271 commented on a diff in the pull request: https://github.com/apache/spark/pull/22961#discussion_r232888324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -214,13 +214,22 @@ object ShuffleExchangeExec { override def getPartition(key: Any): Int = key.asInstanceOf[Int] } case RangePartitioning(sortingExpressions, numPartitions) => -// Internally, RangePartitioner runs a job on the RDD that samples keys to compute -// partition bounds. To get accurate samples, we need to copy the mutable keys. +// Extract only fields used for sorting to avoid collecting large fields that does not +// affect sorting result when deciding partition bounds in RangePartitioner val rddForSampling = rdd.mapPartitionsInternal { iter => + val projection = +UnsafeProjection.create(sortingExpressions.map(_.child), outputAttributes) val mutablePair = new MutablePair[InternalRow, Null]() - iter.map(row => mutablePair.update(row.copy(), null)) + // Internally, RangePartitioner runs a job on the RDD that samples keys to compute + // partition bounds. To get accurate samples, we need to copy the mutable keys. + iter.map(row => mutablePair.update(projection(row).copy(), null)) } -implicit val ordering = new LazilyGeneratedOrdering(sortingExpressions, outputAttributes) +// Construct ordering on extracted sort key. +val orderingAttributes = sortingExpressions.zipWithIndex.map { case (ord, i) => + ord.copy(child = BoundReference(i, ord.dataType, ord.nullable)) +} +implicit val ordering: Ordering[InternalRow] = + new LazilyGeneratedOrdering(orderingAttributes) --- End diff -- this line would actually exceed the 100 character per line limit by 2 characters if I keep the ": Ordering[InternalRow]" type info for the implicit value. I can remove the type info though. Is that what you are suggesting? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22977 **[Test build #98746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98746/testReport)** for PR 22977 at commit [`8b9efe1`](https://github.com/apache/spark/commit/8b9efe14fa4c53fa2f13f598879d7e45c47d3a6c). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98746/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4961/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23014 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23014 **[Test build #98748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98748/testReport)** for PR 23014 at commit [`d5084dc`](https://github.com/apache/spark/commit/d5084dc6a40b03567343701ecefd808ab9d8e453). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23014 > The reason is that each bucket file is too big Can you elaborate please? Is it because we don't chunk each file into multiple splits when we read bucketed table? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22974 **[Test build #98747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98747/testReport)** for PR 22974 at commit [`b0eb584`](https://github.com/apache/spark/commit/b0eb584aa6c3efe51f680578b86c523b14d41eff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4960/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22974: [SPARK-22450][WIP][Core][MLLib][FollowUp] Safely registe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22974 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4959/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22977 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in MimaBu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22977 **[Test build #98746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98746/testReport)** for PR 22977 at commit [`8b9efe1`](https://github.com/apache/spark/commit/8b9efe14fa4c53fa2f13f598879d7e45c47d3a6c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22977: [SPARK-26030][BUILD] Bump previousSparkVersion in...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22977#discussion_r232886260 --- Diff: project/MimaExcludes.scala --- @@ -164,7 +212,50 @@ object MimaExcludes { ProblemFilters.exclude[InheritedNewAbstractMethodProblem]("org.apache.spark.ml.param.shared.HasValidationIndicatorCol.validationIndicatorCol"), // [SPARK-23042] Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier - ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.ml.classification.LabelConverter") + ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.ml.classification.LabelConverter"), + +// [SPARK-21842][MESOS] Support Kerberos ticket renewal and creation in Mesos --- End diff -- these changes are cherry-picked from https://github.com/apache/spark/pull/23015 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23014#discussion_r232885260 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -101,7 +101,8 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) { String message = "Cannot reserve additional contiguous bytes in the vectorized reader (" + (requiredCapacity >= 0 ? "requested " + requiredCapacity + " bytes" : "integer overflow") + "). As a workaround, you can reduce the vectorized reader batch size, or disable the " + -"vectorized reader. For parquet file format, refer to " + +"vectorized reader, or disable " + SQLConf.BUCKETING_ENABLED().key() + " if you read " + +"from bucket table. For parquet file format, refer to " + --- End diff -- parquet -> Parquet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23007: [SPARK-26010][R] fix vignette eval with Java 11
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23007 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21688 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98745/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23007: [SPARK-26010][R] fix vignette eval with Java 11
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/23007 merged to master/2.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21688 **[Test build #98745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98745/testReport)** for PR 21688 at commit [`271de2d`](https://github.com/apache/spark/commit/271de2d186bcd776105a419a0c4f2b8e26498e35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22866: WIP [SPARK-12172][SPARKR] Remove internal-only RDD metho...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22866 thx, but DO NOT MERGE - there's some nasty bug I'm still investigating.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23018: [SPARK-26023][SQL] Dumping truncated plans and generated...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23018 Looks fine to me. adding @cloud-fan and @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23018#discussion_r232883084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -469,7 +471,21 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { def treeString: String = treeString(verbose = true) def treeString(verbose: Boolean, addSuffix: Boolean = false): String = { -generateTreeString(0, Nil, new StringBuilder, verbose = verbose, addSuffix = addSuffix).toString +val writer = new StringBuilderWriter() +try { + treeString(writer, verbose, addSuffix, None) + writer.toString +} finally { + writer.close() +} + } + + def treeString( + writer: Writer, + verbose: Boolean, + addSuffix: Boolean, + maxFields: Option[Int]): Unit = { +generateTreeString(0, Nil, writer, verbose, "", addSuffix) --- End diff -- If #22879 is merged first, we should add that function here. If this one is merged first, that PR better have the function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23012#discussion_r232881732 --- Diff: R/pkg/R/sparkR.R --- @@ -283,6 +283,10 @@ sparkR.session <- function( enableHiveSupport = TRUE, ...) { + if (utils::compareVersion(paste0(R.version$major, ".", R.version$minor), "3.4.0") == -1) { +warning("R prior to version 3.4 is deprecated as of Spark 3.0.") + } --- End diff -- ditto `Support for R prior to version 3.4 is deprecated since Spark 3.0.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23012#discussion_r232882419 --- Diff: docs/index.md --- @@ -31,7 +31,8 @@ Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It's easy locally on one machine --- all you need is to have `java` installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation. -Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark {{site.SPARK_VERSION}} +Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. R prior to version 3.4 is deprecated as of Spark 3.0. --- End diff -- `R prior to version 3.4 support is deprecated as of Spark 3.0.0.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23012#discussion_r232882178 --- Diff: docs/index.md --- @@ -31,7 +31,8 @@ Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It's easy locally on one machine --- all you need is to have `java` installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation. -Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark {{site.SPARK_VERSION}} +Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. R prior to version 3.4 is deprecated as of Spark 3.0. --- End diff -- with all the other changes, we haven't listed all deprecation here, or have we? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/23012#discussion_r232881594 --- Diff: R/WINDOWS.md --- @@ -3,7 +3,7 @@ To build SparkR on Windows, the following steps are required 1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to -include Rtools and R in `PATH`. +include Rtools and R in `PATH`. Note that R prior to version 3.4 is deprecated as of Spark 3.0. --- End diff -- I really would prefer "unsupported" but if we go with this it should say `Note that support for R prior to version 3.4 is deprecated as of Spark 3.0.0.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 In this way, we could postpone R upgrade after Spark 3.0.0 release in Jenkins, and could still test the deprecated R version 3.1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org