[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/20449 > I'm not sure, let's just try it :) All right, I finally tracked down why it's hanging on Jenkins. The global semaphores used by `interruptible iterator of shuffle reader` are interfered by other tasks. Please check the latest change, @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20449 **[Test build #87822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87822/testReport)** for PR 20449 at commit [`a3d8ad5`](https://github.com/apache/spark/commit/a3d8ad56f0709c343e508c8b636083243f9ffdd2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20681 **[Test build #87821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87821/testReport)** for PR 20681 at commit [`6a962e9`](https://github.com/apache/spark/commit/6a962e900a2b9de2e434f2a6ec1eb256ea87a774). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20681 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20681 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1180/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20472 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87820/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20472 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20472 **[Test build #87820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87820/testReport)** for PR 20472 at commit [`51900da`](https://github.com/apache/spark/commit/51900da3266a9025ace567e3cbd5bf2b26051651). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20472 **[Test build #87820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87820/testReport)** for PR 20472 at commit [`51900da`](https://github.com/apache/spark/commit/51900da3266a9025ace567e3cbd5bf2b26051651). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #5785: [SPARK-7250][MLLIB] Added computeInverse to RowMatrix.sca...
Github user kingsaction commented on the issue: https://github.com/apache/spark/pull/5785 @srowen how to add funciton that inverse matrix ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Hi, @gatorsmile , @HyukjinKwon , @cloud-fan . Since 2.3 is officially announced, I'm pinging you guys again. :) Please let me know if there is something for me to do here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87819/testReport)** for PR 20382 at commit [`1073be4`](https://github.com/apache/spark/commit/1073be420b2cc5fd099929fc0215bf8c1be4b6e0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1179/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87814/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #87814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87814/testReport)** for PR 19788 at commit [`fc0fe77`](https://github.com/apache/spark/commit/fc0fe77cc4f1222ffd8a4a492e623ce43fd1f28c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19788 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87812/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #87812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87812/testReport)** for PR 19788 at commit [`c133776`](https://github.com/apache/spark/commit/c13377601da21368955335eb9f10e72c4ac18738). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20576 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20576 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/20685 cc @cloud-fan @jiangxb1987 Could you please help take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20670 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20382#discussion_r171469866 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.sources + +import java.io.IOException +import java.net.InetSocketAddress +import java.nio.ByteBuffer +import java.nio.channels.ServerSocketChannel +import java.sql.Timestamp +import java.util.Optional +import java.util.concurrent.LinkedBlockingQueue + +import scala.collection.JavaConverters._ + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.execution.datasources.DataSource +import org.apache.spark.sql.execution.streaming._ +import org.apache.spark.sql.sources.v2.{DataSourceOptions, MicroBatchReadSupport} +import org.apache.spark.sql.sources.v2.reader.streaming.{MicroBatchReader, Offset} +import org.apache.spark.sql.streaming.StreamTest +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.{StringType, StructField, StructType, TimestampType} + +class TextSocketStreamSuite extends StreamTest with SharedSQLContext with BeforeAndAfterEach { + + override def afterEach() { +sqlContext.streams.active.foreach(_.stop()) +if (serverThread != null) { + serverThread.interrupt() + serverThread.join() + serverThread = null +} +if (batchReader != null) { + batchReader.stop() + batchReader = null +} + } + + private var serverThread: ServerThread = null + private var batchReader: MicroBatchReader = null + + case class AddSocketData(data: String*) extends AddData { +override def addData(query: Option[StreamExecution]): (BaseStreamingSource, Offset) = { + require( +query.nonEmpty, +"Cannot add data when there is no query for finding the active socket source") + + val sources = query.get.logicalPlan.collect { +case StreamingExecutionRelation(source: TextSocketMicroBatchReader, _) => source + } + if (sources.isEmpty) { +throw new Exception( + "Could not find socket source in the StreamExecution logical plan to add data to") + } else if (sources.size > 1) { +throw new Exception( + "Could not select the socket source in the StreamExecution logical plan as there" + +"are multiple socket sources:\n\t" + sources.mkString("\n\t")) + } + val socketSource = sources.head + + assert(serverThread != null && serverThread.port != 0) + val currOffset = socketSource.currentOffset + data.foreach(serverThread.enqueue) + + val newOffset = LongOffset(currOffset.offset + data.size) + (socketSource, newOffset) +} + +override def toString: String = s"AddSocketData(data = $data)" + } + + test("backward compatibility with old path") { + DataSource.lookupDataSource("org.apache.spark.sql.execution.streaming.TextSocketSourceProvider", + spark.sqlContext.conf).newInstance() match { + case ds: MicroBatchReadSupport => +assert(ds.isInstanceOf[TextSocketSourceProvider]) + case _ => +throw new IllegalStateException("Could not find socket source") +} + } + + test("basic usage") { +serverThread = new ServerThread() +serverThread.start() + +withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> "false") { + val ref = spark + import ref.implicits._ + + val socket = spark +.readStream +.format("socket") +.options(Map("host" -> "localhost", "port" ->
[GitHub] spark pull request #20295: [SPARK-23011] Support alternative function form w...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20295#discussion_r171469275 --- Diff: python/pyspark/sql/functions.py --- @@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None, functionType=None): | 2| 1.1094003924504583| +---+---+ + Alternatively, the user can define a function that takes two arguments. + In this case, the grouping key will be passed as the first argument and the data will + be passed as the second argument. The grouping key will be passed as a tuple of numpy + data types, e.g., `numpy.int32` and `numpy.float64`. The data will still be passed in + as a `pandas.DataFrame` containing all columns from the original Spark DataFrame. + This is useful when the user doesn't want to hardcode grouping key in the function. + + >>> from pyspark.sql.functions import pandas_udf, PandasUDFType + >>> df = spark.createDataFrame( + ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], + ... ("id", "v")) # doctest: +SKIP + >>> @pandas_udf("id long, v double", PandasUDFType.GROUP_MAP) # doctest: +SKIP + ... def mean_udf(key, pdf): + ... # key is a tuple of one numpy.int64, which is the value + ... # of 'id' for the current group + ... return pd.DataFrame([key + (pdf.v.mean(),)]) + >>> df.groupby('id').apply(mean_udf).show() #doctest: +SKIP --- End diff -- I think it's because we couldn't find yet a min fix to enable the doctests only when PyArrow and Pandas are installed. Maybe we can try to drop doctests right before we run `doctest.testmod` below conditionally but it's kind of a new approach to Spark as far as I know. Will probably take a look for it separately soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20689 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20689 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87811/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20647 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1178/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20670 **[Test build #87817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87817/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20647 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20689 **[Test build #87811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87811/testReport)** for PR 20689 at commit [`4bf17a7`](https://github.com/apache/spark/commit/4bf17a738de1b705ee673b8e889394ccbe972f47). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20647 **[Test build #87818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87818/testReport)** for PR 20647 at commit [`6fe7681`](https://github.com/apache/spark/commit/6fe76817032cb9b6bac47f14b79d7a4041e286dd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/20670 @gatorsmile thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20670 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1177/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20647#discussion_r171468670 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StringFormat.scala --- @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.DataSourceRegister +import org.apache.spark.sql.sources.v2.DataSourceV2 +import org.apache.spark.sql.sources.v2.reader._ +import org.apache.spark.util.Utils + +/** + * A trait that can be used by data source v2 related query plans(both logical and physical), to + * provide a string format of the data source information for explain. + */ +trait DataSourceV2StringFormat { + + /** + * The instance of this data source implementation. Note that we only consider its class in + * equals/hashCode, not the instance itself. + */ + def source: DataSourceV2 + + /** + * The output of the data source reader, w.r.t. column pruning. + */ + def output: Seq[Attribute] + + /** + * The options for this data source reader. + */ + def options: Map[String, String] + + /** + * The created data source reader. Here we use it to get the filters that has been pushed down + * so far, itself doesn't take part in the equals/hashCode. + */ + def reader: DataSourceReader + + private lazy val filters = reader match { +case s: SupportsPushDownCatalystFilters => s.pushedCatalystFilters().toSet +case s: SupportsPushDownFilters => s.pushedFilters().toSet +case _ => Set.empty + } + + private def sourceName: String = source match { +case registered: DataSourceRegister => registered.shortName() +case _ => source.getClass.getSimpleName.stripSuffix("$") + } + + def metadataString: String = { +val entries = scala.collection.mutable.ArrayBuffer.empty[(String, String)] + +if (filters.nonEmpty) { + entries += "Pushed Filters" -> filters.mkString("[", ", ", "]") +} + +// TODO: we should only display some standard options like path, table, etc. --- End diff -- For followup, there are 2 proposals: 1. define some standard options and only display standard options, if they are specified. 2. Create a new mix-in interface to allow data source implementations to decide which options they want to show during explain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20670 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20576 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87808/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20690 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20576 **[Test build #87808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)** for PR 20576 at commit [`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87809/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87816/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20690 **[Test build #87809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87809/testReport)** for PR 20690 at commit [`f7efb22`](https://github.com/apache/spark/commit/f7efb22ddea3dc8eeccc833086d5a82cbce7e530). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class RequestExecutors(appId: String, requestedTotal: Int,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20695 **[Test build #87816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87816/testReport)** for PR 20695 at commit [`b3e9ddd`](https://github.com/apache/spark/commit/b3e9dddc5eff082a892d109ad959369d5f5510a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87807/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20685 **[Test build #87807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87807/testReport)** for PR 20685 at commit [`110c851`](https://github.com/apache/spark/commit/110c8510dcc6c2abaf4ca416b95854daf129b0a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20692 We should clearly define when and where we need to display attribute data type. I think leaf nodes and some nodes that produce new data like `Generate` are good places. And we may also need to introduce a debug mode for explain. Personally most of the time I only focus on the shape of the query plan, not each attribute. The data type info is only needed when doing some deep debugging. also cc @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20681 looks like test failures are related? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87810/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20295: [SPARK-23011] Support alternative function form w...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20295#discussion_r171466325 --- Diff: python/pyspark/sql/types.py --- @@ -1725,6 +1737,29 @@ def _get_local_timezone(): return os.environ.get('TZ', 'dateutil/:') +def _check_series_localize_timestamps(s, timezone): +""" +Convert timezone aware timestamps to timezone-naive in the specified timezone or local timezone. + +If the input series is not a timestamp series, then the same series is returned. If the input +series is a timestamp series, then a converted series is returned. + +:param s: pandas.Series +:param timezone: the timezone to convert. if None then use local timezone +:return pandas.Series that have been converted to tz-naive +""" +from pyspark.sql.utils import require_minimum_pandas_version +require_minimum_pandas_version() + +from pandas.api.types import is_datetime64tz_dtype --- End diff -- do we have tests for tese in tests.py? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20295: [SPARK-23011] Support alternative function form w...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20295#discussion_r171465908 --- Diff: python/pyspark/sql/functions.py --- @@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None, functionType=None): | 2| 1.1094003924504583| +---+---+ + Alternatively, the user can define a function that takes two arguments. + In this case, the grouping key will be passed as the first argument and the data will + be passed as the second argument. The grouping key will be passed as a tuple of numpy + data types, e.g., `numpy.int32` and `numpy.float64`. The data will still be passed in + as a `pandas.DataFrame` containing all columns from the original Spark DataFrame. + This is useful when the user doesn't want to hardcode grouping key in the function. + + >>> from pyspark.sql.functions import pandas_udf, PandasUDFType + >>> df = spark.createDataFrame( + ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], + ... ("id", "v")) # doctest: +SKIP + >>> @pandas_udf("id long, v double", PandasUDFType.GROUP_MAP) # doctest: +SKIP + ... def mean_udf(key, pdf): + ... # key is a tuple of one numpy.int64, which is the value + ... # of 'id' for the current group + ... return pd.DataFrame([key + (pdf.v.mean(),)]) + >>> df.groupby('id').apply(mean_udf).show() #doctest: +SKIP --- End diff -- why skip all of these btw? why not run them so they can be tested? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #87810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87810/testReport)** for PR 20043 at commit [`37ae9b0`](https://github.com/apache/spark/commit/37ae9b0e217de323dbc73c9e1247ebe9bf2c278c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20695 **[Test build #87816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87816/testReport)** for PR 20695 at commit [`b3e9ddd`](https://github.com/apache/spark/commit/b3e9dddc5eff082a892d109ad959369d5f5510a9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1176/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20682 **[Test build #87815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87815/testReport)** for PR 20682 at commit [`3d28bbf`](https://github.com/apache/spark/commit/3d28bbf9f218ce50ab08fb3e9e62ed9e2fc2307b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20682 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20692#discussion_r171464091 --- Diff: sql/core/src/test/resources/sql-tests/results/inline-table.sql.out --- @@ -166,18 +166,18 @@ struct == Analyzed Logical Plan == col1: string, col2: int, col1: string, col2: int -Project [col1#x, col2#x, col1#x, col2#x] +Project [col1#x: string, col2#x: int, col1#x: string, col2#x: int] +- Join Cross - :- LocalRelation [col1#x, col2#x] - +- LocalRelation [col1#x, col2#x] + :- LocalRelation [col1#x: string, col2#x: int] + +- LocalRelation [col1#x: string, col2#x: int] --- End diff -- Repeatedly showing the data types for the same attributes may not useful. Seems too verbose. For a big query plan, it will be filled with redundant info. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20697: Initial checkin of k8s integration tests.
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/20697 @ssuchter the jira ticket for this is https://issues.apache.org/jira/browse/SPARK-23010. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171463022 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => + /** + * An [[ExpressionSet]] that contains an additional set of constraints, such as equality + * constraints and `isNotNull` constraints, etc. + */ + lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints --- End diff -- We still need `if (conf.constraintPropagationEnabled)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20670#discussion_r171462811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._ trait QueryPlanConstraints { self: LogicalPlan => + /** + * An [[ExpressionSet]] that contains an additional set of constraints, such as equality + * constraints and `isNotNull` constraints, etc. + */ + lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints +.union(inferAdditionalConstraints(validConstraints)) +.union(constructIsNotNullConstraints(validConstraints))) --- End diff -- Nit: indents --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE CO...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20696#discussion_r171462718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -314,8 +314,8 @@ case class AlterTableChangeColumnCommand( val resolver = sparkSession.sessionState.conf.resolver DDLUtils.verifyAlterTableType(catalog, table, isView = false) -// Find the origin column from schema by column name. -val originColumn = findColumnByName(table.schema, columnName, resolver) +// Find the origin column from dataSchema by column name. +val originColumn = findColumnByName(table.dataSchema, columnName, resolver) --- End diff -- Do we have a negative test case to cover that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20670 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20670 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87804/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20670 **[Test build #87804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87804/testReport)** for PR 20670 at commit [`023f2f7`](https://github.com/apache/spark/commit/023f2f709db484d82cde22b00db0bad33ac72279). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #87814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87814/testReport)** for PR 19788 at commit [`fc0fe77`](https://github.com/apache/spark/commit/fc0fe77cc4f1222ffd8a4a492e623ce43fd1f28c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20686 Thanks! I will help review it later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/20576#discussion_r171452995 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala --- @@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with Matchers { mapSideCombine = false ))) +// We support serialized shuffle if we do not need to do map-side aggregation +assert(canUseSerializedShuffle(shuffleDep( + partitioner = new HashPartitioner(2), + serializer = kryo, + keyOrdering = None, + aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])), + mapSideCombine = false --- End diff -- You can see this code: `def groupByKey(partitioner: Partitioner): RDD[(K, Iterable[V])]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20449 **[Test build #87813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87813/testReport)** for PR 20449 at commit [`756e0b7`](https://github.com/apache/spark/commit/756e0b7336fff3c72eca70c2ab489600211b9253). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...
Github user drboyer commented on the issue: https://github.com/apache/spark/pull/20658 Thanks @holdenk, I can open a separate JIRA about the missing field in `Function` if it seems worth fixing. It wasn't critical for me, I just happened to notice while doing some testing so I included it in my inital commit. I hadn't added more complex docstrings just since these seemed like pretty simple methods with straightforward parameters. Happy to add :param: and :return: annotations if desired, but should we add these to some of the other catalog methods as well if we're adding it to these new ones (thinking especially of the `list*()` methods)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19788 **[Test build #87812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87812/testReport)** for PR 19788 at commit [`c133776`](https://github.com/apache/spark/commit/c13377601da21368955335eb9f10e72c4ac18738). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20689 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20689 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1175/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20689 **[Test build #87811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87811/testReport)** for PR 20689 at commit [`4bf17a7`](https://github.com/apache/spark/commit/4bf17a738de1b705ee673b8e889394ccbe972f47). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/20576#discussion_r171451103 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala --- @@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with Matchers { mapSideCombine = false ))) +// We support serialized shuffle if we do not need to do map-side aggregation +assert(canUseSerializedShuffle(shuffleDep( + partitioner = new HashPartitioner(2), + serializer = kryo, + keyOrdering = None, + aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])), + mapSideCombine = false --- End diff -- Under what scenario will ```mapSideCombine``` be ```false```, but an ```aggregator ``` set ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1174/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20043 **[Test build #87810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87810/testReport)** for PR 20043 at commit [`37ae9b0`](https://github.com/apache/spark/commit/37ae9b0e217de323dbc73c9e1247ebe9bf2c278c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20043 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20043#discussion_r171447686 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExprValue.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.codegen + +import scala.language.implicitConversions + +import org.apache.spark.sql.types.DataType + +// An abstraction that represents the evaluation result of [[ExprCode]]. +abstract class ExprValue { + + val javaType: ExprType + + // Whether we can directly access the evaluation value anywhere. + // For example, a variable created outside a method can not be accessed inside the method. + // For such cases, we may need to pass the evaluation as parameter. + val canDirectAccess: Boolean +} + +object ExprValue { + implicit def exprValueToString(exprValue: ExprValue): String = exprValue.toString +} + +// A literal evaluation of [[ExprCode]]. +class LiteralValue(val value: String, val javaType: ExprType) extends ExprValue { + override def toString: String = value + override val canDirectAccess: Boolean = true +} + +object LiteralValue { + def apply(value: String, javaType: ExprType): LiteralValue = new LiteralValue(value, javaType) + def unapply(literal: LiteralValue): Option[(String, ExprType)] = +Some((literal.value, literal.javaType)) +} + +// A variable evaluation of [[ExprCode]]. +case class VariableValue( +val variableName: String, +val javaType: ExprType, +val canDirectAccess: Boolean = false) extends ExprValue { --- End diff -- I want to give it a bit flexibility for something like static variable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20698 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87806/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20698 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20698 **[Test build #87806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87806/testReport)** for PR 20698 at commit [`5f066a0`](https://github.com/apache/spark/commit/5f066a058f685a394397244cb46b022483f7e892). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1173/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20690 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20690 **[Test build #87809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87809/testReport)** for PR 20690 at commit [`f7efb22`](https://github.com/apache/spark/commit/f7efb22ddea3dc8eeccc833086d5a82cbce7e530). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/20690 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20698 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20698 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87805/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20698 **[Test build #87805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87805/testReport)** for PR 20698 at commit [`ebb9b51`](https://github.com/apache/spark/commit/ebb9b51c51a4411811a7e0e09fff8f8608faa017). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20382 Sure, I will do it today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20576 **[Test build #87808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)** for PR 20576 at commit [`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20576 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1172/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20043#discussion_r171444357 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -31,7 +31,7 @@ import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.ScalaReflection.universe.TermName import org.apache.spark.sql.catalyst.encoders.RowEncoder import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.catalyst.expressions.codegen._ --- End diff -- It will list too many classes `CodegenContext`, `ExprCode`, `ExprValue`, `GlobalValue`, `FalseLiteral`, `VariableValue`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20685 **[Test build #87807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87807/testReport)** for PR 20685 at commit [`110c851`](https://github.com/apache/spark/commit/110c8510dcc6c2abaf4ca416b95854daf129b0a5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20685 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org