[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20937 Seems fine but please allow me to take another look, which I will take within this weekend. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2609/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20146 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21100: [SPARK-24012][SQL] Union of map and other compati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21100#discussion_r183611758 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -896,6 +896,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-24012 Union of map and other compatible columns") { --- End diff -- OK, please remove this test and it's ready to go. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20146 **[Test build #89761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89761/testReport)** for PR 20146 at commit [`ed35d87`](https://github.com/apache/spark/commit/ed35d875414ba3cf8751a77463f61665e9c373b0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183611332 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. + val exception = intercept[AnalysisException] { +spark.read.format(dummyParquetReaderV2).load(path) + } + assert(exception.message.equals("Dummy file reader")) +} + } + + test("Fall back
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183611071 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. + val exception = intercept[AnalysisException] { +spark.read.format(dummyParquetReaderV2).load(path) + } + assert(exception.message.equals("Dummy file reader")) +} + } + + test("Fall back
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20146 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21137: [SPARK-23589][SQL][FOLLOW-UP] Reuse InternalRow in Exter...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21137 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21100: [SPARK-24012][SQL] Union of map and other compati...
Github user liutang123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21100#discussion_r183608838 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -896,6 +896,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-24012 Union of map and other compatible columns") { --- End diff -- @cloud-fan , Yes, I am not familiar with TypeCoercionSuite. In order to save time, in my opinion, this PR can be merged first. Thanks a lot. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20998: [SPARK-23888][CORE] correct the comment of hasAttemptOnH...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/20998 Agree and thank you @squito . And thanks for all of you. @felixcheung @mridulm @jiangxb1987 @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21135 LGTM, I think it's broadly correct for query nodes to assume the session has been initialized --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21137: [SPARK-23589][SQL][FOLLOW-UP] Reuse InternalRow in Exter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2608/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21137: [SPARK-23589][SQL][FOLLOW-UP] Reuse InternalRow in Exter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21137 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21136 LGTM. TypedFilter and Filter share the FilterExec execution node, so this should just work. Ideally we would add a test to ContinuousSuite to ensure that TypedFilter does execute properly, but I understand this may not be sustainable. Eventually we'll just remove this whitelist and find some way to broadly execute all the streaming tests for continuous processing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21137: [SPARK-23589][SQL][FOLLOW-UP] Reuse InternalRow in Exter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21137 **[Test build #89760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89760/testReport)** for PR 21137 at commit [`b39b6cd`](https://github.com/apache/spark/commit/b39b6cd5d1bfb47f3a796e57bd421f62aab507e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21126: [SPARK-24050][SS] Calculate input / processing ra...
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21126#discussion_r183605577 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -492,6 +492,77 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi assert(progress.sources(0).numInputRows === 10) } + + test("input row calculation with trigger having data for one of two V2 sources") { +val streamInput1 = MemoryStream[Int] +val streamInput2 = MemoryStream[Int] + +testStream(streamInput1.toDF().union(streamInput2.toDF()), useV2Sink = true)( + AddData(streamInput1, 1, 2, 3), + CheckAnswer(1, 2, 3), + AssertOnQuery { q => +val lastProgress = getLastProgressWithData(q) +assert(lastProgress.nonEmpty) +assert(lastProgress.get.numInputRows == 3) +assert(lastProgress.get.sources.length == 2) +assert(lastProgress.get.sources(0).numInputRows == 3) +assert(lastProgress.get.sources(1).numInputRows == 0) +true + } --- End diff -- nit: i'd suggest doing an AddData() for the other stream after, to make sure there's not some weird order dependence --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21137: [SPARK-23589][SQL][FOLLOW-UP] Reuse InternalRow i...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21137 [SPARK-23589][SQL][FOLLOW-UP] Reuse InternalRow in ExternalMapToCatalyst eval ## What changes were proposed in this pull request? This pr is a follow-up of #20980 and fixes code to reuse `InternalRow` for converting input keys/values in `ExternalMapToCatalyst` eval. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-23589-FOLLOWUP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21137.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21137 commit b39b6cd5d1bfb47f3a796e57bd421f62aab507e5 Author: Takeshi YamamuroDate: 2018-04-24T04:57:59Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21070 Can we run a TPCDS and show that this upgrade doesn't cause performance regression in Spark? I can see that this new version doesn't have perf regression at parquet side, just want to be sure the Spark parquet integration is also OK. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/21136 +1 for this. We find this by CP app use filter with functions, this can be supported by current implement. cc @jose-torres @zsxwing @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21136#discussion_r183604217 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala --- @@ -771,7 +778,16 @@ class UnsupportedOperationsSuite extends SparkFunSuite { } } - /** + /** Assert that the logical plan is not supportd for continuous procsssing mode */ + def assertSupportedForContinuousProcessing( +name: String, +plan: LogicalPlan, +outputMode: OutputMode): Unit = { +test(s"continuous processing - $name: supported") { + UnsupportedOperationChecker.checkForContinuous(plan, outputMode) +} + } + /** --- End diff -- nits: Indent here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21126: [SPARK-24050][SS] Calculate input / processing ra...
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21126#discussion_r183604136 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala --- @@ -207,62 +209,92 @@ trait ProgressReporter extends Logging { return ExecutionStats(Map.empty, stateOperators, watermarkTimestamp) } -// We want to associate execution plan leaves to sources that generate them, so that we match -// the their metrics (e.g. numOutputRows) to the sources. To do this we do the following. -// Consider the translation from the streaming logical plan to the final executed plan. -// -// streaming logical plan (with sources) <==> trigger's logical plan <==> executed plan -// -// 1. We keep track of streaming sources associated with each leaf in the trigger's logical plan -//- Each logical plan leaf will be associated with a single streaming source. -//- There can be multiple logical plan leaves associated with a streaming source. -//- There can be leaves not associated with any streaming source, because they were -// generated from a batch source (e.g. stream-batch joins) -// -// 2. Assuming that the executed plan has same number of leaves in the same order as that of -//the trigger logical plan, we associate executed plan leaves with corresponding -//streaming sources. -// -// 3. For each source, we sum the metrics of the associated execution plan leaves. -// -val logicalPlanLeafToSource = newData.flatMap { case (source, logicalPlan) => - logicalPlan.collectLeaves().map { leaf => leaf -> source } +val numInputRows = extractSourceToNumInputRows() + +val eventTimeStats = lastExecution.executedPlan.collect { + case e: EventTimeWatermarkExec if e.eventTimeStats.value.count > 0 => +val stats = e.eventTimeStats.value +Map( + "max" -> stats.max, + "min" -> stats.min, + "avg" -> stats.avg.toLong).mapValues(formatTimestamp) +}.headOption.getOrElse(Map.empty) ++ watermarkTimestamp + +ExecutionStats(numInputRows, stateOperators, eventTimeStats) + } + + /** Extract number of input sources for each streaming source in plan */ + private def extractSourceToNumInputRows(): Map[BaseStreamingSource, Long] = { + +def sumRows(tuples: Seq[(BaseStreamingSource, Long)]): Map[BaseStreamingSource, Long] = { + tuples.groupBy(_._1).mapValues(_.map(_._2).sum) // sum up rows for each source } -val allLogicalPlanLeaves = lastExecution.logical.collectLeaves() // includes non-streaming -val allExecPlanLeaves = lastExecution.executedPlan.collectLeaves() -val numInputRows: Map[BaseStreamingSource, Long] = + +val onlyDataSourceV2Sources = { + // Check whether the streaming query's logical plan has only V2 data sources + val allStreamingLeaves = +logicalPlan.collect { case s: StreamingExecutionRelation => s } + allStreamingLeaves.forall { _.source.isInstanceOf[MicroBatchReader] } --- End diff -- A point fix here won't be sufficient - right now the metrics don't make it to the driver at all in continuous processing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user yanlin-Lynn commented on the issue: https://github.com/apache/spark/pull/21136 @xuanyuanking , please help to review for this path. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21136 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21136: [SPARK-24061][SS]Add TypedFilter support for continuous ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21136 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...
GitHub user yanlin-Lynn opened a pull request: https://github.com/apache/spark/pull/21136 [SPARK-24061][SS]Add TypedFilter support for continuous processing ## What changes were proposed in this pull request? Add TypedFilter support for continuous processing application. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanlin-Lynn/spark SPARK-24061 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21136.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21136 commit 87a169767ea2d495d3f51d5decf398946560d6d0 Author: wangyanlin01Date: 2018-04-24T04:35:59Z [SPARK-24061][SS]Add TypedFilter support for continuous processing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89751/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20937 **[Test build #89751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89751/testReport)** for PR 20937 at commit [`a7be182`](https://github.com/apache/spark/commit/a7be1821886cd1699981588c82af19c9031be399). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21100: [SPARK-24012][SQL] Union of map and other compati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21100#discussion_r183602450 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -896,6 +896,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-24012 Union of map and other compatible columns") { --- End diff -- discussed with @gatorsmile , we should put end-to-end test in a single place, and currently we encourage people to put SQL related end-to-end test in the SQL golden files. That is to say, we should remove this test from `SQLQuerySuite`. In the meanwhile, a bug fix should also have a unit test. For this case, we should add a test case in `TypeCoercionSuite`. @liutang123 if you are not familiar with that test suite, please let us know, we can merge your PR first and add UT in `TypeCoercionSuite` in a followup. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21100: [SPARK-24012][SQL] Union of map and other compati...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21100#discussion_r183601150 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -896,6 +896,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-24012 Union of map and other compatible columns") { --- End diff -- yes. please add them to SQLQueryTestSuite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21100: [SPARK-24012][SQL] Union of map and other compati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21100#discussion_r183601024 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -896,6 +896,25 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-24012 Union of map and other compatible columns") { --- End diff -- cc @gatorsmile , what's the policy for end-to-end tests? Shall we add it in both the sql golden file and `SQLQuerySuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89754/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21078 **[Test build #89754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89754/testReport)** for PR 21078 at commit [`492dc46`](https://github.com/apache/spark/commit/492dc4605b0779e5b337d0db41691e372adec45e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r183600236 --- Diff: python/pyspark/sql/functions.py --- @@ -2186,6 +2186,29 @@ def map_values(col): return Column(sc._jvm.functions.map_values(_to_java_column(col))) +@since(2.4) +def map_concat(*cols): +"""Returns the union of all the given maps. If a key is found in multiple given maps, +that key's value in the resulting map comes from the last one of those maps. + +:param cols: list of column names (string) or list of :class:`Column` expressions + +>>> from pyspark.sql.functions import map_concat +>>> df = spark.sql("SELECT map(1, 'a', 2, 'b') as map1, map(3, 'c', 1, 'd') as map2") +>>> df.select(map_concat("map1", "map2").alias("map3")).show(truncate=False) +++ +|map3| +++ +|[1 -> d, 2 -> b, 3 -> c]| +++ +""" +sc = SparkContext._active_spark_context +if len(cols) == 1 and isinstance(cols[0], (list, set)): +cols = cols[0] --- End diff -- >what's this for? Excellent question. I don't know, except that it seems sometimes the first column is a list of columns. I used other functions as a template. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r18365 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -56,6 +58,26 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(MapValues(m2), null) } + test("Map Concat") { +val m0 = Literal.create(Map("a" -> "1", "b" -> "2"), MapType(StringType, StringType)) +val m1 = Literal.create(Map("c" -> "3", "a" -> "4"), MapType(StringType, StringType)) +val m2 = Literal.create(Map("d" -> "4", "e" -> "5"), MapType(StringType, StringType)) +val mNull = Literal.create(null, MapType(StringType, StringType)) + +// overlapping maps +checkEvaluation(MapConcat(Seq(m0, m1)), + mutable.LinkedHashMap("a" -> "4", "b" -> "2", "c" -> "3")) +// maps with no overlap +checkEvaluation(MapConcat(Seq(m0, m2)), + mutable.LinkedHashMap("a" -> "1", "b" -> "2", "d" -> "4", "e" -> "5")) +// 3 maps +checkEvaluation(MapConcat(Seq(m0, m1, m2)), + mutable.LinkedHashMap("a" -> "4", "b" -> "2", "c" -> "3", "d" -> "4", "e" -> "5")) +// null map +checkEvaluation(MapConcat(Seq(m0, mNull)), --- End diff -- Done! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21073 **[Test build #89759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89759/testReport)** for PR 21073 at commit [`13baf96`](https://github.com/apache/spark/commit/13baf96aa0a6087f288d45a7d57881744e187826). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21127: [SPARK-24052][CORE][UI] Add spark version informa...
Github user caneGuy closed the pull request at: https://github.com/apache/spark/pull/21127 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21127: [SPARK-24052][CORE][UI] Add spark version information on...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/21127 Ok i will close this pr. Thanks for your time @srowen @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20923 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20923 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89748/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20923 **[Test build #89748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89748/testReport)** for PR 20923 at commit [`f6b9dc8`](https://github.com/apache/spark/commit/f6b9dc83d56c20d887166ddba7a7b876a57d65cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21135 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21135 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user pwoody commented on the issue: https://github.com/apache/spark/pull/21135 @ericl @gatorsmile @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHel...
GitHub user pwoody opened a pull request: https://github.com/apache/spark/pull/21135 [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuite should initialize after SparkSession creation ## What changes were proposed in this pull request? We should ensure that the SparkSession for this test suite has initialized before creating the LocalTableScanExecs ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwoody/spark pw/initfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21135 commit dbaa00ca99161cd2820b9a74aebac9665c626581 Author: Patrick WoodyDate: 2018-04-24T03:40:10Z Initialize StreamingSymmetricHashJoinHelperSuite LocalTableScanExec after SparkSession initialization --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21100: [SPARK-24012][SQL] Union of map and other compatible col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21100 **[Test build #89757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89757/testReport)** for PR 21100 at commit [`8cb240f`](https://github.com/apache/spark/commit/8cb240fbcab257c4151246c36725b6f1ee873d46). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20940 **[Test build #89758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89758/testReport)** for PR 20940 at commit [`8ae0126`](https://github.com/apache/spark/commit/8ae012654b81040cd65ea9b4b5f6d9bec7acd718). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20940 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2607/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21129 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21129: [SPARK-7132][ML] Add fit with validation set to spark.ml...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21129 **[Test build #89756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89756/testReport)** for PR 21129 at commit [`170e08f`](https://github.com/apache/spark/commit/170e08f38ba4853087a3a3be19f8a1a69656de32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21127: [SPARK-24052][CORE][UI] Add spark version information on...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21127 Version info is already available, in the code and in the UI. "Compiled by" info has never struck me as useful. The rest is from the env. I don't think this adds anything. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21127: [SPARK-24052][CORE][UI] Add spark version information on...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/21127 How about the other information? As mentioned,the build info @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20633 **[Test build #4156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4156/testReport)** for PR 20633 at commit [`80b668a`](https://github.com/apache/spark/commit/80b668afb0303b67ead8aed8d4d1f1996fa02658). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21128: [SPARK-24053][CORE] Support add subdirectory name...
Github user caneGuy closed the pull request at: https://github.com/apache/spark/pull/21128 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21128: [SPARK-24053][CORE] Support add subdirectory named as us...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/21128 Got it,thanks @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21116: [SPARK-24038][SS] Refactor continuous writing to its own...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21116 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21116: [SPARK-24038][SS] Refactor continuous writing to its own...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21116 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89747/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21116: [SPARK-24038][SS] Refactor continuous writing to its own...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21116 **[Test build #89747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89747/testReport)** for PR 21116 at commit [`b676dc8`](https://github.com/apache/spark/commit/b676dc85d5ab74b9d3457d45a97c062fd5a51dd3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21100: [SPARK-24012][SQL] Union of map and other compatible col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21100 **[Test build #89755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89755/testReport)** for PR 21100 at commit [`670824f`](https://github.com/apache/spark/commit/670824fa5fc1f8aa72ca4047893104c3786a0295). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2606/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21078 **[Test build #89754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89754/testReport)** for PR 21078 at commit [`492dc46`](https://github.com/apache/spark/commit/492dc4605b0779e5b337d0db41691e372adec45e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20940 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89746/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20940 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20940: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20940 **[Test build #89746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89746/testReport)** for PR 20940 at commit [`8ae0126`](https://github.com/apache/spark/commit/8ae012654b81040cd65ea9b4b5f6d9bec7acd718). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21113: [MINOR][DOCS] Fix comments of SQLExecution#withExecution...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21113 **[Test build #89753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89753/testReport)** for PR 21113 at commit [`c8adec6`](https://github.com/apache/spark/commit/c8adec619f4337e29509a760e8ca84ddc3f851ec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589976 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. + val exception = intercept[AnalysisException] { +spark.read.format(dummyParquetReaderV2).load(path) + } + assert(exception.message.equals("Dummy file reader")) +} + } + + test("Fall back to
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589841 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. + val exception = intercept[AnalysisException] { +spark.read.format(dummyParquetReaderV2).load(path) + } + assert(exception.message.equals("Dummy file reader")) +} + } + + test("Fall back to
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589906 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. + val exception = intercept[AnalysisException] { +spark.read.format(dummyParquetReaderV2).load(path) + } + assert(exception.message.equals("Dummy file reader")) +} + } + + test("Fall back to
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589673 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. + val exception = intercept[AnalysisException] { +spark.read.format(dummyParquetReaderV2).load(path) + } + assert(exception.message.equals("Dummy file reader")) +} + } + + test("Fall back to
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589528 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() +withTempPath { file => + val path = file.getCanonicalPath + // Writing file should fall back to v1 and succeed. + df.write.format(dummyParquetReaderV2).save(path) + + // Validate write result with [[ParquetFileFormat]]. + checkAnswer(spark.read.format(parquetV1).load(path), df) + + // Dummy File reader should fail as expected. --- End diff -- `Dummy File reader should be picked and fail as expected` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589427 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { + class DummyFileWriter extends DataSourceWriter { +override def createWriterFactory(): DataWriterFactory[Row] = { + throw new AnalysisException("hehe") +} + +override def commit(messages: Array[WriterCommitMessage]): Unit = {} + +override def abort(messages: Array[WriterCommitMessage]): Unit = {} + } + + private val dummyParquetReaderV2 = classOf[DummyReadOnlyFileDataSourceV2].getName + private val dummyParquetWriterV2 = classOf[DummyWriteOnlyFileDataSourceV2].getName + private val simpleFileDataSourceV2 = classOf[SimpleFileDataSourceV2].getName + private val parquetV1 = classOf[ParquetFileFormat].getCanonicalName + + test("Fall back to v1 when writing to file with read only FileDataSourceV2") { +val df = spark.range(1, 10).toDF() --- End diff -- nit: `val df = spark.range(10)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589226 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/FileDataSourceV2Suite.scala --- @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.sources.v2 + +import java.util.{List => JList, Optional} + +import org.apache.spark.sql.{AnalysisException, QueryTest, Row, SaveMode} +import org.apache.spark.sql.execution.datasources.FileFormat +import org.apache.spark.sql.execution.datasources.parquet.{ParquetFileFormat, ParquetTest} +import org.apache.spark.sql.execution.datasources.v2.FileDataSourceV2 +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.v2.reader.{DataReaderFactory, DataSourceReader} +import org.apache.spark.sql.sources.v2.writer.{DataSourceWriter, DataWriterFactory, WriterCommitMessage} +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class DummyReadOnlyFileDataSourceV2 extends FileDataSourceV2 with ReadSupport { + class DummyFileReader extends DataSourceReader { +override def readSchema(): StructType = { + throw new AnalysisException("hehe") +} + +override def createDataReaderFactories(): JList[DataReaderFactory[Row]] = + java.util.Arrays.asList() + } + + override def createReader(options: DataSourceOptions): DataSourceReader = { +throw new AnalysisException("Dummy file reader") + } + + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class DummyWriteOnlyFileDataSourceV2 extends FileDataSourceV2 with WriteSupport { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" + + override def createWriter( + jobId: String, + schema: StructType, + mode: SaveMode, + options: DataSourceOptions): Optional[DataSourceWriter] = { +throw new AnalysisException("Dummy file writer") + } +} + +class SimpleFileDataSourceV2 extends SimpleDataSourceV2 with FileDataSourceV2 { + override def fallBackFileFormat: Class[_ <: FileFormat] = classOf[ParquetFileFormat] + + override def shortName(): String = "parquet" +} + +class FileDataSourceV2Suite extends QueryTest with ParquetTest with SharedSQLContext { --- End diff -- `FileDataSourceV2FallbackSuite ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21123: [SPARK-24045][SQL]Create base class for file data...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21123#discussion_r183589146 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala --- @@ -158,6 +158,7 @@ abstract class BaseSessionStateBuilder( override val extendedResolutionRules: Seq[Rule[LogicalPlan]] = new FindDataSourceTable(session) +: new ResolveSQLOnFile(session) +: +new FallBackFileDataSourceToV1(session) +: --- End diff -- We also need to add this rule to `HiveSessionStateBuilder.analyzer` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21072: [SPARK-23973][SQL] Remove consecutive Sorts
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21072 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21072: [SPARK-23973][SQL] Remove consecutive Sorts
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21072 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21127: [SPARK-24052][CORE][UI] Add spark version information on...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21127 The SHS shows the application's Spark version in the title (or the SHS version in the listing page). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21128: [SPARK-24053][CORE] Support add subdirectory named as us...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21128 See my last comment. You don't need code in Spark to achieve what you want to do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21131: [SPARK-23433][CORE] Late zombie task completions update ...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21131 @markhamstra @zsxwing @jiangxb1987 @Ngone51 would appreciate a review, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20923 +1 for @jerryshao 's comment. Some of Hive UTs will fail with Hadoop 3 profile. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21113: [MINOR][DOCS] Fix comments of SQLExecution#withExecution...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21113 **[Test build #4155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4155/testReport)** for PR 21113 at commit [`6aceb43`](https://github.com/apache/spark/commit/6aceb43c56641a38b97b5e090379ef5143f17dd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21127: [SPARK-24052][CORE][UI] Add spark version information on...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/21127 It is version for the historyserver @vanzin May be i can reuse SparkBuildInfo? @srowen @vanzin Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21113: [MINOR][DOCS] Fix comments of SQLExecution#withEx...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21113#discussion_r183582806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -88,7 +88,7 @@ object SQLExecution { /** * Wrap an action with a known executionId. When running a different action in a different * thread from the original one, this method can be used to connect the Spark jobs in this action - * with the known executionId, e.g., `BroadcastHashJoin.broadcastFuture`. + * with the known executionId, e.g., `BroadcastExchange.relationFuture`. --- End diff -- `BroadcastExchange` -> `BroadcastExchangeExec`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21081 **[Test build #4157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4157/testReport)** for PR 21081 at commit [`c4e1a51`](https://github.com/apache/spark/commit/c4e1a51551993008d7b082b112d2296cbc4eb97b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20923: [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevan...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20923 I would guess the test here doesn't actually run on Hadoop 3 profile. So we actually doesn't test anything. Also we still cannot use Hadoop3 even if we merge this because of Hive issue. Unless we use some tricks mentioned above. So I'm not sure if we should address Hive issue first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21128: [SPARK-24053][CORE] Support add subdirectory named as us...
Github user caneGuy commented on the issue: https://github.com/apache/spark/pull/21128 @vanzin Thanks! Since we set the common staging dir as ``` hdfs://xxx/spark/xxx/staging ``` This patch will give us a directory for given user like: ``` hdfs://xxx/spark/xxx/staging/u_zhoukang ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89752/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21078 **[Test build #89752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89752/testReport)** for PR 21078 at commit [`5ee7e23`](https://github.com/apache/spark/commit/5ee7e233bcaf1e78afbbe6f9b877f48cecb03da6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20535: [SPARK-23341][SQL] define some standard options f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20535#discussion_r183580907 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -193,10 +196,13 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { if (ds.isInstanceOf[ReadSupport] || ds.isInstanceOf[ReadSupportWithSchema]) { val sessionOptions = DataSourceV2Utils.extractSessionConfigs( ds = ds, conf = sparkSession.sessionState.conf) +val pathsOption = { + val objectMapper = new ObjectMapper() + DataSourceOptions.PATHS_KEY -> objectMapper.writeValueAsString(paths.toArray) +} Dataset.ofRows(sparkSession, DataSourceV2Relation.create( - ds, extraOptions.toMap ++ sessionOptions, + ds, extraOptions.toMap ++ sessionOptions + pathsOption, --- End diff -- Basically we may have duplicated entries in session configs and `DataFrameReader/Writer` options, not only path. The rule is, `DataFrameReader/Writer` options should overwrite session configs. cc @jiangxb1987 can you submit a PR to explicitly document it in `SessionConfigSupport`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21132: [SPARK-24029][core] Follow up: set SO_REUSEADDR o...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21132 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2605/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21078: [SPARK-23990][ML] Instruments logging improvements - ML ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21078 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21132: [SPARK-24029][core] Follow up: set SO_REUSEADDR on the s...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21132 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21081 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21081 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89749/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21081: [SPARK-23975][ML]Allow Clustering to take Arrays of Doub...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21081 **[Test build #89749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89749/testReport)** for PR 21081 at commit [`c4e1a51`](https://github.com/apache/spark/commit/c4e1a51551993008d7b082b112d2296cbc4eb97b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org