[spark] branch master updated: [SPARK-27001][SQL] Refactor "serializerFor" method between ScalaReflection and JavaTypeInference
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 34f6066 [SPARK-27001][SQL] Refactor "serializerFor" method between ScalaReflection and JavaTypeInference 34f6066 is described below commit 34f606678a90e860711a5f9f9618cf00788c9eb0 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Mon Mar 4 10:45:48 2019 +0800 [SPARK-27001][SQL] Refactor "serializerFor" method between ScalaReflection and JavaTypeInference ## What changes were proposed in this pull request? This patch proposes refactoring `serializerFor` method between `ScalaReflection` and `JavaTypeInference`, being consistent with what we refactored for `deserializerFor` in #23854. This patch also extracts the logic on recording walk type path since the logic is duplicated across `serializerFor` and `deserializerFor` with `ScalaReflection` and `JavaTypeInference`. ## How was this patch tested? Existing tests. Closes #23908 from HeartSaVioR/SPARK-27001. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Wenchen Fan --- .../sql/catalyst/DeserializerBuildHelper.scala | 32 ++- .../spark/sql/catalyst/JavaTypeInference.scala | 143 +- .../spark/sql/catalyst/ScalaReflection.scala | 220 +++-- .../spark/sql/catalyst/SerializerBuildHelper.scala | 198 +++ .../apache/spark/sql/catalyst/WalkedTypePath.scala | 57 ++ .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../sql/catalyst/encoders/ExpressionEncoder.scala | 2 +- .../spark/sql/catalyst/encoders/RowEncoder.scala | 2 +- .../spark/sql/catalyst/expressions/Cast.scala | 4 +- .../catalyst/expressions/CodeGenerationSuite.scala | 2 +- .../expressions/NullExpressionsSuite.scala | 2 +- 11 files changed, 394 insertions(+), 270 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala index d75d3ca..e55c25c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/DeserializerBuildHelper.scala @@ -29,7 +29,7 @@ object DeserializerBuildHelper { path: Expression, part: String, dataType: DataType, - walkedTypePath: Seq[String]): Expression = { + walkedTypePath: WalkedTypePath): Expression = { val newPath = UnresolvedExtractValue(path, expressions.Literal(part)) upCastToExpectedType(newPath, dataType, walkedTypePath) } @@ -39,40 +39,30 @@ object DeserializerBuildHelper { path: Expression, ordinal: Int, dataType: DataType, - walkedTypePath: Seq[String]): Expression = { + walkedTypePath: WalkedTypePath): Expression = { val newPath = GetStructField(path, ordinal) upCastToExpectedType(newPath, dataType, walkedTypePath) } - def deserializerForWithNullSafety( - expr: Expression, - dataType: DataType, - nullable: Boolean, - walkedTypePath: Seq[String], - funcForCreatingNewExpr: (Expression, Seq[String]) => Expression): Expression = { -val newExpr = funcForCreatingNewExpr(expr, walkedTypePath) -expressionWithNullSafety(newExpr, nullable, walkedTypePath) - } - def deserializerForWithNullSafetyAndUpcast( expr: Expression, dataType: DataType, nullable: Boolean, - walkedTypePath: Seq[String], - funcForCreatingNewExpr: (Expression, Seq[String]) => Expression): Expression = { + walkedTypePath: WalkedTypePath, + funcForCreatingDeserializer: (Expression, WalkedTypePath) => Expression): Expression = { val casted = upCastToExpectedType(expr, dataType, walkedTypePath) -deserializerForWithNullSafety(casted, dataType, nullable, walkedTypePath, - funcForCreatingNewExpr) +expressionWithNullSafety(funcForCreatingDeserializer(casted, walkedTypePath), + nullable, walkedTypePath) } - private def expressionWithNullSafety( + def expressionWithNullSafety( expr: Expression, nullable: Boolean, - walkedTypePath: Seq[String]): Expression = { + walkedTypePath: WalkedTypePath): Expression = { if (nullable) { expr } else { - AssertNotNull(expr, walkedTypePath) + AssertNotNull(expr, walkedTypePath.getPaths) } } @@ -167,10 +157,10 @@ object DeserializerBuildHelper { private def upCastToExpectedType( expr: Expression, expected: DataType, - walkedTypePath: Seq[String]): Expression = expected match { + walkedTypePath: WalkedTypePath): Expression = expected match { case _: StructType => expr case _: ArrayType => expr case _:
[spark] branch master updated: [SPARK-26893][SQL] Allow partition pruning with subquery filters on file source
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82820f8 [SPARK-26893][SQL] Allow partition pruning with subquery filters on file source 82820f8 is described below commit 82820f8e2574ca18faa00b54dff1a3a44d105359 Author: Peter Toth AuthorDate: Mon Mar 4 13:38:22 2019 +0800 [SPARK-26893][SQL] Allow partition pruning with subquery filters on file source ## What changes were proposed in this pull request? This PR introduces leveraging of subquery filters for partition pruning in file source. Subquery expressions are not allowed to be used for partition pruning in `FileSourceStrategy` now, instead a `FilterExec` is added around the `FileSourceScanExec` to do the job. This PR optimizes the process by allowing partition pruning subquery expressions as partition filters. ## How was this patch tested? Added new UT and run existing UTs especially SPARK-25482 and SPARK-24085 related ones. Closes #23802 from peter-toth/SPARK-26893. Authored-by: Peter Toth Signed-off-by: Wenchen Fan --- .../spark/sql/execution/DataSourceScanExec.scala | 22 +++--- .../execution/datasources/DataSourceStrategy.scala | 2 +- .../execution/datasources/FileSourceStrategy.scala | 10 -- .../datasources/PruneFileSourcePartitions.scala| 11 ++- .../datasources/v2/DataSourceV2Strategy.scala | 5 +++-- .../org/apache/spark/sql/execution/subquery.scala | 12 .../scala/org/apache/spark/sql/SubquerySuite.scala | 21 + 7 files changed, 62 insertions(+), 21 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala index 3aed2ce..92f7d66 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala @@ -187,6 +187,14 @@ case class FileSourceScanExec( ret } + /** + * [[partitionFilters]] can contain subqueries whose results are available only at runtime so + * accessing [[selectedPartitions]] should be guarded by this method during planning + */ + private def hasPartitionsAvailableAtRunTime: Boolean = { +partitionFilters.exists(ExecSubqueryExpression.hasSubquery) + } + override lazy val (outputPartitioning, outputOrdering): (Partitioning, Seq[SortOrder]) = { val bucketSpec = if (relation.sparkSession.sessionState.conf.bucketingEnabled) { relation.bucketSpec @@ -223,7 +231,7 @@ case class FileSourceScanExec( val sortColumns = spec.sortColumnNames.map(x => toAttribute(x)).takeWhile(x => x.isDefined).map(_.get) - val sortOrder = if (sortColumns.nonEmpty) { + val sortOrder = if (sortColumns.nonEmpty && !hasPartitionsAvailableAtRunTime) { // In case of bucketing, its possible to have multiple files belonging to the // same bucket in a given relation. Each of these files are locally sorted // but those files combined together are not globally sorted. Given that, @@ -272,12 +280,12 @@ case class FileSourceScanExec( "PushedFilters" -> seqToString(pushedDownFilters), "DataFilters" -> seqToString(dataFilters), "Location" -> locationDesc) -val withOptPartitionCount = - relation.partitionSchemaOption.map { _ => -metadata + ("PartitionCount" -> selectedPartitions.size.toString) - } getOrElse { -metadata - } +val withOptPartitionCount = if (relation.partitionSchemaOption.isDefined && + !hasPartitionsAvailableAtRunTime) { + metadata + ("PartitionCount" -> selectedPartitions.size.toString) +} else { + metadata +} val withSelectedBucketsCount = relation.bucketSpec.map { spec => val numSelectedBuckets = optionalBucketSet.map { b => diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index b73dc30..a1252ee 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -434,7 +434,7 @@ object DataSourceStrategy { protected[sql] def normalizeFilters( filters: Seq[Expression], attributes: Seq[AttributeReference]): Seq[Expression] = { -filters.filterNot(SubqueryExpression.hasSubquery).map { e => +filters.map { e => e transform { case a: AttributeReference =>
[spark] branch master updated: [SPARK-26956][SS] remove streaming output mode from data source v2 APIs
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 382d5a8 [SPARK-26956][SS] remove streaming output mode from data source v2 APIs 382d5a8 is described below commit 382d5a82b0e8ef1dd01209e828eae67fe7993a56 Author: Wenchen Fan AuthorDate: Sun Mar 3 22:20:31 2019 -0800 [SPARK-26956][SS] remove streaming output mode from data source v2 APIs ## What changes were proposed in this pull request? Similar to `SaveMode`, we should remove streaming `OutputMode` from data source v2 API, and use operations that has clear semantic. The changes are: 1. append mode: create `StreamingWrite` directly. By default, the `WriteBuilder` will create `Write` to append data. 2. complete mode: call `SupportsTruncate#truncate`. Complete mode means truncating all the old data and appending new data of the current epoch. `SupportsTruncate` has exactly the same semantic. 3. update mode: fail. The current streaming framework can't propagate the update keys, so v2 sinks are not able to implement update mode. In the future we can introduce a `SupportsUpdate` trait. The behavior changes: 1. all the v2 sinks(foreach, console, memory, kafka, noop) don't support update mode. The fact is, previously all the v2 sinks implement the update mode wrong. None of them can really support it. 2. kafka sink doesn't support complete mode. The fact is, the kafka sink can only append data. ## How was this patch tested? existing tests Closes #23859 from cloud-fan/update. Authored-by: Wenchen Fan Signed-off-by: gatorsmile --- .../spark/sql/kafka010/KafkaSourceProvider.scala | 6 +-- .../v2/writer/streaming/SupportsOutputMode.java| 29 .../datasources/noop/NoopDataSource.scala | 7 ++- .../execution/streaming/MicroBatchExecution.scala | 10 + .../sql/execution/streaming/StreamExecution.scala | 34 +++ .../spark/sql/execution/streaming/console.scala| 10 ++--- .../streaming/continuous/ContinuousExecution.scala | 10 + .../streaming/sources/ForeachWriterTable.scala | 11 ++--- .../sql/execution/streaming/sources/memoryV2.scala | 51 +- .../execution/streaming/MemorySinkV2Suite.scala| 4 +- 10 files changed, 75 insertions(+), 97 deletions(-) diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala index a139573..4dc6955 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala @@ -34,7 +34,7 @@ import org.apache.spark.sql.sources.v2._ import org.apache.spark.sql.sources.v2.reader.{Scan, ScanBuilder} import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousStream, MicroBatchStream} import org.apache.spark.sql.sources.v2.writer.WriteBuilder -import org.apache.spark.sql.sources.v2.writer.streaming.{StreamingWrite, SupportsOutputMode} +import org.apache.spark.sql.sources.v2.writer.streaming.StreamingWrite import org.apache.spark.sql.streaming.OutputMode import org.apache.spark.sql.types.StructType @@ -362,7 +362,7 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister } override def newWriteBuilder(options: DataSourceOptions): WriteBuilder = { - new WriteBuilder with SupportsOutputMode { + new WriteBuilder { private var inputSchema: StructType = _ override def withInputDataSchema(schema: StructType): WriteBuilder = { @@ -370,8 +370,6 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister this } -override def outputMode(mode: OutputMode): WriteBuilder = this - override def buildForStreaming(): StreamingWrite = { import scala.collection.JavaConverters._ diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/SupportsOutputMode.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/SupportsOutputMode.java deleted file mode 100644 index 832dcfa..000 --- a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/SupportsOutputMode.java +++ /dev/null @@ -1,29 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may
svn commit: r32737 - /dev/spark/KEYS
Author: dbtsai Date: Mon Mar 4 04:33:15 2019 New Revision: 32737 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Mon Mar 4 04:33:15 2019 @@ -888,30 +888,49 @@ pqnGj9s6Uudh/FXfVN5MC0/pH/ySSACkXwCmKXAh =4noL -END PGP PUBLIC KEY BLOCK- -pub nistp384 2019-02-21 [SC] - F8E155E845606E350C4147710359BC9965359766 +pub rsa4096 2019-03-04 [SCEA] + 5566B502701B3A5F5ABD5B1F42E5B25A8F7A82C1 uid [ultimate] DB Tsai uid [ultimate] DB Tsai -sub nistp384 2019-02-21 [E] -BEGIN PGP PUBLIC KEY BLOCK- -mG8EXG3xSBMFK4EEACIDAwTFUItdmiASsQ34SfIfDCGtSSqpGQHlSfDB801cJRSK -tAxO51Xu3E6BSpSTcImHzstwxGj7rkSDSHhiwZG18314+ykKVNHSuIFDdYEUi2aR -UQ0RWSiGhVS+Eg51v07Zc2q0G0RCIFRzYWkgPGRidHNhaUBkYnRzYWkuY29tPoiz -BBMTCQA7AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAFiEE+OFV6EVgbjUMQUdx -A1m8mWU1l2YFAlxt8YECGQEACgkQA1m8mWU1l2Y+tgGAqzmo6XJZ9D0HxKGmvqu3 -HF/DHN2rEkutYfoGk8Wv3I3ALVVa8Qnh8zikO5y4hL4tAX90IL/NDXHoUR2sdsbJ -tf+Eidnqc6BARrJiMXueH5KvlmtM3nmL8f28+ht8J09SPLm0G0RCIFRzYWkgPGRi -dHNhaUBhcGFjaGUub3JnPoiwBBMTCQA4FiEE+OFV6EVgbjUMQUdxA1m8mWU1l2YF -Alxt8ZcCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQA1m8mWU1l2bZFwF/ -b46arpjADavN31useQj7cEPPjaPLgTsTKmG51lYZaDZlOXZLFVXKp/S++2IqO3C+ -AX46fL0wL+yPZ1XJ8hz0vkDIzUstC6AH6EjtmD0x/JLOei4wFB/ke631GUVkHrQZ -lsS4cwRcbfFIEgUrgQQAIgMDBGKOAw+vHLvIS0OQ+U8M4360AFE/anh7M/wn1bFW -50IdrOOW0Ss19Rib9UzBBW9FHU1udMGJmVOH4aBFRTXr2VOTPcswI1ArG7DoB/bs -R2uwKt67grmq3MqQj5EJNKUzFwMBCQmImAQYEwkAIBYhBPjhVehFYG41DEFHcQNZ -vJllNZdmBQJcbfFIAhsMAAoJEANZvJllNZdmUowBf3uHuFhO3IVozWsAnEfB8wdN -vGQkKNb5uKWrMUAAkeUuY7+AeFDiZFxIcdfrx7B7tgF/c/CS4sgVUAIpDbw/gsfl -UwauMsN1CWpCIaMEFZYNo//B+auqTseNXMtqqGaIgTXv -=NYIb +mQINBFx8pR4BEADHjMDto5wSHrIng7vJn53qKeyaYb3evfppJGw1roUZFlV0USwx +ADOpLgtFR3AxOJRrw9NzGcPtx+G+D7LQl6zD+uGZlgjbJfzfOjSvwHXhuf+MJbUp +FU0e/IuT/7+FBo2C9j6ITVo8knWyOXDMIlCckgkXaQ3AyILeJpxeN5vJR5Q2PBal +hCXreVY0Bny9+pJlpl4xQ3tCpBpBD9ouqdR9Zhhh4QFmxBgg2VsDYsbpixsGB8Ce +VzL6yQkqDxTL6PcmnMspHF2rvlSA68ikK20au+PTY9VYIb9Iry84ZGrhVQSZatAx +icTD5GLggC5fWV5lFV1INF3+syhFPXuDNJ64uLCMDzDXfMUUXAV1538XyWV2WNwR +ZLEr73bYkTTfgmkuRlDvLqz7b9iJLrXqAPVu1Sv+NpEgowU/GRxWd/Ekt+u1XkL1 +OnzyvHoFKjntY+Z2tQEkr7Uk/4u8N2d10XuFgEc07FSKYxz0QvPiHkiQD8I2v7hL +Fp0YHC6mSeoKuIXnYj6yf8V9vxVTBj0pWhQPFUWn5/vIEdpm8D/c1BONsh4/FZ3U +QIwm58v6e/Fw84jFtAFBPoFj3M8bQk559ZOESaHFjM+ZD+AFsA8vRkV0jiz5+VAr +8R1QkoAslZKD6XiTbLAkSvvvg2QUSW2sK2g4iXj0d6J3KbLq8KPN+eMnxwARAQAB +tBtEQiBUc2FpIDxkYnRzYWlAZGJ0c2FpLmNvbT6JAlEEEwEIADsCGy8FCwkIBwIG +FQoJCAsCBBYCAwECHgECF4AWIQRVZrUCcBs6X1q9Wx9C5bJaj3qCwQUCXHylkQIZ +AQAKCRBC5bJaj3qCwWUjEADHFI07aYBkpAiddTQt++fuWOFDAcKTutT3LVxQmNjO +7ADEcZhinKBKKYEehKiwmBGe5Gp1/nRQ/yGganO/NlghEyMU59Q+s07cipvYNJSk ++gO1OvcOyziDOFGvu1/1A/AeM9wJhKqVl/ELbtiwTozMv/hpG37ybFOn05oGKsjI +JrZHiwAhPk9k5Z2KYarxaqEVu7Uon+0gvG+atORq7yF4zybKFhYmdXVy5pMJAmKs +acFYO/bvwhMA/0zgjS96jD+CoisYkeRub6JOLTtGLAQKdUNdiCXEfIxpOUG0NRRR +PH+mf+pmZfPZRKtksPoaNHfrDyEk2sFQe2TVCUEFGwSAw99u0y623ewwWatjoywg +uALOIVwqzp/J1dWrN225zgXt7X93dHwooRcN64rhRQQkvtnhE4Wka3u7f2pJRDC9 +wwpsENvWYfJIgoZPvbLIBNKcAAWu94Uv+k3wgkxj9CITc5J7XG5in8ndM0ioSCDa +zs5SCicLqaW1PLJgBJnZaKXTPtV7pNtIb3Qi03SVrsvPiDpJHCXxMlFo9uDnWfmB +sTaylTga7slAZNp+DcAH2I849tamOopC1kvawocpEZI1dDOs+XR5VPo8Jw+2Loyr +poHbA3GJrv1b6eA3Hlma/u3Hp5qOYxCpawDeRIMmTSJca43+6Od4IX/yJbaSIyzO +S7QbREIgVHNhaSA8ZGJ0c2FpQGFwYWNoZS5vcmc+iQJOBBMBCAA4FiEEVWa1AnAb +Ol9avVsfQuWyWo96gsEFAlx8pYkCGy8FCwkIBwIGFQoJCAsCBBYCAwECHgECF4AA +CgkQQuWyWo96gsEL8w/8CMPZxMVia2kXthXN70ptIHCc9dFxXdSwfBbFj3lIx1Qn +I3IZfxKqxi44UeUW9hYcSzgDyCP3IRv0yOPeqkOJtHkgctSkCKC0rWctH0EhvuED +pqudTRUkpH+tYdcU1qlnOb/4altsvJHguBtxQF9LKi3N0PZDuMPYlHbW2EMsBvLF +QfowjZ+sBS8gjoq2N7phLqnNjepuhXLJMWeIVZRUIzFMzJszv65wCaAA4u/qJ4ac +lKUkMiqjupDJcFE6A/JMwr6xXCvPbCAKcaY09i2i1UuG+Q1MN9yzM9ipcYkgjcCg +KLo7iT2cVKo9ISGbh1VILSJrWqnVzevmJDy3o0z0g9WzcVnVqynvvZ7/A+8dViMW +K10F7u2vg8Y0BDmw+vcVTvF7uhPoxfqIeTDJjeZw0fCVJFVzUsZIa7AqtM/Gzi0J +jdEF8LyZQUC4t3lgdUoiBf83cz0/JtEr8oZZ02rKWiXdvdKLcMy7qYEVF/5gtE/g +sjrN9LcM3vhv1//pkn017uTeL6/aOosSxJFPO2YWGMlFZUfmGj1dWCXK+LBRi4JO +a9HbpFNXEIRMvNI9f183iSWfyMtFXhLrxHETX87IHx3/SztViTl7D1ebnzBwEUR4 +QSiz4fEiq6agEwb78fyEEaBsMFNFM4seVz2V1WvKMemME5DRCzaPyqHMGXEzenk= +=KULP -END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27032][TEST] De-flake org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.HDFSMetadataLog: metadata directory collision
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b76f262 [SPARK-27032][TEST] De-flake org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.HDFSMetadataLog: metadata directory collision b76f262 is described below commit b76f262fc84a1fbe52bbc6bf74573dec4d1ae4df Author: Sean Owen AuthorDate: Mon Mar 4 13:36:41 2019 +0900 [SPARK-27032][TEST] De-flake org.apache.spark.sql.execution.streaming.HDFSMetadataLogSuite.HDFSMetadataLog: metadata directory collision ## What changes were proposed in this pull request? Reduce work in HDFSMetadataLogSuite test to possibly de-flake it. ## How was this patch tested? Existing tests Closes #23937 from srowen/SPARK-27032. Authored-by: Sean Owen Signed-off-by: Hyukjin Kwon --- .../spark/sql/execution/streaming/HDFSMetadataLogSuite.scala | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala index 0e36e7f..3706835 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala @@ -131,9 +131,10 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { testQuietly("HDFSMetadataLog: metadata directory collision") { withTempDir { temp => - val waiter = new Waiter - val maxBatchId = 100 - for (id <- 0 until 10) { + val waiter = new Waiter() + val maxBatchId = 10 + val numThreads = 5 + for (id <- 0 until numThreads) { new UninterruptibleThread(s"HDFSMetadataLog: metadata directory collision - thread $id") { override def run(): Unit = waiter { val metadataLog = @@ -146,7 +147,7 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { nextBatchId += 1 } } catch { - case e: ConcurrentModificationException => + case _: ConcurrentModificationException => // This is expected since there are multiple writers } finally { waiter.dismiss() @@ -155,7 +156,7 @@ class HDFSMetadataLogSuite extends SparkFunSuite with SharedSQLContext { }.start() } - waiter.await(timeout(10.seconds), dismissals(10)) + waiter.await(timeout(10.seconds), dismissals(numThreads)) val metadataLog = new HDFSMetadataLog[String](spark, temp.getAbsolutePath) assert(metadataLog.getLatest() === Some(maxBatchId -> maxBatchId.toString)) assert( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Link to sigs on www.apache.org; provide direct link to release's sigs, checksums
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new c0f7dd2 Link to sigs on www.apache.org; provide direct link to release's sigs, checksums c0f7dd2 is described below commit c0f7dd27612d14cfd57e19bfeeecf683246ba40f Author: Sean Owen AuthorDate: Sun Mar 3 09:40:37 2019 -0600 Link to sigs on www.apache.org; provide direct link to release's sigs, checksums See https://issues.apache.org/jira/browse/SPARK-26274 This also cleans up some weirdness in the JS. I tested it manually and all the links appear to work fine. Author: Sean Owen Closes #184 from srowen/SPARK-26274. --- js/downloads.js | 36 +++- site/js/downloads.js | 36 +++- 2 files changed, 30 insertions(+), 42 deletions(-) diff --git a/js/downloads.js b/js/downloads.js index be6c29c..49ceb36 100644 --- a/js/downloads.js +++ b/js/downloads.js @@ -74,10 +74,8 @@ function onPackageSelect() { function onVersionSelect() { var versionSelect = document.getElementById("sparkVersionSelect"); var packageSelect = document.getElementById("sparkPackageSelect"); - var verifyLink = document.getElementById("sparkDownloadVerify"); empty(packageSelect); - empty(verifyLink); var version = getSelectedValue(versionSelect); var packages = releases[version]["packages"]; @@ -88,10 +86,6 @@ function onVersionSelect() { append(packageSelect, option); } - var href = "https://archive.apache.org/dist/spark/spark-; + version + "/"; - var link = "" + versionShort(version) + " signatures and checksums"; - append(verifyLink, link); - // Populate releases updateDownloadLink(releases[version].mirrored); } @@ -100,32 +94,32 @@ function updateDownloadLink(isMirrored) { var versionSelect = document.getElementById("sparkVersionSelect"); var packageSelect = document.getElementById("sparkPackageSelect"); var downloadLink = document.getElementById("spanDownloadLink"); + var verifyLink = document.getElementById("sparkDownloadVerify"); empty(downloadLink); + empty(verifyLink); var version = getSelectedValue(versionSelect); var pkg = getSelectedValue(packageSelect); - var artifactName = "spark-$ver-bin-$pkg.tgz" -.replace(/\$ver/g, version) -.replace(/\$pkg/g, pkg) + var artifactName = "spark-" + version + "-bin-" + pkg + ".tgz" .replace(/-bin-sources/, ""); // special case for source packages - var link = ""; + var downloadHref = ""; if (isMirrored) { -link = "https://www.apache.org/dyn/closer.lua/spark/spark-$ver/$artifact;; +downloadHref = "https://www.apache.org/dyn/closer.lua/spark/spark-; + version + "/" + artifactName; } else { -link = "https://archive.apache.org/dist/spark/spark-$ver/$artifact;; +downloadHref = "https://archive.apache.org/dist/spark/spark-; + version + "/" + artifactName; } - link = link -.replace(/\$ver/, version) -.replace(/\$artifact/, artifactName); - var text = link.split("/").reverse()[0]; - + var text = downloadHref.split("/").reverse()[0]; var onClick = -"trackOutboundLink(this, 'Release Download Links', 'apache_$artifact'); return false;" -.replace(/\$artifact/, artifactName); - - var contents = "" + text + ""; +"trackOutboundLink(this, 'Release Download Links', 'apache_" + artifactName + "'); return false;"; + var contents = "" + text + ""; append(downloadLink, contents); + + var sigHref = "https://www.apache.org/dist/spark/spark-; + version + "/" + artifactName + ".asc"; + var checksumHref = "https://www.apache.org/dist/spark/spark-; + version + "/" + artifactName + ".sha512"; + var verifyLinks = versionShort(version) + " signatures, checksums"; + append(verifyLink, verifyLinks); } diff --git a/site/js/downloads.js b/site/js/downloads.js index be6c29c..49ceb36 100644 --- a/site/js/downloads.js +++ b/site/js/downloads.js @@ -74,10 +74,8 @@ function onPackageSelect() { function onVersionSelect() { var versionSelect = document.getElementById("sparkVersionSelect"); var packageSelect = document.getElementById("sparkPackageSelect"); - var verifyLink = document.getElementById("sparkDownloadVerify"); empty(packageSelect); - empty(verifyLink); var version = getSelectedValue(versionSelect); var packages = releases[version]["packages"]; @@ -88,10 +86,6 @@ function onVersionSelect() { append(packageSelect, option); } - var href = "https://archive.apache.org/dist/spark/spark-; + version + "/"; - var link = "" + versionShort(version) + " signatures and checksums"; - append(verifyLink, link); - // Populate releases updateDownloadLink(releases[version].mirrored); } @@ -100,32 +94,32 @@ function updateDownloadLink(isMirrored) { var versionSelect =
[GitHub] [spark-website] srowen closed pull request #184: Link to sigs on www.apache.org; provide direct link to release's sigs, checksums
srowen closed pull request #184: Link to sigs on www.apache.org; provide direct link to release's sigs, checksums URL: https://github.com/apache/spark-website/pull/184 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27016][SQL][BUILD] Treat all antlr warnings as errors while generating parser from the sql grammar file.
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 04ad559 [SPARK-27016][SQL][BUILD] Treat all antlr warnings as errors while generating parser from the sql grammar file. 04ad559 is described below commit 04ad559ab64df19ddfdf9e6fe0de3f8484daf86e Author: Dilip Biswal AuthorDate: Sun Mar 3 10:02:25 2019 -0600 [SPARK-27016][SQL][BUILD] Treat all antlr warnings as errors while generating parser from the sql grammar file. ## What changes were proposed in this pull request? Use the maven plugin option `treatWarningsAsErrors` to make sure the warnings are treated as errors while generating the parser file. In the absence of it, we may inadvertently introducing problems while making grammar changes. Please refer to [PR-23897](https://github.com/apache/spark/pull/23897) to know more about the context. ## How was this patch tested? We can use two ways to build Spark 1) sbt 2) Maven This PR, we made a change to configure the maven antlr plugin to include a parameter that makes antlr4 report error on warning. However, when spark is built using sbt, we use the sbt antlr plugin which does not allow us to pass this additional compilation flag. More info on sbt-antlr plugin can be found at [link](https://github.com/ihji/sbt-antlr4/blob/master/src/main/scala/com/simplytyped/Antlr4Plugin.scala) In summary, this fix only applicable when we use maven to build. Closes #23925 from dilipbiswal/antlr_fix. Authored-by: Dilip Biswal Signed-off-by: Sean Owen --- sql/catalyst/pom.xml | 1 + 1 file changed, 1 insertion(+) diff --git a/sql/catalyst/pom.xml b/sql/catalyst/pom.xml index 20cc5d0..15fc5e3 100644 --- a/sql/catalyst/pom.xml +++ b/sql/catalyst/pom.xml @@ -156,6 +156,7 @@ true ../catalyst/src/main/antlr4 + true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on issue #184: Link to sigs on www.apache.org; provide direct link to release's sigs, checksums
srowen commented on issue #184: Link to sigs on www.apache.org; provide direct link to release's sigs, checksums URL: https://github.com/apache/spark-website/pull/184#issuecomment-469034747 Merged. The download page now has separate direct links for sigs and checksum files for the selected release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org