svn commit: r28019 - in /dev/spark/2.4.0-SNAPSHOT-2018_07_10_00_01-4984f1a-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jul 10 07:16:39 2018 New Revision: 28019 Log: Apache Spark 2.4.0-SNAPSHOT-2018_07_10_00_01-4984f1a docs [This commit notification would consist of 1467 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24706][SQL] ByteType and ShortType support pushdown to parquet
Repository: spark Updated Branches: refs/heads/master 4984f1af7 -> a28900956 [SPARK-24706][SQL] ByteType and ShortType support pushdown to parquet ## What changes were proposed in this pull request? `ByteType` and `ShortType` support pushdown to parquet data source. [Benchmark result](https://issues.apache.org/jira/browse/SPARK-24706?focusedCommentId=16528878&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16528878). ## How was this patch tested? unit tests Author: Yuming Wang Closes #21682 from wangyum/SPARK-24706. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2890095 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2890095 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2890095 Branch: refs/heads/master Commit: a289009567c1566a1df4bcdfdf0111e82ae3d81d Parents: 4984f1a Author: Yuming Wang Authored: Tue Jul 10 15:58:14 2018 +0800 Committer: Wenchen Fan Committed: Tue Jul 10 15:58:14 2018 +0800 -- .../FilterPushdownBenchmark-results.txt | 32 +-- .../datasources/parquet/ParquetFilters.scala| 34 +++- .../parquet/ParquetFilterSuite.scala| 56 3 files changed, 94 insertions(+), 28 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a2890095/sql/core/benchmarks/FilterPushdownBenchmark-results.txt -- diff --git a/sql/core/benchmarks/FilterPushdownBenchmark-results.txt b/sql/core/benchmarks/FilterPushdownBenchmark-results.txt index 29fe434..110669b 100644 --- a/sql/core/benchmarks/FilterPushdownBenchmark-results.txt +++ b/sql/core/benchmarks/FilterPushdownBenchmark-results.txt @@ -542,39 +542,39 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized3726 / 3775 4.2 236.9 1.0X -Parquet Vectorized (Pushdown) 3741 / 3789 4.2 237.9 1.0X -Native ORC Vectorized 2793 / 2909 5.6 177.6 1.3X -Native ORC Vectorized (Pushdown) 530 / 561 29.7 33.7 7.0X +Parquet Vectorized3461 / 3997 4.5 220.1 1.0X +Parquet Vectorized (Pushdown) 270 / 315 58.4 17.1 12.8X +Native ORC Vectorized 4107 / 5372 3.8 261.1 0.8X +Native ORC Vectorized (Pushdown) 778 / 1553 20.2 49.5 4.4X Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 10% tinyint rows (value < CAST(12 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized4385 / 4406 3.6 278.8 1.0X -Parquet Vectorized (Pushdown) 4398 / 4454 3.6 279.6 1.0X -Native ORC Vectorized 3420 / 3501 4.6 217.4 1.3X -Native ORC Vectorized (Pushdown) 1395 / 1432 11.3 88.7 3.1X +Parquet Vectorized4771 / 6655 3.3 303.3 1.0X +Parquet Vectorized (Pushdown) 1322 / 1606 11.9 84.0 3.6X +Native ORC Vectorized 4437 / 4572 3.5 282.1 1.1X +Native ORC Vectorized (Pushdown) 1781 / 1976 8.8 113.2 2.7X Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 50% tinyint rows (value < CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized7307 / 7394 2.2 464.6 1.0X -Parquet Vectorized (Pushdown) 7411 / 7461 2.1 471.2 1.0X -Native ORC Vectorized 6501 / 7814 2.4 413.4 1.1X -Native ORC Vectorized (Pushdown) 7341 / 8637 2.1 466.7 1.0X +Parquet Vectorized7433 / 7752 2.1 472.6 1.0X +Parquet Vectorized (Pushdown) 5863 / 5913 2.7 372.8
svn commit: r28023 - in /dev/spark/2.4.0-SNAPSHOT-2018_07_10_04_01-a289009-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jul 10 11:17:52 2018 New Revision: 28023 Log: Apache Spark 2.4.0-SNAPSHOT-2018_07_10_04_01-a289009 docs [This commit notification would consist of 1467 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24678][SPARK-STREAMING] Give priority in use of 'PROCESS_LOCAL' for spark-streaming
Repository: spark Updated Branches: refs/heads/master a28900956 -> 6fe32869c [SPARK-24678][SPARK-STREAMING] Give priority in use of 'PROCESS_LOCAL' for spark-streaming ## What changes were proposed in this pull request? Currently, `BlockRDD.getPreferredLocations` only get hosts info of blocks, which results in subsequent schedule level is not better than 'NODE_LOCAL'. We can just make a small changes, the schedule level can be improved to 'PROCESS_LOCAL' ## How was this patch tested? manual test Author: sharkdtu Closes #21658 from sharkdtu/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fe32869 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fe32869 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fe32869 Branch: refs/heads/master Commit: 6fe32869ccb17933e77a4dbe883e36d382fbeeec Parents: a289009 Author: sharkdtu Authored: Tue Jul 10 20:18:34 2018 +0800 Committer: jerryshao Committed: Tue Jul 10 20:18:34 2018 +0800 -- .../src/main/scala/org/apache/spark/rdd/BlockRDD.scala | 2 +- .../scala/org/apache/spark/storage/BlockManager.scala | 7 +-- .../org/apache/spark/storage/BlockManagerSuite.scala | 13 + 3 files changed, 19 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6fe32869/core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala b/core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala index 4e036c2..23cf19d 100644 --- a/core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala @@ -30,7 +30,7 @@ private[spark] class BlockRDD[T: ClassTag](sc: SparkContext, @transient val blockIds: Array[BlockId]) extends RDD[T](sc, Nil) { - @transient lazy val _locations = BlockManager.blockIdsToHosts(blockIds, SparkEnv.get) + @transient lazy val _locations = BlockManager.blockIdsToLocations(blockIds, SparkEnv.get) @volatile private var _isValid = true override def getPartitions: Array[Partition] = { http://git-wip-us.apache.org/repos/asf/spark/blob/6fe32869/core/src/main/scala/org/apache/spark/storage/BlockManager.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala index df1a4be..0e1c7d5 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala @@ -45,6 +45,7 @@ import org.apache.spark.network.netty.SparkTransportConf import org.apache.spark.network.shuffle.{ExternalShuffleClient, TempFileManager} import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo import org.apache.spark.rpc.RpcEnv +import org.apache.spark.scheduler.ExecutorCacheTaskLocation import org.apache.spark.serializer.{SerializerInstance, SerializerManager} import org.apache.spark.shuffle.ShuffleManager import org.apache.spark.storage.memory._ @@ -1554,7 +1555,7 @@ private[spark] class BlockManager( private[spark] object BlockManager { private val ID_GENERATOR = new IdGenerator - def blockIdsToHosts( + def blockIdsToLocations( blockIds: Array[BlockId], env: SparkEnv, blockManagerMaster: BlockManagerMaster = null): Map[BlockId, Seq[String]] = { @@ -1569,7 +1570,9 @@ private[spark] object BlockManager { val blockManagers = new HashMap[BlockId, Seq[String]] for (i <- 0 until blockIds.length) { - blockManagers(blockIds(i)) = blockLocations(i).map(_.host) + blockManagers(blockIds(i)) = blockLocations(i).map { loc => +ExecutorCacheTaskLocation(loc.host, loc.executorId).toString + } } blockManagers.toMap } http://git-wip-us.apache.org/repos/asf/spark/blob/6fe32869/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala index b19d8eb..08172f0 100644 --- a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala @@ -1422,6 +1422,19 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(mockBlockTransferService.tempFileManager === store.remoteBlockTempFileManager) } + test("query locations of blockIds") { +val mockBlockManagerMaster = mock(classOf[BlockManagerMaster]) +val blockLocations = Seq(BlockManagerId("1", "hos
spark git commit: [SPARK-21743][SQL][FOLLOWUP] free aggregate map when task ends
Repository: spark Updated Branches: refs/heads/master 6fe32869c -> e0559f238 [SPARK-21743][SQL][FOLLOWUP] free aggregate map when task ends ## What changes were proposed in this pull request? This is the first follow-up of https://github.com/apache/spark/pull/21573 , which was only merged to 2.3. This PR fixes the memory leak in another way: free the `UnsafeExternalMap` when the task ends. All the data buffers in Spark SQL are using `UnsafeExternalMap` and `UnsafeExternalSorter` under the hood, e.g. sort, aggregate, window, SMJ, etc. `UnsafeExternalSorter` registers a task completion listener to free the resource, we should apply the same thing to `UnsafeExternalMap`. TODO in the next PR: do not consume all the inputs when having limit in whole stage codegen. ## How was this patch tested? existing tests Author: Wenchen Fan Closes #21738 from cloud-fan/limit. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e0559f23 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e0559f23 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e0559f23 Branch: refs/heads/master Commit: e0559f238009e02c40f65678fec691c07904e8c0 Parents: 6fe3286 Author: Wenchen Fan Authored: Tue Jul 10 23:07:10 2018 +0800 Committer: Wenchen Fan Committed: Tue Jul 10 23:07:10 2018 +0800 -- .../UnsafeFixedWidthAggregationMap.java | 17 +++- .../spark/sql/execution/SparkStrategies.scala | 7 +-- .../execution/aggregate/HashAggregateExec.scala | 2 +- .../aggregate/TungstenAggregationIterator.scala | 2 +- .../UnsafeFixedWidthAggregationMapSuite.scala | 21 5 files changed, 28 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e0559f23/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java index c7c4c7b..c8cf44b 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeFixedWidthAggregationMap.java @@ -20,8 +20,8 @@ package org.apache.spark.sql.execution; import java.io.IOException; import org.apache.spark.SparkEnv; +import org.apache.spark.TaskContext; import org.apache.spark.internal.config.package$; -import org.apache.spark.memory.TaskMemoryManager; import org.apache.spark.sql.catalyst.InternalRow; import org.apache.spark.sql.catalyst.expressions.UnsafeProjection; import org.apache.spark.sql.catalyst.expressions.UnsafeRow; @@ -82,7 +82,7 @@ public final class UnsafeFixedWidthAggregationMap { * @param emptyAggregationBuffer the default value for new keys (a "zero" of the agg. function) * @param aggregationBufferSchema the schema of the aggregation buffer, used for row conversion. * @param groupingKeySchema the schema of the grouping key, used for row conversion. - * @param taskMemoryManager the memory manager used to allocate our Unsafe memory structures. + * @param taskContext the current task context. * @param initialCapacity the initial capacity of the map (a sizing hint to avoid re-hashing). * @param pageSizeBytes the data page size, in bytes; limits the maximum record size. */ @@ -90,19 +90,26 @@ public final class UnsafeFixedWidthAggregationMap { InternalRow emptyAggregationBuffer, StructType aggregationBufferSchema, StructType groupingKeySchema, - TaskMemoryManager taskMemoryManager, + TaskContext taskContext, int initialCapacity, long pageSizeBytes) { this.aggregationBufferSchema = aggregationBufferSchema; this.currentAggregationBuffer = new UnsafeRow(aggregationBufferSchema.length()); this.groupingKeyProjection = UnsafeProjection.create(groupingKeySchema); this.groupingKeySchema = groupingKeySchema; -this.map = - new BytesToBytesMap(taskMemoryManager, initialCapacity, pageSizeBytes, true); +this.map = new BytesToBytesMap( + taskContext.taskMemoryManager(), initialCapacity, pageSizeBytes, true); // Initialize the buffer for aggregation value final UnsafeProjection valueProjection = UnsafeProjection.create(aggregationBufferSchema); this.emptyAggregationBuffer = valueProjection.apply(emptyAggregationBuffer).getBytes(); + +// Register a cleanup task with TaskContext to ensure that memory is guaranteed to be freed at +// the end of the task. This is necessary to avoid memory leaks in when the downstream operator +// does not fully consume the aggregation m
svn commit: r28035 - in /dev/spark/2.4.0-SNAPSHOT-2018_07_10_08_01-6fe3286-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jul 10 15:16:05 2018 New Revision: 28035 Log: Apache Spark 2.4.0-SNAPSHOT-2018_07_10_08_01-6fe3286 docs [This commit notification would consist of 1467 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24662][SQL][SS] Support limit in structured streaming
Repository: spark Updated Branches: refs/heads/master e0559f238 -> 32cb50835 [SPARK-24662][SQL][SS] Support limit in structured streaming ## What changes were proposed in this pull request? Support the LIMIT operator in structured streaming. For streams in append or complete output mode, a stream with a LIMIT operator will return no more than the specified number of rows. LIMIT is still unsupported for the update output mode. This change reverts https://github.com/apache/spark/commit/e4fee395ecd93ad4579d9afbf0861f82a303e563 as part of it because it is a better and more complete implementation. ## How was this patch tested? New and existing unit tests. Author: Mukul Murthy Closes #21662 from mukulmurthy/SPARK-24662. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32cb5083 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32cb5083 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32cb5083 Branch: refs/heads/master Commit: 32cb50835e7258625afff562939872be002232f2 Parents: e0559f2 Author: Mukul Murthy Authored: Tue Jul 10 11:08:04 2018 -0700 Committer: Tathagata Das Committed: Tue Jul 10 11:08:04 2018 -0700 -- .../analysis/UnsupportedOperationChecker.scala | 6 +- .../spark/sql/execution/SparkStrategies.scala | 26 - .../streaming/IncrementalExecution.scala| 11 +- .../streaming/StreamingGlobalLimitExec.scala| 102 ++ .../spark/sql/execution/streaming/memory.scala | 70 ++-- .../execution/streaming/sources/memoryV2.scala | 44 ++-- .../spark/sql/streaming/DataStreamWriter.scala | 4 +- .../execution/streaming/MemorySinkSuite.scala | 62 +-- .../execution/streaming/MemorySinkV2Suite.scala | 80 +- .../spark/sql/streaming/StreamSuite.scala | 108 +++ .../apache/spark/sql/streaming/StreamTest.scala | 4 +- 11 files changed, 272 insertions(+), 245 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/32cb5083/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala index 5ced1ca..f68df5d 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala @@ -315,8 +315,10 @@ object UnsupportedOperationChecker { case GroupingSets(_, _, child, _) if child.isStreaming => throwError("GroupingSets is not supported on streaming DataFrames/Datasets") -case GlobalLimit(_, _) | LocalLimit(_, _) if subPlan.children.forall(_.isStreaming) => - throwError("Limits are not supported on streaming DataFrames/Datasets") +case GlobalLimit(_, _) | LocalLimit(_, _) +if subPlan.children.forall(_.isStreaming) && outputMode == InternalOutputModes.Update => + throwError("Limits are not supported on streaming DataFrames/Datasets in Update " + +"output mode") case Sort(_, _, _) if !containsCompleteData(subPlan) => throwError("Sorting is not supported on streaming DataFrames/Datasets, unless it is on " + http://git-wip-us.apache.org/repos/asf/spark/blob/32cb5083/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index cfbcb9a..02e095b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.planning._ import org.apache.spark.sql.catalyst.plans._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.plans.physical._ +import org.apache.spark.sql.catalyst.streaming.InternalOutputModes import org.apache.spark.sql.execution.columnar.{InMemoryRelation, InMemoryTableScanExec} import org.apache.spark.sql.execution.command._ import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec @@ -34,7 +35,7 @@ import org.apache.spark.sql.execution.joins.{BuildLeft, BuildRight, BuildSide} import org.apache.spark.sql.execution.streaming._ import org.apache.spark.sql.execution.streaming.sources.MemoryPlanV2 import org.apache.spark.sq
svn commit: r28045 - in /dev/spark/2.4.0-SNAPSHOT-2018_07_10_12_01-32cb508-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Tue Jul 10 19:16:13 2018 New Revision: 28045 Log: Apache Spark 2.4.0-SNAPSHOT-2018_07_10_12_01-32cb508 docs [This commit notification would consist of 1467 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24730][SS] Add policy to choose max as global watermark when streaming query has multiple watermarks
Repository: spark Updated Branches: refs/heads/master 32cb50835 -> 6078b891d [SPARK-24730][SS] Add policy to choose max as global watermark when streaming query has multiple watermarks ## What changes were proposed in this pull request? Currently, when a streaming query has multiple watermark, the policy is to choose the min of them as the global watermark. This is safe to do as the global watermark moves with the slowest stream, and is therefore is safe as it does not unexpectedly drop some data as late, etc. While this is indeed the safe thing to do, in some cases, you may want the watermark to advance with the fastest stream, that is, take the max of multiple watermarks. This PR is to add that configuration. It makes the following changes. - Adds a configuration to specify max as the policy. - Saves the configuration in OffsetSeqMetadata because changing it in the middle can lead to unpredictable results. - For old checkpoints without the configuration, it assumes the default policy as min (irrespective of the policy set at the session where the query is being restarted). This is to ensure that existing queries are affected in any way. TODO - [ ] Add a test for recovery from existing checkpoints. ## How was this patch tested? New unit test Author: Tathagata Das Closes #21701 from tdas/SPARK-24730. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6078b891 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6078b891 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6078b891 Branch: refs/heads/master Commit: 6078b891da8fe7fc36579699473168ae7443284c Parents: 32cb508 Author: Tathagata Das Authored: Tue Jul 10 18:03:40 2018 -0700 Committer: Tathagata Das Committed: Tue Jul 10 18:03:40 2018 -0700 -- .../org/apache/spark/sql/internal/SQLConf.scala | 15 ++ .../streaming/MicroBatchExecution.scala | 4 +- .../sql/execution/streaming/OffsetSeq.scala | 37 - .../execution/streaming/WatermarkTracker.scala | 90 ++-- .../commits/0 | 2 + .../commits/1 | 2 + .../metadata| 1 + .../offsets/0 | 4 + .../offsets/1 | 4 + .../sql/streaming/EventTimeWatermarkSuite.scala | 136 ++- 10 files changed, 276 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6078b891/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 50965c1..ae56cc9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -875,6 +875,21 @@ object SQLConf { .stringConf .createWithDefault("org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol") + val STREAMING_MULTIPLE_WATERMARK_POLICY = +buildConf("spark.sql.streaming.multipleWatermarkPolicy") + .doc("Policy to calculate the global watermark value when there are multiple watermark " + +"operators in a streaming query. The default value is 'min' which chooses " + +"the minimum watermark reported across multiple operators. Other alternative value is" + +"'max' which chooses the maximum across multiple operators." + +"Note: This configuration cannot be changed between query restarts from the same " + +"checkpoint location.") + .stringConf + .checkValue( +str => Set("min", "max").contains(str.toLowerCase), +"Invalid value for 'spark.sql.streaming.multipleWatermarkPolicy'. " + + "Valid values are 'min' and 'max'") + .createWithDefault("min") // must be same as MultipleWatermarkPolicy.DEFAULT_POLICY_NAME + val OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD = buildConf("spark.sql.objectHashAggregate.sortBased.fallbackThreshold") .internal() http://git-wip-us.apache.org/repos/asf/spark/blob/6078b891/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index 17ffa2a..16651dd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/s
spark git commit: [SPARK-24530][PYTHON] Add a control to force Python version in Sphinx via environment variable, SPHINXPYTHON
Repository: spark Updated Branches: refs/heads/master 6078b891d -> 1f94bf492 [SPARK-24530][PYTHON] Add a control to force Python version in Sphinx via environment variable, SPHINXPYTHON ## What changes were proposed in this pull request? This PR proposes to add `SPHINXPYTHON` environment variable to control the Python version used by Sphinx. The motivation of this environment variable is, it seems not properly rendering some signatures in the Python documentation when Python 2 is used by Sphinx. See the JIRA's case. It should be encouraged to use Python 3, but looks we will probably live with this problem for a long while in any event. For the default case of `make html`, it keeps previous behaviour and use `SPHINXBUILD` as it was. If `SPHINXPYTHON` is set, then it forces Sphinx to use the specific Python version. ``` $ SPHINXPYTHON=python3 make html python3 -msphinx -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` 1. if `SPHINXPYTHON` is set, use Python. If `SPHINXBUILD` is set, use sphinx-build. 2. If both are set, `SPHINXBUILD` has a higher priority over `SPHINXPYTHON` 3. By default, `SPHINXBUILD` is used as 'sphinx-build'. Probably, we can somehow work around this via explicitly setting `SPHINXBUILD` but `sphinx-build` can't be easily distinguished since it (at least in my environment and up to my knowledge) doesn't replace `sphinx-build` when newer Sphinx is installed in different Python version. It confuses and doesn't warn for its Python version. ## How was this patch tested? Manually tested: **`python` (Python 2.7) in the path with Sphinx:** ``` $ make html sphinx-build -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` **`python` (Python 2.7) in the path without Sphinx:** ``` $ make html Makefile:8: *** The 'sphinx-build' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the 'sphinx-build' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/. Stop. ``` **`SPHINXPYTHON` set `python` (Python 2.7) with Sphinx:** ``` $ SPHINXPYTHON=python make html Makefile:35: *** Note that Python 3 is required to generate PySpark documentation correctly for now. Current Python executable was less than Python 3. See SPARK-24530. To force Sphinx to use a specific Python executable, please set SPHINXPYTHON to point to the Python 3 executable.. Stop. ``` **`SPHINXPYTHON` set `python` (Python 2.7) without Sphinx:** ``` $ SPHINXPYTHON=python make html Makefile:35: *** Note that Python 3 is required to generate PySpark documentation correctly for now. Current Python executable was less than Python 3. See SPARK-24530. To force Sphinx to use a specific Python executable, please set SPHINXPYTHON to point to the Python 3 executable.. Stop. ``` **`SPHINXPYTHON` set `python3` with Sphinx:** ``` $ SPHINXPYTHON=python3 make html python3 -msphinx -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` **`SPHINXPYTHON` set `python3` without Sphinx:** ``` $ SPHINXPYTHON=python3 make html Makefile:39: *** Python executable 'python3' did not have Sphinx installed. Make sure you have Sphinx installed, then set the SPHINXPYTHON environment variable to point to the Python executable having Sphinx installed. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/. Stop. ``` **`SPHINXBUILD` set:** ``` $ SPHINXBUILD=sphinx-build make html sphinx-build -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` **Both `SPHINXPYTHON` and `SPHINXBUILD` are set:** ``` $ SPHINXBUILD=sphinx-build SPHINXPYTHON=python make html sphinx-build -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` Author: hyukjinkwon Closes #21659 from HyukjinKwon/SPARK-24530. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1f94bf49 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1f94bf49 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1f94bf49 Branch: refs/heads/master Commit: 1f94bf492c3bce3b61f7fec6132b50e06dea94a8 Parents: 6078b89 Author: hyukjinkwon Authored: Wed Jul 11 10:10:07 2018 +0800 Committer: hyukjinkwon Committed: Wed Jul 11 10:10:07 2018 +0800 -- python/docs/Makefile | 37 +++-- 1 file changed, 31 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1f94bf49/python/docs/Makefile -- diff --git a/python/docs/Makefile b/python/docs/Makefile index b8e0794..1ed1f33 100644 --- a/python/docs/Makefile +++ b/python/docs/Makefile @@ -1,19 +1,44 @@ # Ma
spark git commit: [SPARK-24530][PYTHON] Add a control to force Python version in Sphinx via environment variable, SPHINXPYTHON
Repository: spark Updated Branches: refs/heads/branch-2.3 72eb97ce9 -> 19542f5de [SPARK-24530][PYTHON] Add a control to force Python version in Sphinx via environment variable, SPHINXPYTHON ## What changes were proposed in this pull request? This PR proposes to add `SPHINXPYTHON` environment variable to control the Python version used by Sphinx. The motivation of this environment variable is, it seems not properly rendering some signatures in the Python documentation when Python 2 is used by Sphinx. See the JIRA's case. It should be encouraged to use Python 3, but looks we will probably live with this problem for a long while in any event. For the default case of `make html`, it keeps previous behaviour and use `SPHINXBUILD` as it was. If `SPHINXPYTHON` is set, then it forces Sphinx to use the specific Python version. ``` $ SPHINXPYTHON=python3 make html python3 -msphinx -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` 1. if `SPHINXPYTHON` is set, use Python. If `SPHINXBUILD` is set, use sphinx-build. 2. If both are set, `SPHINXBUILD` has a higher priority over `SPHINXPYTHON` 3. By default, `SPHINXBUILD` is used as 'sphinx-build'. Probably, we can somehow work around this via explicitly setting `SPHINXBUILD` but `sphinx-build` can't be easily distinguished since it (at least in my environment and up to my knowledge) doesn't replace `sphinx-build` when newer Sphinx is installed in different Python version. It confuses and doesn't warn for its Python version. ## How was this patch tested? Manually tested: **`python` (Python 2.7) in the path with Sphinx:** ``` $ make html sphinx-build -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` **`python` (Python 2.7) in the path without Sphinx:** ``` $ make html Makefile:8: *** The 'sphinx-build' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the 'sphinx-build' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/. Stop. ``` **`SPHINXPYTHON` set `python` (Python 2.7) with Sphinx:** ``` $ SPHINXPYTHON=python make html Makefile:35: *** Note that Python 3 is required to generate PySpark documentation correctly for now. Current Python executable was less than Python 3. See SPARK-24530. To force Sphinx to use a specific Python executable, please set SPHINXPYTHON to point to the Python 3 executable.. Stop. ``` **`SPHINXPYTHON` set `python` (Python 2.7) without Sphinx:** ``` $ SPHINXPYTHON=python make html Makefile:35: *** Note that Python 3 is required to generate PySpark documentation correctly for now. Current Python executable was less than Python 3. See SPARK-24530. To force Sphinx to use a specific Python executable, please set SPHINXPYTHON to point to the Python 3 executable.. Stop. ``` **`SPHINXPYTHON` set `python3` with Sphinx:** ``` $ SPHINXPYTHON=python3 make html python3 -msphinx -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` **`SPHINXPYTHON` set `python3` without Sphinx:** ``` $ SPHINXPYTHON=python3 make html Makefile:39: *** Python executable 'python3' did not have Sphinx installed. Make sure you have Sphinx installed, then set the SPHINXPYTHON environment variable to point to the Python executable having Sphinx installed. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/. Stop. ``` **`SPHINXBUILD` set:** ``` $ SPHINXBUILD=sphinx-build make html sphinx-build -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` **Both `SPHINXPYTHON` and `SPHINXBUILD` are set:** ``` $ SPHINXBUILD=sphinx-build SPHINXPYTHON=python make html sphinx-build -b html -d _build/doctrees . _build/html Running Sphinx v1.7.5 ... ``` Author: hyukjinkwon Closes #21659 from HyukjinKwon/SPARK-24530. (cherry picked from commit 1f94bf492c3bce3b61f7fec6132b50e06dea94a8) Signed-off-by: hyukjinkwon Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/19542f5d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/19542f5d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/19542f5d Branch: refs/heads/branch-2.3 Commit: 19542f5de390f6096745e478dd9339958db899e8 Parents: 72eb97c Author: hyukjinkwon Authored: Wed Jul 11 10:10:07 2018 +0800 Committer: hyukjinkwon Committed: Wed Jul 11 10:10:29 2018 +0800 -- python/docs/Makefile | 37 +++-- 1 file changed, 31 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/19542f5d/python/docs/Makefile -- diff --git a/python/docs/Makefile b/python/docs/Makefile i
spark-website git commit: Update release process to use Python 3 for Python API documentation
Repository: spark-website Updated Branches: refs/heads/asf-site 7b3e459e2 -> a6788714a Update release process to use Python 3 for Python API documentation Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/a6788714 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/a6788714 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/a6788714 Branch: refs/heads/asf-site Commit: a6788714a00098a0dffe491c1c4fc090c1087479 Parents: 7b3e459 Author: hyukjinkwon Authored: Tue Jul 3 00:10:35 2018 +0800 Committer: hyukjinkwon Committed: Wed Jul 11 10:19:16 2018 +0800 -- release-process.md| 1 + site/release-process.html | 1 + 2 files changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/a6788714/release-process.md -- diff --git a/release-process.md b/release-process.md index b2ee62b..de04539 100644 --- a/release-process.md +++ b/release-process.md @@ -86,6 +86,7 @@ Instead much of the same release logic can be accessed in `dev/create-release/re - R, for CRAN packaging tests, requires e1071 to be installed as part of the packaging tests. - In addition R uses LaTeX for some things, and requires some additional fonts. On Debian based systems you may wish to install `texlive-fonts-recommended` and `texlive-fonts-extra`. - Make sure you required Python packages for packaging (see `dev/requirements.txt`) +- Ensure you have Python 3 having Sphinx installed, and `SPHINXPYTHON` environment variable is set to indicate your Python 3 executable (see SPARK-24530). - Tag the release candidate with `dev/create-release/release-tag.sh` (e.g. for creating 2.1.2 RC2 we did `ASF_USERNAME=holden ASF_PASSWORD=yoursecretgoeshere GIT_NAME="Holden Karau" GIT_BRANCH=branch-2.1 GIT_EMAIL="hol...@us.ibm.com" RELEASE_VERSION=2.1.2 RELEASE_TAG=v2.1.2-rc2 NEXT_VERSION=2.1.3-SNAPSHOT ./dev/create-release/release-tag.sh`) - Package the release binaries & sources with `dev/create-release/release-build.sh package` - Create the release docs with `dev/create-release/release-build.sh docs` http://git-wip-us.apache.org/repos/asf/spark-website/blob/a6788714/site/release-process.html -- diff --git a/site/release-process.html b/site/release-process.html index b69d1ef..b803a5d 100644 --- a/site/release-process.html +++ b/site/release-process.html @@ -287,6 +287,7 @@ At present the Jenkins jobs SHOULD NOT BE USED as they use a legacy sha R, for CRAN packaging tests, requires e1071 to be installed as part of the packaging tests. In addition R uses LaTeX for some things, and requires some additional fonts. On Debian based systems you may wish to install texlive-fonts-recommended and texlive-fonts-extra. Make sure you required Python packages for packaging (see dev/requirements.txt) + Ensure you have Python 3 having Sphinx installed, and SPHINXPYTHON environment variable is set to indicate your Python 3 executable (see SPARK-24530). Tag the release candidate with dev/create-release/release-tag.sh (e.g. for creating 2.1.2 RC2 we did ASF_USERNAME=holden ASF_PASSWORD=yoursecretgoeshere GIT_NAME="Holden Karau" GIT_BRANCH=branch-2.1 GIT_EMAIL="hol...@us.ibm.com" RELEASE_VERSION=2.1.2 RELEASE_TAG=v2.1.2-rc2 NEXT_VERSION=2.1.3-SNAPSHOT ./dev/create-release/release-tag.sh) Package the release binaries & sources with dev/create-release/release-build.sh package Create the release docs with dev/create-release/release-build.sh docs - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r28048 - in /dev/spark/2.4.0-SNAPSHOT-2018_07_10_20_01-1f94bf4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Jul 11 03:15:48 2018 New Revision: 28048 Log: Apache Spark 2.4.0-SNAPSHOT-2018_07_10_20_01-1f94bf4 docs [This commit notification would consist of 1467 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24165][SQL] Fixing conditional expressions to handle nullability of nested types
Repository: spark Updated Branches: refs/heads/master 1f94bf492 -> 74a8d6308 [SPARK-24165][SQL] Fixing conditional expressions to handle nullability of nested types ## What changes were proposed in this pull request? This PR is proposing a fix for the output data type of ```If``` and ```CaseWhen``` expression. Upon till now, the implementation of exprassions has ignored nullability of nested types from different execution branches and returned the type of the first branch. This could lead to an unwanted ```NullPointerException``` from other expressions depending on a ```If```/```CaseWhen``` expression. Example: ``` val rows = new util.ArrayList[Row]() rows.add(Row(true, ("a", 1))) rows.add(Row(false, (null, 2))) val schema = StructType(Seq( StructField("cond", BooleanType, false), StructField("s", StructType(Seq( StructField("val1", StringType, true), StructField("val2", IntegerType, false) )), false) )) val df = spark.createDataFrame(rows, schema) df .select(when('cond, struct(lit("x").as("val1"), lit(10).as("val2"))).otherwise('s) as "res") .select('res.getField("val1")) .show() ``` Exception: ``` Exception in thread "main" java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.LocalTableScanExec$$anonfun$unsafeRows$1.apply(LocalTableScanExec.scala:44) at org.apache.spark.sql.execution.LocalTableScanExec$$anonfun$unsafeRows$1.apply(LocalTableScanExec.scala:44) ... ``` Output schema: ``` root |-- res.val1: string (nullable = false) ``` ## How was this patch tested? New test cases added into - DataFrameSuite.scala - conditionalExpressions.scala Author: Marek Novotny Closes #21687 from mn-mikke/SPARK-24165. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/74a8d630 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/74a8d630 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/74a8d630 Branch: refs/heads/master Commit: 74a8d6308bfa6e7ed4c64e1175c77eb3114baed5 Parents: 1f94bf4 Author: Marek Novotny Authored: Wed Jul 11 12:21:03 2018 +0800 Committer: Wenchen Fan Committed: Wed Jul 11 12:21:03 2018 +0800 -- .../sql/catalyst/analysis/TypeCoercion.scala| 22 -- .../sql/catalyst/expressions/Expression.scala | 37 ++- .../expressions/conditionalExpressions.scala| 28 .../ConditionalExpressionSuite.scala| 70 .../org/apache/spark/sql/DataFrameSuite.scala | 58 5 files changed, 195 insertions(+), 20 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/74a8d630/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala index 72908c1..e8331c9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala @@ -173,6 +173,18 @@ object TypeCoercion { } /** + * The method finds a common type for data types that differ only in nullable, containsNull + * and valueContainsNull flags. If the input types are too different, None is returned. + */ + def findCommonTypeDifferentOnlyInNullFlags(t1: DataType, t2: DataType): Option[DataType] = { +if (t1 == t2) { + Some(t1) +} else { + findTypeForComplex(t1, t2, findCommonTypeDifferentOnlyInNullFlags) +} + } + + /** * Case 2 type widening (see the classdoc comment above for TypeCoercion). * * i.e. the main difference with [[findTightestCommonType]] is that here we allow some @@ -660,8 +672,8 @@ object TypeCoercion { object CaseWhenCoercion extends TypeCoercionRule { override protected def coerceTypes( plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { - case c: CaseWhen if c.childrenResolved && !c.valueTypesEqual => -val maybeCommonType = findWiderCommonType(c.valueTypes) + case c: CaseWhen if c.childrenResolved && !c.areInputTypesForMergingEqual => +val maybeCommonType = findWiderCommonType(c.inputTypesForMerging) maybeCommonType.map { commonType => var changed = false val newBranches = c.branches.map { case (condition, value) => @@ -693,10 +705,10 @@ object TypeCoercion { plan: LogicalPlan): LogicalPlan = plan transformAllExpress
svn commit: r28049 - in /dev/spark/2.3.3-SNAPSHOT-2018_07_10_22_01-19542f5-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Wed Jul 11 05:16:16 2018 New Revision: 28049 Log: Apache Spark 2.3.3-SNAPSHOT-2018_07_10_22_01-19542f5 docs [This commit notification would consist of 1443 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[1/2] spark git commit: Preparing Spark release v2.3.2-rc2
Repository: spark Updated Branches: refs/heads/branch-2.3 19542f5de -> 86457a16d Preparing Spark release v2.3.2-rc2 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/307499e1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/307499e1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/307499e1 Branch: refs/heads/branch-2.3 Commit: 307499e1a99c6ad3ce0b978626894ea2c1e3807e Parents: 19542f5 Author: Saisai Shao Authored: Wed Jul 11 05:27:02 2018 + Committer: Saisai Shao Committed: Wed Jul 11 05:27:02 2018 + -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/307499e1/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 6ec4966..8df2635 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.3 +Version: 2.3.2 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/307499e1/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index f8b15cc..57485fc 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.2 ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/307499e1/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index e412a47..53e58c2 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.2 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/307499e1/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index d8f9a3d..d05647c 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.3-SNAPSHOT +2.3.2 ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/307499e1/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index a1a4f87..8d46761 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml
[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.3.2-rc2 [created] 307499e1a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: Preparing development version 2.3.3-SNAPSHOT
Preparing development version 2.3.3-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86457a16 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86457a16 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86457a16 Branch: refs/heads/branch-2.3 Commit: 86457a16de20eb126d0569d12f51b2e427fd03c3 Parents: 307499e Author: Saisai Shao Authored: Wed Jul 11 05:27:12 2018 + Committer: Saisai Shao Committed: Wed Jul 11 05:27:12 2018 + -- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml| 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/86457a16/R/pkg/DESCRIPTION -- diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 8df2635..6ec4966 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 2.3.2 +Version: 2.3.3 Title: R Frontend for Apache Spark Description: Provides an R Frontend for Apache Spark. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), http://git-wip-us.apache.org/repos/asf/spark/blob/86457a16/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 57485fc..f8b15cc 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.3.2 +2.3.3-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/86457a16/common/kvstore/pom.xml -- diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index 53e58c2..e412a47 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2 +2.3.3-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/86457a16/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index d05647c..d8f9a3d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3.2 +2.3.3-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/86457a16/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 8d46761..a1a4f87 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.3
spark git commit: [SPARK-23529][K8S] Support mounting volumes
Repository: spark Updated Branches: refs/heads/master 74a8d6308 -> 5ff1b9ba1 [SPARK-23529][K8S] Support mounting volumes This PR continues #21095 and intersects with #21238. I've added volume mounts as a separate step and added PersistantVolumeClaim support. There is a fundamental problem with how we pass the options through spark conf to fabric8. For each volume type and all possible volume options we would have to implement some custom code to map config values to fabric8 calls. This will result in big body of code we would have to support and means that Spark will always be somehow out of sync with k8s. I think there needs to be a discussion on how to proceed correctly (eg use PodPreset instead) Due to the complications of provisioning and managing actual resources this PR addresses only volume mounting of already present resources. - [x] emptyDir support - [x] Testing - [x] Documentation - [x] KubernetesVolumeUtils tests Author: Andrew Korzhuev Author: madanadit Closes #21260 from andrusha/k8s-vol. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5ff1b9ba Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5ff1b9ba Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5ff1b9ba Branch: refs/heads/master Commit: 5ff1b9ba1983d5601add62aef64a3e87d07050eb Parents: 74a8d63 Author: Andrew Korzhuev Authored: Tue Jul 10 22:53:44 2018 -0700 Committer: Felix Cheung Committed: Tue Jul 10 22:53:44 2018 -0700 -- docs/running-on-kubernetes.md | 48 +++ .../org/apache/spark/deploy/k8s/Config.scala| 12 ++ .../spark/deploy/k8s/KubernetesConf.scala | 11 ++ .../spark/deploy/k8s/KubernetesUtils.scala | 2 - .../spark/deploy/k8s/KubernetesVolumeSpec.scala | 38 + .../deploy/k8s/KubernetesVolumeUtils.scala | 110 ++ .../k8s/features/BasicDriverFeatureStep.scala | 5 +- .../k8s/features/BasicExecutorFeatureStep.scala | 5 +- .../k8s/features/MountVolumesFeatureStep.scala | 79 ++ .../k8s/submit/KubernetesDriverBuilder.scala| 31 ++-- .../cluster/k8s/KubernetesExecutorBuilder.scala | 38 ++--- .../deploy/k8s/KubernetesVolumeUtilsSuite.scala | 106 ++ .../features/BasicDriverFeatureStepSuite.scala | 23 +-- .../BasicExecutorFeatureStepSuite.scala | 3 + ...rKubernetesCredentialsFeatureStepSuite.scala | 3 + .../DriverServiceFeatureStepSuite.scala | 6 + .../features/EnvSecretsFeatureStepSuite.scala | 1 + .../features/LocalDirsFeatureStepSuite.scala| 3 +- .../features/MountSecretsFeatureStepSuite.scala | 1 + .../features/MountVolumesFeatureStepSuite.scala | 144 +++ .../bindings/JavaDriverFeatureStepSuite.scala | 1 + .../bindings/PythonDriverFeatureStepSuite.scala | 2 + .../spark/deploy/k8s/submit/ClientSuite.scala | 1 + .../submit/KubernetesDriverBuilderSuite.scala | 45 +- .../k8s/KubernetesExecutorBuilderSuite.scala| 38 - 25 files changed, 705 insertions(+), 51 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5ff1b9ba/docs/running-on-kubernetes.md -- diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 408e446..7149616 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -629,6 +629,54 @@ specific to Spark on Kubernetes. Add as an environment variable to the executor container with name EnvName (case sensitive), the value referenced by key key in the data of the referenced https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables";>Kubernetes Secret. For example, spark.kubernetes.executor.secrets.ENV_VAR=spark-secret:key. + + + spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.path + (none) + + Add the https://kubernetes.io/docs/concepts/storage/volumes/";>Kubernetes Volume named VolumeName of the VolumeType type to the driver pod on the path specified in the value. For example, + spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.mount.path=/checkpoint. + + + + spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].mount.readOnly + (none) + + Specify if the mounted volume is read only or not. For example, + spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.mount.readOnly=false. + + + + spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].options.[OptionName] + (none) + + Configure https://kubernetes.io/docs/concepts/storage/volumes/";>Kubernetes Volume options passed to the Kubernetes with OptionName as key having specified value, must conform with Kubernetes option format. For example, + spark.kubernetes.
spark git commit: [SPARK-23461][R] vignettes should include model predictions for some ML models
Repository: spark Updated Branches: refs/heads/master 5ff1b9ba1 -> 006e798e4 [SPARK-23461][R] vignettes should include model predictions for some ML models ## What changes were proposed in this pull request? Add model predictions for Linear Support Vector Machine (SVM) Classifier, Logistic Regression, GBT, RF and DecisionTree in vignettes. ## How was this patch tested? Manually ran the test and checked the result. Author: Huaxin Gao Closes #21678 from huaxingao/spark-23461. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/006e798e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/006e798e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/006e798e Branch: refs/heads/master Commit: 006e798e477b6871ad3ba4417d354d23f45e4013 Parents: 5ff1b9b Author: Huaxin Gao Authored: Tue Jul 10 23:18:07 2018 -0700 Committer: Felix Cheung Committed: Tue Jul 10 23:18:07 2018 -0700 -- R/pkg/vignettes/sparkr-vignettes.Rmd | 5 + 1 file changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/006e798e/R/pkg/vignettes/sparkr-vignettes.Rmd -- diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd index d4713de..68a18ab 100644 --- a/R/pkg/vignettes/sparkr-vignettes.Rmd +++ b/R/pkg/vignettes/sparkr-vignettes.Rmd @@ -590,6 +590,7 @@ summary(model) Predict values on training data ```{r} prediction <- predict(model, training) +head(select(prediction, "Class", "Sex", "Age", "Freq", "Survived", "prediction")) ``` Logistic Regression @@ -613,6 +614,7 @@ summary(model) Predict values on training data ```{r} fitted <- predict(model, training) +head(select(fitted, "Class", "Sex", "Age", "Freq", "Survived", "prediction")) ``` Multinomial logistic regression against three classes @@ -807,6 +809,7 @@ df <- createDataFrame(t) dtModel <- spark.decisionTree(df, Survived ~ ., type = "classification", maxDepth = 2) summary(dtModel) predictions <- predict(dtModel, df) +head(select(predictions, "Class", "Sex", "Age", "Freq", "Survived", "prediction")) ``` Gradient-Boosted Trees @@ -822,6 +825,7 @@ df <- createDataFrame(t) gbtModel <- spark.gbt(df, Survived ~ ., type = "classification", maxDepth = 2, maxIter = 2) summary(gbtModel) predictions <- predict(gbtModel, df) +head(select(predictions, "Class", "Sex", "Age", "Freq", "Survived", "prediction")) ``` Random Forest @@ -837,6 +841,7 @@ df <- createDataFrame(t) rfModel <- spark.randomForest(df, Survived ~ ., type = "classification", maxDepth = 2, numTrees = 2) summary(rfModel) predictions <- predict(rfModel, df) +head(select(predictions, "Class", "Sex", "Age", "Freq", "Survived", "prediction")) ``` Bisecting k-Means - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org