date:20190820

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315812795
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -2851,6 +2851,9 @@ object SparkContext extends Logging {
   memoryPerSlaveInt, sc.executorMemory))
 }
 
+// for local cluster mode the SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED 
defaults to false
 
 Review comment:
   The comment just repeats the code. The comment should say why this is needed 
/ wanted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315833901
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -358,12 +382,65 @@ final class ShuffleBlockFetcherIterator(
 }
   }
 
+
+  /**
+   * Fetch the host-local blocks while we are fetching remote blocks. This is 
ok because
+   * `ManagedBuffer`'s memory is allocated lazily when we create the input 
stream, so all we
+   * track in-memory are the ManagedBuffer references themselves.
+   */
+  private[this] def fetchHostLocalBlocks() {
+logDebug(s"Start fetching host-local blocks: ${hostLocalBlocks.mkString(", 
")}")
+val hostLocalExecutorIds = 
hostLocalBlocksByExecutor.keySet.map(_.executorId)
+val readsWithoutLocalDir = LinkedHashMap[BlockManagerId, Seq[(BlockId, 
Long)]]()
+val localDirsByExec = 
blockManager.getHostLocalDirs(hostLocalExecutorIds.toArray)
+hostLocalBlocksByExecutor.foreach { case (bmId, blockInfos) =>
+  val localDirs = localDirsByExec.get(bmId.executorId)
+  if (localDirs.isDefined) {
+blockInfos.foreach {  case (blockId, _) =>
+  try {
+val buf = blockManager
+  .getHostLocalShuffleData(blockId.asInstanceOf[ShuffleBlockId], 
localDirs.get)
+shuffleMetrics.incLocalBlocksFetched(1)
+shuffleMetrics.incLocalBytesRead(buf.size)
+buf.retain()
+results.put(SuccessFetchResult(blockId, 
blockManager.blockManagerId,
+  buf.size(), buf, isNetworkReqDone = false))
+  } catch {
+case e: Exception =>
+  // If we see an exception, stop immediately.
+  logError(s"Error occurred while fetching local blocks", e)
+  results.put(FailureFetchResult(blockId, 
blockManager.blockManagerId, e))
+  return
+  }
+}
+  } else {
+readsWithoutLocalDir += bmId -> blockInfos
+  }
+}
+
+if (readsWithoutLocalDir.nonEmpty) {
+  val collectedRemoteRequests = new ArrayBuffer[FetchRequest]
+  readsWithoutLocalDir.foreach( { case (bmId, blockInfos) =>
 
 Review comment:
   `.foreach { case (bmId, blockInfos) =>`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315826558
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##
 @@ -995,6 +1003,19 @@ private[spark] class BlockManager(
 None
   }
 
+  private[spark] def getHostLocalDirs(executorIds: Array[String])
+  : scala.collection.Map[String, Array[String]] = {
+val cachedItems = 
executorIdToLocalDirsCache.filterKeys(executorIds.contains(_))
+if (cachedItems.size < executorIds.length) {
+  val notCachedItems = master
+
.getHostLocalDirs(executorIds.filterNot(executorIdToLocalDirsCache.contains))
+  executorIdToLocalDirsCache ++= notCachedItems
 
 Review comment:
   Isn't this going to grow potentially unbounded?
   
   You added a cache to `BlockManagerMasterEndpoint` but that's something that 
only exists in the driver. So it doesn't look like there's anything limiting 
the number of entries cached in executors, at least directly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315829047
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -74,16 +76,22 @@ final class ShuffleBlockFetcherIterator(
 maxReqSizeShuffleToMem: Long,
 detectCorrupt: Boolean,
 detectCorruptUseExtraMemory: Boolean,
-shuffleMetrics: ShuffleReadMetricsReporter)
+shuffleMetrics: ShuffleReadMetricsReporter,
+enableHostLocalDiskReading: Boolean = true)
   extends Iterator[(BlockId, InputStream)] with DownloadFileManager with 
Logging {
 
   import ShuffleBlockFetcherIterator._
 
+  // Make remote requests at most maxBytesInFlight / 5 in length; the reason 
to keep them
+  // smaller than maxBytesInFlight is to allow multiple, parallel fetches from 
up to 5
+  // nodes, rather than blocking on reading output from one node.
+  val targetRemoteRequestSize = math.max(maxBytesInFlight / 5, 1L)
 
 Review comment:
   `private`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315833073
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
 ##
 @@ -358,12 +382,65 @@ final class ShuffleBlockFetcherIterator(
 }
   }
 
+
+  /**
+   * Fetch the host-local blocks while we are fetching remote blocks. This is 
ok because
+   * `ManagedBuffer`'s memory is allocated lazily when we create the input 
stream, so all we
+   * track in-memory are the ManagedBuffer references themselves.
+   */
+  private[this] def fetchHostLocalBlocks() {
 
 Review comment:
   missing return type


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315831188
 
 

 ##
 File path: project/MimaExcludes.scala
 ##
 @@ -36,6 +36,9 @@ object MimaExcludes {
 
   // Exclude rules for 3.0.x
   lazy val v30excludes = v24excludes ++ Seq(
+// [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched 
from the same host
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.LiveEntityHelpers.createMetrics"),
 
 Review comment:
   Is this still needed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315815432
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##
 @@ -995,6 +1003,19 @@ private[spark] class BlockManager(
 None
   }
 
+  private[spark] def getHostLocalDirs(executorIds: Array[String])
+  : scala.collection.Map[String, Array[String]] = {
 
 Review comment:
   Just `Map`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-20 Thread GitBox

vanzin commented on a change in pull request #25299: [SPARK-27651][Core] Avoid 
the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r315851620
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala
 ##
 @@ -96,6 +99,39 @@ class ExternalShuffleServiceSuite extends ShuffleSuite with 
BeforeAndAfterAll wi
 e.getMessage should include ("Fetch failure will not retry stage due to 
testing config")
   }
 
+  test("SPARK-27651: host local disk reading avoids external shuffle service 
on the same node") {
+val confWithHostLocalRead =
+  conf.clone.set(config.SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED, true)
+sc = new SparkContext("local-cluster[2,1,1024]", "test", 
confWithHostLocalRead)
+sc.getConf.get(config.SHUFFLE_HOST_LOCAL_DISK_READING_ENABLED) should 
equal(true)
+sc.env.blockManager.externalShuffleServiceEnabled should equal(true)
+sc.env.blockManager.externalShuffleServiceEnabled should equal(true)
 
 Review comment:
   Duplicate


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25514: [SPARK-28784]StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25514: [SPARK-28784]StreamExecution 
and StreamingQueryManager should utilize CheckpointFileManager to interact with 
checkpoint directories
URL: https://github.com/apache/spark/pull/25514#issuecomment-523150908
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dlstadther commented on issue #25513: [MINOR] [DOC] Fix Dockerfile path reference

2019-08-20 Thread GitBox

dlstadther commented on issue #25513: [MINOR] [DOC] Fix Dockerfile path 
reference
URL: https://github.com/apache/spark/pull/25513#issuecomment-523152020
 
 
   @dongjoon-hyun what `Apache Spark 2.4.3` directory are you referring to? The 
source code download from Apache's website includes the same directory 
structure (includes `resource-managers`, but no `kubernetes` directories).
   
   BUT the `running-on-kubernetes.md` file contained in that release does not 
include these two lines.
   
   Thus, it still appears the version in `master` is not up-to-date.
   
   Could you please direct me to where you see this `kubernetes` dir in 2.4.3? 
Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API

2019-08-20 Thread GitBox

SparkQA commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API
URL: https://github.com/apache/spark/pull/25354#issuecomment-523151807
 
 
   **[Test build #109429 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109429/testReport)**
 for PR 25354 at commit 
[`17fe1fd`](https://github.com/apache/spark/commit/17fe1fd4439adf821d0ae0fa5ac8b0387fb0a32a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25514: [SPARK-28784]StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread GitBox

SparkQA commented on issue #25514: [SPARK-28784]StreamExecution and 
StreamingQueryManager should utilize CheckpointFileManager to interact with 
checkpoint directories
URL: https://github.com/apache/spark/pull/25514#issuecomment-523151801
 
 
   **[Test build #109428 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109428/testReport)**
 for PR 25514 at commit 
[`97bcf5b`](https://github.com/apache/spark/commit/97bcf5b810c4c312b9d8aedd90b0d8abd492c27c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25354: [SPARK-28612][SQL] Add 
DataFrameWriterV2 API
URL: https://github.com/apache/spark/pull/25354#issuecomment-523151011
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25354: [SPARK-28612][SQL] Add 
DataFrameWriterV2 API
URL: https://github.com/apache/spark/pull/25354#issuecomment-523151020
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14492/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25514: [SPARK-28784]StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25514: [SPARK-28784]StreamExecution 
and StreamingQueryManager should utilize CheckpointFileManager to interact with 
checkpoint directories
URL: https://github.com/apache/spark/pull/25514#issuecomment-523150670
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25354: [SPARK-28612][SQL] Add 
DataFrameWriterV2 API
URL: https://github.com/apache/spark/pull/25354#issuecomment-523151020
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14492/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25514: [SPARK-28784]StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25514: [SPARK-28784]StreamExecution and 
StreamingQueryManager should utilize CheckpointFileManager to interact with 
checkpoint directories
URL: https://github.com/apache/spark/pull/25514#issuecomment-523150908
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25354: [SPARK-28612][SQL] Add 
DataFrameWriterV2 API
URL: https://github.com/apache/spark/pull/25354#issuecomment-523151011
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25514: [SPARK-28784]StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25514: [SPARK-28784]StreamExecution and 
StreamingQueryManager should utilize CheckpointFileManager to interact with 
checkpoint directories
URL: https://github.com/apache/spark/pull/25514#issuecomment-523150670
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shrutig opened a new pull request #25514: [SPARK-28784]StreamExecution and StreamingQueryManager should utilize CheckpointFileManager to interact with checkpoint directories

2019-08-20 Thread GitBox

shrutig opened a new pull request #25514: [SPARK-28784]StreamExecution and 
StreamingQueryManager should utilize CheckpointFileManager to interact with 
checkpoint directories
URL: https://github.com/apache/spark/pull/25514
 
 
   ### What changes were proposed in this pull request?
   After PR https://github.com/apache/spark/pull/21048, the 
CheckpointFileManager interface was created to handle all structured streaming 
checkpointing operations and helps users to choose how they wish to write 
checkpointing files atomically.
   StreamExecution and StreamingQueryManager still uses some FileSystem 
operations without using the CheckpointFileManager.
   For instance,
   
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L137
   
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L392
   
   Instead, StreamExecution and StreamingQueryManager should use 
CheckpointFileManager for these operations.
   
   ### Why are the changes needed?
   This change will allow users to use CheckpointFileManager for structured 
streaming checkpointing files without need for a separate FileSystem 
implementation for the same.
   
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   Existing tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] brkyvz commented on a change in pull request #25348: [SPARK-28554][SQL] Adds a v1 fallback writer implementation for v2 data source codepaths

2019-08-20 Thread GitBox

brkyvz commented on a change in pull request #25348: [SPARK-28554][SQL] Adds a 
v1 fallback writer implementation for v2 data source codepaths
URL: https://github.com/apache/spark/pull/25348#discussion_r315851494
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V1FallbackWriters.scala
 ##
 @@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.SparkException
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{Dataset, SaveMode}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.sources.{AlwaysTrue, CreatableRelationProvider, 
Filter, InsertableRelation}
+import org.apache.spark.sql.sources.v2.Table
+import org.apache.spark.sql.sources.v2.writer._
+import org.apache.spark.sql.util.CaseInsensitiveStringMap
+
+/**
+ * Physical plan node for append into a v2 table using V1 write interfaces.
+ *
+ * Rows in the output data set are appended.
+ */
+case class AppendDataExecV1(
+writeBuilder: V1WriteBuilder,
+plan: LogicalPlan) extends V1FallbackWriters {
 
 Review comment:
   I think that'd be great to have when looking at explain plans


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming

2019-08-20 Thread GitBox

zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add 
support for Kafka headers in Structured Streaming
URL: https://github.com/apache/spark/pull/22282#discussion_r315845844
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
 ##
 @@ -433,4 +441,42 @@ private[kafka010] object KafkaOffsetReader {
 StructField("timestamp", TimestampType),
 StructField("timestampType", IntegerType)
   ))
+
+  val schemaWithHeaders = {
+new StructType(schemaWithoutHeaders.fields :+ StructField("headers", 
headersType))
+  }
+
+  def kafkaSchema(includeHeaders: Boolean): StructType = {
+if (includeHeaders) schemaWithHeaders else schemaWithoutHeaders
+  }
+
+  def toInternalRowWithoutHeaders: Record => InternalRow =
+(cr: Record) => InternalRow(
+  cr.key, cr.value, UTF8String.fromString(cr.topic), cr.partition, 
cr.offset,
+  DateTimeUtils.fromJavaTimestamp(new java.sql.Timestamp(cr.timestamp)), 
cr.timestampType.id
+)
+
+  def toInternalRowWithHeaders: Record => InternalRow =
+(cr: Record) => InternalRow(
+  cr.key, cr.value, UTF8String.fromString(cr.topic), cr.partition, 
cr.offset,
+  DateTimeUtils.fromJavaTimestamp(new java.sql.Timestamp(cr.timestamp)), 
cr.timestampType.id,
+  if (cr.headers.iterator().hasNext) {
+new GenericArrayData(cr.headers.iterator().asScala
+  .map(header =>
+InternalRow(UTF8String.fromString(header.key()), header.value())
+  ).toArray)
+  } else {
+null
+  }
+)
+
+  def toUnsafeRowWithoutHeadersProjector: Record => UnsafeRow =
+(cr: Record) => 
UnsafeProjection.create(schemaWithoutHeaders)(toInternalRowWithoutHeaders(cr))
 
 Review comment:
   This will create a `UnsafeProjection` for each record.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming

2019-08-20 Thread GitBox

zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add 
support for Kafka headers in Structured Streaming
URL: https://github.com/apache/spark/pull/22282#discussion_r312267713
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
 ##
 @@ -432,4 +441,42 @@ private[kafka010] object KafkaOffsetReader {
 StructField("timestamp", TimestampType),
 StructField("timestampType", IntegerType)
   ))
+
+  val schemaWithHeaders = {
+new StructType(schemaWithoutHeaders.fields :+ StructField("headers", 
headersType))
+  }
+
+  def kafkaSchema(includeHeaders: Boolean): StructType = {
+if (includeHeaders) schemaWithHeaders else schemaWithoutHeaders
+  }
+
+  def toInternalRowWithoutHeaders: Record => InternalRow =
 
 Review comment:
   This should be a `val`. Otherwise, it will be called once for each record in 
the closure returned by `toUnsafeRowWithoutHeadersProjector`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming

2019-08-20 Thread GitBox

zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add 
support for Kafka headers in Structured Streaming
URL: https://github.com/apache/spark/pull/22282#discussion_r312267723
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
 ##
 @@ -432,4 +441,42 @@ private[kafka010] object KafkaOffsetReader {
 StructField("timestamp", TimestampType),
 StructField("timestampType", IntegerType)
   ))
+
+  val schemaWithHeaders = {
+new StructType(schemaWithoutHeaders.fields :+ StructField("headers", 
headersType))
+  }
+
+  def kafkaSchema(includeHeaders: Boolean): StructType = {
+if (includeHeaders) schemaWithHeaders else schemaWithoutHeaders
+  }
+
+  def toInternalRowWithoutHeaders: Record => InternalRow =
+(cr: Record) => InternalRow(
+  cr.key, cr.value, UTF8String.fromString(cr.topic), cr.partition, 
cr.offset,
+  DateTimeUtils.fromJavaTimestamp(new java.sql.Timestamp(cr.timestamp)), 
cr.timestampType.id
+)
+
+  def toInternalRowWithHeaders: Record => InternalRow =
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming

2019-08-20 Thread GitBox

zsxwing commented on a change in pull request #22282: [SPARK-23539][SS] Add 
support for Kafka headers in Structured Streaming
URL: https://github.com/apache/spark/pull/22282#discussion_r315845897
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
 ##
 @@ -433,4 +441,42 @@ private[kafka010] object KafkaOffsetReader {
 StructField("timestamp", TimestampType),
 StructField("timestampType", IntegerType)
   ))
+
+  val schemaWithHeaders = {
+new StructType(schemaWithoutHeaders.fields :+ StructField("headers", 
headersType))
+  }
+
+  def kafkaSchema(includeHeaders: Boolean): StructType = {
+if (includeHeaders) schemaWithHeaders else schemaWithoutHeaders
+  }
+
+  def toInternalRowWithoutHeaders: Record => InternalRow =
+(cr: Record) => InternalRow(
+  cr.key, cr.value, UTF8String.fromString(cr.topic), cr.partition, 
cr.offset,
+  DateTimeUtils.fromJavaTimestamp(new java.sql.Timestamp(cr.timestamp)), 
cr.timestampType.id
+)
+
+  def toInternalRowWithHeaders: Record => InternalRow =
+(cr: Record) => InternalRow(
+  cr.key, cr.value, UTF8String.fromString(cr.topic), cr.partition, 
cr.offset,
+  DateTimeUtils.fromJavaTimestamp(new java.sql.Timestamp(cr.timestamp)), 
cr.timestampType.id,
+  if (cr.headers.iterator().hasNext) {
+new GenericArrayData(cr.headers.iterator().asScala
+  .map(header =>
+InternalRow(UTF8String.fromString(header.key()), header.value())
+  ).toArray)
+  } else {
+null
+  }
+)
+
+  def toUnsafeRowWithoutHeadersProjector: Record => UnsafeRow =
+(cr: Record) => 
UnsafeProjection.create(schemaWithoutHeaders)(toInternalRowWithoutHeaders(cr))
+
+  def toUnsafeRowWithHeadersProjector: Record => UnsafeRow =
+(cr: Record) => 
UnsafeProjection.create(schemaWithHeaders)(toInternalRowWithHeaders(cr))
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #25513: [MINOR] [DOC] Fix Dockerfile path reference

2019-08-20 Thread GitBox

srowen closed pull request #25513: [MINOR] [DOC] Fix Dockerfile path reference
URL: https://github.com/apache/spark/pull/25513
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #25513: [MINOR] [DOC] Fix Dockerfile path reference

2019-08-20 Thread GitBox

srowen commented on issue #25513: [MINOR] [DOC] Fix Dockerfile path reference
URL: https://github.com/apache/spark/pull/25513#issuecomment-523144518
 
 
   Oh, right, sorry I made the same mistake!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24785: [SPARK-27937][CORE] Revert partial logic for auto namespace discovery

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24785: [SPARK-27937][CORE] Revert 
partial logic for auto namespace discovery
URL: https://github.com/apache/spark/pull/24785#issuecomment-523143751
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109420/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24785: [SPARK-27937][CORE] Revert partial logic for auto namespace discovery

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24785: [SPARK-27937][CORE] Revert 
partial logic for auto namespace discovery
URL: https://github.com/apache/spark/pull/24785#issuecomment-523143747
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24785: [SPARK-27937][CORE] Revert partial logic for auto namespace discovery

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24785: [SPARK-27937][CORE] Revert partial 
logic for auto namespace discovery
URL: https://github.com/apache/spark/pull/24785#issuecomment-523143751
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109420/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24785: [SPARK-27937][CORE] Revert partial logic for auto namespace discovery

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24785: [SPARK-27937][CORE] Revert partial 
logic for auto namespace discovery
URL: https://github.com/apache/spark/pull/24785#issuecomment-523143747
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24785: [SPARK-27937][CORE] Revert partial logic for auto namespace discovery

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #24785: [SPARK-27937][CORE] Revert partial 
logic for auto namespace discovery
URL: https://github.com/apache/spark/pull/24785#issuecomment-523077607
 
 
   **[Test build #109420 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109420/testReport)**
 for PR 24785 at commit 
[`25240e5`](https://github.com/apache/spark/commit/25240e5c9d9a5a6825e4bbfac53267e75df8f52f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24785: [SPARK-27937][CORE] Revert partial logic for auto namespace discovery

2019-08-20 Thread GitBox

SparkQA commented on issue #24785: [SPARK-27937][CORE] Revert partial logic for 
auto namespace discovery
URL: https://github.com/apache/spark/pull/24785#issuecomment-523142938
 
 
   **[Test build #109420 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109420/testReport)**
 for PR 24785 at commit 
[`25240e5`](https://github.com/apache/spark/commit/25240e5c9d9a5a6825e4bbfac53267e75df8f52f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523132993
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14491/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523132989
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523132993
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14491/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523132989
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523131216
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109427/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin commented on a change in pull request #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

2019-08-20 Thread GitBox

ueshin commented on a change in pull request #24232: [SPARK-27297] [SQL] Add 
higher order functions to scala API
URL: https://github.com/apache/spark/pull/24232#discussion_r315829924
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
 ##
 @@ -1917,19 +1921,33 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSparkSession {
   null
 ).toDF("i")
 
+// transform(i, x -> x + 1)
+val resA = Seq(
+  Row(Seq(2, 10, 9, 8)),
+  Row(Seq(6, 9, 10, 8, 3)),
+  Row(Seq.empty),
+  Row(null))
+
+// transform(i, (x, i) -> x + i)
+val resB = Seq(
+  Row(Seq(1, 10, 10, 10)),
+  Row(Seq(5, 9, 11, 10, 6)),
+  Row(Seq.empty),
+  Row(null))
+
 def testArrayOfPrimitiveTypeNotContainsNull(): Unit = {
-  checkAnswer(df.selectExpr("transform(i, x -> x + 1)"),
-Seq(
-  Row(Seq(2, 10, 9, 8)),
-  Row(Seq(6, 9, 10, 8, 3)),
-  Row(Seq.empty),
-  Row(null)))
-  checkAnswer(df.selectExpr("transform(i, (x, i) -> x + i)"),
-Seq(
-  Row(Seq(1, 10, 10, 10)),
-  Row(Seq(5, 9, 11, 10, 6)),
-  Row(Seq.empty),
-  Row(null)))
+  checkAnswer(df.selectExpr("transform(i, x -> x + 1)"), resA)
+  checkAnswer(df.selectExpr("transform(i, (x, i) -> x + i)"), resB)
+
+  checkAnswer(df.select(transform(col("i"), x => x + 1)), resA)
+  checkAnswer(df.select(transform(col("i"), (x, i) => x + i)), resB)
+
+  checkAnswer(df.select(transform(col("i"), new JFunc {
+def call(x: Column) = x + 1
+  })), resA)
+  checkAnswer(df.select(transform(col("i"), new JFunc2 {
+def call(x: Column, i: Column) = x + i
+  })), resB)
 
 Review comment:
   Could you move Java tests to `JavaDataFrameSuite` or create 
`JavaDataFrameFunctionsSuite`?
   
   Seems like Java compiler claims `error: reference to transform is ambiguous` 
because there are two overloaded functions `def transform(column: Column, f: 
Column => Column): Column` and `def transform(column: Column, f: 
JavaFunction[Column, Column])`, and the same for the others.
   
   I guess Java compiler can handle `Column => Column` so we don't need the 
overloaded function with `JavaFunction[Column, Column]`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523131203
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523130550
 
 
   **[Test build #109427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109427/testReport)**
 for PR 25507 at commit 
[`6f02ddc`](https://github.com/apache/spark/commit/6f02ddc718a4ec5513d482237e0a1f694796e379).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523131203
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523131216
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109427/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

SparkQA commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523131193
 
 
   **[Test build #109427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109427/testReport)**
 for PR 25507 at commit 
[`6f02ddc`](https://github.com/apache/spark/commit/6f02ddc718a4ec5513d482237e0a1f694796e379).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class RpcAbortException(message: String) extends Exception(message)`
 * `class CatalogManager(conf: SQLConf) extends Logging `
 * `case class ReuseAdaptiveSubquery(`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

SparkQA commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523130550
 
 
   **[Test build #109427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109427/testReport)**
 for PR 25507 at commit 
[`6f02ddc`](https://github.com/apache/spark/commit/6f02ddc718a4ec5513d482237e0a1f694796e379).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523129733
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109418/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523129728
 
 
   Build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523129733
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109418/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523129728
 
 
   Build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #25507: [SPARK-28667][SQL] Support 
InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523077563
 
 
   **[Test build #109418 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109418/testReport)**
 for PR 25507 at commit 
[`a911ee3`](https://github.com/apache/spark/commit/a911ee3e609c0888fc4a7e96dd8aa9cb0d649ca5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-20 Thread GitBox

SparkQA commented on issue #25507: [SPARK-28667][SQL] Support InsertInto 
through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#issuecomment-523129172
 
 
   **[Test build #109418 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109418/testReport)**
 for PR 25507 at commit 
[`a911ee3`](https://github.com/apache/spark/commit/a911ee3e609c0888fc4a7e96dd8aa9cb0d649ca5).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source 
tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523123057
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109415/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source 
tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523123049
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523123057
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109415/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523123049
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #24715: [SPARK-25474][SQL] Data source 
tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523031483
 
 
   **[Test build #109415 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109415/testReport)**
 for PR 24715 at commit 
[`cc32b48`](https://github.com/apache/spark/commit/cc32b486a02abe0f1f077cf15765006906071dae).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

SparkQA commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523122339
 
 
   **[Test build #109415 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109415/testReport)**
 for PR 24715 at commit 
[`cc32b48`](https://github.com/apache/spark/commit/cc32b486a02abe0f1f077cf15765006906071dae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge 
the two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#discussion_r315820103
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -38,8 +37,7 @@ class DataSourceV2SQLSuite extends QueryTest with 
SharedSparkSession with Before
 
   import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
 
-  private val orc2 = classOf[OrcDataSourceV2].getName
-  private val v2Source = classOf[FakeV2Provider].getName
+  private val v2Format = classOf[FakeV2Provider].getName
 
 Review comment:
   Why rename `v2Source` to `v2Format`? That introduces several unnecessary 
changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge 
the two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#discussion_r315819485
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -347,9 +346,16 @@ class DataSourceV2SQLSuite extends QueryTest with 
SharedSparkSession with Before
 
   test("ReplaceTableAsSelect: CREATE OR REPLACE new table has same behavior as 
CTAS.") {
 Seq("testcat", "testcat_atomic").foreach { catalogName =>
-  spark.sql(s"CREATE TABLE $catalogName.created USING $orc2 AS SELECT id, 
data FROM source")
   spark.sql(
-s"CREATE OR REPLACE TABLE $catalogName.replaced USING $orc2 AS SELECT 
id, data FROM source")
+s"""
+   |CREATE TABLE $catalogName.created USING $v2Format
+   |AS SELECT id, data FROM source
+ """.stripMargin)
+  spark.sql(
+s"""
+   |CREATE TABLE $catalogName.replaced USING $v2Format
 
 Review comment:
   This changed the SQL. The original query was `CREATE OR REPLACE`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge 
the two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#discussion_r315816400
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
 ##
 @@ -251,64 +251,62 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
 
 assertNotBucketed("save")
 
-val session = df.sparkSession
-val cls = DataSource.lookupDataSource(source, session.sessionState.conf)
-val canUseV2 = canUseV2Source(session, cls) && partitioningColumns.isEmpty
-
-// In Data Source V2 project, partitioning is still under development.
-// Here we fallback to V1 if partitioning columns are specified.
-// TODO(SPARK-26778): use V2 implementations when partitioning feature is 
supported.
-if (canUseV2) {
-  val provider = 
cls.getConstructor().newInstance().asInstanceOf[TableProvider]
-  val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
-provider, session.sessionState.conf)
-  val options = sessionOptions ++ extraOptions
-  val dsOptions = new CaseInsensitiveStringMap(options.asJava)
-
-  import 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
-  provider.getTable(dsOptions) match {
-// TODO (SPARK-27815): To not break existing tests, here we treat file 
source as a special
-// case, and pass the save mode to file source directly. This hack 
should be removed.
-case table: FileTable =>
-  val write = 
table.newWriteBuilder(dsOptions).asInstanceOf[FileWriteBuilder]
-.mode(modeForDSV1) // should not change default mode for file 
source.
-.withQueryId(UUID.randomUUID().toString)
-.withInputDataSchema(df.logicalPlan.schema)
-.buildForBatch()
-  // The returned `Write` can be null, which indicates that we can 
skip writing.
-  if (write != null) {
+lookupV2Provider() match {
+  // TODO(SPARK-26778): use V2 implementations when partition columns are 
specified
+  case Some(provider) if partitioningColumns.isEmpty =>
+saveToV2Source(provider)
+
+  case _ => saveToV1Source()
+}
+  }
+
+  private def saveToV2Source(provider: TableProvider): Unit = {
+val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+  provider, df.sparkSession.sessionState.conf)
+val options = sessionOptions ++ extraOptions
+val dsOptions = new CaseInsensitiveStringMap(options.asJava)
+
+import 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
+provider.getTable(dsOptions) match {
+  // TODO (SPARK-27815): To not break existing tests, here we treat file 
source as a special
+  // case, and pass the save mode to file source directly. This hack 
should be removed.
+  case table: FileTable =>
 
 Review comment:
   Is this case still needed, now that `lookupV2Provider` will not return 
`FileDataSourceV2`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

rdblue commented on a change in pull request #25465: [SPARK-28747][SQL] merge 
the two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#discussion_r315814833
 
 

 ##
 File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
 ##
 @@ -251,64 +251,62 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
 
 assertNotBucketed("save")
 
-val session = df.sparkSession
-val cls = DataSource.lookupDataSource(source, session.sessionState.conf)
-val canUseV2 = canUseV2Source(session, cls) && partitioningColumns.isEmpty
-
-// In Data Source V2 project, partitioning is still under development.
-// Here we fallback to V1 if partitioning columns are specified.
-// TODO(SPARK-26778): use V2 implementations when partitioning feature is 
supported.
-if (canUseV2) {
-  val provider = 
cls.getConstructor().newInstance().asInstanceOf[TableProvider]
-  val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
-provider, session.sessionState.conf)
-  val options = sessionOptions ++ extraOptions
-  val dsOptions = new CaseInsensitiveStringMap(options.asJava)
-
-  import 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
-  provider.getTable(dsOptions) match {
-// TODO (SPARK-27815): To not break existing tests, here we treat file 
source as a special
-// case, and pass the save mode to file source directly. This hack 
should be removed.
-case table: FileTable =>
-  val write = 
table.newWriteBuilder(dsOptions).asInstanceOf[FileWriteBuilder]
-.mode(modeForDSV1) // should not change default mode for file 
source.
-.withQueryId(UUID.randomUUID().toString)
-.withInputDataSchema(df.logicalPlan.schema)
-.buildForBatch()
-  // The returned `Write` can be null, which indicates that we can 
skip writing.
-  if (write != null) {
+lookupV2Provider() match {
+  // TODO(SPARK-26778): use V2 implementations when partition columns are 
specified
+  case Some(provider) if partitioningColumns.isEmpty =>
+saveToV2Source(provider)
+
+  case _ => saveToV1Source()
+}
+  }
+
+  private def saveToV2Source(provider: TableProvider): Unit = {
+val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+  provider, df.sparkSession.sessionState.conf)
+val options = sessionOptions ++ extraOptions
+val dsOptions = new CaseInsensitiveStringMap(options.asJava)
+
+import 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Implicits._
+provider.getTable(dsOptions) match {
+  // TODO (SPARK-27815): To not break existing tests, here we treat file 
source as a special
+  // case, and pass the save mode to file source directly. This hack 
should be removed.
+  case table: FileTable =>
+val write = 
table.newWriteBuilder(dsOptions).asInstanceOf[FileWriteBuilder]
+  .mode(modeForDSV1) // should not change default mode for file source.
+  .withQueryId(UUID.randomUUID().toString)
+  .withInputDataSchema(df.logicalPlan.schema)
+  .buildForBatch()
+// The returned `Write` can be null, which indicates that we can skip 
writing.
+if (write != null) {
+  runCommand(df.sparkSession, "save") {
+WriteToDataSourceV2(write, df.logicalPlan)
+  }
+}
+
+  case table: SupportsWrite if table.supports(BATCH_WRITE) =>
+lazy val relation = DataSourceV2Relation.create(table, dsOptions)
+modeForDSV2 match {
+  case SaveMode.Append =>
 runCommand(df.sparkSession, "save") {
-  WriteToDataSourceV2(write, df.logicalPlan)
+  AppendData.byName(relation, df.logicalPlan)
 }
-  }
 
-case table: SupportsWrite if table.supports(BATCH_WRITE) =>
-  lazy val relation = DataSourceV2Relation.create(table, dsOptions)
-  modeForDSV2 match {
-case SaveMode.Append =>
-  runCommand(df.sparkSession, "save") {
-AppendData.byName(relation, df.logicalPlan)
-  }
-
-case SaveMode.Overwrite if table.supportsAny(TRUNCATE, 
OVERWRITE_BY_FILTER) =>
-  // truncate the table
-  runCommand(df.sparkSession, "save") {
-OverwriteByExpression.byName(relation, df.logicalPlan, 
Literal(true))
-  }
-
-case other =>
-  throw new AnalysisException(s"TableProvider implementation 
$source cannot be " +
-s"written with $other mode, please use Append or Overwrite " +
-"modes instead.")
-  }
+  case SaveMode.Overwrite if table.supportsAny(TRUNCATE, 
OVERWRITE_BY_FILTER) =>
+// truncate the table
+runCommand(df.sparkSession, "save") {
+  OverwriteByExpression.byName(relation,

[GitHub] [spark] viirya commented on a change in pull request #25499: [SPARK-28774] [SQL] Fix exchange reuse for columnar data

2019-08-20 Thread GitBox

viirya commented on a change in pull request #25499: [SPARK-28774] [SQL] Fix 
exchange reuse for columnar data
URL: https://github.com/apache/spark/pull/25499#discussion_r315813283
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala
 ##
 @@ -105,6 +120,17 @@ class ExchangeSuite extends SparkPlanTest with 
SharedSparkSession {
 assert(exchange5 sameResult exchange4)
   }
 
+  test ("Columnar exchange works") {
 
 Review comment:
   Remove space after `test`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #25499: [SPARK-28774] [SQL] Fix exchange reuse for columnar data

2019-08-20 Thread GitBox

viirya commented on a change in pull request #25499: [SPARK-28774] [SQL] Fix 
exchange reuse for columnar data
URL: https://github.com/apache/spark/pull/25499#discussion_r315813454
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala
 ##
 @@ -105,6 +120,17 @@ class ExchangeSuite extends SparkPlanTest with 
SharedSparkSession {
 assert(exchange5 sameResult exchange4)
   }
 
+  test ("Columnar exchange works") {
+val df = spark.range(10)
+val plan = df.queryExecution.executedPlan
+assert(plan sameResult plan)
 
 Review comment:
   looks like a redundant assert?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source 
tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523110224
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24715: [SPARK-25474][SQL] Data source 
tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523110230
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109413/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523110224
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523110230
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109413/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #24715: [SPARK-25474][SQL] Data source 
tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523010743
 
 
   **[Test build #109413 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109413/testReport)**
 for PR 24715 at commit 
[`1e1378d`](https://github.com/apache/spark/commit/1e1378d4e1542e9bbd73d419e79ba4470365b7ec).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-20 Thread GitBox

SparkQA commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-523109554
 
 
   **[Test build #109413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109413/testReport)**
 for PR 24715 at commit 
[`1e1378d`](https://github.com/apache/spark/commit/1e1378d4e1542e9bbd73d419e79ba4470365b7ec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class DetermineTableStats(session: SparkSession) extends 
Rule[LogicalPlan] `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the 
two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523108004
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the 
two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523108017
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14490/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315804479
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ##
 @@ -682,6 +699,28 @@ abstract class BaseSubqueryExec extends SparkPlan {
   override def outputPartitioning: Partitioning = child.outputPartitioning
 
   override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  override def generateTreeString(
+depth: Int,
 
 Review comment:
   @cloud-fan oops.. some how need to set up my ide to catch this :-).. will 
change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data 
source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523108004
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data 
source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523108017
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14490/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315804272
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala
 ##
 @@ -19,6 +19,8 @@ package org.apache.spark.sql.execution.aggregate
 
 import java.util.concurrent.TimeUnit._
 
+import scala.collection.mutable
 
 Review comment:
   @cloud-fan will remove.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315804244
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala
 ##
 @@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression}
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.trees.TreeNodeTag
+
+object ExplainUtils {
+  /**
+   * Given a input physical plan, performs the following tasks.
+   *   1. Generate the plan -> operator id map.
+   *   2. Generate the plan -> codegen id map
+   *   3. Generate the two part explain output for this plan.
+   *  1. First part explains the operator tree with each operator tagged 
with an unique
+   * identifier.
+   *  2. Second part explans each operator in a verbose manner.
+   *
+   * Note : This function skips over subqueries. They are handled by its 
caller.
+   */
+  private def processPlanSkippingSubqueries[T <: QueryPlan[T]](
+  plan: => QueryPlan[T],
+  append: String => Unit,
+  startOperatorID: Int): Int = {
+
+// ReusedSubqueryExecs are skipped over
+if (plan.isInstanceOf[BaseSubqueryExec]) {
+  return startOperatorID
+}
+
+val operationIDs = new mutable.ArrayBuffer[(Int, QueryPlan[_])]()
+var currentOperatorID = startOperatorID
+try {
+  currentOperatorID = generateOperatorIDs(plan, currentOperatorID, 
operationIDs)
+  generateWholeStageCodegenIdMap(plan)
+
+  QueryPlan.append(
+plan,
+append,
+verbose = false,
+addSuffix = false,
+printOperatorId = true)
+
+  append("\n")
+  var i: Integer = 0
+  for ((opId, curPlan) <- operationIDs) {
+append(curPlan.verboseStringWithOperatorId())
+  }
+} catch {
+  case e: AnalysisException => append(e.toString)
+}
+currentOperatorID
+  }
+
+  /**
+   * Given a input physical plan, performs the following tasks.
+   *   1. Generates the explain output for the input plan excluding the 
subquery plans.
+   *   2. Generates the explain output for each subquery referenced in the 
plan.
+   */
+  def processPlan[T <: QueryPlan[T]](
+  plan: => QueryPlan[T],
+  append: String => Unit): Unit = {
+try {
+  val subqueries = ArrayBuffer.empty[(SparkPlan, Expression, SparkPlan)]
+  var currentOperatorID = 0
+  currentOperatorID = processPlanSkippingSubqueries(plan, append, 
currentOperatorID)
+  getSubqueries(plan, subqueries)
+  var i = 0
+
+  // Process all the subqueries in the plan.
+  for (sub <- subqueries) {
+if (i == 0) {
+  append("\n= Subqueries =\n\n")
+}
+i = i + 1
+append(s"Subquery:$i Hosting operator id = " +
+  s"${getOpId(sub._1)} Hosting Expression = ${sub._2}\n")
+
+// For each subquery expression in the parent plan, process its child 
plan to compute
+// the explain output.
+currentOperatorID = processPlanSkippingSubqueries(
+  sub._3,
+  append,
+  currentOperatorID)
+append("\n")
+  }
+} finally {
+  removeTags(plan)
+}
+  }
+
+  /**
+   * Traverses the supplied input plan in a bottem-up fashion to produce the 
following two maps :
+   *1. operator -> operator identifier
+   *2. operator identifier -> operator
+   * Note :
+   *1. Operator such as WholeStageCodegenExec and InputAdapter are skipped 
as they don't
+   *   appear in the explain output.
+   *2. operator identifier starts at startIdx + 1
+   */
+  private def generateOperatorIDs(
+  plan: QueryPlan[_],
+  startOperatorID: Int,
+  operatorIDs: mutable.ArrayBuffer[(Int, QueryPlan[_])]): Int = {
+var currentOperationID = startOperatorID
+// Skip the subqueries as they are not printed as part of main query block.
+if

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315803314
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala
 ##
 @@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression}
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.trees.TreeNodeTag
+
+object ExplainUtils {
+  /**
+   * Given a input physical plan, performs the following tasks.
+   *   1. Generate the plan -> operator id map.
+   *   2. Generate the plan -> codegen id map
+   *   3. Generate the two part explain output for this plan.
+   *  1. First part explains the operator tree with each operator tagged 
with an unique
+   * identifier.
+   *  2. Second part explans each operator in a verbose manner.
+   *
+   * Note : This function skips over subqueries. They are handled by its 
caller.
+   */
+  private def processPlanSkippingSubqueries[T <: QueryPlan[T]](
+  plan: => QueryPlan[T],
+  append: String => Unit,
+  startOperatorID: Int): Int = {
 
 Review comment:
   @cloud-fan OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315803402
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala
 ##
 @@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression}
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.trees.TreeNodeTag
+
+object ExplainUtils {
+  /**
+   * Given a input physical plan, performs the following tasks.
+   *   1. Generate the plan -> operator id map.
+   *   2. Generate the plan -> codegen id map
+   *   3. Generate the two part explain output for this plan.
+   *  1. First part explains the operator tree with each operator tagged 
with an unique
+   * identifier.
+   *  2. Second part explans each operator in a verbose manner.
+   *
+   * Note : This function skips over subqueries. They are handled by its 
caller.
+   */
+  private def processPlanSkippingSubqueries[T <: QueryPlan[T]](
+  plan: => QueryPlan[T],
+  append: String => Unit,
+  startOperatorID: Int): Int = {
+
+// ReusedSubqueryExecs are skipped over
+if (plan.isInstanceOf[BaseSubqueryExec]) {
+  return startOperatorID
+}
+
+val operationIDs = new mutable.ArrayBuffer[(Int, QueryPlan[_])]()
+var currentOperatorID = startOperatorID
+try {
+  currentOperatorID = generateOperatorIDs(plan, currentOperatorID, 
operationIDs)
+  generateWholeStageCodegenIdMap(plan)
+
+  QueryPlan.append(
+plan,
+append,
+verbose = false,
+addSuffix = false,
+printOperatorId = true)
+
+  append("\n")
+  var i: Integer = 0
+  for ((opId, curPlan) <- operationIDs) {
+append(curPlan.verboseStringWithOperatorId())
+  }
+} catch {
+  case e: AnalysisException => append(e.toString)
+}
+currentOperatorID
+  }
+
+  /**
+   * Given a input physical plan, performs the following tasks.
+   *   1. Generates the explain output for the input plan excluding the 
subquery plans.
+   *   2. Generates the explain output for each subquery referenced in the 
plan.
+   */
+  def processPlan[T <: QueryPlan[T]](
+  plan: => QueryPlan[T],
+  append: String => Unit): Unit = {
+try {
+  val subqueries = ArrayBuffer.empty[(SparkPlan, Expression, SparkPlan)]
+  var currentOperatorID = 0
+  currentOperatorID = processPlanSkippingSubqueries(plan, append, 
currentOperatorID)
+  getSubqueries(plan, subqueries)
+  var i = 0
+
+  // Process all the subqueries in the plan.
+  for (sub <- subqueries) {
+if (i == 0) {
+  append("\n= Subqueries =\n\n")
+}
+i = i + 1
+append(s"Subquery:$i Hosting operator id = " +
+  s"${getOpId(sub._1)} Hosting Expression = ${sub._2}\n")
+
+// For each subquery expression in the parent plan, process its child 
plan to compute
+// the explain output.
+currentOperatorID = processPlanSkippingSubqueries(
+  sub._3,
+  append,
+  currentOperatorID)
+append("\n")
+  }
+} finally {
+  removeTags(plan)
+}
+  }
+
+  /**
+   * Traverses the supplied input plan in a bottem-up fashion to produce the 
following two maps :
+   *1. operator -> operator identifier
+   *2. operator identifier -> operator
+   * Note :
+   *1. Operator such as WholeStageCodegenExec and InputAdapter are skipped 
as they don't
+   *   appear in the explain output.
+   *2. operator identifier starts at startIdx + 1
+   */
+  private def generateOperatorIDs(
+  plan: QueryPlan[_],
+  startOperatorID: Int,
+  operatorIDs: mutable.ArrayBuffer[(Int, QueryPlan[_])]): Int = {
+var currentOperationID = startOperatorID
+// Skip the subqueries as they are not printed as part of main query block.
+if

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315803233
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala
 ##
 @@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression}
+import org.apache.spark.sql.catalyst.plans.QueryPlan
+import org.apache.spark.sql.catalyst.trees.TreeNodeTag
+
+object ExplainUtils {
+  /**
+   * Given a input physical plan, performs the following tasks.
+   *   1. Generate the plan -> operator id map.
+   *   2. Generate the plan -> codegen id map
+   *   3. Generate the two part explain output for this plan.
+   *  1. First part explains the operator tree with each operator tagged 
with an unique
+   * identifier.
+   *  2. Second part explans each operator in a verbose manner.
+   *
+   * Note : This function skips over subqueries. They are handled by its 
caller.
+   */
+  private def processPlanSkippingSubqueries[T <: QueryPlan[T]](
 
 Review comment:
   @cloud-fan No.. don't need it.. i probably copied from somewhere :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315802736
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 ##
 @@ -273,6 +288,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
 }
 
 object QueryPlan extends PredicateHelper {
+  val opidTag = TreeNodeTag[Int]("operatorId")
 
 Review comment:
   @cloud-fan Will change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315802822
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala
 ##
 @@ -273,6 +288,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] 
extends TreeNode[PlanT
 }
 
 object QueryPlan extends PredicateHelper {
+  val opidTag = TreeNodeTag[Int]("operatorId")
+  val codegenTag = new TreeNodeTag[Int]("wholeStageCodegenId")
 
 Review comment:
   @cloud-fan Will change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command

2019-08-20 Thread GitBox

dilipbiswal commented on a change in pull request #24759: 
[SPARK-27395][SQL][WIP] Improve EXPLAIN command
URL: https://github.com/apache/spark/pull/24759#discussion_r315803003
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1512,6 +1512,16 @@ object SQLConf {
 "When this conf is not set, the value from 
`spark.redaction.string.regex` is used.")
   .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN)
 
+  val SQL_EXPLAIN_LEGACY_FORMAT =
 
 Review comment:
   @cloud-fan Sounds good. Will remove the property.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

SparkQA commented on issue #25465: [SPARK-28747][SQL] merge the two data source 
v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523105687
 
 
   **[Test build #109426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109426/testReport)**
 for PR 25465 at commit 
[`f93a8eb`](https://github.com/apache/spark/commit/f93a8eb609ebe12c7bbda2539c4574b1c9702050).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25136: [SPARK-28322][SQL] Add support to Decimal type for integral divide

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25136: [SPARK-28322][SQL] Add support 
to Decimal type for integral divide
URL: https://github.com/apache/spark/pull/25136#issuecomment-523101240
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109411/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25136: [SPARK-28322][SQL] Add support to Decimal type for integral divide

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #25136: [SPARK-28322][SQL] Add support to 
Decimal type for integral divide
URL: https://github.com/apache/spark/pull/25136#issuecomment-523001086
 
 
   **[Test build #109411 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109411/testReport)**
 for PR 25136 at commit 
[`8caf4a9`](https://github.com/apache/spark/commit/8caf4a95376623bd6f1e3e999c49be69e726852e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25136: [SPARK-28322][SQL] Add support to Decimal type for integral divide

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25136: [SPARK-28322][SQL] Add support 
to Decimal type for integral divide
URL: https://github.com/apache/spark/pull/25136#issuecomment-523101230
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25136: [SPARK-28322][SQL] Add support to Decimal type for integral divide

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25136: [SPARK-28322][SQL] Add support to 
Decimal type for integral divide
URL: https://github.com/apache/spark/pull/25136#issuecomment-523101240
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109411/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

SparkQA removed a comment on issue #25465: [SPARK-28747][SQL] merge the two 
data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523004203
 
 
   **[Test build #109412 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109412/testReport)**
 for PR 25465 at commit 
[`830550d`](https://github.com/apache/spark/commit/830550d944ab47d0d7b7ff02228c11bfbc39658c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25136: [SPARK-28322][SQL] Add support to Decimal type for integral divide

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25136: [SPARK-28322][SQL] Add support to 
Decimal type for integral divide
URL: https://github.com/apache/spark/pull/25136#issuecomment-523101230
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the 
two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523100802
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109412/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #25465: [SPARK-28747][SQL] merge the 
two data source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523100793
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] rdblue commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-20 Thread GitBox

rdblue commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to 
track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-523101070
 
 
   @cloud-fan, sorry for the delay reviewing this. +1 from me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data 
source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523100802
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109412/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #25465: [SPARK-28747][SQL] merge the two data 
source v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523100793
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25136: [SPARK-28322][SQL] Add support to Decimal type for integral divide

2019-08-20 Thread GitBox

SparkQA commented on issue #25136: [SPARK-28322][SQL] Add support to Decimal 
type for integral divide
URL: https://github.com/apache/spark/pull/25136#issuecomment-523100439
 
 
   **[Test build #109411 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109411/testReport)**
 for PR 25136 at commit 
[`8caf4a9`](https://github.com/apache/spark/commit/8caf4a95376623bd6f1e3e999c49be69e726852e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25465: [SPARK-28747][SQL] merge the two data source v2 fallback configs

2019-08-20 Thread GitBox

SparkQA commented on issue #25465: [SPARK-28747][SQL] merge the two data source 
v2 fallback configs
URL: https://github.com/apache/spark/pull/25465#issuecomment-523100075
 
 
   **[Test build #109412 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109412/testReport)**
 for PR 25465 at commit 
[`830550d`](https://github.com/apache/spark/commit/830550d944ab47d0d7b7ff02228c11bfbc39658c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24981: [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs

2019-08-20 Thread GitBox

SparkQA commented on issue #24981: [SPARK-27463][PYTHON] Support Dataframe 
Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24981#issuecomment-523099461
 
 
   **[Test build #109425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109425/testReport)**
 for PR 24981 at commit 
[`dd1ffaf`](https://github.com/apache/spark/commit/dd1ffaf86f5d1a44fd812540b405e2eb8a21e6cf).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24981: [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24981: [SPARK-27463][PYTHON] Support 
Dataframe Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24981#issuecomment-523098659
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24981: [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs

2019-08-20 Thread GitBox

AmplabJenkins removed a comment on issue #24981: [SPARK-27463][PYTHON] Support 
Dataframe Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24981#issuecomment-523098665
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14489/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24981: [SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs

2019-08-20 Thread GitBox

AmplabJenkins commented on issue #24981: [SPARK-27463][PYTHON] Support 
Dataframe Cogroup via Pandas UDFs
URL: https://github.com/apache/spark/pull/24981#issuecomment-523098665
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14489/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 1119 matches

Mail list logo