[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-658900416







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


MaxGekk commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r455238390



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala
##
@@ -508,6 +548,9 @@ object JsonBenchmark extends SqlBasedBenchmark {
   jsonInDS(50 * 1000 * 1000, numIters)
   jsonInFile(50 * 1000 * 1000, numIters)
   datetimeBenchmark(rowsNum = 10 * 1000 * 1000, numIters)
+  // Benchmark pushdown filters that refer to top-level columns.
+  // TODO: Add benchmarks for filters with nested column attributes.

Review comment:
   I created the sub-task https://issues.apache.org/jira/browse/SPARK-32325





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-658914488







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658914448







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-658914488







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658914448







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [WIP][SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-658914370







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29126:
URL: https://github.com/apache/spark/pull/29126#issuecomment-658914030


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


cloud-fan commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455237329



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
##
@@ -715,7 +715,8 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
   accumUpdates: Array[(Long, Seq[AccumulatorV2[_, _]])],
   blockManagerId: BlockManagerId,
   executorUpdates: Map[(Int, Int), ExecutorMetrics]): Boolean = true
-  override def executorDecommission(executorId: String): Unit = {}
+  override def executorDecommission(executorId: String,

Review comment:
   ditto: indentation

##
File path: 
core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala
##
@@ -90,7 +90,8 @@ private class DummyTaskScheduler extends TaskScheduler {
   override def notifyPartitionCompletion(stageId: Int, partitionId: Int): Unit 
= {}
   override def setDAGScheduler(dagScheduler: DAGScheduler): Unit = {}
   override def defaultParallelism(): Int = 2
-  override def executorDecommission(executorId: String): Unit = {}
+  override def executorDecommission(executorId: String,

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [WIP][SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-658914370







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


cloud-fan commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455237815



##
File path: core/src/main/scala/org/apache/spark/scheduler/DecommissionInfo.scala
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+/**
+ * Provides more detail about a decommissioning event.
+ * @param message Human readable reason for why the decommissioning is 
happening.
+ * @param isWorkerDecommissioned Whether the worker is being decommissioned 
too.
+ *   Used to know if the shuffle data might be 
lost too.
+ */
+private[spark]
+case class DecommissionInfo(message: String, isWorkerDecommissioned: Boolean)

Review comment:
   so this PR is just a refactor and doesn't actually use the 
`isWorkerDecommissioned` flag?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


aokolnychyi commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658932378


   @dongjoon-hyun @viirya @hvanhovell @maropu, what do you think?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-658940338







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


MaxGekk commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r455259788



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/StructFiltersSuite.scala
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.sources.{AlwaysFalse, AlwaysTrue, Filter}
+import org.apache.spark.sql.types.{IntegerType, StructType}
+import org.apache.spark.unsafe.types.UTF8String
+
+abstract class StructFiltersSuite extends SparkFunSuite {
+
+  def createFilters(filters: Seq[sources.Filter], schema: StructType): 
StructFilters

Review comment:
   You mix 2 things - scope and what should be implemented in child 
classes. `protected` doesn't indicate that a method must be implemented in a 
child class because it can have an implementation in the parent class. 
   
   > You had better change your point of view to become a committer.
   
   Thank you, now I know what blocks me ;-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl

2020-07-15 Thread GitBox


venkata91 commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658957953


   > yes we have a test in TaskSchedulerImplSuite that checks to make sure it 
aborted, but I don't think it covers when dynamic allocation on, so it doesn't 
hit your new code. So we would want to add a test where it can't acquire a new 
executor and aborts.
   
   I think the new test which I added is just duplicated. Do you think its 
better to just add the config to enable dynamic allocation to the other test 
itself in order to avoid the duplication



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658958309


   It would be great if you mention that in the PR title and PR description. 
Otherwise, the PR title is misleading.
   > By suppressing it,



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


MaxGekk commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-658966760


   jenkins, retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658966540


   Hi, @jiangxb1987 . Could you ping someone in your mind explicitly like I did 
at https://github.com/apache/spark/pull/28708#issuecomment-658965320 ?
   > Please wait for a couple of days (maybe until the end of this week ?) to 
allow other committers to review and post +1, thanks!
   
   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl

2020-07-15 Thread GitBox


tgravescs commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658974291


   you can make a common function that has most of the code that gets called 
from 2 separate tests. one test passes with dynamic allocation on, the other 
with it off. that will reduce code duplication.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-15 Thread GitBox


SparkQA commented on pull request #29123:
URL: https://github.com/apache/spark/pull/29123#issuecomment-658983163


   **[Test build #125880 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125880/testReport)**
 for PR 29123 at commit 
[`45d1e43`](https://github.com/apache/spark/commit/45d1e4341ecab8d5271e17f9ae13072c71c46e32).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29123:
URL: https://github.com/apache/spark/pull/29123#issuecomment-658857296


   **[Test build #125880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125880/testReport)**
 for PR 29123 at commit 
[`45d1e43`](https://github.com/apache/spark/commit/45d1e4341ecab8d5271e17f9ae13072c71c46e32).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658998288


   Hi, @srowen . You last commit passed the GitHub Action. Please see here.
   - 
https://github.com/apache/spark/pull/29111/commits/6390b6c46f5bf35e0c92b140bfbe12f98c35cd8f



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658998506


   Also, here.
   ![Screen Shot 2020-07-15 at 1 41 21 
PM](https://user-images.githubusercontent.com/9700541/87593815-e4910e00-c6a0-11ea-9e09-1c8b68fc8ed2.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659020484


   **[Test build #125909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125909/testReport)**
 for PR 29032 at commit 
[`090eecd`](https://github.com/apache/spark/commit/090eecd7a9c0293aeb270f154d000da123e602aa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29127: [SPARK-32327][SQL] Introduce UnresolvedTableOrPermanentView for commands that support a table and permanent view, but not a temporary v

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29127:
URL: https://github.com/apache/spark/pull/29127#issuecomment-659047837







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


karuppayya commented on a change in pull request #28804:
URL: https://github.com/apache/spark/pull/28804#discussion_r455402409



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2196,6 +2196,25 @@ object SQLConf {
   .checkValue(bit => bit >= 10 && bit <= 30, "The bit value must be in 
[10, 30].")
   .createWithDefault(16)
 
+  val SKIP_PARTIAL_AGGREGATE_ENABLED =
+buildConf("spark.sql.aggregate.partialaggregate.skip.enabled")
+  .internal()
+  .doc("Avoid sort/spill to disk during partial aggregation")
+  .booleanConf
+  .createWithDefault(true)
+
+  val SKIP_PARTIAL_AGGREGATE_THRESHOLD =
+buildConf("spark.sql.aggregate.partialaggregate.skip.threshold")
+  .internal()
+  .longConf
+  .createWithDefault(10)

Review comment:
   @cloud-fan we skip partial aggregartion only when the aggragation was 
not able to cut down records by 50%(define by 
spark.sql.aggregate.partialaggregate.skip.ratio). In this case it will not kick 
in.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-659047829







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-659047829







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-658857875


   **[Test build #125892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125892/testReport)**
 for PR 29045 at commit 
[`cf68729`](https://github.com/apache/spark/commit/cf6872989fdcb5396357c0e4cd3b3529e1334e6a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


SparkQA commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-659047383


   **[Test build #125892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125892/testReport)**
 for PR 29045 at commit 
[`cf68729`](https://github.com/apache/spark/commit/cf6872989fdcb5396357c0e4cd3b3529e1334e6a).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-658900416







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

2020-07-15 Thread GitBox


huaxingao commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658901689


   cc @srowen @viirya @zhengruifeng 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-654419342


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


cloud-fan commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-658915498


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-658936909







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-658936909







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl

2020-07-15 Thread GitBox


tgravescs commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658951229


   yes we have a test in TaskSchedulerImplSuite that checks to make sure it 
aborted, but I don't think it covers when dynamic allocation on, so it doesn't 
hit your new code.  So we would want to add a test where it can't acquire a new 
executor and aborts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-658958999


   Thank you for pinging me, @cloud-fan .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-658959140


   Retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

2020-07-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658958309


   It would be great if you mention `suppressing` in the PR title and PR 
description. Otherwise, the PR title is misleading.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's bl

2020-07-15 Thread GitBox


venkata91 commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658977388


   > you can make a common function that has most of the code that gets called 
from 2 separate tests. one test passes with dynamic allocation on, the other 
with it off. that will reduce code duplication.
   
   nevermind, I made some changes to the test so that it goes to the `None` 
block where we check if dynamic allocation is enabled or not and request 
accordingly. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-15 Thread GitBox


SparkQA commented on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-658987615


   **[Test build #125881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125881/testReport)**
 for PR 29120 at commit 
[`e56f5d4`](https://github.com/apache/spark/commit/e56f5d4936fc8105d672fea5fe8ae441b7de0f2b).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29120: [SPARK-32291][SQL] COALESCE should not reduce the child parallelism if it contains a Join

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29120:
URL: https://github.com/apache/spark/pull/29120#issuecomment-658857353


   **[Test build #125881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125881/testReport)**
 for PR 29120 at commit 
[`e56f5d4`](https://github.com/apache/spark/commit/e56f5d4936fc8105d672fea5fe8ae441b7de0f2b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-658995807


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125893/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28977:
URL: https://github.com/apache/spark/pull/28977#issuecomment-659007597


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125896/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29090: [SPARK-32293] Fix inconsistency between Spark memory configs and JVM option

2020-07-15 Thread GitBox


SparkQA commented on pull request #29090:
URL: https://github.com/apache/spark/pull/29090#issuecomment-659008171


   **[Test build #125885 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125885/testReport)**
 for PR 29090 at commit 
[`cc495c1`](https://github.com/apache/spark/commit/cc495c1c45ac0648156b662fdc308287c79f3fdc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-659007765


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658910627


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/30513/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-659007979


   **[Test build #125905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125905/testReport)**
 for PR 28708 at commit 
[`eb43f20`](https://github.com/apache/spark/commit/eb43f2055a38067c63f925526f91d435d7c90aaa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


viirya commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-659008506


   Jenkins seems not working one this. But GitHub Actions are passed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


tgravescs commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r455369019



##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##
@@ -0,0 +1,330 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.util.concurrent.ExecutorService
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.shuffle.{MigratableResolver, ShuffleBlockInfo}
+import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Class to handle block manager decommissioning retries.
+ * It creates a Thread to retry offloading all RDD cache and Shuffle blocks
+ */
+private[storage] class BlockManagerDecommissioner(
+  conf: SparkConf,

Review comment:
   nit, these should be 4 space indented

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##
@@ -0,0 +1,330 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.util.concurrent.ExecutorService
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.shuffle.{MigratableResolver, ShuffleBlockInfo}
+import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Class to handle block manager decommissioning retries.
+ * It creates a Thread to retry offloading all RDD cache and Shuffle blocks

Review comment:
   these creates a thread per add and shuffle block migration correct? and 
possibly another pool for the actual migration. Wonder if we can just clarify 
or generalize





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 opened a new pull request #29127: [SPARK-32327][SQL] Introduce UnresolvedTableOrPermanentView for commands that support a table and permanent view, but not a temporary view

2020-07-15 Thread GitBox


imback82 opened a new pull request #29127:
URL: https://github.com/apache/spark/pull/29127


   
   
   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce `UnresolvedTableOrPermanentView` for commands 
that support a table and a permanent view, but not a temporary view as 
discussed here: 
https://github.com/apache/spark/pull/28375#discussion_r416343587.
   
   This new logical plan is now used for `SHOW TBLPROPERTIES`.
   
   ### Why are the changes needed?
   
   There are commands that support both a table and a permanent view, but not a 
temporary view.  Using `UnresolvedTableOrPermanentView` makes it for the 
analyzer to resolve only the relation that's needed for those commands.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes,
   Before:
   ```
   scala> sql("CREATE TEMPORARY VIEW tv TBLPROPERTIES('p1'='v1') AS SELECT 1 AS 
c1")
   res0: org.apache.spark.sql.DataFrame = []
   
   scala> sql("SHOW TBLPROPERTIES tv").show
   +---+-+
   |key|value|
   +---+-+
   +---+-+
   ```
   After:
   ```
   scala> sql("CREATE TEMPORARY VIEW tv TBLPROPERTIES('p1'='v1') AS SELECT 1 AS 
c1")
   res0: org.apache.spark.sql.DataFrame = []
   
   scala> sql("SHOW TBLPROPERTIES tv").show
   org.apache.spark.sql.AnalysisException: tv is a temp view, not a table or 
permanent view.; line 1 pos 0
 at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.$anonfun$applyOrElse$42(Analyzer.scala:863)
 at scala.Option.foreach(Option.scala:407)
   ...
   ```
   
   ### How was this patch tested?
   
   Updated existing tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] GuoPhilipse commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-15 Thread GitBox


GuoPhilipse commented on a change in pull request #29056:
URL: https://github.com/apache/spark/pull/29056#discussion_r455220561



##
File path: docs/sql-ref-syntax-qry-select-groupby.md
##
@@ -38,6 +38,8 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
 While aggregate functions are defined as
 ```sql
 aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE 
boolean_expression ) ]
+
+[ FIRST | LAST ] ( expression [ IGNORE NULLS ] ) ]

Review comment:
   I just tried, not working ,even aggregate functions do not support 
`FILTER`  in V2.4.5,  i will test for other version tomorrow.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

2020-07-15 Thread GitBox


srowen commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658908251


   So in theory this shouldn't change behavior, or if it does, it's fixing an 
incompatibility that's likely more a bug than anything right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


cloud-fan commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455235801



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -912,7 +912,8 @@ private[spark] class TaskSchedulerImpl(
 }
   }
 
-  override def executorDecommission(executorId: String): Unit = {
+  override def executorDecommission(executorId: String,

Review comment:
   nit: code style should be
   ```
   override def ...(
   para1: T, para2: T): ... 
   ```
   
   4 space indentation for the parameter list.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


dongjoon-hyun commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r455251187



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonFilters.scala
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import org.apache.spark.sql.catalyst.{InternalRow, StructFilters}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.StructType
+
+/**
+ * The class provides API for applying pushed down source filters to rows with
+ * a struct schema parsed from JSON records. The class should be used in this 
way:
+ * 1. Before processing of the next row, `JacksonParser` (parser for short) 
resets the internal
+ *state of `JsonFilters` by calling the `reset()` method.
+ * 2. The parser reads JSON fields one-by-one in streaming fashion. It 
converts an incoming
+ *field value to the desired type from the schema. After that, it sets the 
value to an instance
+ *of `InternalRow` at the position according to the schema. Order of 
parsed JSON fields can
+ *be different from the order in the schema.
+ * 3. Per every JSON field of the top-level JSON object, the parser calls 
`skipRow` by passing
+ *an `InternalRow` in which some of fields can be already set, and the 
position of the JSON
+ *field according to the schema.
+ *3.1 `skipRow` finds a group of predicates that refers to this JSON field.
+ *3.2 Per each predicate from the group, `skipRow` decrements its 
reference counter.
+ *3.2.1 If predicate reference counter becomes 0, it means that all 
predicate attributes have
+ *  been already set in the internal row, and the predicate can be 
applied to it. `skipRow`
+ *  invokes the predicate for the row.
+ *3.3 `skipRow` applies predicates until one of them returns `false`. In 
that case, the method
+ *returns `true` to the parser.
+ *3.4 If all predicates with zero reference counter return `true`, the 
final result of
+ *the method is `false` which tells the parser to not skip the row.
+ * 4. If the parser gets `true` from `JsonFilters.skipRow`, it must not call 
the method anymore
+ *for this internal row, and should go the step 1.
+ *
+ * `JsonFilters` assumes that:
+ *   - `reset()` is called before any `skipRow()` calls for new row.
+ *   - `skipRow()` can be called for any valid index of the struct fields,
+ *  and in any order.
+ *   - After `skipRow()` returns `true`, the internal state of `JsonFilters` 
can be inconsistent,
+ * so, `skipRow()` must not be called for the current row anymore without 
`reset()`.
+ *
+ * @param pushedFilters The pushed down source filters. The filters should 
refer to
+ *  the fields of the provided schema.
+ * @param schema The required schema of records from datasource files.
+ */
+class JsonFilters(pushedFilters: Seq[sources.Filter], schema: StructType)
+  extends StructFilters(pushedFilters, schema) {
+
+  /**
+   * Stateful JSON predicate that keeps track of its dependent references in 
the
+   * current row via `refCount`.
+   *
+   * @param predicate The predicate compiled from pushed down source filters.
+   * @param totalRefs The total amount of all filters references which the 
predicate
+   *  compiled from.
+   */
+  case class JsonPredicate(predicate: BasePredicate, totalRefs: Int) {
+// The current number of predicate references in the row that have been 
not set yet.
+// When `refCount` reaches zero, the predicate has all dependencies are 
set, and can
+// be applied to the row.
+var refCount: Int = totalRefs
+
+def reset(): Unit = {
+  refCount = totalRefs
+}
+  }
+
+  // Predicates compiled from the pushed down filters. The predicates are 
grouped by their
+  // attributes. The i-th group contains predicates that refer to the i-th 
field of the given
+  // schema. A predicates can be placed to many groups if it has many 
attributes. For example:
+  //  schema: i INTEGER, s STRING
+  //  

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29115: [SPARK-32315][ML] Provide an explanation error message when calling require

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29115:
URL: https://github.com/apache/spark/pull/29115#issuecomment-658929884







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658960595







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-658960503







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29124: [WIP][SPARK-31168][BUILD] Upgrade Scala to 2.12.12

2020-07-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #29124:
URL: https://github.com/apache/spark/pull/29124#issuecomment-658956530


   The failure looks consistent. Could you take a look at that, @wangyum ?
   ```
   [info] org.apache.spark.serializer.KryoSerializerSuite *** ABORTED *** (324 
milliseconds)
   [info]   java.lang.NoSuchFieldError: numNonEmptyBlocks
   [info]   at 
org.apache.spark.scheduler.HighlyCompressedMapStatus.(MapStatus.scala:174)
   ```
   
   That might be another Scala bug.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-658960503







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-658896338







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko edited a comment on pull request #29121: [SPARK-32319][PYSPARK] Remove unused imports

2020-07-15 Thread GitBox


Fokko edited a comment on pull request #29121:
URL: https://github.com/apache/spark/pull/29121#issuecomment-658967694


   Good point @dongjoon-hyun, I was focusing on getting the CI green again. 
I've updated the PR description and title.
   
   While rereading it. Technically the title is correct. If we suppress the 
error, the import serves a purpose. Feel free to update if there is something 
that covers the content better in your opinion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29124: [WIP][SPARK-31168][BUILD] Upgrade Scala to 2.12.12

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29124:
URL: https://github.com/apache/spark/pull/29124#issuecomment-658857295


   **[Test build #125879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125879/testReport)**
 for PR 29124 at commit 
[`3adc82a`](https://github.com/apache/spark/commit/3adc82a2c4f9dc4f4ae418efba885ad713d8ee26).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29124: [WIP][SPARK-31168][BUILD] Upgrade Scala to 2.12.12

2020-07-15 Thread GitBox


SparkQA commented on pull request #29124:
URL: https://github.com/apache/spark/pull/29124#issuecomment-658988384


   **[Test build #125879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125879/testReport)**
 for PR 29124 at commit 
[`3adc82a`](https://github.com/apache/spark/commit/3adc82a2c4f9dc4f4ae418efba885ad713d8ee26).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


holdenk commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r455337945



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -420,6 +420,29 @@ package object config {
   .booleanConf
   .createWithDefault(false)
 
+  private[spark] val STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED =
+ConfigBuilder("spark.storage.decommission.shuffleBlocks.enabled")

Review comment:
   I was planning on saving that for once we've agreed it's ready for 
general usage. I know the SPIP is approved, but I still view this as more of a 
developer feature (e.g. one we would expect a cloud vendor to build on top of) 
than ready for end user feature. What do you think?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #28977:
URL: https://github.com/apache/spark/pull/28977#issuecomment-658874516


   **[Test build #125896 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125896/testReport)**
 for PR 28977 at commit 
[`9600708`](https://github.com/apache/spark/commit/96007086d18db5838fb57e7cd298709f26f1f088).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28977: [WIP] Add all hive.execution suite in the parallel test group

2020-07-15 Thread GitBox


SparkQA commented on pull request #28977:
URL: https://github.com/apache/spark/pull/28977#issuecomment-659006088


   **[Test build #125896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125896/testReport)**
 for PR 28977 at commit 
[`9600708`](https://github.com/apache/spark/commit/96007086d18db5838fb57e7cd298709f26f1f088).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659014189


   **[Test build #125883 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125883/testReport)**
 for PR 29114 at commit 
[`465fd8a`](https://github.com/apache/spark/commit/465fd8a5f4773c3fee69df9c5cf8d3ad57160d03).
* This patch **fails Spark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658857494


   **[Test build #125883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125883/testReport)**
 for PR 29114 at commit 
[`465fd8a`](https://github.com/apache/spark/commit/465fd8a5f4773c3fee69df9c5cf8d3ad57160d03).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


tgravescs commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r455352852



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##
@@ -44,9 +47,9 @@ import org.apache.spark.util.Utils
 // 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver#getSortBasedShuffleBlockData().
 private[spark] class IndexShuffleBlockResolver(
 conf: SparkConf,
-_blockManager: BlockManager = null)
+var _blockManager: BlockManager = null)

Review comment:
   this is a var for testing?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


tgravescs commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r455388677



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -420,6 +420,29 @@ package object config {
   .booleanConf
   .createWithDefault(false)
 
+  private[spark] val STORAGE_DECOMMISSION_SHUFFLE_BLOCKS_ENABLED =
+ConfigBuilder("spark.storage.decommission.shuffleBlocks.enabled")

Review comment:
   ok, that is fine with me. just wanted to make sure we thought about it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark'

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659036401







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karuppayya closed pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


karuppayya closed pull request #28804:
URL: https://github.com/apache/spark/pull/28804


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karuppayya commented on a change in pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


karuppayya commented on a change in pull request #28804:
URL: https://github.com/apache/spark/pull/28804#discussion_r455403785



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2196,6 +2196,25 @@ object SQLConf {
   .checkValue(bit => bit >= 10 && bit <= 30, "The bit value must be in 
[10, 30].")
   .createWithDefault(16)
 
+  val SKIP_PARTIAL_AGGREGATE_ENABLED =
+buildConf("spark.sql.aggregate.partialaggregate.skip.enabled")
+  .internal()
+  .doc("Avoid sort/spill to disk during partial aggregation")
+  .booleanConf
+  .createWithDefault(true)
+
+  val SKIP_PARTIAL_AGGREGATE_THRESHOLD =
+buildConf("spark.sql.aggregate.partialaggregate.skip.threshold")
+  .internal()
+  .longConf
+  .createWithDefault(10)
+
+  val SKIP_PARTIAL_AGGREGATE_RATIO =
+buildConf("spark.sql.aggregate.partialaggregate.skip.ratio")
+  .internal()
+  .doubleConf
+  .createWithDefault(0.5)

Review comment:
   @maropu I have borrowed this heuristic from Hive. We can merge them into 
one. Any suggestions here? 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] karuppayya opened a new pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


karuppayya opened a new pull request #28804:
URL: https://github.com/apache/spark/pull/28804


   ### What changes were proposed in this pull request?
   In case of HashAggregation, a partial aggregation(update) is done followed 
by final aggregation(merge) 
   
   During partial aggregation we sort and spill to disk every-time   fby, 
when the fast Map(when enabled) and  UnsafeFixedWidthAggregationMap gets 
exhausted
   
   **When the cardinality of grouping column is close to the total number of 
records being processed, the sorting of data spilling to disk is not required, 
since it is kind of no-op and we can directly use rows in Final aggregation.**
   
   When the user is aware of nature of data, currently he has no control over 
disabling this sort, spill operation.
   
   This is similar to following issues in Hive:
   https://issues.apache.org/jira/browse/HIVE-223
   https://issues.apache.org/jira/browse/HIVE-291
   
   In this PR, the ability to disable sort/spill during partial aggregation is 
added
   
   ### Benchmark
   spark.executor.memory = 12G
    Init code
   ```
   // init code
   case class Data(name: String, value1: String, value2: String, value3: Long, 
random: Int)
   val numRecords = Seq(6000)
   val tblName = "tbl"
   ```
    Generate data
   
   ```
   // init code
   case class Data(name: String, value1: String, value2: String, value3: Long, 
random: Int)
   val numRecords = Seq(3000, 6000)
   
   val basePath = "s3://qubole-spar/karuppayya/SPAR-4477/benchmark/"
   val rand = scala.util.Random
   // write
   numRecords.foreach {
 recordCount =>
   val dataLocation = s"$basePath/$recordCount"
   val dataDF = spark.range(recordCount).map {
 x =>
   if (x < 10) Data(s"name1", s"value1", s"value1", 10, 
rand.nextInt(100))
   else Data(s"name$x", s"value$x", s"value$x", 1, rand.nextInt(100))
   }
   // creating data to be processed by on task(aslo gzip-ing to ensure 
spark doesnt
   // create multiple splits )
   val randomDF = dataDF.orderBy("random")
 randomDF.drop("random").repartition(1)
 .write
 .mode("overwrite")
 .option("compression", "gzip")
 .parquet(dataLocation)
   }
   ```
    query
   ```
   val query =
 s"""
   |SELECT name, value1, value2, SUM(value3) s
   |FROM $tblName
   |GROUP BY name, value1, value2
   |"""
   ```
   
    Benchmark code
   ```
 .add(StructField("name", StringType))
 .add(StructField("value1", StringType))
 .add(StructField("value2", StringType))
 .add(StructField("value3", LongType))
   val query =
 """
   |SELECT name, value1, value2, SUM(value3) s
   |FROM tbl
   |GROUP BY name, value1, value2
   |"""
   
   case class Metric(recordCount: Long, partialAggregateEnabled: Boolean, 
timeTaken: Long)
   val metrics = Seq(true, false).flatMap {
 enabled =>
   sql(s"set 
spark.sql.aggregate.partialaggregate.skip.enabled=$enabled").collect
   numRecords.map {
 recordCount =>
   import java.util.concurrent.TimeUnit.NANOSECONDS
   val dataLocation = s"$basePath/$recordCount"
   spark.read
 .option("inferTimestamp", "false")
 .schema(userSpecifiedSchema)
 .json(dataLocation)
 .createOrReplaceTempView("tbl")
   val start = System.nanoTime()
   spark.sql(query).filter("s > 10").collect
   val end = System.nanoTime()
   val diff = end - start
   Metric(recordCount, enabled, NANOSECONDS.toMillis(diff))
   }
   }
   ```
   ### Results
   ```
   val df = metrics.toDF
   df.createOrReplaceTempView("a")
   val df = sql("select * from a order by recordcount desc, 
partialAggregateEnabled")
   df.show()
   scala> df.show
   +---+---+-+
   |recordCount|partialAggregateEnabled|timeTaken|
   +---+---+-+
   |   9000|  false|   593844|
   |   9000|   true|   412958|
   |   6000|  false|   377054|
   |   6000|   true|   276363|
   ```
   ### Percent improvement: 
   9000 → 30.46%, 6000 → 26.70%
   

   ### Why are the changes needed?
   This improvement can improve the performance of queries
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   This patch was tested manually



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional 

[GitHub] [spark] karuppayya commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-15 Thread GitBox


karuppayya commented on pull request #28804:
URL: https://github.com/apache/spark/pull/28804#issuecomment-659049598


   Updated the description with the benchmarks, after the latest changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


SparkQA commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659048554


   **[Test build #125912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)**
 for PR 27366 at commit 
[`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan opened a new pull request #29125: [SPARK-32018][SQL] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


cloud-fan opened a new pull request #29125:
URL: https://github.com/apache/spark/pull/29125


   partially backport https://github.com/apache/spark/pull/29026



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


MaxGekk commented on a change in pull request #27366:
URL: https://github.com/apache/spark/pull/27366#discussion_r455208221



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonFilters.scala
##
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import org.apache.spark.sql.catalyst.{InternalRow, StructFilters}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.StructType
+
+/**
+ * The class provides API for applying pushed down source filters to rows with
+ * a struct schema parsed from JSON records. The class should be used in this 
way:
+ * 1. Before processing of the next row, `JacksonParser` (parser for short) 
resets the internal
+ *state of `JsonFilters` by calling the `reset()` method.
+ * 2. The parser reads JSON fields one-by-one in streaming fashion. It 
converts an incoming
+ *field value to the desired type from the schema. After that, it sets the 
value to an instance
+ *of `InternalRow` at the position according to the schema. Order of 
parsed JSON fields can
+ *be different from the order in the schema.
+ * 3. Per every JSON field of the top-level JSON object, the parser calls 
`skipRow` by passing
+ *an `InternalRow` in which some of fields can be already set, and the 
position of the JSON
+ *field according to the schema.
+ *3.1 `skipRow` finds a group of predicates that refers to this JSON field.
+ *3.2 Per each predicate from the group, `skipRow` decrements its 
reference counter.
+ *3.2.1 If predicate reference counter becomes 0, it means that all 
predicate attributes have
+ *  been already set in the internal row, and the predicate can be 
applied to it. `skipRow`
+ *  invokes the predicate for the row.
+ *3.3 `skipRow` applies predicates until one of them returns `false`. In 
that case, the method
+ *returns `true` to the parser.
+ *3.4 If all predicates with zero reference counter return `true`, the 
final result of
+ *the method is `false` which tells the parser to not skip the row.
+ * 4. If the parser gets `true` from `JsonFilters.skipRow`, it must not call 
the method anymore
+ *for this internal row, and should go the step 1.
+ *
+ * `JsonFilters` assumes that:
+ *   - `reset()` is called before any `skipRow()` calls for new row.
+ *   - `skipRow()` can be called for any valid index of the struct fields,
+ *  and in any order.
+ *   - After `skipRow()` returns `true`, the internal state of `JsonFilters` 
can be inconsistent,
+ * so, `skipRow()` must not be called for the current row anymore without 
`reset()`.

Review comment:
   Actually, only the first one is applicable to `StructFilters` in 
general. Two other assumptions are `JsonFilters` specific.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-07-15 Thread GitBox


cloud-fan commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-658894290


   cc @dongjoon-hyun @viirya 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity part 1

2020-07-15 Thread GitBox


viirya commented on pull request #29112:
URL: https://github.com/apache/spark/pull/29112#issuecomment-658922003


   "classification, regression, clustering and fpm" instead of "part 1" in the 
title? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


cloud-fan commented on a change in pull request #29015:
URL: https://github.com/apache/spark/pull/29015#discussion_r455246869



##
File path: core/src/main/scala/org/apache/spark/internal/config/UI.scala
##
@@ -191,4 +191,14 @@ private[spark] object UI {
 .version("3.0.0")
 .stringConf
 .createOptional
+
+  val MASTER_UI_DECOMMISSION_ALLOW_MODE = 
ConfigBuilder("spark.master.ui.decommission.allow.mode")
+.doc("Specifies the behavior of the Master Web UI's /workers/kill 
endpoint. Possible choices" +
+  " are: `local` means allow this endpoint from IP's that are local to the 
machine running" +
+  " the Master, `deny` means to completely disable this endpoint, `allow` 
means to allow" +
+  " calling this endpoint from any IP.")
+.internal()
+.version("3.1.0")
+.stringConf
+.createWithDefault("deny")

Review comment:
   shall we use `local` as default? looks safe enough.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


cloud-fan commented on a change in pull request #29015:
URL: https://github.com/apache/spark/pull/29015#discussion_r455247627



##
File path: core/src/main/scala/org/apache/spark/internal/config/UI.scala
##
@@ -191,4 +191,14 @@ private[spark] object UI {
 .version("3.0.0")
 .stringConf
 .createOptional
+
+  val MASTER_UI_DECOMMISSION_ALLOW_MODE = 
ConfigBuilder("spark.master.ui.decommission.allow.mode")
+.doc("Specifies the behavior of the Master Web UI's /workers/kill 
endpoint. Possible choices" +
+  " are: `local` means allow this endpoint from IP's that are local to the 
machine running" +
+  " the Master, `deny` means to completely disable this endpoint, `allow` 
means to allow" +
+  " calling this endpoint from any IP.")
+.internal()
+.version("3.1.0")
+.stringConf

Review comment:
   it's common to always upper case the config value, as it should be case 
insensitive. e.g.
   ```
   ...
 .stringConf
 .transform(_.toUpperCase(Locale.ROOT))
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


aokolnychyi commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658931940


   Yes, my proposal is to optimize cases when we sort the data after the 
repartition like in the examples I gave above. In those cases, sorts below seem 
to be redundant. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


holdenk commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658944693


   All checks pass, I'm going to merge this to our current development branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


holdenk commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658954395


   The SPIP has been voted on, this has been reviewed extensively, the original 
design is from 2017, I'm not waiting unless someone wishes to -1 for a valid 
technical reason.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

2020-07-15 Thread GitBox


frankyin-factual commented on pull request #28898:
URL: https://github.com/apache/spark/pull/28898#issuecomment-658954329


   @dongjoon-hyun friendly bump



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-658986674







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-658986674







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-15 Thread GitBox


dongjoon-hyun edited a comment on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658998506


   Also, here. The green checkbox at the commit id.
   ![Screen Shot 2020-07-15 at 1 41 21 
PM](https://user-images.githubusercontent.com/9700541/87593815-e4910e00-c6a0-11ea-9e09-1c8b68fc8ed2.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29090: [SPARK-32293] Fix inconsistency between Spark memory configs and JVM option

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29090:
URL: https://github.com/apache/spark/pull/29090#issuecomment-659010229







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659015242


   Build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659015242







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659026999







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-659026829







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-659026829


   Build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659026999


   Build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


SparkQA commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659027181


   **[Test build #125911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125911/testReport)**
 for PR 29015 at commit 
[`d8e241f`](https://github.com/apache/spark/commit/d8e241fc492a6a626d6cd00ef1f666fa62ffd178).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-659026841


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125897/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >