[GitHub] [spark] jiangxb1987 commented on a change in pull request #28839: [SPARK-32000][CORE][TESTS] Fix the flaky testcase for partially launched task in barrier-mode.

2020-06-16 Thread GitBox


jiangxb1987 commented on a change in pull request #28839:
URL: https://github.com/apache/spark/pull/28839#discussion_r441298122



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
##
@@ -275,7 +275,8 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
-initLocalClusterSparkContext(2)
+val conf = new SparkConf().set(LOCALITY_WAIT_PROCESS.key, "10s")

Review comment:
   Now that we have waited until all the executors has been launched before 
we submit any jobs, thus upon the first time we try to offer the resources to 
the pending tasks, we should expect exactly one task get launched, this 
shouldn't change following different locality wait time. cc @Ngone51 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645168383







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645168383







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645116392


   **[Test build #124151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124151/testReport)**
 for PR 28847 at commit 
[`c195a7f`](https://github.com/apache/spark/commit/c195a7fb321a646e2d016b173b51983c2bcda4c8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28842: [SPARK-32006][SQL] Create date/timestamp formatters once before collect in `hiveResultString()`

2020-06-16 Thread GitBox


cloud-fan commented on a change in pull request #28842:
URL: https://github.com/apache/spark/pull/28842#discussion_r441297086



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
##
@@ -72,47 +90,32 @@ object HiveResult {
 }
   }
 
-  // We can create the date formatter only once because it does not depend on 
Spark's
-  // session time zone controlled by the SQL config 
`spark.sql.session.timeZone`.
-  // The `zoneId` parameter is used only in parsing of special date values 
like `now`,
-  // `yesterday` and etc. but not in date formatting. While formatting of:
-  // - `java.time.LocalDate`, zone id is not used by `DateTimeFormatter` at 
all.
-  // - `java.sql.Date`, the date formatter delegates formatting to the legacy 
formatter
-  //   which uses the default system time zone `TimeZone.getDefault`. This 
works correctly
-  //   due to `DateTimeUtils.toJavaDate` which is based on the system time 
zone too.
-  private val dateFormatter = DateFormatter(
-format = DateFormatter.defaultPattern,
-// We can set any time zone id. UTC was taken for simplicity.
-zoneId = ZoneOffset.UTC,
-locale = DateFormatter.defaultLocale,
-// Use `FastDateFormat` as the legacy formatter because it is thread-safe.
-legacyFormat = LegacyDateFormats.FAST_DATE_FORMAT,
-isParsing = false)
-  private def timestampFormatter = TimestampFormatter.getFractionFormatter(
-DateTimeUtils.getZoneId(SQLConf.get.sessionLocalTimeZone))
-
   /** Formats a datum (based on the given data type) and returns the string 
representation. */
-  def toHiveString(a: (Any, DataType), nested: Boolean = false): String = a 
match {
+  def toHiveString(
+a: (Any, DataType),

Review comment:
   nit: 4 space indentation





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-645167728







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-645167728







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


SparkQA commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645167716


   **[Test build #124151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124151/testReport)**
 for PR 28847 at commit 
[`c195a7f`](https://github.com/apache/spark/commit/c195a7fb321a646e2d016b173b51983c2bcda4c8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-645093212


   **[Test build #124146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124146/testReport)**
 for PR 27066 at commit 
[`c2f9216`](https://github.com/apache/spark/commit/c2f9216e39152fe978eb13dfdf8fc1ac75f7063c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


SparkQA commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-645167153


   **[Test build #124146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124146/testReport)**
 for PR 27066 at commit 
[`c2f9216`](https://github.com/apache/spark/commit/c2f9216e39152fe978eb13dfdf8fc1ac75f7063c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28778: [SPARK-31949][SQL] Add spark.default.parallelism in SQLConf for isolated across session

2020-06-16 Thread GitBox


cloud-fan commented on pull request #28778:
URL: https://github.com/apache/spark/pull/28778#issuecomment-645167188


   After more thoughts, I think the file partitions split logic itself is 
problematic. Its target is to make the number of partitions the same as the 
total number of cores, which doesn't make sense as the cluster may only have a 
few free cores.
   
   I think a proper way is to set an expected size of each partition, like 
64mb. This is also what we do when coalescing shuffle partitions in AQE. Can we 
add such a config?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645157060


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645156028


   **[Test build #124154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124154/testReport)**
 for PR 28818 at commit 
[`9bb0293`](https://github.com/apache/spark/commit/9bb0293125af3dcde87b52be5bf0ba4dcc5f4a0f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645157066


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124154/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645157060







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


SparkQA commented on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645157055


   **[Test build #124154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124154/testReport)**
 for PR 28818 at commit 
[`9bb0293`](https://github.com/apache/spark/commit/9bb0293125af3dcde87b52be5bf0ba4dcc5f4a0f).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645156370







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645156370







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28818: [WIP][SPARK-31198][CORE] Use graceful decommissioning as part of dynamic scaling

2020-06-16 Thread GitBox


SparkQA commented on pull request #28818:
URL: https://github.com/apache/spark/pull/28818#issuecomment-645156028


   **[Test build #124154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124154/testReport)**
 for PR 28818 at commit 
[`9bb0293`](https://github.com/apache/spark/commit/9bb0293125af3dcde87b52be5bf0ba4dcc5f4a0f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-64515


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124153/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645152215


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645151419


   **[Test build #124153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124153/testReport)**
 for PR 28848 at commit 
[`4e976ab`](https://github.com/apache/spark/commit/4e976ab16e922dfe28125798461e45afaa1d62a7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645152215







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


SparkQA commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645152202


   **[Test build #124153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124153/testReport)**
 for PR 28848 at commit 
[`4e976ab`](https://github.com/apache/spark/commit/4e976ab16e922dfe28125798461e45afaa1d62a7).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645149561







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


SparkQA commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645151419


   **[Test build #124153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124153/testReport)**
 for PR 28848 at commit 
[`4e976ab`](https://github.com/apache/spark/commit/4e976ab16e922dfe28125798461e45afaa1d62a7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on a change in pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


wypoon commented on a change in pull request #28848:
URL: https://github.com/apache/spark/pull/28848#discussion_r441280737



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
##
@@ -540,6 +540,46 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 assert(mapStatus2(2).location.host === "hostB")
   }
 
+  test("[SPARK-32003] All shuffle files for executor should be cleaned up on 
fetch failure") {
+// reset the test context with the right shuffle service config
+afterEach()
+val conf = new SparkConf()
+conf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true")
+init(conf)
+
+val shuffleMapRdd = new MyRDD(sc, 3, Nil)
+val shuffleDep = new ShuffleDependency(shuffleMapRdd, new 
HashPartitioner(3))
+val shuffleId = shuffleDep.shuffleId
+val reduceRdd = new MyRDD(sc, 3, List(shuffleDep), tracker = 
mapOutputTracker)
+
+submit(reduceRdd, Array(0, 1, 2))
+// Map stage completes successfully,
+// two tasks are run on an executor on hostA and one on an executor on 
hostB
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 3)),
+  (Success, makeMapStatus("hostA", 3)),
+  (Success, makeMapStatus("hostB", 3
+// Now the executor on hostA is lost
+runEvent(ExecutorLost("hostA-exec", ExecutorExited(-100, false, "Container 
marked as failed")))
+
+// The MapOutputTracker has all the shuffle files
+val initialMapStatuses = 
mapOutputTracker.shuffleStatuses(shuffleId).mapStatuses
+assert(initialMapStatuses.count(_ != null) == 3)
+assert(initialMapStatuses(0).location.executorId === "hostA-exec")
+assert(initialMapStatuses(1).location.executorId === "hostA-exec")
+assert(initialMapStatuses(2).location.executorId === "hostB-exec")
+
+// Now a fetch failure from the lost executor occurs
+complete(taskSets(1), Seq(
+  (FetchFailed(makeBlockManagerId("hostA"), shuffleId, 0L, 0, 0, 
"ignored"), null)
+))
+
+// Shuffle files for hostA-exec should be lost
+val mapStatuses = mapOutputTracker.shuffleStatuses(shuffleId).mapStatuses
+assert(mapStatuses.count(_ != null) == 1)

Review comment:
   Without the change this part fails.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon commented on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


wypoon commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645149861


   @attilapiros 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645149534







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645149534







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-645149561







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


SparkQA commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645149223


   **[Test build #124152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124152/testReport)**
 for PR 28846 at commit 
[`523e1d5`](https://github.com/apache/spark/commit/523e1d592beddf90331f77f57aff50af9dfea12b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL] Nested column predicate pushdown for ORC

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-645149128







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wypoon opened a new pull request #28848: [SPARK-32003][CORE] Unregister outputs for executor on fetch failure …

2020-06-16 Thread GitBox


wypoon opened a new pull request #28848:
URL: https://github.com/apache/spark/pull/28848


   …after executor is lost
   
   ### What changes were proposed in this pull request?
   
   If an executor is lost, the `DAGScheduler` handles the executor loss by 
removing the executor but does not unregister its outputs if the external 
shuffle service is used. However, if the node on which the executor runs is 
lost, the shuffle service may not be able to serve the shuffle files.
   In such a case, when fetches from the executor's outputs fail in the same 
stage, the `DAGScheduler` again removes the executor and by right, should 
unregister its outputs. It doesn't because the epoch used to track the executor 
failure has not increased.
   
   We track the epoch for failed executors that result in lost file output 
separately, so we can unregister the outputs in this scenario. The idea to 
track a second epoch is due to Attila Zsolt Piros.
   
   ### Why are the changes needed?
   
   Without the changes, the loss of a node could require two stage attempts to 
recover instead of one.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New unit test. This test fails without the change and passes with it.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28761: [SPARK-25557][SQL] Nested column predicate pushdown for ORC

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-645149128







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28761: [SPARK-25557][SQL] Nested column predicate pushdown for ORC

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-645067909


   **[Test build #124143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124143/testReport)**
 for PR 28761 at commit 
[`bd691ed`](https://github.com/apache/spark/commit/bd691ed16eade2e63c0fdd8d2bbd88282f6c4662).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL] Nested column predicate pushdown for ORC

2020-06-16 Thread GitBox


SparkQA commented on pull request #28761:
URL: https://github.com/apache/spark/pull/28761#issuecomment-645148461


   **[Test build #124143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124143/testReport)**
 for PR 28761 at commit 
[`bd691ed`](https://github.com/apache/spark/commit/bd691ed16eade2e63c0fdd8d2bbd88282f6c4662).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645146303


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124149/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645146298


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645102759


   **[Test build #124149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124149/testReport)**
 for PR 28846 at commit 
[`e171a6c`](https://github.com/apache/spark/commit/e171a6cf65a6d29ed6bdc2d961effded185f9cbd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645146298







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


SparkQA commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645146152


   **[Test build #124149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124149/testReport)**
 for PR 28846 at commit 
[`e171a6c`](https://github.com/apache/spark/commit/e171a6cf65a6d29ed6bdc2d961effded185f9cbd).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-16 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r441273689



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1238,6 +1239,18 @@ package object config {
 s"The value must be in allowed range [1,048,576, 
${MAX_BUFFER_SIZE_BYTES}].")
   .createWithDefault(1024 * 1024)
 
+  private[spark] val UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO =
+ConfigBuilder("spark.unsafe.sorter.spill.reader.buffer.size.ratio")
+  .doc("The multiplication ratio is the parameter that controls the 
initial read buffer " +
+"size. The multiplication ratio value range is from 1 to 1024. This 
parameter changes "  +
+"the initial read buffer size in 1KB increments. It will result in the 
initial buffer " +
+"size in the range from 1KB to 1MB. The read buffer size is 
dynamically adjusted " +
+"afterward based on data length read from the spilled file.")
+  .intConf
+  .checkValue(v => 1 <= v && v <= DEFAULT_BUFFER_SIZE_RATIO,
+s"The value must be in allowed range [1, 
${DEFAULT_BUFFER_SIZE_RATIO}].")
+  .createWithDefault(DEFAULT_BUFFER_SIZE_RATIO)

Review comment:
   Just need clarification. SQLConf is member of catalist package/project. 
I looked all classes involved in this change and I do not see usage of SQLConf. 
I would expect that new parameter UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO 
is together with existing 
package$.MODULE$.UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE parameter  in 
package.scala. Also, UnsafeSorterSpillReader is in core package/project. I just 
want to be sure that we want UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO to be 
in SQLConf. Could you please confirm?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] siknezevic commented on a change in pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-16 Thread GitBox


siknezevic commented on a change in pull request #27246:
URL: https://github.com/apache/spark/pull/27246#discussion_r441273689



##
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##
@@ -1238,6 +1239,18 @@ package object config {
 s"The value must be in allowed range [1,048,576, 
${MAX_BUFFER_SIZE_BYTES}].")
   .createWithDefault(1024 * 1024)
 
+  private[spark] val UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO =
+ConfigBuilder("spark.unsafe.sorter.spill.reader.buffer.size.ratio")
+  .doc("The multiplication ratio is the parameter that controls the 
initial read buffer " +
+"size. The multiplication ratio value range is from 1 to 1024. This 
parameter changes "  +
+"the initial read buffer size in 1KB increments. It will result in the 
initial buffer " +
+"size in the range from 1KB to 1MB. The read buffer size is 
dynamically adjusted " +
+"afterward based on data length read from the spilled file.")
+  .intConf
+  .checkValue(v => 1 <= v && v <= DEFAULT_BUFFER_SIZE_RATIO,
+s"The value must be in allowed range [1, 
${DEFAULT_BUFFER_SIZE_RATIO}].")
+  .createWithDefault(DEFAULT_BUFFER_SIZE_RATIO)

Review comment:
   Just need clarification. SQLConf is member of catalist package/project. 
I looked all classes involved in this change and I do not see usage of SQLConf. 
I would expect that new parameter UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO 
is together with existing 
package$.MODULE$.UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE parameter  in 
package.scala. I just want to be sure that we want 
UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE_RATIO to be in SQLConf. Could you please 
confirm?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cchighman closed pull request #28841: [SPARK-31962][SQL] Provide option to load files after a specified date when reading from a folder path

2020-06-16 Thread GitBox


cchighman closed pull request #28841:
URL: https://github.com/apache/spark/pull/28841


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cchighman closed pull request #28841: [SPARK-31962][SQL] Provide option to load files after a specified date when reading from a folder path

2020-06-16 Thread GitBox


cchighman closed pull request #28841:
URL: https://github.com/apache/spark/pull/28841


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


dongjoon-hyun closed pull request #28847:
URL: https://github.com/apache/spark/pull/28847


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


dongjoon-hyun commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645134832


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


dongjoon-hyun commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645122517


   Thank you, @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


HyukjinKwon closed pull request #28845:
URL: https://github.com/apache/spark/pull/28845


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


HyukjinKwon commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645120469


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


HyukjinKwon commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645120390


   Thanks @WeichenXu123 for review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


HyukjinKwon commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645120346


   Let me just merge and go ahead since it's just all about a documentation and 
warning at the end where we can change without many restrictions.
   
   Let me know if you guys have any concern on this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28827: [SPARK-31989][SQL] Generate JSON rebasing files w/ 30 minutes step

2020-06-16 Thread GitBox


HyukjinKwon closed pull request #28827:
URL: https://github.com/apache/spark/pull/28827


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28827: [SPARK-31989][SQL] Generate JSON rebasing files w/ 30 minutes step

2020-06-16 Thread GitBox


HyukjinKwon commented on pull request #28827:
URL: https://github.com/apache/spark/pull/28827#issuecomment-645119748


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645114911







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


SparkQA commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645116392


   **[Test build #124151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124151/testReport)**
 for PR 28847 at commit 
[`c195a7f`](https://github.com/apache/spark/commit/c195a7fb321a646e2d016b173b51983c2bcda4c8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


dongjoon-hyun commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645115062


   cc @tgravescs 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28847:
URL: https://github.com/apache/spark/pull/28847#issuecomment-645114911







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #28847: [SPARK-29148][CORE][FOLLOWUP] Fix warning message to show a correct executor id

2020-06-16 Thread GitBox


dongjoon-hyun opened a new pull request #28847:
URL: https://github.com/apache/spark/pull/28847


   ### What changes were proposed in this pull request?
   
   This aims to replace `executorIdsToBeRemoved` with `executorIdToBeRemoved`.
   
   ### Why are the changes needed?
   
   Since a wrong variable is used currently, `ArrayBuffer()` is always 
displayed.
   ```
   20/06/16 19:33:31 WARN ExecutorAllocationManager: Not removing executor 
ArrayBuffer() because the ResourceProfile was UNKNOWN!
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manual.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28835:
URL: https://github.com/apache/spark/pull/28835#issuecomment-645106680







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 edited a comment on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


fqaiser94 edited a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-645047815


   @cloud-fan I've made fairly significant changes to the implementation in 
order to address the issue where poorly optimized physical plans resulted in 
massive amounts of java code being generated. 
   
   Firstly, I've changed `WithFields` Expression so it only supports 
adding/replacing top level fields in any given struct.  To add/replace a nested 
field using this expression, you will need to use `WithFields` in a nested 
fashion (similar to what is being done in the `withFieldHelper` function). This 
change is necessary to facilitate easy optimization. 
   
   I've added 2 new optimization rules: 
   - In `SimplifyExtractValueOps` I've added another case statement to simplify 
`GetStructField` Expressions that are operating on `WithFields` Expressions
   - I've added a new rule `CombineWithFields` to collapse adjacent 
`WithFields` Expression into a single `WithFields` Expression
   
   These 2 rules are executed until we reach a fixed point. Only after that do 
we execute the third rule to transform `WithFields` Expressions to 
`CreateNamedStruct` Expressions. This appears to be enough to remove all 
unnecessary nested if-null-else statements. 
   
   Below is a ridiculous example of a spark-sql query where we're 
adding/replacing different fields at different levels of nesting in a struct 
column. You can see how these optimizer rules are able to simplify the parsed 
logical plan into an optimized logical plan: 
   ```
   val structLevel3 = StructType(Seq(
 StructField("a3", IntegerType, nullable = false),
 StructField("b3", IntegerType, nullable = false)))
   
   val structLevel2 = StructType(Seq(
 StructField("a2", structLevel3, nullable = true),
 StructField("b2", structLevel3, nullable = true)))
   
   val structLevel1 = StructType(Seq(
 StructField("a1", structLevel2, nullable = true),
 StructField("b1", structLevel2, nullable = true)))
   
   val df = spark.createDataFrame(
 sparkContext.parallelize(Row(Row(Row(Row(1, 2), Row(3, 4)), Row(Row(3, 4), 
Row(5, 6 ::
   Row(Row(Row(null, Row(3, 4)), Row(Row(3, 4), Row(5, 6 ::
   Row(Row(null, Row(Row(3, 4), Row(5, 6 ::
   Row(Row(null, null)) :: Nil),
 StructType(Seq(StructField("a", structLevel1, nullable = false)))
   )
   
   val result = df.withColumn("a", $"a"
 .withField("a1", $"a.a1".withField("c1", lit("hello")))
 .withField("b1.a2", lit(1000))
 .withField("b1.b2.d3", $"a.b1.b2.a3" * 10)
 .withField("b1Original", $"a.b1")
   )
   
   result.explain(true)
   
   // == Parsed Logical Plan ==
   // 'Project [with_fields(with_fields(with_fields(with_fields('a, a1, 
with_fields('a.a1, c1, hello)), b1, with_fields(with_fields('a, a1, 
with_fields('a.a1, c1, hello))[b1], a2, 1000)), b1, 
with_fields(with_fields(with_fields('a, a1, with_fields('a.a1, c1, hello)), b1, 
with_fields(with_fields('a, a1, with_fields('a.a1, c1, hello))[b1], a2, 
1000))[b1], b2, with_fields(with_fields(with_fields('a, a1, with_fields('a.a1, 
c1, hello)), b1, with_fields(with_fields('a, a1, with_fields('a.a1, c1, 
hello))[b1], a2, 1000))[b1][b2], d3, ('a.b1.b2.a3 * 10, b1Original, 'a.b1) 
AS a#14]
   // +- LogicalRDD [a#12], false
   // 
   // == Analyzed Logical Plan ==
   // a: 
struct,b2:struct,c1:string>,b1:struct>,b1Original:struct,b2:struct>>
   // Project [with_fields(with_fields(with_fields(with_fields(a#12, a1, 
with_fields(a#12.a1, c1, hello)), b1, with_fields(with_fields(a#12, a1, 
with_fields(a#12.a1, c1, hello)).b1, a2, 1000)), b1, 
with_fields(with_fields(with_fields(a#12, a1, with_fields(a#12.a1, c1, hello)), 
b1, with_fields(with_fields(a#12, a1, with_fields(a#12.a1, c1, hello)).b1, a2, 
1000)).b1, b2, with_fields(with_fields(with_fields(a#12, a1, 
with_fields(a#12.a1, c1, hello)), b1, with_fields(with_fields(a#12, a1, 
with_fields(a#12.a1, c1, hello)).b1, a2, 1000)).b1.b2, d3, (a#12.b1.b2.a3 * 
10, b1Original, a#12.b1) AS a#14]
   // +- LogicalRDD [a#12], false
   // 
   // == Optimized Logical Plan ==
   // Project [named_struct(a1, if (isnull(a#12.a1)) null else named_struct(a2, 
a#12.a1.a2, b2, a#12.a1.b2, c1, hello), b1, if (isnull(a#12.b1)) null else 
named_struct(a2, 1000, b2, if (isnull(a#12.b1.b2)) null else named_struct(a3, 
a#12.b1.b2.a3, b3, a#12.b1.b2.b3, d3, (a#12.b1.b2.a3 * 10))), b1Original, 
a#12.b1) AS a#14]
   // +- LogicalRDD [a#12], false
   // 
   // == Physical Plan ==
   // *(1) Project [named_struct(a1, if (isnull(a#12.a1)) null else 
named_struct(a2, a#12.a1.a2, b2, a#12.a1.b2, c1, hello), b1, if 
(isnull(a#12.b1)) null else named_struct(a2, 1000, b2, if (isnull(a#12.b1.b2)) 
null else named_struct(a3, a#12.b1.b2.a3, b3, a#12.b1.b2.b3, d3, (a#12.b1.b2.a3 
* 10))), b1Original, a#12.b1) AS a#14]
   // +- *(1) Scan ExistingRDD[a#12]
   
   result.show(false)
   
   // 

[GitHub] [spark] AmplabJenkins commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28835:
URL: https://github.com/apache/spark/pull/28835#issuecomment-645106680







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore

2020-06-16 Thread GitBox


SparkQA commented on pull request #28835:
URL: https://github.com/apache/spark/pull/28835#issuecomment-645106412


   **[Test build #124150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124150/testReport)**
 for PR 28835 at commit 
[`5ce343a`](https://github.com/apache/spark/commit/5ce343aef6e17eaedb258f19f21823015e8066ce).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645105255







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645105255







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645095283


   **[Test build #124147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124147/testReport)**
 for PR 28845 at commit 
[`5fee577`](https://github.com/apache/spark/commit/5fee577a5f298763e17770b6b0e4e3fb08ecb9d8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


SparkQA commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645104956


   **[Test build #124147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124147/testReport)**
 for PR 28845 at commit 
[`5fee577`](https://github.com/apache/spark/commit/5fee577a5f298763e17770b6b0e4e3fb08ecb9d8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #28835: [WIP][SPARK-31926][TESTS][FOLLOWUP][test-maven] Cleanup the thread local variable of hive metastore

2020-06-16 Thread GitBox


yaooqinn commented on pull request #28835:
URL: https://github.com/apache/spark/pull/28835#issuecomment-645104925


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645102991







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645102991







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


SparkQA commented on pull request #28846:
URL: https://github.com/apache/spark/pull/28846#issuecomment-645102759


   **[Test build #124149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124149/testReport)**
 for PR 28846 at commit 
[`e171a6c`](https://github.com/apache/spark/commit/e171a6cf65a6d29ed6bdc2d961effded185f9cbd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #28846: [SPARK-32012][SQL] Incrementally create and materialize query stage to avoid unnecessary local shuffle

2020-06-16 Thread GitBox


viirya opened a new pull request #28846:
URL: https://github.com/apache/spark/pull/28846


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch changes the current way of creating query stages in AQE. Instead 
of creating query stages in batch, incrementally creating query stage can bring 
the optimization in earlier. It could avoid unnecessary local shuffle.
   
   ### Why are the changes needed?
   
   
   The current way of creating query stage in AQE is in batch. For example, the 
children of a sort merge join will be materialized as query stages in a batch. 
Then AQE brings the optimization in and optimize sort merge join to broadcast 
join. Except for the broadcasted exchange, we don't need do any exchange on 
another side of join but we already materialized the exchange. Currently AQE 
wraps the materialized exchange with local reader, but it still brings 
unnecessary I/O. We can avoid unnecessary local shuffle by incrementally 
creating query stage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Unit tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645096303


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124148/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


SparkQA removed a comment on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645095273


   **[Test build #124148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124148/testReport)**
 for PR 28844 at commit 
[`a010071`](https://github.com/apache/spark/commit/a01007151bd1e047130757ad8a09c30744cc443d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645096296


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645096296







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


SparkQA commented on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645096285


   **[Test build #124148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124148/testReport)**
 for PR 28844 at commit 
[`a010071`](https://github.com/apache/spark/commit/a01007151bd1e047130757ad8a09c30744cc443d).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645095627







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645095627







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


SparkQA commented on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645095273


   **[Test build #124148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124148/testReport)**
 for PR 28844 at commit 
[`a010071`](https://github.com/apache/spark/commit/a01007151bd1e047130757ad8a09c30744cc443d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


SparkQA commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645095283


   **[Test build #124147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124147/testReport)**
 for PR 28845 at commit 
[`5fee577`](https://github.com/apache/spark/commit/5fee577a5f298763e17770b6b0e4e3fb08ecb9d8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


HyukjinKwon commented on pull request #28845:
URL: https://github.com/apache/spark/pull/28845#issuecomment-645094746


   cc @squito too FYI



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #28845: [SPARK-32011][PYTHON][CORE] Remove warnings about pin-thread modes and guide to use collectWithJobGroup

2020-06-16 Thread GitBox


HyukjinKwon opened a new pull request #28845:
URL: https://github.com/apache/spark/pull/28845


   ### What changes were proposed in this pull request?
   
   This PR proposes to remove the warning about multi-thread in local 
properties, and change the guide to use `collectWithJobGroup` for multi-threads 
for now because:
   - It is too noisy to users who don't use multiple threads - the number of 
this single thread case is arguably more prevailing.
   - There was a critical issue found about pin-thread mode SPARK-32010, which 
will be fixed in Spark 3.1.
   - To smoothly migrate, `RDD.collectWithJobGroup` was added, which will be 
deprecated in Spark 3.1 with SPARK-32010 fixed.
   
   I will target to deprecate `RDD.collectWithJobGroup`, and make this 
pin-thread mode stable in Spark 3.1. In the future releases, I plan to make 
this mode as a default mode, and remove `RDD.collectWithJobGroup` away.
   
   ### Why are the changes needed?
   
   To avoid guiding users a feature with a critical issue, and provide a proper 
workaround for now.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, warning message and documentation.
   
   ### How was this patch tested?
   
   Manually tested:
   
   Before:
   
   ```
   >>> spark.sparkContext.setLocalProperty("a", "b")
   /.../spark/python/pyspark/util.py:141: UserWarning: Currently, 
'setLocalProperty' (set to local 
   properties) with multiple threads does not properly work.
   Internally threads on PVM and JVM are not synced, and JVM thread can be 
reused for multiple 
   threads on PVM, which fails to isolate local properties for each thread on 
PVM.
   To work around this, you can set PYSPARK_PIN_THREAD to true (see 
SPARK-22340). However, 
   note that it cannot inherit the local properties from the parent thread 
although it isolates each 
   thread on PVM and JVM with its own local properties.
   To work around this, you should manually copy and set the local properties 
from the parent thread
to the child thread when you create another thread.
   ```
   
   After:
   ```
   >>> spark.sparkContext.setLocalProperty("a", "b")
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r441230512



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
##
@@ -452,4 +453,91 @@ class ComplexTypesSuite extends PlanTest with 
ExpressionEvalHelper {
 checkEvaluation(GetMapValue(mb0, Literal(Array[Byte](2, 1), BinaryType)), 
"2")
 checkEvaluation(GetMapValue(mb0, Literal(Array[Byte](3, 4))), null)
   }
+
+  test("simplify GetStructField on WithFields that is not changing the 
attribute being extracted") {
+val query = relation.select(
+  GetStructField(
+WithFields('id, Seq("c"), Seq(Literal(1))),
+0,
+Some("a")) as "outerAtt")
+val expected = relation.select(GetStructField('id, 0, Some("a")) as 
"outerAtt")
+
+checkRule(query, expected)
+  }
+
+  test("simplify GetStructField on WithFields that is changing the attribute 
being extracted") {
+val query = relation.select(
+  GetStructField(
+WithFields('id, Seq("c"), Seq(Literal(1))),
+0,
+Some("c")) as "outerAtt")
+val expected = relation.select(Literal(1) as "outerAtt")
+
+checkRule(query, expected)
+  }
+
+  test(
+"simplify GetStructField on WithFields that is changing the attribute 
being extracted twice") {
+val query = relation.select(
+  GetStructField(
+WithFields('id, Seq("c", "c"), Seq(Literal(1), Literal(2))),
+0,
+Some("c")) as "outerAtt")
+val expected = relation.select(Literal(2) as "outerAtt")
+
+checkRule(query, expected)
+  }
+
+  test("collapse multiple GetStructField on the same WithFields") {
+val query = relation
+  .select(CreateNamedStruct(Seq("att1", 'id, "att2", 'id * 'id)) as 
"struct1")
+  .select(WithFields('struct1, Seq("att3"), Seq(Literal(3))) as "struct2")
+  .select(
+GetStructField('struct2, 0, Some("att1")) as "struct1Att1",
+GetStructField('struct2, 1, Some("att2")) as "struct1Att2",
+GetStructField('struct2, 2, Some("att3")) as "struct1Att3")
+
+val expected = relation
+  .select(
+'id as "struct1Att1",
+('id * 'id) as "struct1Att2",
+Literal(3) as "struct1Att3")
+
+checkRule(query, expected)
+  }
+
+  test("collapse multiple GetStructField on different WithFields") {
+val query = relation
+  .select(CreateNamedStruct(Seq("att1", 'id)) as "struct1")
+  .select(
+WithFields('struct1, Seq("att2"), Seq(Literal(2))) as "struct2",
+WithFields('struct1, Seq("att2"), Seq(Literal(3))) as "struct3")
+  .select(
+GetStructField('struct2, 0, Some("att1")) as "struct2Att1",
+GetStructField('struct2, 1, Some("att2")) as "struct2Att2",
+GetStructField('struct3, 0, Some("att1")) as "struct3Att1",
+GetStructField('struct3, 1, Some("att2")) as "struct3Att2")
+
+val expected = relation
+  .select(
+'id as "struct2Att1",
+Literal(2) as "struct2Att2",
+'id as "struct3Att1",
+Literal(3) as "struct3Att2")
+
+checkRule(query, expected)
+  }
+
+  test("WIP write tests for ensuring case sensitivity is respected") {

Review comment:
   please bear with me, I still need to finish writing the unit tests for 
this rule, hopefully later this week otherwise during the weekend





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add refresh function command

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-645093592







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


AmplabJenkins removed a comment on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645093544







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645093544







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add refresh function command

2020-06-16 Thread GitBox


AmplabJenkins commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-645093592







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28840: [SPARK-31999][SQL] Add refresh function command

2020-06-16 Thread GitBox


maropu commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r441229539



##
File path: docs/sql-ref-syntax-aux-refresh-function.md
##
@@ -0,0 +1,59 @@
+---
+layout: global
+title: REFRESH FUNCTION
+displayTitle: REFRESH FUNCTION
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description

Review comment:
   cc: @huaxingao 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r441229325



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
##
@@ -39,7 +41,18 @@ object SimplifyExtractValueOps extends Rule[LogicalPlan] {
   // Remove redundant field extraction.
   case GetStructField(createNamedStruct: CreateNamedStruct, ordinal, _) =>
 createNamedStruct.valExprs(ordinal)
-
+  case GetStructField(WithFields(struct, nameExprs, valExprs), ordinal, 
maybeName) =>
+val extractFieldName = maybeName.getOrElse(
+  struct.dataType.asInstanceOf[StructType](ordinal).name)
+val resolver = SQLConf.get.resolver
+val names = nameExprs.map(e => e.eval().toString)
+if (names.exists(n => resolver(n, extractFieldName))) {

Review comment:
   You're right, this is more concise. To give the same behaviour though, I 
had to change it slightly: 
   ```
   val matches = names.zip(valExprs).filter { case (name, _) => resolver(name, 
extractFieldName) }
   if (matches.nonEmpty) {
 matches.last._2
   } else {
 GetStructField(struct, ordinal, Some(extractFieldName))
   }
   ```
   Reason: It's possible for there to be multiple matches and we only want the 
last one. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


SparkQA commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-645093212


   **[Test build #124146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124146/testReport)**
 for PR 27066 at commit 
[`c2f9216`](https://github.com/apache/spark/commit/c2f9216e39152fe978eb13dfdf8fc1ac75f7063c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add refresh function command

2020-06-16 Thread GitBox


SparkQA commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-645093211


   **[Test build #124145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124145/testReport)**
 for PR 28840 at commit 
[`3fc807e`](https://github.com/apache/spark/commit/3fc807e3f4ae62d516b00f7d55ea48919b039754).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #28844: [SPARK-32009][ML][PySpark] Remove deprecated method BisectingKMeansModel.computeCost

2020-06-16 Thread GitBox


huaxingao commented on pull request #28844:
URL: https://github.com/apache/spark/pull/28844#issuecomment-645093196


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #28840: [SPARK-31999][SQL] Add refresh function command

2020-06-16 Thread GitBox


ulysses-you commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r441228944



##
File path: docs/sql-ref-syntax-aux-refresh-function.md
##
@@ -0,0 +1,59 @@
+---
+layout: global
+title: REFRESH FUNCTION
+displayTitle: REFRESH FUNCTION
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description
+
+`REFRESH FUNCTION` statement invalidates the cached entries, which include 
class name
+and resource location of the given function. The invalidated cache is 
populated right now.

Review comment:
   A little difference with `refresh table`, it's light to populate 
function cache right now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-06-16 Thread GitBox


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r441228301



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -22,7 +22,9 @@ import 
org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion}
 import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.{FUNC_ALIAS, 
FunctionBuilder}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser

Review comment:
   Fixed. 

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -539,3 +541,82 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces field in struct by name.
+ */
+case class WithFields(
+  structExpr: Expression,

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vanzin closed pull request #28635: [SPARK-31337][SQL]Support MS SQL Kerberos login in JDBC connector

2020-06-16 Thread GitBox


vanzin closed pull request #28635:
URL: https://github.com/apache/spark/pull/28635


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vanzin commented on pull request #28635: [SPARK-31337][SQL]Support MS SQL Kerberos login in JDBC connector

2020-06-16 Thread GitBox


vanzin commented on pull request #28635:
URL: https://github.com/apache/spark/pull/28635#issuecomment-645091699


   Merging to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #28831: [SPARK-31993][SQL] Evaluate children code whenever needed in both varargCounts/varargBuilds for 'concat_ws' for mixed string/array

2020-06-16 Thread GitBox


HeartSaVioR edited a comment on pull request #28831:
URL: https://github.com/apache/spark/pull/28831#issuecomment-644538409


   cc. @kiszk @viirya as initial reviewers as I can see they're contributed the 
code part.
   cc. @bersprockets who helped to review initial patch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #28840: [SPARK-31999][SQL] Add refresh function command

2020-06-16 Thread GitBox


ulysses-you commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-645083564


   Hive support `reload functions` that reload all function.
   
   `refresh function` just like `refresh table`, invalid cache for one function.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >