[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629982682


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122785/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629982670







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


SparkQA commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629982536


   **[Test build #122785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122785/testReport)**
 for PR 27066 at commit 
[`c99c086`](https://github.com/apache/spark/commit/c99c086a2796aef8727328d15f13f9ecb0dc2977).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629943931


   **[Test build #122785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122785/testReport)**
 for PR 27066 at commit 
[`c99c086`](https://github.com/apache/spark/commit/c99c086a2796aef8727328d15f13f9ecb0dc2977).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629982670


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #28554: [SPARK-31735][CORE] Include date/timestamp in the summary report

2020-05-17 Thread GitBox


Fokko commented on a change in pull request #28554:
URL: https://github.com/apache/spark/pull/28554#discussion_r426402521



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
##
@@ -264,7 +264,10 @@ object StatFunctions extends Logging {
 }
 
 val selectedCols = ds.logicalPlan.output
-  .filter(a => a.dataType.isInstanceOf[NumericType] || 
a.dataType.isInstanceOf[StringType])
+  .filter(a => a.dataType.isInstanceOf[NumericType]
+|| a.dataType.isInstanceOf[StringType]
+|| a.dataType.isInstanceOf[DateType]

Review comment:
   I'm working on getting the test suite running on my machine, so I need 
some time. I don't think that the `mean` will be the issue, this is just the 
element in the middle of the sorted collection, however, the `stddev` will be 
tricky. For the StringType this is just `null`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-629978159







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-629978159







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-17 Thread GitBox


Ngone51 commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r426398285



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1901,58 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+@volatile private var stopped = false
+private val sleepInterval = conf.get(
+  config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+
+private val blockReplicationThread = new Thread {
+  override def run(): Unit = {
+var failures = 0
+while (blockManagerDecommissioning
+  && !stopped
+  && !Thread.interrupted()

Review comment:
   I believe in normal cases like `wait`, `sleep`, the status will be 
cleared according to JDK doc:
   
   ```
   * @throws  InterruptedException
   *  if any thread has interrupted the current thread. The
   *  interrupted status of the current thread is
   *  cleared when this exception is thrown.
   ```
   
   And even if the status is not cleared in other cases, the following 
`Thread.sleep(sleepInterval)` will throw `InterruptedException` firstly and set 
`stopped` to true and `Thread.interrupted()` still does not take effect.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-17 Thread GitBox


Ngone51 commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r426398285



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1901,58 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+@volatile private var stopped = false
+private val sleepInterval = conf.get(
+  config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+
+private val blockReplicationThread = new Thread {
+  override def run(): Unit = {
+var failures = 0
+while (blockManagerDecommissioning
+  && !stopped
+  && !Thread.interrupted()

Review comment:
   I believe in normal cases like `wait`, `sleep`, the status will be 
cleared according to JDK doc:
   
   ```
   * @throws  InterruptedException
   *  if any thread has interrupted the current thread. The
   *  interrupted status of the current thread is
   *  cleared when this exception is thrown.
   ```
   
   And even if the status is not be cleared in other cases, the following 
`Thread.sleep(sleepInterval)` will throw `InterruptedException` firstly and set 
`stopped` to true and `Thread.interrupted()` still does not take effect.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-629890268


   **[Test build #122771 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122771/testReport)**
 for PR 28208 at commit 
[`2f9522f`](https://github.com/apache/spark/commit/2f9522f4da807260b133085dd70011785f7823c5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-05-17 Thread GitBox


SparkQA commented on pull request #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-629977464


   **[Test build #122771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122771/testReport)**
 for PR 28208 at commit 
[`2f9522f`](https://github.com/apache/spark/commit/2f9522f4da807260b133085dd70011785f7823c5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426394163



##
File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt
##
@@ -0,0 +1,11 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4
+Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+insert hive table benchmark:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+INSERT INTO DYNAMIC7346   7470 
175  0.0  717423.0   1.0X
+INSERT INTO HYBRID 1179   1188 
 13  0.0  115184.2   6.2X
+INSERT INTO STATIC  344367 
 48  0.0   33585.1  21.4X
+INSERT OVERWRITE DYNAMIC   7656   7714 
 82  0.0  747622.7   1.0X
+INSERT OVERWRITE HYBRID1179   1183 
  6  0.0  115163.3   6.2X
+INSERT OVERWRITE STATIC 400408 
 10  0.0   39014.2  18.4X

Review comment:
   ```sql
   -INSERT INTO DYNAMIC7742   7918  
   248  0.0  756044.0   1.0X
   -INSERT INTO HYBRID 1289   1307  
26  0.0  125866.3   6.0X
   -INSERT INTO STATIC  371393  
38  0.0   36219.4  20.9X
   -INSERT OVERWRITE DYNAMIC   8456   8554  
   138  0.0  825790.3   0.9X
   -INSERT OVERWRITE HYBRID1303   1311  
12  0.0  127198.4   5.9X
   -INSERT OVERWRITE STATIC 434447  
13  0.0   42373.8  17.8X
   +INSERT INTO DYNAMIC7382   7456  
   105  0.0  720904.8   1.0X
   +INSERT INTO HYBRID 1128   1129  
 1  0.0  110169.4   6.5X
   +INSERT INTO STATIC  349370  
39  0.0   34095.4  21.1X
   +INSERT OVERWRITE DYNAMIC   8149   8362  
   301  0.0  795821.8   0.9X
   +INSERT OVERWRITE HYBRID1317   1318  
 2  0.0  128616.7   5.6X
   +INSERT OVERWRITE STATIC 387408  
37  0.0   37804.1  19.1X
   ```
   
  + for master
  - for this PR
   both using hive 2.3.7





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426394163



##
File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt
##
@@ -0,0 +1,11 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4
+Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+insert hive table benchmark:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+INSERT INTO DYNAMIC7346   7470 
175  0.0  717423.0   1.0X
+INSERT INTO HYBRID 1179   1188 
 13  0.0  115184.2   6.2X
+INSERT INTO STATIC  344367 
 48  0.0   33585.1  21.4X
+INSERT OVERWRITE DYNAMIC   7656   7714 
 82  0.0  747622.7   1.0X
+INSERT OVERWRITE HYBRID1179   1183 
  6  0.0  115163.3   6.2X
+INSERT OVERWRITE STATIC 400408 
 10  0.0   39014.2  18.4X

Review comment:
   ```sql
   -INSERT INTO DYNAMIC7742   7918  
   248  0.0  756044.0   1.0X
   -INSERT INTO HYBRID 1289   1307  
26  0.0  125866.3   6.0X
   -INSERT INTO STATIC  371393  
38  0.0   36219.4  20.9X
   -INSERT OVERWRITE DYNAMIC   8456   8554  
   138  0.0  825790.3   0.9X
   -INSERT OVERWRITE HYBRID1303   1311  
12  0.0  127198.4   5.9X
   -INSERT OVERWRITE STATIC 434447  
13  0.0   42373.8  17.8X
   +INSERT INTO DYNAMIC7382   7456  
   105  0.0  720904.8   1.0X
   +INSERT INTO HYBRID 1128   1129  
 1  0.0  110169.4   6.5X
   +INSERT INTO STATIC  349370  
39  0.0   34095.4  21.1X
   +INSERT OVERWRITE DYNAMIC   8149   8362  
   301  0.0  795821.8   0.9X
   +INSERT OVERWRITE HYBRID1317   1318  
 2  0.0  128616.7   5.6X
   +INSERT OVERWRITE STATIC 387408  
37  0.0   37804.1  19.1X
   ```
   *+ for master
   *- for this PR
   both using hive 2.3.7





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426394163



##
File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt
##
@@ -0,0 +1,11 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4
+Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+insert hive table benchmark:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+INSERT INTO DYNAMIC7346   7470 
175  0.0  717423.0   1.0X
+INSERT INTO HYBRID 1179   1188 
 13  0.0  115184.2   6.2X
+INSERT INTO STATIC  344367 
 48  0.0   33585.1  21.4X
+INSERT OVERWRITE DYNAMIC   7656   7714 
 82  0.0  747622.7   1.0X
+INSERT OVERWRITE HYBRID1179   1183 
  6  0.0  115163.3   6.2X
+INSERT OVERWRITE STATIC 400408 
 10  0.0   39014.2  18.4X

Review comment:
   ```sql
   -INSERT INTO DYNAMIC7742   7918  
   248  0.0  756044.0   1.0X
   -INSERT INTO HYBRID 1289   1307  
26  0.0  125866.3   6.0X
   -INSERT INTO STATIC  371393  
38  0.0   36219.4  20.9X
   -INSERT OVERWRITE DYNAMIC   8456   8554  
   138  0.0  825790.3   0.9X
   -INSERT OVERWRITE HYBRID1303   1311  
12  0.0  127198.4   5.9X
   -INSERT OVERWRITE STATIC 434447  
13  0.0   42373.8  17.8X
   +INSERT INTO DYNAMIC7382   7456  
   105  0.0  720904.8   1.0X
   +INSERT INTO HYBRID 1128   1129  
 1  0.0  110169.4   6.5X
   +INSERT INTO STATIC  349370  
39  0.0   34095.4  21.1X
   +INSERT OVERWRITE DYNAMIC   8149   8362  
   301  0.0  795821.8   0.9X
   +INSERT OVERWRITE HYBRID1317   1318  
 2  0.0  128616.7   5.6X
   +INSERT OVERWRITE STATIC 387408  
37  0.0   37804.1  19.1X
   ```
   
 -  + for master
 - - for this PR
   both using hive 2.3.7





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-629976971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-629976971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp

2020-05-17 Thread GitBox


SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-629976178


   **[Test build #122765 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122765/testReport)**
 for PR 28534 at commit 
[`7d562f4`](https://github.com/apache/spark/commit/7d562f4810608100e7b852b55d3d1c49690512ae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28534: [SPARK-31710][SQL]Fix millisecond and microsecond convert to timestamp in to_timestamp

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-629885392


   **[Test build #122765 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122765/testReport)**
 for PR 28534 at commit 
[`7d562f4`](https://github.com/apache/spark/commit/7d562f4810608100e7b852b55d3d1c49690512ae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column

2020-05-17 Thread GitBox


viirya commented on a change in pull request #28560:
URL: https://github.com/apache/spark/pull/28560#discussion_r426391779



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -68,10 +76,23 @@ object NestedColumnAliasing {
*/
   def replaceChildrenWithAliases(
   plan: LogicalPlan,
+  nestedFieldToAlias: Map[ExtractValue, Alias],
   attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
 plan.withNewChildren(plan.children.map { plan =>
   Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
-})
+}).transformExpressions {
+  case f: ExtractValue if nestedFieldToAlias.contains(f) =>
+nestedFieldToAlias(f).toAttribute
+}
+  }
+
+  /**
+   * Returns true for those operators that we can prune nested column on it.
+   */
+  private def canPruneOn(plan: LogicalPlan) = plan match {
+case _: Aggregate => true
+case _: Expand => true
+case _ => false

Review comment:
   I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, 
it looks like
   
   ```python
   df.groupby("id").apply(udf).show()
   ```
   
   So basically the python udf takes no nested column selection but a full 
columns of DataFrame. It doesn't do nested column pruning.
   
   `MapInPandas` is also the same.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426394163



##
File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt
##
@@ -0,0 +1,11 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4
+Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+insert hive table benchmark:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+INSERT INTO DYNAMIC7346   7470 
175  0.0  717423.0   1.0X
+INSERT INTO HYBRID 1179   1188 
 13  0.0  115184.2   6.2X
+INSERT INTO STATIC  344367 
 48  0.0   33585.1  21.4X
+INSERT OVERWRITE DYNAMIC   7656   7714 
 82  0.0  747622.7   1.0X
+INSERT OVERWRITE HYBRID1179   1183 
  6  0.0  115163.3   6.2X
+INSERT OVERWRITE STATIC 400408 
 10  0.0   39014.2  18.4X

Review comment:
   ```sql
   -INSERT INTO DYNAMIC7742   7918  
   248  0.0  756044.0   1.0X
   -INSERT INTO HYBRID 1289   1307  
26  0.0  125866.3   6.0X
   -INSERT INTO STATIC  371393  
38  0.0   36219.4  20.9X
   -INSERT OVERWRITE DYNAMIC   8456   8554  
   138  0.0  825790.3   0.9X
   -INSERT OVERWRITE HYBRID1303   1311  
12  0.0  127198.4   5.9X
   -INSERT OVERWRITE STATIC 434447  
13  0.0   42373.8  17.8X
   +INSERT INTO DYNAMIC7382   7456  
   105  0.0  720904.8   1.0X
   +INSERT INTO HYBRID 1128   1129  
 1  0.0  110169.4   6.5X
   +INSERT INTO STATIC  349370  
39  0.0   34095.4  21.1X
   +INSERT OVERWRITE DYNAMIC   8149   8362  
   301  0.0  795821.8   0.9X
   +INSERT OVERWRITE HYBRID1317   1318  
 2  0.0  128616.7   5.6X
   +INSERT OVERWRITE STATIC 387408  
37  0.0   37804.1  19.1X
   ```
   
   + for master
   - for this PR
   both using hive 2.3.7





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column

2020-05-17 Thread GitBox


viirya commented on a change in pull request #28560:
URL: https://github.com/apache/spark/pull/28560#discussion_r426391779



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -68,10 +76,23 @@ object NestedColumnAliasing {
*/
   def replaceChildrenWithAliases(
   plan: LogicalPlan,
+  nestedFieldToAlias: Map[ExtractValue, Alias],
   attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
 plan.withNewChildren(plan.children.map { plan =>
   Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
-})
+}).transformExpressions {
+  case f: ExtractValue if nestedFieldToAlias.contains(f) =>
+nestedFieldToAlias(f).toAttribute
+}
+  }
+
+  /**
+   * Returns true for those operators that we can prune nested column on it.
+   */
+  private def canPruneOn(plan: LogicalPlan) = plan match {
+case _: Aggregate => true
+case _: Expand => true
+case _ => false

Review comment:
   I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, 
it looks like
   
   ```python
   df.groupby("id").apply(udf).show()
   ```
   
   So basically the python udf takes no nested column selection but a full 
columns of DataFrame. It doesn't do nested column pruning.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #28560: [SPARK-27217][SQL] Nested column aliasing for more operators which can prune nested column

2020-05-17 Thread GitBox


viirya commented on a change in pull request #28560:
URL: https://github.com/apache/spark/pull/28560#discussion_r426391779



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -68,10 +76,23 @@ object NestedColumnAliasing {
*/
   def replaceChildrenWithAliases(
   plan: LogicalPlan,
+  nestedFieldToAlias: Map[ExtractValue, Alias],
   attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
 plan.withNewChildren(plan.children.map { plan =>
   Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, 
Seq(a))), plan)
-})
+}).transformExpressions {
+  case f: ExtractValue if nestedFieldToAlias.contains(f) =>
+nestedFieldToAlias(f).toAttribute
+}
+  }
+
+  /**
+   * Returns true for those operators that we can prune nested column on it.
+   */
+  private def canPruneOn(plan: LogicalPlan) = plan match {
+case _: Aggregate => true
+case _: Expand => true
+case _ => false

Review comment:
   I think I was wrong. Re-checking `FlatMapGroupsInPandas`'s Python API, 
it looks like
   
   ```python
   df.groupby("id").apply(normalize).show()
   ```
   
   So basically the python udf takes no nested column selection but a full 
columns of DataFrame. It doesn't do nested column pruning.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629968873







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629968873







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629964194


   **[Test build #122790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122790/testReport)**
 for PR 28558 at commit 
[`8a43b8c`](https://github.com/apache/spark/commit/8a43b8c61cb356d8d94985ad499ed8a233a3e306).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


SparkQA commented on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629968746


   **[Test build #122790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122790/testReport)**
 for PR 28558 at commit 
[`8a43b8c`](https://github.com/apache/spark/commit/8a43b8c61cb356d8d94985ad499ed8a233a3e306).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28511:
URL: https://github.com/apache/spark/pull/28511#issuecomment-629966971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-17 Thread GitBox


SparkQA commented on pull request #28511:
URL: https://github.com/apache/spark/pull/28511#issuecomment-629966651


   **[Test build #122791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122791/testReport)**
 for PR 28511 at commit 
[`f7c6b51`](https://github.com/apache/spark/commit/f7c6b5126b601a87a62ac6eb50217f03f3c1b27a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28544: [SPARK-31387][test-maven] Handle unknown operation/session ID in HiveThriftServer2Listener

2020-05-17 Thread GitBox


maropu commented on a change in pull request #28544:
URL: https://github.com/apache/spark/pull/28544#discussion_r426387696



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
##
@@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
-val session = sessionList.get(e.sessionId)
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-  }
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")
+  case Some(sessionData) =>
+val session = sessionData
+session.finishTimestamp = e.finishTime
+updateStoreWithTriggerEnabled(session)
+sessionList.remove(e.sessionId)
+}
 
-  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit = {
-val info = getOrCreateExecution(
-  e.id,
-  e.statement,
-  e.sessionId,
-  e.startTime,
-  e.userName)
-
-info.state = ExecutionState.STARTED
-executionList.put(e.id, info)
-sessionList.get(e.sessionId).totalExecution += 1
-executionList.get(e.id).groupId = e.groupId
-updateLiveStore(executionList.get(e.id))
-updateLiveStore(sessionList.get(e.sessionId))
-  }
+  private def onOperationStart(e: SparkListenerThriftServerOperationStart): 
Unit =
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onOperationStart called with unknown session 
id: ${e.sessionId}")

Review comment:
   We can keep processing queries even in this case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28511:
URL: https://github.com/apache/spark/pull/28511#issuecomment-629966971







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


maropu commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629965293


   Looks nice! Thanks for re-trigger the tests, @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629964663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629964658







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629964658







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629964663







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


SparkQA commented on pull request #28558:
URL: https://github.com/apache/spark/pull/28558#issuecomment-629964194


   **[Test build #122790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122790/testReport)**
 for PR 28558 at commit 
[`8a43b8c`](https://github.com/apache/spark/commit/8a43b8c61cb356d8d94985ad499ed8a233a3e306).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


SparkQA commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629964220


   **[Test build #122789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122789/testReport)**
 for PR 28565 at commit 
[`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28138: [SPARK-31366][DOCS][SQL] Add doc for the aggregation in SQL reference guide

2020-05-17 Thread GitBox


maropu commented on pull request #28138:
URL: https://github.com/apache/spark/pull/28138#issuecomment-629964286


   Ah, I forgot to close this Thanks, @srowen 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28544: [SPARK-31387][test-maven] Handle unknown operation/session ID in HiveThriftServer2Listener

2020-05-17 Thread GitBox


maropu commented on a change in pull request #28544:
URL: https://github.com/apache/spark/pull/28544#discussion_r426385130



##
File path: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala
##
@@ -131,60 +132,81 @@ private[thriftserver] class HiveThriftServer2Listener(
 updateLiveStore(session)
   }
 
-  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
= {
-val session = sessionList.get(e.sessionId)
-session.finishTimestamp = e.finishTime
-updateStoreWithTriggerEnabled(session)
-sessionList.remove(e.sessionId)
-  }
+  private def onSessionClosed(e: SparkListenerThriftServerSessionClosed): Unit 
=
+Option(sessionList.get(e.sessionId)) match {
+  case None => logWarning(s"onSessionClosed called with unknown session 
id: ${e.sessionId}")

Review comment:
   Could you move the `None` pattern into the place after the `Some` 
pattern?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


HyukjinKwon commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629963202


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


HyukjinKwon commented on a change in pull request #28566:
URL: https://github.com/apache/spark/pull/28566#discussion_r426384293



##
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala
##
@@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite {
 // goal is to create enough requests for localized containers (so there 
should be many
 // tasks on several hosts that have no allocated containers).
 
-val resource = Resource.newInstance(8 * 1024, 4)

Review comment:
   should be fixed now. thanks for pointing out quickly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


MaxGekk commented on a change in pull request #28558:
URL: https://github.com/apache/spark/pull/28558#discussion_r426383975



##
File path: docs/sql-ref-datetime-pattern.md
##
@@ -76,6 +76,57 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is exceeded when 'G' is not present.
 
+- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. For example, the word used for a month 
when used alone in a date picker is different to the word used for month in 
association with a day and year in a date. Here are examples for all supported 
pattern letters:
+  - `'M'` or `'L'`: Month number in a year starting from 1. There is no 
difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
+```sql
+spark-sql> select date_format(date '1970-01-01', "M");
+1
+spark-sql> select date_format(date '1970-12-01', "L");
+12
+```
+  - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is 
added for month 1-9.
+  ```sql
+  spark-sql> select date_format(date '1970-1-01', "LL");
+  01
+  spark-sql> select date_format(date '1970-09-01', "MM");
+  09
+  ```
+  - `'MMM'`: Short textual representation in the standard form. The month 
pattern should be a part of a date pattern not just a stand-alone month except 
locales where there is no difference between stand and stand-alone forms like 
in English.
+```sql
+spark-sql> select date_format(date '1970-01-01', "d MMM");
+1 Jan
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'dd MMM', 'locale', 'RU'));
+01 янв.
+```
+  - `'LLL'`: Short textual representation in the stand-alone form. It should 
be used to format/parse only months without any other date fields.
+```sql
+spark-sql> select date_format(date '1970-01-01', "LLL");
+Jan
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'LLL', 'locale', 'RU'));
+янв.
+```
+  - `''`: full textual month representation in the standard form. It is 
used for parsing/formatting months as a part of dates/timestamps.
+```sql
+spark-sql> select date_format(date '1970-01-01', " ");
+January 1970
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'd ', 'locale', 'RU'));
+1 января
+```
+  - `''`: full textual month representation in the stand-alone form. The 
pattern can be used to format/parse only months.
+```sql
+spark-sql> select date_format(date '1970-01-01', "");
+January
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', '', 'locale', 'RU'));
+январь
+```
+  - `'L'` or `'M'`: Narrow textual representation of standard or 
stand-alone forms. Typically it is a single letter.

Review comment:
   Added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


maropu commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629962468


   It seems the failure above is not related to this PR. See: 
https://github.com/apache/spark/pull/28566



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


MaxGekk commented on a change in pull request #28558:
URL: https://github.com/apache/spark/pull/28558#discussion_r426383919



##
File path: docs/sql-ref-datetime-pattern.md
##
@@ -76,6 +76,57 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is exceeded when 'G' is not present.
 
+- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. For example, the word used for a month 
when used alone in a date picker is different to the word used for month in 
association with a day and year in a date. Here are examples for all supported 
pattern letters:

Review comment:
   Added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


HyukjinKwon commented on a change in pull request #28566:
URL: https://github.com/apache/spark/pull/28566#discussion_r426383725



##
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala
##
@@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite {
 // goal is to create enough requests for localized containers (so there 
should be many
 // tasks on several hosts that have no allocated containers).
 
-val resource = Resource.newInstance(8 * 1024, 4)

Review comment:
   ah, yeah. Seems it's used in the branch-3.0. I will fix.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-629961956







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-629961956







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


maropu commented on a change in pull request #28566:
URL: https://github.com/apache/spark/pull/28566#discussion_r426383379



##
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala
##
@@ -57,7 +62,6 @@ class LocalityPlacementStrategySuite extends SparkFunSuite {
 // goal is to create enough requests for localized containers (so there 
should be many
 // tasks on several hosts that have no allocated containers).
 
-val resource = Resource.newInstance(8 * 1024, 4)

Review comment:
   https://github.com/apache/spark/pull/28566#issuecomment-629961204





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


maropu edited a comment on pull request #28566:
URL: https://github.com/apache/spark/pull/28566#issuecomment-629961204


   @HyukjinKwon It seems  branch-3.0 broken?
   ```
   [info] Done packaging.
   [error] 
/home/jenkins/workspace/SparkPullRequestBuilder@4/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/LocalityPlacementStrategySuite.scala:65:
 not found: value resource
   [error]   yarnConf, resource, new MockResolver())
   [error] ^
   [info] Packaging 
/home/jenkins/workspace/SparkPullRequestBuilder@4/external/kafka-0-10-token-provider/target/scala-2.12/spark-token-provider-kafka-0-10_2.12-3.0.1-SNAPSHOT-tests.jar
 ...
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-629879870


   **[Test build #122764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122764/testReport)**
 for PR 28527 at commit 
[`7d50c17`](https://github.com/apache/spark/commit/7d50c17ceca0051e455ec1faf17f3c9ad05a206f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


maropu commented on pull request #28566:
URL: https://github.com/apache/spark/pull/28566#issuecomment-629961204


   @HyukjinKwon It seems  branch-3.0 broken?
   ```
   LocalityPlacementStrategySuite
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28527: [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path

2020-05-17 Thread GitBox


SparkQA commented on pull request #28527:
URL: https://github.com/apache/spark/pull/28527#issuecomment-629961162


   **[Test build #122764 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122764/testReport)**
 for PR 28527 at commit 
[`7d50c17`](https://github.com/apache/spark/commit/7d50c17ceca0051e455ec1faf17f3c9ad05a206f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629960072


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122788/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


SparkQA commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629960051


   **[Test build #122788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)**
 for PR 28565 at commit 
[`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629960066







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629957300


   **[Test build #122788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)**
 for PR 28565 at commit 
[`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629960066


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28563:
URL: https://github.com/apache/spark/pull/28563#issuecomment-629959652







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28563:
URL: https://github.com/apache/spark/pull/28563#issuecomment-629959652







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426381332



##
File path: sql/hive/benchmarks/InsertIntoHiveTableBenchmark-results.txt
##
@@ -0,0 +1,11 @@
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.4
+Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+insert hive table benchmark:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+INSERT INTO DYNAMIC7346   7470 
175  0.0  717423.0   1.0X
+INSERT INTO HYBRID 1179   1188 
 13  0.0  115184.2   6.2X
+INSERT INTO STATIC  344367 
 48  0.0   33585.1  21.4X
+INSERT OVERWRITE DYNAMIC   7656   7714 
 82  0.0  747622.7   1.0X
+INSERT OVERWRITE HYBRID1179   1183 
  6  0.0  115163.3   6.2X
+INSERT OVERWRITE STATIC 400408 
 10  0.0   39014.2  18.4X

Review comment:
   Let me run this benchmark on the master branch and update the result 
later in the PR description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28563:
URL: https://github.com/apache/spark/pull/28563#issuecomment-629892529


   **[Test build #122773 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122773/testReport)**
 for PR 28563 at commit 
[`a0aff8a`](https://github.com/apache/spark/commit/a0aff8af880f6d64e43a7229f44fa7237dfd718e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28563: [SPARK-31743][CORE] Add spark_info metric into PrometheusResource

2020-05-17 Thread GitBox


SparkQA commented on pull request #28563:
URL: https://github.com/apache/spark/pull/28563#issuecomment-629958887


   **[Test build #122773 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122773/testReport)**
 for PR 28563 at commit 
[`a0aff8a`](https://github.com/apache/spark/commit/a0aff8af880f6d64e43a7229f44fa7237dfd718e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28562:
URL: https://github.com/apache/spark/pull/28562#issuecomment-629958185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629957757


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122775/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28562:
URL: https://github.com/apache/spark/pull/28562#issuecomment-629958185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426380145



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.hive.HiveUtils
+import org.apache.spark.sql.hive.test.TestHive
+
+/**
+ * Benchmark to measure hive table write performance.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class 
+ *--jars ,,
+ *--packages org.spark-project.hive:hive-exec:1.2.1.spark2
+ *
+ *   2. build/sbt "hive/test:runMain " -Phive-1.2 or
+ *  build/sbt "hive/test:runMain " -Phive-2.3
+ *   3. generate result:
+ *   SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain "
+ *  Results will be written to 
"benchmarks/InsertIntoHiveTableBenchmark-results.txt".
+ *   4. -Phive-1.2 does not work for JDK 11
+ * }}}
+ */
+object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark {
+
+  override def getSparkSession: SparkSession = TestHive.sparkSession
+
+  val tempTable = "temp"
+  val numRows = 1024 * 10
+  val sql = spark.sql _
+
+  // scalastyle:off hadoopconfiguration
+  private val hadoopConf = spark.sparkContext.hadoopConfiguration
+  // scalastyle:on hadoopconfiguration
+  hadoopConf.set("hive.exec.dynamic.partition", "true")
+  hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict")
+  hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString)
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+val ds = spark.range(numRows)
+tableNames.foreach { name =>
+  ds.createOrReplaceTempView(name)
+}
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+tableNames.foreach { name =>
+  sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b 
INT, c INT)")
+}
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," +
+s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM 
$tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE HYBRID") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS 
INT) AS a," +
+s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE STATIC") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id 
AS INT) AS a" +
+s" FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT INTO DYNAMIC") { _ =>
+  sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," +
+s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM 
$tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT INTO HYBRID") { _ =>
+  sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) 
AS a," +
+s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertIntoStatic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT INTO STATIC") { _ =>
+  sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS 
INT) AS a" +
+s" FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+withTempTable(tempTable) {
+  val t1 = "t1"

[GitHub] [spark] yaooqinn commented on a change in pull request #28511: [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same

2020-05-17 Thread GitBox


yaooqinn commented on a change in pull request #28511:
URL: https://github.com/apache/spark/pull/28511#discussion_r426379977



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.hive.HiveUtils
+import org.apache.spark.sql.hive.test.TestHive
+
+/**
+ * Benchmark to measure hive table write performance.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class 
+ *--jars ,,
+ *--packages org.spark-project.hive:hive-exec:1.2.1.spark2
+ *
+ *   2. build/sbt "hive/test:runMain " -Phive-1.2 or
+ *  build/sbt "hive/test:runMain " -Phive-2.3
+ *   3. generate result:
+ *   SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "hive/test:runMain "
+ *  Results will be written to 
"benchmarks/InsertIntoHiveTableBenchmark-results.txt".
+ *   4. -Phive-1.2 does not work for JDK 11
+ * }}}
+ */
+object InsertIntoHiveTableBenchmark extends SqlBasedBenchmark {
+
+  override def getSparkSession: SparkSession = TestHive.sparkSession
+
+  val tempTable = "temp"
+  val numRows = 1024 * 10
+  val sql = spark.sql _
+
+  // scalastyle:off hadoopconfiguration
+  private val hadoopConf = spark.sparkContext.hadoopConfiguration
+  // scalastyle:on hadoopconfiguration
+  hadoopConf.set("hive.exec.dynamic.partition", "true")
+  hadoopConf.set("hive.exec.dynamic.partition.mode", "nonstrict")
+  hadoopConf.set("hive.exec.max.dynamic.partitions", numRows.toString)
+
+  def withTempTable(tableNames: String*)(f: => Unit): Unit = {
+val ds = spark.range(numRows)
+tableNames.foreach { name =>
+  ds.createOrReplaceTempView(name)
+}
+try f finally tableNames.foreach(spark.catalog.dropTempView)
+  }
+
+  def withTable(tableNames: String*)(f: => Unit): Unit = {
+tableNames.foreach { name =>
+  sql(s"CREATE TABLE $name(a INT) STORED AS TEXTFILE PARTITIONED BY (b 
INT, c INT)")
+}
+try f finally {
+  tableNames.foreach { name =>
+spark.sql(s"DROP TABLE IF EXISTS $name")
+  }
+}
+  }
+
+  def insertOverwriteDynamic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE DYNAMIC") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table SELECT CAST(id AS INT) AS a," +
+s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM 
$tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertOverwriteHybrid(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE HYBRID") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c) SELECT CAST(id AS 
INT) AS a," +
+s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertOverwriteStatic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT OVERWRITE STATIC") { _ =>
+  sql(s"INSERT OVERWRITE TABLE $table partition(b=1, c=10) SELECT CAST(id 
AS INT) AS a" +
+s" FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertIntoDynamic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT INTO DYNAMIC") { _ =>
+  sql(s"INSERT INTO TABLE $table SELECT CAST(id AS INT) AS a," +
+s" CAST(id % 10 AS INT) AS b, CAST(id % 100 AS INT) AS c FROM 
$tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertIntoHybrid(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT INTO HYBRID") { _ =>
+  sql(s"INSERT INTO TABLE $table partition(b=1, c) SELECT CAST(id AS INT) 
AS a," +
+s" CAST(id % 10 AS INT) AS c FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  def insertIntoStatic(table: String, benchmark: Benchmark): Unit = {
+benchmark.addCase("INSERT INTO STATIC") { _ =>
+  sql(s"INSERT INTO TABLE $table partition(b=1, c=10) SELECT CAST(id AS 
INT) AS a" +
+s" FROM $tempTable DISTRIBUTE BY a")
+}
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+withTempTable(tempTable) {
+  val t1 = "t1"

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629957755


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629957755







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629957600







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28562:
URL: https://github.com/apache/spark/pull/28562#issuecomment-629891408


   **[Test build #122772 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122772/testReport)**
 for PR 28562 at commit 
[`8bc6df2`](https://github.com/apache/spark/commit/8bc6df2c2e86d917e48c4debd78e87f714a27151).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629900457


   **[Test build #122775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122775/testReport)**
 for PR 27066 at commit 
[`7f76539`](https://github.com/apache/spark/commit/7f76539ac50f59264cc443cfec93e4a8f4e495ab).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629957600







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27066: [SPARK-31317][SQL] Add withFields method to Column

2020-05-17 Thread GitBox


SparkQA commented on pull request #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-629957411


   **[Test build #122775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122775/testReport)**
 for PR 27066 at commit 
[`7f76539`](https://github.com/apache/spark/commit/7f76539ac50f59264cc443cfec93e4a8f4e495ab).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28562: [SPARK-31742][TESTS] Increase the eventually time limit for Mino kdc in tests to fix flakiness

2020-05-17 Thread GitBox


SparkQA commented on pull request #28562:
URL: https://github.com/apache/spark/pull/28562#issuecomment-629957428


   **[Test build #122772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122772/testReport)**
 for PR 28562 at commit 
[`8bc6df2`](https://github.com/apache/spark/commit/8bc6df2c2e86d917e48c4debd78e87f714a27151).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


SparkQA commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629957300


   **[Test build #122788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122788/testReport)**
 for PR 28565 at commit 
[`49fa2a8`](https://github.com/apache/spark/commit/49fa2a87254238eb42e2801321dc50fb94cc50bb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


HyukjinKwon closed pull request #28566:
URL: https://github.com/apache/spark/pull/28566


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


HyukjinKwon commented on pull request #28566:
URL: https://github.com/apache/spark/pull/28566#issuecomment-629956109


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28566: [SPARK-31746][YARN][TESTS] Show the actual error message in LocalityPlacementStrategySuite

2020-05-17 Thread GitBox


HyukjinKwon commented on pull request #28566:
URL: https://github.com/apache/spark/pull/28566#issuecomment-629955913


   I am going to merge this, see 
https://github.com/apache/spark/pull/28463#issuecomment-629955825.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


HyukjinKwon commented on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-629955825


   `LocalityPlacementStrategySuite` was failed again. Potentially related. I am 
going to merge https://github.com/apache/spark/pull/28566 together.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #28565: [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment

2020-05-17 Thread GitBox


maropu commented on pull request #28565:
URL: https://github.com/apache/spark/pull/28565#issuecomment-629955864


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-629955676


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-629955676







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


cloud-fan closed pull request #28463:
URL: https://github.com/apache/spark/pull/28463


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


SparkQA commented on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-629955235


   **[Test build #122781 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122781/testReport)**
 for PR 28463 at commit 
[`978e60e`](https://github.com/apache/spark/commit/978e60e171e35b01ee166e00c4f63da3db877aad).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `//   starting closure (in class T)`
 * `// we need to track calls from \"inner closure\" to outer classes 
relative to it (class T, A, B)`
 * `logDebug(s\"found inner class $ownerExternalName\")`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


cloud-fan commented on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-629955260


   We don't have many critical changes after the last success build: 
https://github.com/apache/spark/pull/28463#issuecomment-624694820
   
   The failed flaky tests are unrelated to this PR, and we need to unblock 3.0 
ASAP. I'm merging it first, will monitor the jenkins builds later. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28463: [SPARK-31399][CORE][test-hadoop3.2][test-java11] Support indylambda Scala closure in ClosureCleaner

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28463:
URL: https://github.com/apache/spark/pull/28463#issuecomment-629923433


   **[Test build #122781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122781/testReport)**
 for PR 28463 at commit 
[`978e60e`](https://github.com/apache/spark/commit/978e60e171e35b01ee166e00c4f63da3db877aad).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28561:
URL: https://github.com/apache/spark/pull/28561#issuecomment-629954597







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link

2020-05-17 Thread GitBox


SparkQA commented on pull request #28561:
URL: https://github.com/apache/spark/pull/28561#issuecomment-629954432


   **[Test build #122786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)**
 for PR 28561 at commit 
[`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link

2020-05-17 Thread GitBox


SparkQA removed a comment on pull request #28561:
URL: https://github.com/apache/spark/pull/28561#issuecomment-629949398


   **[Test build #122786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122786/testReport)**
 for PR 28561 at commit 
[`77c2e14`](https://github.com/apache/spark/commit/77c2e14669f64c19d6068dbec695287b08f54205).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28561:
URL: https://github.com/apache/spark/pull/28561#issuecomment-629954597







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28558: [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters

2020-05-17 Thread GitBox


cloud-fan commented on a change in pull request #28558:
URL: https://github.com/apache/spark/pull/28558#discussion_r426375094



##
File path: docs/sql-ref-datetime-pattern.md
##
@@ -76,6 +76,57 @@ The count of pattern letters determines the format.
 
 - Year: The count of letters determines the minimum field width below which 
padding is used. If the count of letters is two, then a reduced two digit form 
is used. For printing, this outputs the rightmost two digits. For parsing, this 
will parse using the base value of 2000, resulting in a year within the range 
2000 to 2099 inclusive. If the count of letters is less than four (but not 
two), then the sign is only output for negative years. Otherwise, the sign is 
output if the pad width is exceeded when 'G' is not present.
 
+- Month: If the number of pattern letters is 3 or more, the month is 
interpreted as text; otherwise, it is interpreted as a number. The text form is 
depend on letters - 'M' denotes the 'standard' form, and 'L' is for 
'stand-alone' form. The difference between the 'standard' and 'stand-alone' 
forms is trickier to describe as there is no difference in English. However, in 
other languages there is a difference in the word used when the text is used 
alone, as opposed to in a complete date. For example, the word used for a month 
when used alone in a date picker is different to the word used for month in 
association with a day and year in a date. Here are examples for all supported 
pattern letters:
+  - `'M'` or `'L'`: Month number in a year starting from 1. There is no 
difference between 'M' and 'L'. Month from 1 to 9 are printed without padding.
+```sql
+spark-sql> select date_format(date '1970-01-01', "M");
+1
+spark-sql> select date_format(date '1970-12-01', "L");
+12
+```
+  - `'MM'` or `'LL'`: Month number in a year starting from 1. Zero padding is 
added for month 1-9.
+  ```sql
+  spark-sql> select date_format(date '1970-1-01', "LL");
+  01
+  spark-sql> select date_format(date '1970-09-01', "MM");
+  09
+  ```
+  - `'MMM'`: Short textual representation in the standard form. The month 
pattern should be a part of a date pattern not just a stand-alone month except 
locales where there is no difference between stand and stand-alone forms like 
in English.
+```sql
+spark-sql> select date_format(date '1970-01-01', "d MMM");
+1 Jan
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'dd MMM', 'locale', 'RU'));
+01 янв.
+```
+  - `'LLL'`: Short textual representation in the stand-alone form. It should 
be used to format/parse only months without any other date fields.
+```sql
+spark-sql> select date_format(date '1970-01-01', "LLL");
+Jan
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'LLL', 'locale', 'RU'));
+янв.
+```
+  - `''`: full textual month representation in the standard form. It is 
used for parsing/formatting months as a part of dates/timestamps.
+```sql
+spark-sql> select date_format(date '1970-01-01', " ");
+January 1970
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', 'd ', 'locale', 'RU'));
+1 января
+```
+  - `''`: full textual month representation in the stand-alone form. The 
pattern can be used to format/parse only months.
+```sql
+spark-sql> select date_format(date '1970-01-01', "");
+January
+spark-sql> select to_csv(named_struct('date', date '1970-01-01'), 
map('dateFormat', '', 'locale', 'RU'));
+январь
+```
+  - `'L'` or `'M'`: Narrow textual representation of standard or 
stand-alone forms. Typically it is a single letter.

Review comment:
   how about
   ```
   Here are examples for all supported pattern letters (more than 5 letter is 
invalid):
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-17 Thread GitBox


AmplabJenkins removed a comment on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-629951647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-17 Thread GitBox


AmplabJenkins commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-629951647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-17 Thread GitBox


holdenk commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r426374454



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1901,58 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {

Review comment:
   So if you look at the parent issue you can see there is another sub 
issue that says migrate shuffle blocks. It’s ok to ask for a follow up even if 
there is one (we all miss things in reading), but attempt to vote a -1 has a 
higher bar than just asking for something.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-17 Thread GitBox


SparkQA commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-629951351


   **[Test build #122787 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122787/testReport)**
 for PR 28523 at commit 
[`1955f01`](https://github.com/apache/spark/commit/1955f01fa870cd180f66a22070ee1b0ca9a73ca3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #28561: [SPARK-31740][K8S][TESTS] Use github URL instead of a broken link

2020-05-17 Thread GitBox


dongjoon-hyun closed pull request #28561:
URL: https://github.com/apache/spark/pull/28561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-17 Thread GitBox


holdenk commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r426373154



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1829,7 +1901,58 @@ private[spark] class BlockManager(
 data.dispose()
   }
 
+  /**
+   * Class to handle block manager decommissioning retries
+   * It creates a Thread to retry offloading all RDD cache blocks
+   */
+  private class BlockManagerDecommissionManager(conf: SparkConf) {
+@volatile private var stopped = false
+private val sleepInterval = conf.get(
+  config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL)
+
+private val blockReplicationThread = new Thread {
+  override def run(): Unit = {
+var failures = 0
+while (blockManagerDecommissioning
+  && !stopped
+  && !Thread.interrupted()

Review comment:
   If an interrupt exception is caught the thread would still be marked as 
interrupted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a change in pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-05-17 Thread GitBox


holdenk commented on a change in pull request #28370:
URL: https://github.com/apache/spark/pull/28370#discussion_r426372989



##
File path: 
core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionSuite.scala
##
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.util.concurrent.Semaphore
+
+import scala.collection.mutable.ArrayBuffer
+import scala.concurrent.duration._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkFunSuite, Success}
+import org.apache.spark.internal.config
+import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd, 
SparkListenerTaskStart}
+import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend
+import org.apache.spark.util.{ResetSystemProperties, ThreadUtils}
+
+class BlockManagerDecommissionSuite extends SparkFunSuite with 
LocalSparkContext
+with ResetSystemProperties {
+
+  override def beforeEach(): Unit = {
+val conf = new SparkConf().setAppName("test")
+  .set(config.Worker.WORKER_DECOMMISSION_ENABLED, true)
+  .set(config.STORAGE_DECOMMISSION_ENABLED, true)
+
+sc = new SparkContext("local-cluster[2, 1, 1024]", "test", conf)
+  }
+
+  test(s"verify that an already running task which is going to cache data 
succeeds " +
+s"on a decommissioned executor") {
+// Create input RDD with 10 partitions
+val input = sc.parallelize(1 to 10, 10)
+val accum = sc.longAccumulator("mapperRunAccumulator")
+// Do a count to wait for the executors to be registered.

Review comment:
   That’s ok for this test. But no harm in changing to the utility function





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >