[GitHub] [spark] dongjoon-hyun opened a new pull request #29346: [SPARK-32524][SQL][TESTS] CachedBatchSerializerSuite should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun opened a new pull request #29346:
URL: https://github.com/apache/spark/pull/29346


   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29259:
URL: https://github.com/apache/spark/pull/29259#discussion_r464816729



##
File path: 
sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java
##
@@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws 
Exception {
   public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() 
throws Exception {
 int numFields = 1;
 
+long row1Data = 11L;
+long row2Data = 11L + Integer.MAX_VALUE;
+
+// BinaryComparator compares longs in big-endian byte order.
+if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) {
+  row1Data = Long.reverseBytes(row1Data);
+  row2Data = Long.reverseBytes(row2Data);
+}
+
 UnsafeRow row1 = new UnsafeRow(numFields);
 byte[] data1 = new byte[100];
 row1.pointTo(data1, computeSizeInBytes(numFields * 8));
-row1.setLong(0, 11);
+row1.setLong(0, row1Data);
 
 UnsafeRow row2 = new UnsafeRow(numFields);
 byte[] data2 = new byte[100];
 row2.pointTo(data2, computeSizeInBytes(numFields * 8));
-row2.setLong(0, 11L + Integer.MAX_VALUE);
+row2.setLong(0, row2Data);
 
 insertRow(row1);
 insertRow(row2);
 
-Assert.assertTrue(compare(0, 1) > 0);
+Assert.assertTrue(compare(0, 1) < 0);

Review comment:
   My understanding is: the comparator compares bytes from left to right. 
It doesn't assume the byte ordering of the data. It's expected that different 
byte order leads to different comparison results.
   
   I think a simple way to fix the test is:
   ```
   if (LITTLE_ENDIAN) {
 Assert.assertTrue(compare(0, 1) < 0);
   } else {
 Assert.assertTrue(compare(0, 1) > 0);
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap

2020-08-03 Thread GitBox


HeartSaVioR commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668395101


   Thanks for the quick reviews & reaction!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668394801







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668394801







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


huaxingao commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464697692



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala
##
@@ -106,4 +106,84 @@ class JDBCTableCatalogSuite extends QueryTest with 
SharedSparkSession {
 Seq(Row("test", "people"), Row("test", "new_table")))
 }
   }
+
+  test("alter table ... add column") {
+withTable("h2.test.alt_table") {
+  sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _")
+  sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C1 INTEGER, C2 STRING)")
+  var t = spark.table("h2.test.alt_table")
+  var expectedSchema = new StructType()
+.add("ID", IntegerType)
+.add("C1", IntegerType)
+.add("C2", StringType)
+  assert(t.schema === expectedSchema)
+  sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C3 DOUBLE)")
+  t = spark.table("h2.test.alt_table")
+  expectedSchema = new StructType()
+.add("ID", IntegerType)
+.add("C1", IntegerType)
+.add("C2", StringType)
+.add("C3", DoubleType)
+  assert(t.schema === expectedSchema)
+}
+  }
+
+  test("alter table ... rename column") {
+withTable("h2.test.alt_table") {
+  sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _")
+  sql("ALTER TABLE h2.test.alt_table RENAME COLUMN ID TO C")
+  val t = spark.table("h2.test.alt_table")
+  val expectedSchema = new StructType().add("C", IntegerType)
+  assert(t.schema === expectedSchema)
+}
+  }
+
+  test("alter table ... drop column") {
+withTable("h2.test.alt_table") {
+  sql("CREATE TABLE h2.test.alt_table (C1 INTEGER, C2 INTEGER) USING _")
+  sql("ALTER TABLE h2.test.alt_table DROP COLUMN C1")
+  val t = spark.table("h2.test.alt_table")
+  val expectedSchema = new StructType().add("C2", IntegerType)
+  assert(t.schema === expectedSchema)
+}
+  }
+
+  test("alter table ... update column type") {
+withTable("h2.test.alt_table") {
+  sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _")
+  sql("ALTER TABLE h2.test.alt_table ALTER COLUMN id TYPE DOUBLE")
+  val t = spark.table("h2.test.alt_table")
+  val expectedSchema = new StructType().add("ID", DoubleType)
+  assert(t.schema === expectedSchema)
+}
+  }
+
+  test("alter table ... update column nullability") {
+withTable("h2.test.alt_table") {
+  sql("CREATE TABLE h2.test.alt_table (ID INTEGER NOT NULL) USING _")
+  var t = spark.table("h2.test.alt_table")
+  var expectedSchema = new StructType().add("ID", IntegerType, nullable = 
false)
+  // Todo: find out why the nullable for ID INTEGER NOT NULL is true
+  // assert(t.schema === expectedSchema)

Review comment:
   Somehow ```t.schema``` has ```StructType().add("ID", IntegerType, 
nullable = true)```. I comment out this assert for now and will find out what 
is wrong.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #29345: [SPARK-32525][DOCS] The layout of monitoring.html is broken

2020-08-03 Thread GitBox


sarutak opened a new pull request #29345:
URL: https://github.com/apache/spark/pull/29345


   ### What changes were proposed in this pull request?
   
   This PR fixes the layout of monitoring.html broken after SPARK-31566(#28354).
   The cause is there are 2 `` tags not closed in `monitoring.md`.
   
   ### Why are the changes needed?
   
   This is a bug.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Build docs and the following screenshots are before/after.
   
   * Before fixed
   
![broken-doc](https://user-images.githubusercontent.com/4736016/89257873-fba09b80-d661-11ea-90da-06cbc0783011.png)
   
   * After fixed.
   
![fixed-doc2](https://user-images.githubusercontent.com/4736016/89257910-0fe49880-d662-11ea-9a85-7a1ecb1d38d6.png)
   
   
   Of course, the table is still rendered correctly.
   
![fixed-doc1](https://user-images.githubusercontent.com/4736016/89257948-225ed200-d662-11ea-80fd-d9254b44d4a0.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


huaxingao commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668394577


   Thanks for reviewing. I have addressed the comments. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


SparkQA commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668394491


   **[Test build #127029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127029/testReport)**
 for PR 29324 at commit 
[`db266fa`](https://github.com/apache/spark/commit/db266fa7e07e3e665f254bb4b1ff8a39c8666752).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464814771



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   First of all, @HyukjinKwon 's idea is not helpful because 
`InMemoryRelation.ser` is a singleton.
   > What about just fixing CachedBatchSerializerSuite not to extend 
SharedSparkSessionBase?
   
   I'm moving to @cloud-fan 's proposal.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


HyukjinKwon commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464815045



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Thanks @dongjoon-hyun!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464814771



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   First of all, @HyukjinKwon 's idea is unable to remove the failure 
because `InMemoryRelation.ser` is a singleton.
   > What about just fixing CachedBatchSerializerSuite not to extend 
SharedSparkSessionBase?
   
   I'm moving to @cloud-fan 's proposal.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.time

2020-08-03 Thread GitBox


HyukjinKwon closed pull request #29343:
URL: https://github.com/apache/spark/pull/29343


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap

2020-08-03 Thread GitBox


HyukjinKwon commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668393902


   I am going to merge this unblock other PRs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap

2020-08-03 Thread GitBox


HyukjinKwon commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668393962


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


huaxingao commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464814466



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##
@@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable {
   /**
* Rename an existing table.
*
-   * TODO (SPARK-32382): Override this method in the dialects that don't 
support such syntax.
-   *
* @param oldTable The existing table.
* @param newTable New name of the table.
* @return The SQL statement to use for renaming the table.
*/
   def renameTable(oldTable: String, newTable: String): String = {
 s"ALTER TABLE $oldTable RENAME TO $newTable"
   }
+
+  /**
+   * Alter an existing table.
+   * TODO (SPARK-32523): Override this method in the dialects that have 
different syntax.
+   *
+   * @param tableName The name of the table to be altered.
+   * @param changes Changes to apply to the table.
+   * @return The SQL statements to use for altering the table.
+   */
+  def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] 
= {
+val updateClause = ArrayBuilder.make[String]
+for (change <- changes) {
+  change match {
+case add: AddColumn =>
+  add.fieldNames match {
+case Array(name) =>

Review comment:
   Seems to me fieldName always has only one element.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


huaxingao commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464814358



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -900,10 +889,52 @@ object JdbcUtils extends Logging {
   newTable: String,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.renameTable(oldTable, newTable))
+  }
+
+  /**
+   * Update a table from the JDBC database.
+   */
+  def alterTable(
+  conn: Connection,
+  tableName: String,
+  changes: Seq[TableChange],
+  options: JDBCOptions): Unit = {
+val dialect = JdbcDialects.get(options.url)
+if (changes.length == 1) {
+  executeStatement(conn, options, dialect.alterTable(tableName, 
changes)(0))
+} else {
+  val metadata = conn.getMetaData
+  if (!metadata.supportsTransactions) {
+throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" +
+  s" support multiple alter table changes simultaneously for database 
server" +
+s" that doesn't have transaction support.")
+  } else {
+conn.setAutoCommit(false)
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  for (sql <- dialect.alterTable(tableName, changes)) {
+statement.executeUpdate(sql)
+  }
+  conn.commit()
+} catch {
+  case e: SQLException =>
+if (conn != null) conn.rollback()
+throw e
+} finally {
+  statement.close()
+  conn.setAutoCommit(true)

Review comment:
   I guess we don't need to record the original auto commit status. The 
default value of autocommit is true.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] prakharjain09 commented on pull request #29318: [SPARK-32509][SQL] Ignore unused DPP True Filter in Canonicalization

2020-08-03 Thread GitBox


prakharjain09 commented on pull request #29318:
URL: https://github.com/apache/spark/pull/29318#issuecomment-668390649


   @cloud-fan Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-03 Thread GitBox


HyukjinKwon edited a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425


   Just to share the current status,
   
   In 
[ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report)
 plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is 
set by default in GitHub Actions. It uses GitHub API to create [status 
checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks)
 via 
[here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43)
 - it requires write permission to the repo. However, the permissions of 
`GITHUB_TOKEN` [does not cover the case when a PR was raised based on a 
fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token).
   
   There are many similar issues and questions, for example, in 
[codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub 
community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
 In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` 
at 
[here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189).
 Basically they used existing GitHub Actions environment variables to verify in 
their service. This is not feasible in our case because the plugin is dependent 
of GitHub API to create the status checks directly.
   
   I investigated this issue yesterday and concluded there's no clean 
workaround to make this working out of the box. 
   I am currently investigating the feasibility of _potential_ alternatives. I 
am not yet sure if all of them work or not:
   
   - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a 
GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as 
a GitHub secret in their forks so that the PRs report test results. Note that 
the contributors would _not_ be able to report the test results as their tokens 
don't have the write access to the repo.
   
   - Just run the test reports only in the commits of the repo and don't run 
them in PRs until GitHub provides an alternative to work around this. There 
look many requests such as 
[this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
   
   - Just generate a token that only has the permission to change the status 
checks, and hardcode it in the repo. At the worst case people abuse this token, 
the status checks of PRs or commits can be changed. This does not affect the 
codes and Jenkins runs as a safeguard so it might be fine. I wonder what people 
can get by abusing this status checks.
   
   I opened an INFRA ticket in ASF and a Github Actions ticket in GitHub, and 
am discussing the options. Once I verify the feasible options, we will be able 
to discuss further which one to pick (or just drop the PR at the worst case).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.

2020-08-03 Thread GitBox


gaborgsomogyi commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668388396


   Had a look, the explanation and solution makes sense. Thank you guys helping 
me out!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668388062







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668388062







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.ti

2020-08-03 Thread GitBox


SparkQA commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668387822


   **[Test build #127028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127028/testReport)**
 for PR 29343 at commit 
[`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap

2020-08-03 Thread GitBox


HeartSaVioR commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668386032


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464805656



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Let me create another PR for @HyukjinKwon or @cloud-fan to compare.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464805656



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Let me create another PR for @HyukjinKwon or @cloud-fan idea to compare 
with this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464805252



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   For this question, yes. The root cause is that InMemoryRelation.ser is a 
kind of singleton. Since the new configuration is static conf, this will match 
with the semantic of `InMemoryRelation.ser`. So, the problem is the testing.
   > Does it really help? InMemoryRelation.ser doesn't belong to any session 
and is global.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464804706



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Yes. Since this is a new feature and new test coverage, I'm also not 
sure what is the best here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29020:
URL: https://github.com/apache/spark/pull/29020#issuecomment-668384049







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29020:
URL: https://github.com/apache/spark/pull/29020#issuecomment-668384049







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29303:
URL: https://github.com/apache/spark/pull/29303#discussion_r464803900



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
##
@@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends 
SharedThriftServer {
   }
 }
   }
+
+  test("check results from get columns operation from thrift server") {
+val schemaName = "default"
+val tableName = "spark_get_col_operation"
+val schema = new StructType()
+  .add("c0", "boolean", nullable = false, "0")
+  .add("c1", "tinyint", nullable = true, "1")
+  .add("c2", "smallint", nullable = false, "2")
+  .add("c3", "int", nullable = true, "3")
+  .add("c4", "long", nullable = false, "4")
+  .add("c5", "float", nullable = true, "5")
+  .add("c6", "double", nullable = false, "6")
+  .add("c7", "decimal(38, 20)", nullable = true, "7")
+  .add("c8", "decimal(10, 2)", nullable = false, "8")
+  .add("c9", "string", nullable = true, "9")
+  .add("c10", "array", nullable = false, "10")
+  .add("c11", "array", nullable = true, "11")
+  .add("c12", "map", nullable = false, "12")
+  .add("c13", "date", nullable = true, "13")
+  .add("c14", "timestamp", nullable = false, "14")
+  .add("c15", "struct", nullable = true, "15")
+  .add("c16", "binary", nullable = false, "16")
+
+val ddl =
+  s"""
+ |CREATE TABLE $schemaName.$tableName (
+ |  ${schema.toDDL}
+ |)
+ |using parquet""".stripMargin
+
+withCLIServiceClient { client =>
+  val sessionHandle = client.openSession(user, "")
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay)
+  var status = client.getOperationStatus(opHandle)
+  while (!status.getState.isTerminal) {
+Thread.sleep(10)
+status = client.getOperationStatus(opHandle)
+  }
+  val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, 
null)

Review comment:
   BTW, it's up to you, @cloud-fan . I'll not revert this since PRBuilder 
is still working.

##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
##
@@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends 
SharedThriftServer {
   }
 }
   }
+
+  test("check results from get columns operation from thrift server") {
+val schemaName = "default"
+val tableName = "spark_get_col_operation"
+val schema = new StructType()
+  .add("c0", "boolean", nullable = false, "0")
+  .add("c1", "tinyint", nullable = true, "1")
+  .add("c2", "smallint", nullable = false, "2")
+  .add("c3", "int", nullable = true, "3")
+  .add("c4", "long", nullable = false, "4")
+  .add("c5", "float", nullable = true, "5")
+  .add("c6", "double", nullable = false, "6")
+  .add("c7", "decimal(38, 20)", nullable = true, "7")
+  .add("c8", "decimal(10, 2)", nullable = false, "8")
+  .add("c9", "string", nullable = true, "9")
+  .add("c10", "array", nullable = false, "10")
+  .add("c11", "array", nullable = true, "11")
+  .add("c12", "map", nullable = false, "12")
+  .add("c13", "date", nullable = true, "13")
+  .add("c14", "timestamp", nullable = false, "14")
+  .add("c15", "struct", nullable = true, "15")
+  .add("c16", "binary", nullable = false, "16")
+
+val ddl =
+  s"""
+ |CREATE TABLE $schemaName.$tableName (
+ |  ${schema.toDDL}
+ |)
+ |using parquet""".stripMargin
+
+withCLIServiceClient { client =>
+  val sessionHandle = client.openSession(user, "")
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay)
+  var status = client.getOperationStatus(opHandle)
+  while (!status.getState.isTerminal) {
+Thread.sleep(10)
+status = client.getOperationStatus(opHandle)
+  }
+  val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, 
null)

Review comment:
   BTW, it's up to you, @cloud-fan . I'll not revert this by myself since 
PRBuilder is still working.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] stczwd commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2

2020-08-03 Thread GitBox


stczwd commented on a change in pull request #29339:
URL: https://github.com/apache/spark/pull/29339#discussion_r464803576



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala
##
@@ -62,4 +74,31 @@ object DataSourceV2Implicits {
   new CaseInsensitiveStringMap(options.asJava)
 }
   }
+
+  def convertPartitionIndentifers(
+  partSpec: TablePartitionSpec,
+  partSchema: StructType): InternalRow = {

Review comment:
   Hm, a little ugly to me if it defined with implicits. I can change it if 
you think it's better with implicits.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, 
TableCatalog}
+
+/**
+ * Physical plan node for adding partitions of table.
+ */
+case class AlterTableAddPartitionExec(
+catalog: TableCatalog,
+ident: Identifier,
+partitionSpecsAndLocs: Seq[(TablePartitionSpec, Option[String])],
+ignoreIfExists: Boolean) extends V2CommandExec {
+  import DataSourceV2Implicits._
+  import CatalogV2Implicits._
+
+  override def output: Seq[Attribute] = Seq.empty
+
+  override protected def run(): Seq[InternalRow] = {
+val table = catalog.loadTable(ident).asPartitionable
+val partNames = table.partitionSchema().map(_.name)
+
+partitionSpecsAndLocs.foreach { case (spec, location) =>
+  val partParams = new java.util.HashMap[String, 
String](table.properties())
+  location.foreach(locationUri =>
+partParams.put("location", locationUri))
+  partParams.put("ignoreIfExists", ignoreIfExists.toString)

Review comment:
   Fine, I change it.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala
##
@@ -62,4 +74,31 @@ object DataSourceV2Implicits {
   new CaseInsensitiveStringMap(options.asJava)
 }
   }
+
+  def convertPartitionIndentifers(
+  partSpec: TablePartitionSpec,
+  partSchema: StructType): InternalRow = {
+val partValues = partSchema.map { part =>
+  part.dataType match {
+case _: ByteType =>
+  partSpec.getOrElse(part.name, "0").toByte

Review comment:
   sure. sounds reasonable to me.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, 
TableCatalog}
+
+/**
+ * Physical plan node for adding partitions of table.
+ */
+case class AlterTableAddPartitionExec(
+catalog: TableCatalog,

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29303:
URL: https://github.com/apache/spark/pull/29303#discussion_r464803705



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
##
@@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends 
SharedThriftServer {
   }
 }
   }
+
+  test("check results from get columns operation from thrift server") {
+val schemaName = "default"
+val tableName = "spark_get_col_operation"
+val schema = new StructType()
+  .add("c0", "boolean", nullable = false, "0")
+  .add("c1", "tinyint", nullable = true, "1")
+  .add("c2", "smallint", nullable = false, "2")
+  .add("c3", "int", nullable = true, "3")
+  .add("c4", "long", nullable = false, "4")
+  .add("c5", "float", nullable = true, "5")
+  .add("c6", "double", nullable = false, "6")
+  .add("c7", "decimal(38, 20)", nullable = true, "7")
+  .add("c8", "decimal(10, 2)", nullable = false, "8")
+  .add("c9", "string", nullable = true, "9")
+  .add("c10", "array", nullable = false, "10")
+  .add("c11", "array", nullable = true, "11")
+  .add("c12", "map", nullable = false, "12")
+  .add("c13", "date", nullable = true, "13")
+  .add("c14", "timestamp", nullable = false, "14")
+  .add("c15", "struct", nullable = true, "15")
+  .add("c16", "binary", nullable = false, "16")
+
+val ddl =
+  s"""
+ |CREATE TABLE $schemaName.$tableName (
+ |  ${schema.toDDL}
+ |)
+ |using parquet""".stripMargin
+
+withCLIServiceClient { client =>
+  val sessionHandle = client.openSession(user, "")
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay)
+  var status = client.getOperationStatus(opHandle)
+  while (!status.getState.isTerminal) {
+Thread.sleep(10)
+status = client.getOperationStatus(opHandle)
+  }
+  val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, 
null)

Review comment:
   BTW, this is not a flaky failure. This broke some profiles consistently.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-08-03 Thread GitBox


SparkQA commented on pull request #29020:
URL: https://github.com/apache/spark/pull/29020#issuecomment-668383772


   **[Test build #127027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127027/testReport)**
 for PR 29020 at commit 
[`2115007`](https://github.com/apache/spark/commit/211500761b442dd5fd493064d3ef88ac49225379).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464803563



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   A more complicated fix is to put the serializer instance in 
`SharedState`. This brings one benefit: users can stop the current spark 
context, create a new one with a different cache plugin. But I'm not sure if 
this is a useful feature.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29303:
URL: https://github.com/apache/spark/pull/29303#discussion_r464803567



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
##
@@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends 
SharedThriftServer {
   }
 }
   }
+
+  test("check results from get columns operation from thrift server") {
+val schemaName = "default"
+val tableName = "spark_get_col_operation"
+val schema = new StructType()
+  .add("c0", "boolean", nullable = false, "0")
+  .add("c1", "tinyint", nullable = true, "1")
+  .add("c2", "smallint", nullable = false, "2")
+  .add("c3", "int", nullable = true, "3")
+  .add("c4", "long", nullable = false, "4")
+  .add("c5", "float", nullable = true, "5")
+  .add("c6", "double", nullable = false, "6")
+  .add("c7", "decimal(38, 20)", nullable = true, "7")
+  .add("c8", "decimal(10, 2)", nullable = false, "8")
+  .add("c9", "string", nullable = true, "9")
+  .add("c10", "array", nullable = false, "10")
+  .add("c11", "array", nullable = true, "11")
+  .add("c12", "map", nullable = false, "12")
+  .add("c13", "date", nullable = true, "13")
+  .add("c14", "timestamp", nullable = false, "14")
+  .add("c15", "struct", nullable = true, "15")
+  .add("c16", "binary", nullable = false, "16")
+
+val ddl =
+  s"""
+ |CREATE TABLE $schemaName.$tableName (
+ |  ${schema.toDDL}
+ |)
+ |using parquet""".stripMargin
+
+withCLIServiceClient { client =>
+  val sessionHandle = client.openSession(user, "")
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay)
+  var status = client.getOperationStatus(opHandle)
+  while (!status.getState.isTerminal) {
+Thread.sleep(10)
+status = client.getOperationStatus(opHandle)
+  }
+  val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, 
null)

Review comment:
   Ya. It seems that Hive 2.3 is affected. And, JDK11 maven seems not 
running those test suite.
   Do we have a follow-up idea, @yaooqinn and @cloud-fan ?
   The error message is clear, `java.lang.ClassCastException`. If there is no 
idea, I'd like to recommend to revert first and pass Maven Jenkins properly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #29148: [SPARK-32349][UI] Reduce unnecessary allexecutors call when render stage page executor summary

2020-08-03 Thread GitBox


gengliangwang commented on pull request #29148:
URL: https://github.com/apache/spark/pull/29148#issuecomment-668382888


   The downside is storing duplicated info in the kvstore. I don't think 
reducing an API call and simplify the frontend logic is a good reason for such 
changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464802349



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Does it really help? `InMemoryRelation.ser` doesn't belong to any 
session and is global.
   
   I think a simpler fix is to clear it in 
`CachedBatchSerializerSuite.beforeAll` and `afterAll`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464802058



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   BTW, I'm also open to your idea. Let's see the original author and 
committer opinion. Thanks.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   BTW, I'm still open to your idea. Let's see the original author and 
committer opinion. Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668381517


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127014/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464801731



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Also, the test resource clean-up had better be centralized at 
`SharedSparkSession`.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Also, the test resource clean-up had better be centralized at 
`SharedSparkSession` in order to not to forget.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668381516


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668381516







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464801416



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Ya. I thought that way first, but this is more general way because 
SPARK-32274 make SQL cache serialization pluggable. We may have another test 
suite in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29334: [RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-03 Thread GitBox


dongjoon-hyun commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-668381120


   I also agree with @srowen 's opinion. However, if we have a clean patch 
passing all UTs, we may reconsider.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


SparkQA removed a comment on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668312504


   **[Test build #127014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127014/testReport)**
 for PR 29324 at commit 
[`114b7a3`](https://github.com/apache/spark/commit/114b7a3fa1ecae64f1cd21339289039a69e1829d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


SparkQA commented on pull request #29324:
URL: https://github.com/apache/spark/pull/29324#issuecomment-668380508


   **[Test build #127014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127014/testReport)**
 for PR 29324 at commit 
[`114b7a3`](https://github.com/apache/spark/commit/114b7a3fa1ecae64f1cd21339289039a69e1829d).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


HyukjinKwon commented on a change in pull request #29344:
URL: https://github.com/apache/spark/pull/29344#discussion_r464799986



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##
@@ -277,6 +277,9 @@ object InMemoryRelation {
 ser.get
   }
 
+  /* Visible for testing */
+  private[spark] def clearSerializer(): Unit = synchronized { ser = None }

Review comment:
   Nice, @dongjoon-hyun. What about just fixing 
`CachedBatchSerializerSuite` not to extend `SharedSparkSessionBase`? For 
example, like `ExecutorSideSQLConfSuite` or `SparkSessionExtensionSuite`. I 
think that would be simpler and a more isolated fix.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668380028







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668380028







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29340: [SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors.

2020-08-03 Thread GitBox


dongjoon-hyun commented on pull request #29340:
URL: https://github.com/apache/spark/pull/29340#issuecomment-668379660


   Please note that there is a PR for the 170 `Cache` related failures.
   - https://github.com/apache/spark/pull/29344
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


SparkQA commented on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668379760


   **[Test build #127026 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127026/testReport)**
 for PR 29344 at commit 
[`8cfe79b`](https://github.com/apache/spark/commit/8cfe79bc81e48b8a990dcad5d08fa9e4af706d5b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-03 Thread GitBox


beliefer commented on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-668379005


   @cloud-fan Thanks for your review and good idea.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun commented on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668378067


   Could you review this PR, @revans2 , @tgravescs , @cloud-fan ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-03 Thread GitBox


dongjoon-hyun commented on a change in pull request #29331:
URL: https://github.com/apache/spark/pull/29331#discussion_r464797425



##
File path: core/src/main/scala/org/apache/spark/storage/StorageLevel.scala
##
@@ -172,6 +173,7 @@ object StorageLevel {
 case "NONE" => NONE
 case "DISK_ONLY" => DISK_ONLY
 case "DISK_ONLY_2" => DISK_ONLY_2
+case "DISK_ONLY_3" => DISK_ONLY_3
 case "MEMORY_ONLY" => MEMORY_ONLY
 case "MEMORY_ONLY_2" => MEMORY_ONLY_2
 case "MEMORY_ONLY_SER" => MEMORY_ONLY_SER

Review comment:
   Thank you for review. For the question, No, @viirya . We can consider 
this as a kind of replacement of HDFS. This PR is not aiming to support 
`MEMORY_AND_DISK_3` or `MEMORY_AND_DISK_SER_3`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29338: [SPARK-32521][SQL] Bug-fix: WithFields Expression should not be foldable

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29338:
URL: https://github.com/apache/spark/pull/29338#discussion_r464797067



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
##
@@ -1420,4 +1420,48 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSparkSession {
   }.getMessage should include("No such struct field b in a, B")
 }
   }
+
+  test("withField user-facing examples") {
+checkAnswer(
+  sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+.select($"struct_col".withField("c", lit(3))),
+  Row(Row(1, 2, 3)))
+
+checkAnswer(
+  sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
+.select($"struct_col".withField("b", lit(3))),
+  Row(Row(1, 3)))
+
+checkAnswer(
+  sql("SELECT CAST(NULL AS struct) struct_col")
+.select($"struct_col".withField("c", lit(3))),
+  Row(null))
+
+checkAnswer(
+  sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col")
+.select($"struct_col".withField("b", lit(100))),
+  Row(Row(1, 100, 100)))
+
+checkAnswer(
+  sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col")
+.select($"struct_col".withField("a.c", lit(3))),
+  Row(Row(Row(1, 2, 3
+
+intercept[AnalysisException] {
+  sql("SELECT named_struct('a', named_struct('b', 1), 'a', 
named_struct('c', 2)) struct_col")
+.select($"struct_col".withField("a.c", lit(3)))
+}.getMessage should include("Ambiguous reference to fields")
+  }
+
+  test("SPARK-32521: withField with statically evaluable arguments should 
succeed") {

Review comment:
   Do we still need this test case? It's included in `withField user-facing 
examples`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-03 Thread GitBox


dongjoon-hyun commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-668377008


   I found the root cause of random failures in the master branch. Here is the 
PR.
   - https://github.com/apache/spark/pull/29344



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

2020-08-03 Thread GitBox


dongjoon-hyun commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-668376771


   Hi, @revans2 and @cloud-fan .
   
   Unfortunately, this PR causes a flaky UT failure situation. Like the 
following, this causes failures depending on the test sequence.
   ```
   $ build/sbt "sql/testOnly *.CachedBatchSerializerSuite *.CachedTableSuite"
   ...
   [info] *** 30 TESTS FAILED ***
   [error] Failed: Total 51, Failed 30, Errors 0, Passed 21
   [error] Failed tests:
   [error]  org.apache.spark.sql.CachedTableSuite
   [error] (sql/test:testOnly) sbt.TestsFailedException: Tests unsuccessful
   ```
   
   I made a PR to fix that. Currently, this causes random Jenkins failures on 
the other PRs.
   - https://github.com/apache/spark/pull/29344



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668376280







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668376280







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


SparkQA commented on pull request #29344:
URL: https://github.com/apache/spark/pull/29344#issuecomment-668376057


   **[Test build #127025 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127025/testReport)**
 for PR 29344 at commit 
[`b3bed27`](https://github.com/apache/spark/commit/b3bed277ab0c64d2352a0fb76457a3815bdc6534).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-03 Thread GitBox


cloud-fan commented on pull request #29291:
URL: https://github.com/apache/spark/pull/29291#issuecomment-668375534


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser

2020-08-03 Thread GitBox


dongjoon-hyun opened a new pull request #29344:
URL: https://github.com/apache/spark/pull/29344


   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-08-03 Thread GitBox


cloud-fan closed pull request #29291:
URL: https://github.com/apache/spark/pull/29291


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464794637



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -900,10 +889,52 @@ object JdbcUtils extends Logging {
   newTable: String,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.renameTable(oldTable, newTable))
+  }
+
+  /**
+   * Update a table from the JDBC database.
+   */
+  def alterTable(
+  conn: Connection,
+  tableName: String,
+  changes: Seq[TableChange],
+  options: JDBCOptions): Unit = {
+val dialect = JdbcDialects.get(options.url)
+if (changes.length == 1) {
+  executeStatement(conn, options, dialect.alterTable(tableName, 
changes)(0))
+} else {
+  val metadata = conn.getMetaData
+  if (!metadata.supportsTransactions) {
+throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" +
+  s" support multiple alter table changes simultaneously for database 
server" +
+s" that doesn't have transaction support.")
+  } else {
+conn.setAutoCommit(false)
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  for (sql <- dialect.alterTable(tableName, changes)) {
+statement.executeUpdate(sql)
+  }
+  conn.commit()
+} catch {
+  case e: SQLException =>
+if (conn != null) conn.rollback()
+throw e

Review comment:
   +1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29339:
URL: https://github.com/apache/spark/pull/29339#discussion_r464794249



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, 
TableCatalog}
+
+/**
+ * Physical plan node for dropping partitions of table.
+ */
+case class AlterTableDropPartitionExec(
+catalog: TableCatalog,
+ident: Identifier,
+specs: Seq[TablePartitionSpec],
+ignoreIfNotExists: Boolean,
+purge: Boolean,
+retainData: Boolean) extends V2CommandExec {

Review comment:
   We should think about how to deal with `purge` and `retainData`. Shall 
we just put them into partition properties?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29339:
URL: https://github.com/apache/spark/pull/29339#discussion_r464794249



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, 
TableCatalog}
+
+/**
+ * Physical plan node for dropping partitions of table.
+ */
+case class AlterTableDropPartitionExec(
+catalog: TableCatalog,
+ident: Identifier,
+specs: Seq[TablePartitionSpec],
+ignoreIfNotExists: Boolean,
+purge: Boolean,
+retainData: Boolean) extends V2CommandExec {

Review comment:
   We should think about how to deal with `purge` and `retainData`. Shall 
we just put them into partition properties? Or put it into `dropPartition` 
parameters?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29339:
URL: https://github.com/apache/spark/pull/29339#discussion_r464793949



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, 
TableCatalog}
+
+/**
+ * Physical plan node for dropping partitions of table.
+ */
+case class AlterTableDropPartitionExec(

Review comment:
   I think the physical plan should be general
   ```
   case class ...(
   table: SupportsPartition,
   partitionIdent: InternalRow,
   ignoreIfNotExists: Boolean) ...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29339:
URL: https://github.com/apache/spark/pull/29339#discussion_r464794051



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala
##
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, 
TableCatalog}
+
+/**
+ * Physical plan node for dropping partitions of table.
+ */
+case class AlterTableDropPartitionExec(

Review comment:
   all the checks and conversions should be done in the analyzer/planner, 
instead of runtime.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ScrapCodes commented on a change in pull request #29334: [RFC][SPARK-32495] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-03 Thread GitBox


ScrapCodes commented on a change in pull request #29334:
URL: https://github.com/apache/spark/pull/29334#discussion_r464793132



##
File path: core/pom.xml
##
@@ -224,6 +224,10 @@
   org.scala-lang
   scala-library
 
+
+  org.scala-lang
+  scala-reflect

Review comment:
   This is required because, jackson-scala-module underwent this change: 
   
   
https://github.com/FasterXML/jackson-module-scala/commit/d6071ac4597e582b1a59b5a492ac38009e17eeb6#diff-fdc3abdfd754eeb24090dbd90aeec2ce
   
   And Spark-core uses scala-reflect library at one place.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


viirya commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464791662



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -900,10 +889,52 @@ object JdbcUtils extends Logging {
   newTable: String,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.renameTable(oldTable, newTable))
+  }
+
+  /**
+   * Update a table from the JDBC database.
+   */
+  def alterTable(
+  conn: Connection,
+  tableName: String,
+  changes: Seq[TableChange],
+  options: JDBCOptions): Unit = {
+val dialect = JdbcDialects.get(options.url)
+if (changes.length == 1) {
+  executeStatement(conn, options, dialect.alterTable(tableName, 
changes)(0))
+} else {
+  val metadata = conn.getMetaData
+  if (!metadata.supportsTransactions) {
+throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" +
+  s" support multiple alter table changes simultaneously for database 
server" +
+s" that doesn't have transaction support.")
+  } else {
+conn.setAutoCommit(false)
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  for (sql <- dialect.alterTable(tableName, changes)) {
+statement.executeUpdate(sql)
+  }
+  conn.commit()
+} catch {
+  case e: SQLException =>
+if (conn != null) conn.rollback()
+throw e

Review comment:
   Do we only need to rollback when SQLException? Isn't safer to rollback 
for all exception?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


viirya commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464791154



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -900,10 +889,52 @@ object JdbcUtils extends Logging {
   newTable: String,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.renameTable(oldTable, newTable))
+  }
+
+  /**
+   * Update a table from the JDBC database.
+   */
+  def alterTable(
+  conn: Connection,
+  tableName: String,
+  changes: Seq[TableChange],
+  options: JDBCOptions): Unit = {
+val dialect = JdbcDialects.get(options.url)
+if (changes.length == 1) {
+  executeStatement(conn, options, dialect.alterTable(tableName, 
changes)(0))
+} else {
+  val metadata = conn.getMetaData
+  if (!metadata.supportsTransactions) {
+throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" +
+  s" support multiple alter table changes simultaneously for database 
server" +
+s" that doesn't have transaction support.")

Review comment:
   No need s"" here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


viirya commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464790560



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##
@@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable {
   /**
* Rename an existing table.
*
-   * TODO (SPARK-32382): Override this method in the dialects that don't 
support such syntax.
-   *
* @param oldTable The existing table.
* @param newTable New name of the table.
* @return The SQL statement to use for renaming the table.
*/
   def renameTable(oldTable: String, newTable: String): String = {
 s"ALTER TABLE $oldTable RENAME TO $newTable"
   }
+
+  /**
+   * Alter an existing table.
+   * TODO (SPARK-32523): Override this method in the dialects that have 
different syntax.
+   *
+   * @param tableName The name of the table to be altered.
+   * @param changes Changes to apply to the table.
+   * @return The SQL statements to use for altering the table.
+   */
+  def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] 
= {
+val updateClause = ArrayBuilder.make[String]
+for (change <- changes) {
+  change match {
+case add: AddColumn =>
+  add.fieldNames match {
+case Array(name) =>
+  val dataType = JdbcUtils.getJdbcType(add.dataType(), 
this).databaseTypeDefinition
+  updateClause += s"ALTER TABLE $tableName ADD COLUMN $name 
$dataType"
+  }
+case rename: RenameColumn =>
+  rename.fieldNames match {
+case Array(name) =>
+  updateClause += s"ALTER TABLE $tableName RENAME COLUMN $name TO 
${rename.newName}"
+  }
+case delete: DeleteColumn =>
+  delete.fieldNames match {
+case Array(name) =>
+  updateClause += s"ALTER TABLE $tableName DROP COLUMN $name"
+  }
+case update: UpdateColumnType =>
+  update.fieldNames match {
+case Array(name) =>
+  val dataType = JdbcUtils.getJdbcType(update.newDataType(), this)
+.databaseTypeDefinition
+  updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name 
$dataType"
+  }
+case update: UpdateColumnNullability =>
+  update.fieldNames match {
+case Array(name) =>
+  val nullable = if (update.nullable()) "NULL" else "NOT NULL"
+  updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name SET 
" + nullable

Review comment:
   Since you already use s"", just put `nullable` into it too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a change in pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page

2020-08-03 Thread GitBox


gengliangwang commented on a change in pull request #29082:
URL: https://github.com/apache/spark/pull/29082#discussion_r464789218



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -421,6 +420,51 @@ private[spark] class AppStatusStore(
 constructTaskDataList(taskDataWrapperIter)
   }
 
+  def exceptionSummary(stageId: Int, attemptId: Int): Seq[v1.ExceptionSummary] 
= {
+val (stageData, _) = stageAttempt(stageId, attemptId)
+val key = Array(stageId, attemptId, stageData.numFailedTasks)
+asOption(store.read(classOf[CachedExceptionSummary], key))

Review comment:
   If the exception summary is cached when the stage is in progress, is it 
possible that we will always see the partial results even after the stage is 
finished?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


viirya commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464788259



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -900,10 +889,52 @@ object JdbcUtils extends Logging {
   newTable: String,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.renameTable(oldTable, newTable))
+  }
+
+  /**
+   * Update a table from the JDBC database.
+   */
+  def alterTable(
+  conn: Connection,
+  tableName: String,
+  changes: Seq[TableChange],
+  options: JDBCOptions): Unit = {
+val dialect = JdbcDialects.get(options.url)
+if (changes.length == 1) {
+  executeStatement(conn, options, dialect.alterTable(tableName, 
changes)(0))
+} else {
+  val metadata = conn.getMetaData
+  if (!metadata.supportsTransactions) {
+throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" +
+  s" support multiple alter table changes simultaneously for database 
server" +
+s" that doesn't have transaction support.")
+  } else {
+conn.setAutoCommit(false)
+val statement = conn.createStatement
+try {
+  statement.setQueryTimeout(options.queryTimeout)
+  for (sql <- dialect.alterTable(tableName, changes)) {
+statement.executeUpdate(sql)
+  }
+  conn.commit()
+} catch {
+  case e: SQLException =>
+if (conn != null) conn.rollback()
+throw e
+} finally {
+  statement.close()
+  conn.setAutoCommit(true)

Review comment:
   Should we record auto commit status and restore back?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "defaul

2020-08-03 Thread GitBox


SparkQA removed a comment on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668360492


   **[Test build #127020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127020/testReport)**
 for PR 29343 at commit 
[`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668368990







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668368990







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.ti

2020-08-03 Thread GitBox


SparkQA commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668368901


   **[Test build #127020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127020/testReport)**
 for PR 29343 at commit 
[`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668365539


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127019/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464784760



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala
##
@@ -106,4 +106,75 @@ class JDBCTableCatalogSuite extends QueryTest with 
SharedSparkSession {
 Seq(Row("test", "people"), Row("test", "new_table")))
 }
   }
+
+  test("alter table ... add column") {
+withTable("h2.test.alt_table") {
+  sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _")
+  sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C1 INTEGER, C2 STRING)")
+  var t = spark.table("h2.test.alt_table")
+  var expectedSchema = new StructType()
+.add("ID", IntegerType)
+.add("C1", IntegerType)
+.add("C2", StringType)
+  assert(t.schema === expectedSchema)
+  sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C3 DOUBLE)")
+  t = spark.table("h2.test.alt_table")
+  expectedSchema = new StructType()

Review comment:
   nit: `expectedSchema = expectedSchema.add("C3", DoubleType)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668365757







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464784559



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##
@@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable {
   /**
* Rename an existing table.
*
-   * TODO (SPARK-32382): Override this method in the dialects that don't 
support such syntax.
-   *
* @param oldTable The existing table.
* @param newTable New name of the table.
* @return The SQL statement to use for renaming the table.
*/
   def renameTable(oldTable: String, newTable: String): String = {
 s"ALTER TABLE $oldTable RENAME TO $newTable"
   }
+
+  /**
+   * Alter an existing table.
+   * TODO (SPARK-32523): Override this method in the dialects that have 
different syntax.
+   *
+   * @param tableName The name of the table to be altered.
+   * @param changes Changes to apply to the table.
+   * @return The SQL statements to use for altering the table.
+   */
+  def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] 
= {
+val updateClause = ArrayBuilder.make[String]
+for (change <- changes) {
+  change match {
+case add: AddColumn =>
+  add.fieldNames match {
+case Array(name) =>
+  val dataType = JdbcUtils.getJdbcType(add.dataType(), 
this).databaseTypeDefinition
+  updateClause += s"ALTER TABLE $tableName ADD COLUMN $name 
$dataType"
+  }
+case rename: RenameColumn =>
+  rename.fieldNames match {
+case Array(name) =>
+  updateClause += s"ALTER TABLE $tableName RENAME COLUMN $name TO 
${rename.newName}"
+  }
+case delete: DeleteColumn =>
+  delete.fieldNames match {
+case Array(name) =>
+  updateClause += s"ALTER TABLE $tableName DROP COLUMN $name"
+  }
+case update: UpdateColumnType =>
+  update.fieldNames match {
+case Array(name) =>
+  val dataType = JdbcUtils.getJdbcType(update.newDataType(), this)
+.databaseTypeDefinition
+  updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name 
$dataType"
+  }
+case update: UpdateColumnNullability =>
+  update.fieldNames match {
+case Array(name) =>
+  val nullable = if (update.nullable()) "NULL" else "NOT NULL"
+  updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name SET 
" + nullable
+  }
+case _ =>
+  throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable" +

Review comment:
   ditto, `this.getClass.getName` is not useful here. Just say `Unsupported 
TableChange: $change`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668365535


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "defaul

2020-08-03 Thread GitBox


SparkQA removed a comment on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668356773


   **[Test build #127018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127018/testReport)**
 for PR 29343 at commit 
[`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668365757







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.ti

2020-08-03 Thread GitBox


SparkQA commented on pull request #29343:
URL: https://github.com/apache/spark/pull/29343#issuecomment-668365602


   **[Test build #127018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127018/testReport)**
 for PR 29343 at commit 
[`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-03 Thread GitBox


HyukjinKwon edited a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425


   Just to share the current status,
   
   In 
[ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report)
 plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is 
set by default in GitHub Actions. It uses GitHub API to create [status 
checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks)
 via 
[here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43)
 - it requires write permission to the repo. However, the permissions of 
`GITHUB_TOKEN` [does not cover the case when a PR was raised based on a 
fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token).
   
   There are many similar issues and questions, for example, in 
[codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub 
community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
 In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` 
at 
[here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189).
 Basically they used existing GitHub Actions environment variables to verify in 
their service. This is not feasible in our case because the plugin is dependent 
of GitHub API to create the status checks directly.
   
   I investigated this issue yesterday and concluded there's no clean 
workaround to make this working out of the box. 
   I am currently investigating the feasibility of _potential_ alternatives. I 
am not yet sure if all of them work or not:
   
   - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a 
GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as 
a GitHub secret in their forks so that the PRs report test results. Note that 
the contributors would _not_ be able to report the test results as their tokens 
don't have the write access to the repo.
   
   - Just run the test reports only in the commits of the repo and don't run 
them in PRs until GitHub provides an alternative to work around this. There 
look many requests such as 
[this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
   
   - Just generate a token that only has the permission to change the status 
checks, and hardcode it in the repo. At the worst case people abuse this token, 
the status checks of PRs or commits can be changed. This does not affect the 
codes and Jenkins runs as a safeguard so it might be fine. I wonder what people 
can get by abusing this status checks.
   
   I opened an INFRA ticket in ASF and a ticket for Github Actions, and am 
discussing the options. Once I verify the feasible options, we will be able to 
discuss further which one to pick (or just drop the PR at the worst case).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-08-03 Thread GitBox


gengliangwang commented on pull request #29020:
URL: https://github.com/apache/spark/pull/29020#issuecomment-668365579


   @imback82 I saw this error from github action tests:
   ```
   ERROR: this R is version 3.4.4, package 'SparkR' requires R >= 3.5
   [error] running /home/runner/work/spark/spark/R/install-dev.sh
   ```
   Could you rebase the PR to the latest master? I will merge it once the tests 
are passed. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-03 Thread GitBox


SparkQA removed a comment on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668356800


   **[Test build #127019 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127019/testReport)**
 for PR 29031 at commit 
[`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464784185



##
File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
##
@@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable {
   /**
* Rename an existing table.
*
-   * TODO (SPARK-32382): Override this method in the dialects that don't 
support such syntax.
-   *
* @param oldTable The existing table.
* @param newTable New name of the table.
* @return The SQL statement to use for renaming the table.
*/
   def renameTable(oldTable: String, newTable: String): String = {
 s"ALTER TABLE $oldTable RENAME TO $newTable"
   }
+
+  /**
+   * Alter an existing table.
+   * TODO (SPARK-32523): Override this method in the dialects that have 
different syntax.
+   *
+   * @param tableName The name of the table to be altered.
+   * @param changes Changes to apply to the table.
+   * @return The SQL statements to use for altering the table.
+   */
+  def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] 
= {
+val updateClause = ArrayBuilder.make[String]
+for (change <- changes) {
+  change match {
+case add: AddColumn =>
+  add.fieldNames match {
+case Array(name) =>

Review comment:
   we should have a better error message if the field name has more than 
one parts.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-03 Thread GitBox


SparkQA commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668365424


   **[Test build #127019 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127019/testReport)**
 for PR 29031 at commit 
[`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29031:
URL: https://github.com/apache/spark/pull/29031#issuecomment-668365535







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog

2020-08-03 Thread GitBox


cloud-fan commented on a change in pull request #29324:
URL: https://github.com/apache/spark/pull/29324#discussion_r464783902



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -900,10 +889,52 @@ object JdbcUtils extends Logging {
   newTable: String,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
+executeStatement(conn, options, dialect.renameTable(oldTable, newTable))
+  }
+
+  /**
+   * Update a table from the JDBC database.
+   */
+  def alterTable(
+  conn: Connection,
+  tableName: String,
+  changes: Seq[TableChange],
+  options: JDBCOptions): Unit = {
+val dialect = JdbcDialects.get(options.url)
+if (changes.length == 1) {
+  executeStatement(conn, options, dialect.alterTable(tableName, 
changes)(0))
+} else {
+  val metadata = conn.getMetaData
+  if (!metadata.supportsTransactions) {
+throw new 
SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" +

Review comment:
   `this.getClass.getName` is `JdbcUtils`, I don't think it's useful in the 
error message. How about
   ```
   The target JDBC server does not support transaction and can only support
   ALTER TABLE with a single action.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ted-Jiang commented on a change in pull request #26095: [SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=

2020-08-03 Thread GitBox


Ted-Jiang commented on a change in pull request #26095:
URL: https://github.com/apache/spark/pull/26095#discussion_r464783504



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -905,15 +905,8 @@ private[spark] object MapOutputTracker extends Logging {
 for (part <- startPartition until endPartition) {
   val size = status.getSizeForBlock(part)
   if (size != 0) {
-if (useOldFetchProtocol) {

Review comment:
   sorry, could you explain why  Integer directly parse the string for the 
big and long-running application not work? 
   Is it a performance problem?
   looking forward for your reply.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-08-03 Thread GitBox


AmplabJenkins removed a comment on pull request #29020:
URL: https://github.com/apache/spark/pull/29020#issuecomment-668364199







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-03 Thread GitBox


HyukjinKwon edited a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425


   Just to share the current status,
   
   In 
[ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report)
 plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is 
set by default in GitHub Actions. It uses GitHub API to create [status 
checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks)
 via 
[here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43)
 - it requires write permission to the repo. However, the permissions of 
`GITHUB_TOKEN` [does not cover the case when a PR was raised based on the 
fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token).
   
   There are many similar issues and questions, for example, in 
[codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub 
community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
 In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` 
at 
[here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189).
 Basically they used existing GitHub Actions environment variables to verify in 
their service. This is not feasible in our case because the plugin is dependent 
of GitHub API to create the status checks directly.
   
   I investigated this issue yesterday and concluded there's no clean 
workaround to make this working out of the box. 
   I am currently investigating the feasibility of _potential_ alternatives. I 
am not yet sure if all of them work or not:
   
   - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a 
GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as 
a GitHub secret in their forks so that the PRs report test results. Note that 
the contributors would _not_ be able to report the test results as their tokens 
don't have the write access to the repo.
   
   - Just run the test reports only in the commits of the repo and don't run 
them in PRs until GitHub provides an alternative to work around this. There 
look many requests such as 
[this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
   
   - Just generate a token that only has the permission to change the status 
checks, and hardcode it in the repo. At the worst case people abuse this token, 
the status checks of PRs or commits can be changed. This does not affect the 
codes and Jenkins runs as a safeguard so it might be fine. I wonder what people 
can get by abusing this status checks.
   
   I opened an INFRA ticket in ASF and a ticket for Github Actions, and am 
discussing the options. Once I verify the feasible options, we will be able to 
discuss further which one to pick (or just drop the PR at the worst case).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-08-03 Thread GitBox


cloud-fan commented on pull request #29125:
URL: https://github.com/apache/spark/pull/29125#issuecomment-668364306


   It's "one bug hides another bug". I don't think the right choice is to leave 
the bug there. `UnsafeRow` is a very fundamental component and I think it's 
better to fix all the bugs we know. Aggregate is not the only place that uses 
`UnsafeRow`. It can even be used by external data sources.
   
   If we think the decimal sum overflow is serious enough, we should consider 
backporting the actual fix, and evaluate the streaming backward compatibility 
impact.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions

2020-08-03 Thread GitBox


HyukjinKwon edited a comment on pull request #29333:
URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425


   Just to share the current status,
   
   In 
[ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report)
 plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is 
set by default in GitHub Actions. It uses GitHub API to create [status 
checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks)
 via 
[here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43)
 - it requires write permission to the repo. However, the permissions of 
`GITHUB_TOKEN` [does not cover the case when a PR was raised based on the 
fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token).
   
   There are many similar issues and questions, for example, in 
[codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub 
community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
 In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` 
at 
[here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189).
 Basically they used existing GitHub Actions environment variables to verify in 
their service. This is not feasible in our case because the plugin is dependent 
of GitHub API to create the status checks directly.
   
   I investigated this issue yesterday and concluded there's no clean 
workaround to make this working out of the box. 
   I am currently investigating the feasibility of _potential_ alternatives. I 
am not yet sure if all of them work or not:
   
   - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a 
GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as 
a GitHub secret in their forks so that the PRs report test results. Note that 
the contributors would _not_ be able to report the test results as their tokens 
don't have the write access to the repo.
   
   - Just run the test reports only in the commits of the repo and don't run 
them in PRs until GitHub provides an alternative to work around this. There 
look many requests such as 
[this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166).
   
   - Just generate a token that only has the permission to change the status 
checks, and hardcode it in the repo. At the worst case people abuse this token, 
the status checks of PRs or commits can be changed. This does not affect the 
codes and Jenkins runs as a safeguard so it might be fine. I wonder what people 
can get by abusing this status checks.
   
   I opened an INFRA ticket in ASF and a ticket for Github Actions, and am 
discussing the options. Once I verify the feasible options, we will be able to 
discuss further which one to pick (or just drop the PR at worst case).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API

2020-08-03 Thread GitBox


AmplabJenkins commented on pull request #29020:
URL: https://github.com/apache/spark/pull/29020#issuecomment-668364199







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >