[GitHub] [spark] dongjoon-hyun opened a new pull request #29346: [SPARK-32524][SQL][TESTS] CachedBatchSerializerSuite should clean up InMemoryRelation.ser
dongjoon-hyun opened a new pull request #29346: URL: https://github.com/apache/spark/pull/29346 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite
cloud-fan commented on a change in pull request #29259: URL: https://github.com/apache/spark/pull/29259#discussion_r464816729 ## File path: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java ## @@ -261,40 +263,58 @@ public void testBinaryComparatorForNullColumns() throws Exception { public void testBinaryComparatorWhenSubtractionIsDivisibleByMaxIntValue() throws Exception { int numFields = 1; +long row1Data = 11L; +long row2Data = 11L + Integer.MAX_VALUE; + +// BinaryComparator compares longs in big-endian byte order. +if (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)) { + row1Data = Long.reverseBytes(row1Data); + row2Data = Long.reverseBytes(row2Data); +} + UnsafeRow row1 = new UnsafeRow(numFields); byte[] data1 = new byte[100]; row1.pointTo(data1, computeSizeInBytes(numFields * 8)); -row1.setLong(0, 11); +row1.setLong(0, row1Data); UnsafeRow row2 = new UnsafeRow(numFields); byte[] data2 = new byte[100]; row2.pointTo(data2, computeSizeInBytes(numFields * 8)); -row2.setLong(0, 11L + Integer.MAX_VALUE); +row2.setLong(0, row2Data); insertRow(row1); insertRow(row2); -Assert.assertTrue(compare(0, 1) > 0); +Assert.assertTrue(compare(0, 1) < 0); Review comment: My understanding is: the comparator compares bytes from left to right. It doesn't assume the byte ordering of the data. It's expected that different byte order leads to different comparison results. I think a simple way to fix the test is: ``` if (LITTLE_ENDIAN) { Assert.assertTrue(compare(0, 1) < 0); } else { Assert.assertTrue(compare(0, 1) > 0); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap
HeartSaVioR commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668395101 Thanks for the quick reviews & reaction! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668394801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins removed a comment on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668394801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
huaxingao commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464697692 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala ## @@ -106,4 +106,84 @@ class JDBCTableCatalogSuite extends QueryTest with SharedSparkSession { Seq(Row("test", "people"), Row("test", "new_table"))) } } + + test("alter table ... add column") { +withTable("h2.test.alt_table") { + sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _") + sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C1 INTEGER, C2 STRING)") + var t = spark.table("h2.test.alt_table") + var expectedSchema = new StructType() +.add("ID", IntegerType) +.add("C1", IntegerType) +.add("C2", StringType) + assert(t.schema === expectedSchema) + sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C3 DOUBLE)") + t = spark.table("h2.test.alt_table") + expectedSchema = new StructType() +.add("ID", IntegerType) +.add("C1", IntegerType) +.add("C2", StringType) +.add("C3", DoubleType) + assert(t.schema === expectedSchema) +} + } + + test("alter table ... rename column") { +withTable("h2.test.alt_table") { + sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _") + sql("ALTER TABLE h2.test.alt_table RENAME COLUMN ID TO C") + val t = spark.table("h2.test.alt_table") + val expectedSchema = new StructType().add("C", IntegerType) + assert(t.schema === expectedSchema) +} + } + + test("alter table ... drop column") { +withTable("h2.test.alt_table") { + sql("CREATE TABLE h2.test.alt_table (C1 INTEGER, C2 INTEGER) USING _") + sql("ALTER TABLE h2.test.alt_table DROP COLUMN C1") + val t = spark.table("h2.test.alt_table") + val expectedSchema = new StructType().add("C2", IntegerType) + assert(t.schema === expectedSchema) +} + } + + test("alter table ... update column type") { +withTable("h2.test.alt_table") { + sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _") + sql("ALTER TABLE h2.test.alt_table ALTER COLUMN id TYPE DOUBLE") + val t = spark.table("h2.test.alt_table") + val expectedSchema = new StructType().add("ID", DoubleType) + assert(t.schema === expectedSchema) +} + } + + test("alter table ... update column nullability") { +withTable("h2.test.alt_table") { + sql("CREATE TABLE h2.test.alt_table (ID INTEGER NOT NULL) USING _") + var t = spark.table("h2.test.alt_table") + var expectedSchema = new StructType().add("ID", IntegerType, nullable = false) + // Todo: find out why the nullable for ID INTEGER NOT NULL is true + // assert(t.schema === expectedSchema) Review comment: Somehow ```t.schema``` has ```StructType().add("ID", IntegerType, nullable = true)```. I comment out this assert for now and will find out what is wrong. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #29345: [SPARK-32525][DOCS] The layout of monitoring.html is broken
sarutak opened a new pull request #29345: URL: https://github.com/apache/spark/pull/29345 ### What changes were proposed in this pull request? This PR fixes the layout of monitoring.html broken after SPARK-31566(#28354). The cause is there are 2 `` tags not closed in `monitoring.md`. ### Why are the changes needed? This is a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Build docs and the following screenshots are before/after. * Before fixed ![broken-doc](https://user-images.githubusercontent.com/4736016/89257873-fba09b80-d661-11ea-90da-06cbc0783011.png) * After fixed. ![fixed-doc2](https://user-images.githubusercontent.com/4736016/89257910-0fe49880-d662-11ea-9a85-7a1ecb1d38d6.png) Of course, the table is still rendered correctly. ![fixed-doc1](https://user-images.githubusercontent.com/4736016/89257948-225ed200-d662-11ea-80fd-d9254b44d4a0.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
huaxingao commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668394577 Thanks for reviewing. I have addressed the comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
SparkQA commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668394491 **[Test build #127029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127029/testReport)** for PR 29324 at commit [`db266fa`](https://github.com/apache/spark/commit/db266fa7e07e3e665f254bb4b1ff8a39c8666752). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464814771 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: First of all, @HyukjinKwon 's idea is not helpful because `InMemoryRelation.ser` is a singleton. > What about just fixing CachedBatchSerializerSuite not to extend SharedSparkSessionBase? I'm moving to @cloud-fan 's proposal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
HyukjinKwon commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464815045 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Thanks @dongjoon-hyun! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464814771 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: First of all, @HyukjinKwon 's idea is unable to remove the failure because `InMemoryRelation.ser` is a singleton. > What about just fixing CachedBatchSerializerSuite not to extend SharedSparkSessionBase? I'm moving to @cloud-fan 's proposal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.time
HyukjinKwon closed pull request #29343: URL: https://github.com/apache/spark/pull/29343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap
HyukjinKwon commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668393902 I am going to merge this unblock other PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap
HyukjinKwon commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668393962 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
huaxingao commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464814466 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ## @@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable { /** * Rename an existing table. * - * TODO (SPARK-32382): Override this method in the dialects that don't support such syntax. - * * @param oldTable The existing table. * @param newTable New name of the table. * @return The SQL statement to use for renaming the table. */ def renameTable(oldTable: String, newTable: String): String = { s"ALTER TABLE $oldTable RENAME TO $newTable" } + + /** + * Alter an existing table. + * TODO (SPARK-32523): Override this method in the dialects that have different syntax. + * + * @param tableName The name of the table to be altered. + * @param changes Changes to apply to the table. + * @return The SQL statements to use for altering the table. + */ + def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] = { +val updateClause = ArrayBuilder.make[String] +for (change <- changes) { + change match { +case add: AddColumn => + add.fieldNames match { +case Array(name) => Review comment: Seems to me fieldName always has only one element. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
huaxingao commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464814358 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -900,10 +889,52 @@ object JdbcUtils extends Logging { newTable: String, options: JDBCOptions): Unit = { val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.renameTable(oldTable, newTable)) + } + + /** + * Update a table from the JDBC database. + */ + def alterTable( + conn: Connection, + tableName: String, + changes: Seq[TableChange], + options: JDBCOptions): Unit = { +val dialect = JdbcDialects.get(options.url) +if (changes.length == 1) { + executeStatement(conn, options, dialect.alterTable(tableName, changes)(0)) +} else { + val metadata = conn.getMetaData + if (!metadata.supportsTransactions) { +throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" + + s" support multiple alter table changes simultaneously for database server" + +s" that doesn't have transaction support.") + } else { +conn.setAutoCommit(false) +val statement = conn.createStatement +try { + statement.setQueryTimeout(options.queryTimeout) + for (sql <- dialect.alterTable(tableName, changes)) { +statement.executeUpdate(sql) + } + conn.commit() +} catch { + case e: SQLException => +if (conn != null) conn.rollback() +throw e +} finally { + statement.close() + conn.setAutoCommit(true) Review comment: I guess we don't need to record the original auto commit status. The default value of autocommit is true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] prakharjain09 commented on pull request #29318: [SPARK-32509][SQL] Ignore unused DPP True Filter in Canonicalization
prakharjain09 commented on pull request #29318: URL: https://github.com/apache/spark/pull/29318#issuecomment-668390649 @cloud-fan Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon edited a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425 Just to share the current status, In [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is set by default in GitHub Actions. It uses GitHub API to create [status checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks) via [here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43) - it requires write permission to the repo. However, the permissions of `GITHUB_TOKEN` [does not cover the case when a PR was raised based on a fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token). There are many similar issues and questions, for example, in [codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` at [here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189). Basically they used existing GitHub Actions environment variables to verify in their service. This is not feasible in our case because the plugin is dependent of GitHub API to create the status checks directly. I investigated this issue yesterday and concluded there's no clean workaround to make this working out of the box. I am currently investigating the feasibility of _potential_ alternatives. I am not yet sure if all of them work or not: - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret in their forks so that the PRs report test results. Note that the contributors would _not_ be able to report the test results as their tokens don't have the write access to the repo. - Just run the test reports only in the commits of the repo and don't run them in PRs until GitHub provides an alternative to work around this. There look many requests such as [this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). - Just generate a token that only has the permission to change the status checks, and hardcode it in the repo. At the worst case people abuse this token, the status checks of PRs or commits can be changed. This does not affect the codes and Jenkins runs as a safeguard so it might be fine. I wonder what people can get by abusing this status checks. I opened an INFRA ticket in ASF and a Github Actions ticket in GitHub, and am discussing the options. Once I verify the feasible options, we will be able to discuss further which one to pick (or just drop the PR at the worst case). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.
gaborgsomogyi commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668388396 Had a look, the explanation and solution makes sense. Thank you guys helping me out! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.
AmplabJenkins commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668388062 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "
AmplabJenkins removed a comment on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668388062 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.ti
SparkQA commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668387822 **[Test build #127028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127028/testReport)** for PR 29343 at commit [`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.ap
HeartSaVioR commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668386032 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464805656 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Let me create another PR for @HyukjinKwon or @cloud-fan to compare. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464805656 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Let me create another PR for @HyukjinKwon or @cloud-fan idea to compare with this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464805252 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: For this question, yes. The root cause is that InMemoryRelation.ser is a kind of singleton. Since the new configuration is static conf, this will match with the semantic of `InMemoryRelation.ser`. So, the problem is the testing. > Does it really help? InMemoryRelation.ser doesn't belong to any session and is global. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464804706 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Yes. Since this is a new feature and new test coverage, I'm also not sure what is the best here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
AmplabJenkins removed a comment on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-668384049 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
AmplabJenkins commented on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-668384049 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for
dongjoon-hyun commented on a change in pull request #29303: URL: https://github.com/apache/spark/pull/29303#discussion_r464803900 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala ## @@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends SharedThriftServer { } } } + + test("check results from get columns operation from thrift server") { +val schemaName = "default" +val tableName = "spark_get_col_operation" +val schema = new StructType() + .add("c0", "boolean", nullable = false, "0") + .add("c1", "tinyint", nullable = true, "1") + .add("c2", "smallint", nullable = false, "2") + .add("c3", "int", nullable = true, "3") + .add("c4", "long", nullable = false, "4") + .add("c5", "float", nullable = true, "5") + .add("c6", "double", nullable = false, "6") + .add("c7", "decimal(38, 20)", nullable = true, "7") + .add("c8", "decimal(10, 2)", nullable = false, "8") + .add("c9", "string", nullable = true, "9") + .add("c10", "array", nullable = false, "10") + .add("c11", "array", nullable = true, "11") + .add("c12", "map", nullable = false, "12") + .add("c13", "date", nullable = true, "13") + .add("c14", "timestamp", nullable = false, "14") + .add("c15", "struct", nullable = true, "15") + .add("c16", "binary", nullable = false, "16") + +val ddl = + s""" + |CREATE TABLE $schemaName.$tableName ( + | ${schema.toDDL} + |) + |using parquet""".stripMargin + +withCLIServiceClient { client => + val sessionHandle = client.openSession(user, "") + val confOverlay = new java.util.HashMap[java.lang.String, java.lang.String] + val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay) + var status = client.getOperationStatus(opHandle) + while (!status.getState.isTerminal) { +Thread.sleep(10) +status = client.getOperationStatus(opHandle) + } + val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, null) Review comment: BTW, it's up to you, @cloud-fan . I'll not revert this since PRBuilder is still working. ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala ## @@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends SharedThriftServer { } } } + + test("check results from get columns operation from thrift server") { +val schemaName = "default" +val tableName = "spark_get_col_operation" +val schema = new StructType() + .add("c0", "boolean", nullable = false, "0") + .add("c1", "tinyint", nullable = true, "1") + .add("c2", "smallint", nullable = false, "2") + .add("c3", "int", nullable = true, "3") + .add("c4", "long", nullable = false, "4") + .add("c5", "float", nullable = true, "5") + .add("c6", "double", nullable = false, "6") + .add("c7", "decimal(38, 20)", nullable = true, "7") + .add("c8", "decimal(10, 2)", nullable = false, "8") + .add("c9", "string", nullable = true, "9") + .add("c10", "array", nullable = false, "10") + .add("c11", "array", nullable = true, "11") + .add("c12", "map", nullable = false, "12") + .add("c13", "date", nullable = true, "13") + .add("c14", "timestamp", nullable = false, "14") + .add("c15", "struct", nullable = true, "15") + .add("c16", "binary", nullable = false, "16") + +val ddl = + s""" + |CREATE TABLE $schemaName.$tableName ( + | ${schema.toDDL} + |) + |using parquet""".stripMargin + +withCLIServiceClient { client => + val sessionHandle = client.openSession(user, "") + val confOverlay = new java.util.HashMap[java.lang.String, java.lang.String] + val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay) + var status = client.getOperationStatus(opHandle) + while (!status.getState.isTerminal) { +Thread.sleep(10) +status = client.getOperationStatus(opHandle) + } + val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, null) Review comment: BTW, it's up to you, @cloud-fan . I'll not revert this by myself since PRBuilder is still working. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stczwd commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2
stczwd commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r464803576 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala ## @@ -62,4 +74,31 @@ object DataSourceV2Implicits { new CaseInsensitiveStringMap(options.asJava) } } + + def convertPartitionIndentifers( + partSpec: TablePartitionSpec, + partSchema: StructType): InternalRow = { Review comment: Hm, a little ugly to me if it defined with implicits. I can change it if you think it's better with implicits. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, TableCatalog} + +/** + * Physical plan node for adding partitions of table. + */ +case class AlterTableAddPartitionExec( +catalog: TableCatalog, +ident: Identifier, +partitionSpecsAndLocs: Seq[(TablePartitionSpec, Option[String])], +ignoreIfExists: Boolean) extends V2CommandExec { + import DataSourceV2Implicits._ + import CatalogV2Implicits._ + + override def output: Seq[Attribute] = Seq.empty + + override protected def run(): Seq[InternalRow] = { +val table = catalog.loadTable(ident).asPartitionable +val partNames = table.partitionSchema().map(_.name) + +partitionSpecsAndLocs.foreach { case (spec, location) => + val partParams = new java.util.HashMap[String, String](table.properties()) + location.foreach(locationUri => +partParams.put("location", locationUri)) + partParams.put("ignoreIfExists", ignoreIfExists.toString) Review comment: Fine, I change it. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala ## @@ -62,4 +74,31 @@ object DataSourceV2Implicits { new CaseInsensitiveStringMap(options.asJava) } } + + def convertPartitionIndentifers( + partSpec: TablePartitionSpec, + partSchema: StructType): InternalRow = { +val partValues = partSchema.map { part => + part.dataType match { +case _: ByteType => + partSpec.getOrElse(part.name, "0").toByte Review comment: sure. sounds reasonable to me. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, TableCatalog} + +/** + * Physical plan node for adding partitions of table. + */ +case class AlterTableAddPartitionExec( +catalog: TableCatalog,
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for
dongjoon-hyun commented on a change in pull request #29303: URL: https://github.com/apache/spark/pull/29303#discussion_r464803705 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala ## @@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends SharedThriftServer { } } } + + test("check results from get columns operation from thrift server") { +val schemaName = "default" +val tableName = "spark_get_col_operation" +val schema = new StructType() + .add("c0", "boolean", nullable = false, "0") + .add("c1", "tinyint", nullable = true, "1") + .add("c2", "smallint", nullable = false, "2") + .add("c3", "int", nullable = true, "3") + .add("c4", "long", nullable = false, "4") + .add("c5", "float", nullable = true, "5") + .add("c6", "double", nullable = false, "6") + .add("c7", "decimal(38, 20)", nullable = true, "7") + .add("c8", "decimal(10, 2)", nullable = false, "8") + .add("c9", "string", nullable = true, "9") + .add("c10", "array", nullable = false, "10") + .add("c11", "array", nullable = true, "11") + .add("c12", "map", nullable = false, "12") + .add("c13", "date", nullable = true, "13") + .add("c14", "timestamp", nullable = false, "14") + .add("c15", "struct", nullable = true, "15") + .add("c16", "binary", nullable = false, "16") + +val ddl = + s""" + |CREATE TABLE $schemaName.$tableName ( + | ${schema.toDDL} + |) + |using parquet""".stripMargin + +withCLIServiceClient { client => + val sessionHandle = client.openSession(user, "") + val confOverlay = new java.util.HashMap[java.lang.String, java.lang.String] + val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay) + var status = client.getOperationStatus(opHandle) + while (!status.getState.isTerminal) { +Thread.sleep(10) +status = client.getOperationStatus(opHandle) + } + val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, null) Review comment: BTW, this is not a flaky failure. This broke some profiles consistently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
SparkQA commented on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-668383772 **[Test build #127027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127027/testReport)** for PR 29020 at commit [`2115007`](https://github.com/apache/spark/commit/211500761b442dd5fd493064d3ef88ac49225379). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
cloud-fan commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464803563 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: A more complicated fix is to put the serializer instance in `SharedState`. This brings one benefit: users can stop the current spark context, create a new one with a different cache plugin. But I'm not sure if this is a useful feature. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29303: [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for
dongjoon-hyun commented on a change in pull request #29303: URL: https://github.com/apache/spark/pull/29303#discussion_r464803567 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala ## @@ -101,6 +104,135 @@ trait ThriftServerWithSparkContextSuite extends SharedThriftServer { } } } + + test("check results from get columns operation from thrift server") { +val schemaName = "default" +val tableName = "spark_get_col_operation" +val schema = new StructType() + .add("c0", "boolean", nullable = false, "0") + .add("c1", "tinyint", nullable = true, "1") + .add("c2", "smallint", nullable = false, "2") + .add("c3", "int", nullable = true, "3") + .add("c4", "long", nullable = false, "4") + .add("c5", "float", nullable = true, "5") + .add("c6", "double", nullable = false, "6") + .add("c7", "decimal(38, 20)", nullable = true, "7") + .add("c8", "decimal(10, 2)", nullable = false, "8") + .add("c9", "string", nullable = true, "9") + .add("c10", "array", nullable = false, "10") + .add("c11", "array", nullable = true, "11") + .add("c12", "map", nullable = false, "12") + .add("c13", "date", nullable = true, "13") + .add("c14", "timestamp", nullable = false, "14") + .add("c15", "struct", nullable = true, "15") + .add("c16", "binary", nullable = false, "16") + +val ddl = + s""" + |CREATE TABLE $schemaName.$tableName ( + | ${schema.toDDL} + |) + |using parquet""".stripMargin + +withCLIServiceClient { client => + val sessionHandle = client.openSession(user, "") + val confOverlay = new java.util.HashMap[java.lang.String, java.lang.String] + val opHandle = client.executeStatement(sessionHandle, ddl, confOverlay) + var status = client.getOperationStatus(opHandle) + while (!status.getState.isTerminal) { +Thread.sleep(10) +status = client.getOperationStatus(opHandle) + } + val getCol = client.getColumns(sessionHandle, "", schemaName, tableName, null) Review comment: Ya. It seems that Hive 2.3 is affected. And, JDK11 maven seems not running those test suite. Do we have a follow-up idea, @yaooqinn and @cloud-fan ? The error message is clear, `java.lang.ClassCastException`. If there is no idea, I'd like to recommend to revert first and pass Maven Jenkins properly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #29148: [SPARK-32349][UI] Reduce unnecessary allexecutors call when render stage page executor summary
gengliangwang commented on pull request #29148: URL: https://github.com/apache/spark/pull/29148#issuecomment-668382888 The downside is storing duplicated info in the kvstore. I don't think reducing an API call and simplify the frontend logic is a good reason for such changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
cloud-fan commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464802349 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Does it really help? `InMemoryRelation.ser` doesn't belong to any session and is global. I think a simpler fix is to clear it in `CachedBatchSerializerSuite.beforeAll` and `afterAll`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464802058 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: BTW, I'm also open to your idea. Let's see the original author and committer opinion. Thanks. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: BTW, I'm still open to your idea. Let's see the original author and committer opinion. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins removed a comment on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668381517 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127014/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464801731 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Also, the test resource clean-up had better be centralized at `SharedSparkSession`. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Also, the test resource clean-up had better be centralized at `SharedSparkSession` in order to not to forget. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins removed a comment on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668381516 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
AmplabJenkins commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668381516 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464801416 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Ya. I thought that way first, but this is more general way because SPARK-32274 make SQL cache serialization pluggable. We may have another test suite in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29334: [RFC][SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.
dongjoon-hyun commented on pull request #29334: URL: https://github.com/apache/spark/pull/29334#issuecomment-668381120 I also agree with @srowen 's opinion. However, if we have a clean patch passing all UTs, we may reconsider. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
SparkQA removed a comment on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668312504 **[Test build #127014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127014/testReport)** for PR 29324 at commit [`114b7a3`](https://github.com/apache/spark/commit/114b7a3fa1ecae64f1cd21339289039a69e1829d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
SparkQA commented on pull request #29324: URL: https://github.com/apache/spark/pull/29324#issuecomment-668380508 **[Test build #127014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127014/testReport)** for PR 29324 at commit [`114b7a3`](https://github.com/apache/spark/commit/114b7a3fa1ecae64f1cd21339289039a69e1829d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
HyukjinKwon commented on a change in pull request #29344: URL: https://github.com/apache/spark/pull/29344#discussion_r464799986 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -277,6 +277,9 @@ object InMemoryRelation { ser.get } + /* Visible for testing */ + private[spark] def clearSerializer(): Unit = synchronized { ser = None } Review comment: Nice, @dongjoon-hyun. What about just fixing `CachedBatchSerializerSuite` not to extend `SharedSparkSessionBase`? For example, like `ExecutorSideSQLConfSuite` or `SparkSessionExtensionSuite`. I think that would be simpler and a more isolated fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
AmplabJenkins removed a comment on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668380028 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
AmplabJenkins commented on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668380028 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29340: [SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors.
dongjoon-hyun commented on pull request #29340: URL: https://github.com/apache/spark/pull/29340#issuecomment-668379660 Please note that there is a PR for the 170 `Cache` related failures. - https://github.com/apache/spark/pull/29344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
SparkQA commented on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668379760 **[Test build #127026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127026/testReport)** for PR 29344 at commit [`8cfe79b`](https://github.com/apache/spark/commit/8cfe79bc81e48b8a990dcad5d08fa9e4af706d5b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
beliefer commented on pull request #29291: URL: https://github.com/apache/spark/pull/29291#issuecomment-668379005 @cloud-fan Thanks for your review and good idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun commented on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668378067 Could you review this PR, @revans2 , @tgravescs , @cloud-fan ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3
dongjoon-hyun commented on a change in pull request #29331: URL: https://github.com/apache/spark/pull/29331#discussion_r464797425 ## File path: core/src/main/scala/org/apache/spark/storage/StorageLevel.scala ## @@ -172,6 +173,7 @@ object StorageLevel { case "NONE" => NONE case "DISK_ONLY" => DISK_ONLY case "DISK_ONLY_2" => DISK_ONLY_2 +case "DISK_ONLY_3" => DISK_ONLY_3 case "MEMORY_ONLY" => MEMORY_ONLY case "MEMORY_ONLY_2" => MEMORY_ONLY_2 case "MEMORY_ONLY_SER" => MEMORY_ONLY_SER Review comment: Thank you for review. For the question, No, @viirya . We can consider this as a kind of replacement of HDFS. This PR is not aiming to support `MEMORY_AND_DISK_3` or `MEMORY_AND_DISK_SER_3`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29338: [SPARK-32521][SQL] Bug-fix: WithFields Expression should not be foldable
cloud-fan commented on a change in pull request #29338: URL: https://github.com/apache/spark/pull/29338#discussion_r464797067 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ## @@ -1420,4 +1420,48 @@ class ColumnExpressionSuite extends QueryTest with SharedSparkSession { }.getMessage should include("No such struct field b in a, B") } } + + test("withField user-facing examples") { +checkAnswer( + sql("SELECT named_struct('a', 1, 'b', 2) struct_col") +.select($"struct_col".withField("c", lit(3))), + Row(Row(1, 2, 3))) + +checkAnswer( + sql("SELECT named_struct('a', 1, 'b', 2) struct_col") +.select($"struct_col".withField("b", lit(3))), + Row(Row(1, 3))) + +checkAnswer( + sql("SELECT CAST(NULL AS struct) struct_col") +.select($"struct_col".withField("c", lit(3))), + Row(null)) + +checkAnswer( + sql("SELECT named_struct('a', 1, 'b', 2, 'b', 3) struct_col") +.select($"struct_col".withField("b", lit(100))), + Row(Row(1, 100, 100))) + +checkAnswer( + sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col") +.select($"struct_col".withField("a.c", lit(3))), + Row(Row(Row(1, 2, 3 + +intercept[AnalysisException] { + sql("SELECT named_struct('a', named_struct('b', 1), 'a', named_struct('c', 2)) struct_col") +.select($"struct_col".withField("a.c", lit(3))) +}.getMessage should include("Ambiguous reference to fields") + } + + test("SPARK-32521: withField with statically evaluable arguments should succeed") { Review comment: Do we still need this test case? It's included in `withField user-facing examples`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3
dongjoon-hyun commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-668377008 I found the root cause of random failures in the master branch. Here is the PR. - https://github.com/apache/spark/pull/29344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable
dongjoon-hyun commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-668376771 Hi, @revans2 and @cloud-fan . Unfortunately, this PR causes a flaky UT failure situation. Like the following, this causes failures depending on the test sequence. ``` $ build/sbt "sql/testOnly *.CachedBatchSerializerSuite *.CachedTableSuite" ... [info] *** 30 TESTS FAILED *** [error] Failed: Total 51, Failed 30, Errors 0, Passed 21 [error] Failed tests: [error] org.apache.spark.sql.CachedTableSuite [error] (sql/test:testOnly) sbt.TestsFailedException: Tests unsuccessful ``` I made a PR to fix that. Currently, this causes random Jenkins failures on the other PRs. - https://github.com/apache/spark/pull/29344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
AmplabJenkins removed a comment on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668376280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
AmplabJenkins commented on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668376280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
SparkQA commented on pull request #29344: URL: https://github.com/apache/spark/pull/29344#issuecomment-668376057 **[Test build #127025 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127025/testReport)** for PR 29344 at commit [`b3bed27`](https://github.com/apache/spark/commit/b3bed277ab0c64d2352a0fb76457a3815bdc6534). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan commented on pull request #29291: URL: https://github.com/apache/spark/pull/29291#issuecomment-668375534 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #29344: [SPARK-32524][SQL][TESTS] SharedSparkSession should clean up InMemoryRelation.ser
dongjoon-hyun opened a new pull request #29344: URL: https://github.com/apache/spark/pull/29344 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29291: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT
cloud-fan closed pull request #29291: URL: https://github.com/apache/spark/pull/29291 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
cloud-fan commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464794637 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -900,10 +889,52 @@ object JdbcUtils extends Logging { newTable: String, options: JDBCOptions): Unit = { val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.renameTable(oldTable, newTable)) + } + + /** + * Update a table from the JDBC database. + */ + def alterTable( + conn: Connection, + tableName: String, + changes: Seq[TableChange], + options: JDBCOptions): Unit = { +val dialect = JdbcDialects.get(options.url) +if (changes.length == 1) { + executeStatement(conn, options, dialect.alterTable(tableName, changes)(0)) +} else { + val metadata = conn.getMetaData + if (!metadata.supportsTransactions) { +throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" + + s" support multiple alter table changes simultaneously for database server" + +s" that doesn't have transaction support.") + } else { +conn.setAutoCommit(false) +val statement = conn.createStatement +try { + statement.setQueryTimeout(options.queryTimeout) + for (sql <- dialect.alterTable(tableName, changes)) { +statement.executeUpdate(sql) + } + conn.commit() +} catch { + case e: SQLException => +if (conn != null) conn.rollback() +throw e Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2
cloud-fan commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r464794249 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, TableCatalog} + +/** + * Physical plan node for dropping partitions of table. + */ +case class AlterTableDropPartitionExec( +catalog: TableCatalog, +ident: Identifier, +specs: Seq[TablePartitionSpec], +ignoreIfNotExists: Boolean, +purge: Boolean, +retainData: Boolean) extends V2CommandExec { Review comment: We should think about how to deal with `purge` and `retainData`. Shall we just put them into partition properties? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2
cloud-fan commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r464794249 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, TableCatalog} + +/** + * Physical plan node for dropping partitions of table. + */ +case class AlterTableDropPartitionExec( +catalog: TableCatalog, +ident: Identifier, +specs: Seq[TablePartitionSpec], +ignoreIfNotExists: Boolean, +purge: Boolean, +retainData: Boolean) extends V2CommandExec { Review comment: We should think about how to deal with `purge` and `retainData`. Shall we just put them into partition properties? Or put it into `dropPartition` parameters? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2
cloud-fan commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r464793949 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, TableCatalog} + +/** + * Physical plan node for dropping partitions of table. + */ +case class AlterTableDropPartitionExec( Review comment: I think the physical plan should be general ``` case class ...( table: SupportsPartition, partitionIdent: InternalRow, ignoreIfNotExists: Boolean) ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29339: [Spark-32512][SQL][WIP] add alter table add/drop partition command for datasourcev2
cloud-fan commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r464794051 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.connector.catalog.{CatalogV2Implicits, Identifier, TableCatalog} + +/** + * Physical plan node for dropping partitions of table. + */ +case class AlterTableDropPartitionExec( Review comment: all the checks and conversions should be done in the analyzer/planner, instead of runtime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on a change in pull request #29334: [RFC][SPARK-32495] Update jackson versions to a maintained release, to fix various security vulnerabilities.
ScrapCodes commented on a change in pull request #29334: URL: https://github.com/apache/spark/pull/29334#discussion_r464793132 ## File path: core/pom.xml ## @@ -224,6 +224,10 @@ org.scala-lang scala-library + + org.scala-lang + scala-reflect Review comment: This is required because, jackson-scala-module underwent this change: https://github.com/FasterXML/jackson-module-scala/commit/d6071ac4597e582b1a59b5a492ac38009e17eeb6#diff-fdc3abdfd754eeb24090dbd90aeec2ce And Spark-core uses scala-reflect library at one place. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
viirya commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464791662 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -900,10 +889,52 @@ object JdbcUtils extends Logging { newTable: String, options: JDBCOptions): Unit = { val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.renameTable(oldTable, newTable)) + } + + /** + * Update a table from the JDBC database. + */ + def alterTable( + conn: Connection, + tableName: String, + changes: Seq[TableChange], + options: JDBCOptions): Unit = { +val dialect = JdbcDialects.get(options.url) +if (changes.length == 1) { + executeStatement(conn, options, dialect.alterTable(tableName, changes)(0)) +} else { + val metadata = conn.getMetaData + if (!metadata.supportsTransactions) { +throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" + + s" support multiple alter table changes simultaneously for database server" + +s" that doesn't have transaction support.") + } else { +conn.setAutoCommit(false) +val statement = conn.createStatement +try { + statement.setQueryTimeout(options.queryTimeout) + for (sql <- dialect.alterTable(tableName, changes)) { +statement.executeUpdate(sql) + } + conn.commit() +} catch { + case e: SQLException => +if (conn != null) conn.rollback() +throw e Review comment: Do we only need to rollback when SQLException? Isn't safer to rollback for all exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
viirya commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464791154 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -900,10 +889,52 @@ object JdbcUtils extends Logging { newTable: String, options: JDBCOptions): Unit = { val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.renameTable(oldTable, newTable)) + } + + /** + * Update a table from the JDBC database. + */ + def alterTable( + conn: Connection, + tableName: String, + changes: Seq[TableChange], + options: JDBCOptions): Unit = { +val dialect = JdbcDialects.get(options.url) +if (changes.length == 1) { + executeStatement(conn, options, dialect.alterTable(tableName, changes)(0)) +} else { + val metadata = conn.getMetaData + if (!metadata.supportsTransactions) { +throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" + + s" support multiple alter table changes simultaneously for database server" + +s" that doesn't have transaction support.") Review comment: No need s"" here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
viirya commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464790560 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ## @@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable { /** * Rename an existing table. * - * TODO (SPARK-32382): Override this method in the dialects that don't support such syntax. - * * @param oldTable The existing table. * @param newTable New name of the table. * @return The SQL statement to use for renaming the table. */ def renameTable(oldTable: String, newTable: String): String = { s"ALTER TABLE $oldTable RENAME TO $newTable" } + + /** + * Alter an existing table. + * TODO (SPARK-32523): Override this method in the dialects that have different syntax. + * + * @param tableName The name of the table to be altered. + * @param changes Changes to apply to the table. + * @return The SQL statements to use for altering the table. + */ + def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] = { +val updateClause = ArrayBuilder.make[String] +for (change <- changes) { + change match { +case add: AddColumn => + add.fieldNames match { +case Array(name) => + val dataType = JdbcUtils.getJdbcType(add.dataType(), this).databaseTypeDefinition + updateClause += s"ALTER TABLE $tableName ADD COLUMN $name $dataType" + } +case rename: RenameColumn => + rename.fieldNames match { +case Array(name) => + updateClause += s"ALTER TABLE $tableName RENAME COLUMN $name TO ${rename.newName}" + } +case delete: DeleteColumn => + delete.fieldNames match { +case Array(name) => + updateClause += s"ALTER TABLE $tableName DROP COLUMN $name" + } +case update: UpdateColumnType => + update.fieldNames match { +case Array(name) => + val dataType = JdbcUtils.getJdbcType(update.newDataType(), this) +.databaseTypeDefinition + updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name $dataType" + } +case update: UpdateColumnNullability => + update.fieldNames match { +case Array(name) => + val nullable = if (update.nullable()) "NULL" else "NOT NULL" + updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name SET " + nullable Review comment: Since you already use s"", just put `nullable` into it too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page
gengliangwang commented on a change in pull request #29082: URL: https://github.com/apache/spark/pull/29082#discussion_r464789218 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -421,6 +420,51 @@ private[spark] class AppStatusStore( constructTaskDataList(taskDataWrapperIter) } + def exceptionSummary(stageId: Int, attemptId: Int): Seq[v1.ExceptionSummary] = { +val (stageData, _) = stageAttempt(stageId, attemptId) +val key = Array(stageId, attemptId, stageData.numFailedTasks) +asOption(store.read(classOf[CachedExceptionSummary], key)) Review comment: If the exception summary is cached when the stage is in progress, is it possible that we will always see the partial results even after the stage is finished? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
viirya commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464788259 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -900,10 +889,52 @@ object JdbcUtils extends Logging { newTable: String, options: JDBCOptions): Unit = { val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.renameTable(oldTable, newTable)) + } + + /** + * Update a table from the JDBC database. + */ + def alterTable( + conn: Connection, + tableName: String, + changes: Seq[TableChange], + options: JDBCOptions): Unit = { +val dialect = JdbcDialects.get(options.url) +if (changes.length == 1) { + executeStatement(conn, options, dialect.alterTable(tableName, changes)(0)) +} else { + val metadata = conn.getMetaData + if (!metadata.supportsTransactions) { +throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" + + s" support multiple alter table changes simultaneously for database server" + +s" that doesn't have transaction support.") + } else { +conn.setAutoCommit(false) +val statement = conn.createStatement +try { + statement.setQueryTimeout(options.queryTimeout) + for (sql <- dialect.alterTable(tableName, changes)) { +statement.executeUpdate(sql) + } + conn.commit() +} catch { + case e: SQLException => +if (conn != null) conn.rollback() +throw e +} finally { + statement.close() + conn.setAutoCommit(true) Review comment: Should we record auto commit status and restore back? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "defaul
SparkQA removed a comment on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668360492 **[Test build #127020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127020/testReport)** for PR 29343 at commit [`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "
AmplabJenkins removed a comment on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668368990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.
AmplabJenkins commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668368990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.ti
SparkQA commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668368901 **[Test build #127020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127020/testReport)** for PR 29343 at commit [`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668365539 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127019/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
cloud-fan commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464784760 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala ## @@ -106,4 +106,75 @@ class JDBCTableCatalogSuite extends QueryTest with SharedSparkSession { Seq(Row("test", "people"), Row("test", "new_table"))) } } + + test("alter table ... add column") { +withTable("h2.test.alt_table") { + sql("CREATE TABLE h2.test.alt_table (ID INTEGER) USING _") + sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C1 INTEGER, C2 STRING)") + var t = spark.table("h2.test.alt_table") + var expectedSchema = new StructType() +.add("ID", IntegerType) +.add("C1", IntegerType) +.add("C2", StringType) + assert(t.schema === expectedSchema) + sql("ALTER TABLE h2.test.alt_table ADD COLUMNS (C3 DOUBLE)") + t = spark.table("h2.test.alt_table") + expectedSchema = new StructType() Review comment: nit: `expectedSchema = expectedSchema.add("C3", DoubleType)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "
AmplabJenkins removed a comment on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668365757 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
cloud-fan commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464784559 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ## @@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable { /** * Rename an existing table. * - * TODO (SPARK-32382): Override this method in the dialects that don't support such syntax. - * * @param oldTable The existing table. * @param newTable New name of the table. * @return The SQL statement to use for renaming the table. */ def renameTable(oldTable: String, newTable: String): String = { s"ALTER TABLE $oldTable RENAME TO $newTable" } + + /** + * Alter an existing table. + * TODO (SPARK-32523): Override this method in the dialects that have different syntax. + * + * @param tableName The name of the table to be altered. + * @param changes Changes to apply to the table. + * @return The SQL statements to use for altering the table. + */ + def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] = { +val updateClause = ArrayBuilder.make[String] +for (change <- changes) { + change match { +case add: AddColumn => + add.fieldNames match { +case Array(name) => + val dataType = JdbcUtils.getJdbcType(add.dataType(), this).databaseTypeDefinition + updateClause += s"ALTER TABLE $tableName ADD COLUMN $name $dataType" + } +case rename: RenameColumn => + rename.fieldNames match { +case Array(name) => + updateClause += s"ALTER TABLE $tableName RENAME COLUMN $name TO ${rename.newName}" + } +case delete: DeleteColumn => + delete.fieldNames match { +case Array(name) => + updateClause += s"ALTER TABLE $tableName DROP COLUMN $name" + } +case update: UpdateColumnType => + update.fieldNames match { +case Array(name) => + val dataType = JdbcUtils.getJdbcType(update.newDataType(), this) +.databaseTypeDefinition + updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name $dataType" + } +case update: UpdateColumnNullability => + update.fieldNames match { +case Array(name) => + val nullable = if (update.nullable()) "NULL" else "NOT NULL" + updateClause += s"ALTER TABLE $tableName ALTER COLUMN $name SET " + nullable + } +case _ => + throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable" + Review comment: ditto, `this.getClass.getName` is not useful here. Just say `Unsupported TableChange: $change` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668365535 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "defaul
SparkQA removed a comment on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668356773 **[Test build #127018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127018/testReport)** for PR 29343 at commit [`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.
AmplabJenkins commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668365757 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29343: [SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" as well when specifying "request.timeout.ms" on replacing "default.api.ti
SparkQA commented on pull request #29343: URL: https://github.com/apache/spark/pull/29343#issuecomment-668365602 **[Test build #127018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127018/testReport)** for PR 29343 at commit [`4929439`](https://github.com/apache/spark/commit/49294395f30887be847825e0c4315e9fe291f55c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon edited a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425 Just to share the current status, In [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is set by default in GitHub Actions. It uses GitHub API to create [status checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks) via [here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43) - it requires write permission to the repo. However, the permissions of `GITHUB_TOKEN` [does not cover the case when a PR was raised based on a fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token). There are many similar issues and questions, for example, in [codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` at [here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189). Basically they used existing GitHub Actions environment variables to verify in their service. This is not feasible in our case because the plugin is dependent of GitHub API to create the status checks directly. I investigated this issue yesterday and concluded there's no clean workaround to make this working out of the box. I am currently investigating the feasibility of _potential_ alternatives. I am not yet sure if all of them work or not: - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret in their forks so that the PRs report test results. Note that the contributors would _not_ be able to report the test results as their tokens don't have the write access to the repo. - Just run the test reports only in the commits of the repo and don't run them in PRs until GitHub provides an alternative to work around this. There look many requests such as [this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). - Just generate a token that only has the permission to change the status checks, and hardcode it in the repo. At the worst case people abuse this token, the status checks of PRs or commits can be changed. This does not affect the codes and Jenkins runs as a safeguard so it might be fine. I wonder what people can get by abusing this status checks. I opened an INFRA ticket in ASF and a ticket for Github Actions, and am discussing the options. Once I verify the feasible options, we will be able to discuss further which one to pick (or just drop the PR at the worst case). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
gengliangwang commented on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-668365579 @imback82 I saw this error from github action tests: ``` ERROR: this R is version 3.4.4, package 'SparkR' requires R >= 3.5 [error] running /home/runner/work/spark/spark/R/install-dev.sh ``` Could you rebase the PR to the latest master? I will merge it once the tests are passed. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
SparkQA removed a comment on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668356800 **[Test build #127019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127019/testReport)** for PR 29031 at commit [`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
cloud-fan commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464784185 ## File path: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala ## @@ -184,15 +189,62 @@ abstract class JdbcDialect extends Serializable { /** * Rename an existing table. * - * TODO (SPARK-32382): Override this method in the dialects that don't support such syntax. - * * @param oldTable The existing table. * @param newTable New name of the table. * @return The SQL statement to use for renaming the table. */ def renameTable(oldTable: String, newTable: String): String = { s"ALTER TABLE $oldTable RENAME TO $newTable" } + + /** + * Alter an existing table. + * TODO (SPARK-32523): Override this method in the dialects that have different syntax. + * + * @param tableName The name of the table to be altered. + * @param changes Changes to apply to the table. + * @return The SQL statements to use for altering the table. + */ + def alterTable(tableName: String, changes: Seq[TableChange]): Array[String] = { +val updateClause = ArrayBuilder.make[String] +for (change <- changes) { + change match { +case add: AddColumn => + add.fieldNames match { +case Array(name) => Review comment: we should have a better error message if the field name has more than one parts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
SparkQA commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668365424 **[Test build #127019 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127019/testReport)** for PR 29031 at commit [`4585a04`](https://github.com/apache/spark/commit/4585a04ef4548022865ea0978040d9dd56df8252). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29031: [SPARK-32216][SQL] Remove redundant ProjectExec
AmplabJenkins commented on pull request #29031: URL: https://github.com/apache/spark/pull/29031#issuecomment-668365535 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29324: [SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
cloud-fan commented on a change in pull request #29324: URL: https://github.com/apache/spark/pull/29324#discussion_r464783902 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -900,10 +889,52 @@ object JdbcUtils extends Logging { newTable: String, options: JDBCOptions): Unit = { val dialect = JdbcDialects.get(options.url) +executeStatement(conn, options, dialect.renameTable(oldTable, newTable)) + } + + /** + * Update a table from the JDBC database. + */ + def alterTable( + conn: Connection, + tableName: String, + changes: Seq[TableChange], + options: JDBCOptions): Unit = { +val dialect = JdbcDialects.get(options.url) +if (changes.length == 1) { + executeStatement(conn, options, dialect.alterTable(tableName, changes)(0)) +} else { + val metadata = conn.getMetaData + if (!metadata.supportsTransactions) { +throw new SQLFeatureNotSupportedException(s"${this.getClass.getName}.alterTable doesn't" + Review comment: `this.getClass.getName` is `JdbcUtils`, I don't think it's useful in the error message. How about ``` The target JDBC server does not support transaction and can only support ALTER TABLE with a single action. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ted-Jiang commented on a change in pull request #26095: [SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=
Ted-Jiang commented on a change in pull request #26095: URL: https://github.com/apache/spark/pull/26095#discussion_r464783504 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -905,15 +905,8 @@ private[spark] object MapOutputTracker extends Logging { for (part <- startPartition until endPartition) { val size = status.getSizeForBlock(part) if (size != 0) { -if (useOldFetchProtocol) { Review comment: sorry, could you explain why Integer directly parse the string for the big and long-running application not work? Is it a performance problem? looking forward for your reply. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
AmplabJenkins removed a comment on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-668364199 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon edited a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425 Just to share the current status, In [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is set by default in GitHub Actions. It uses GitHub API to create [status checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks) via [here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43) - it requires write permission to the repo. However, the permissions of `GITHUB_TOKEN` [does not cover the case when a PR was raised based on the fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token). There are many similar issues and questions, for example, in [codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` at [here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189). Basically they used existing GitHub Actions environment variables to verify in their service. This is not feasible in our case because the plugin is dependent of GitHub API to create the status checks directly. I investigated this issue yesterday and concluded there's no clean workaround to make this working out of the box. I am currently investigating the feasibility of _potential_ alternatives. I am not yet sure if all of them work or not: - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret in their forks so that the PRs report test results. Note that the contributors would _not_ be able to report the test results as their tokens don't have the write access to the repo. - Just run the test reports only in the commits of the repo and don't run them in PRs until GitHub provides an alternative to work around this. There look many requests such as [this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). - Just generate a token that only has the permission to change the status checks, and hardcode it in the repo. At the worst case people abuse this token, the status checks of PRs or commits can be changed. This does not affect the codes and Jenkins runs as a safeguard so it might be fine. I wonder what people can get by abusing this status checks. I opened an INFRA ticket in ASF and a ticket for Github Actions, and am discussing the options. Once I verify the feasible options, we will be able to discuss further which one to pick (or just drop the PR at the worst case). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value
cloud-fan commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-668364306 It's "one bug hides another bug". I don't think the right choice is to leave the bug there. `UnsafeRow` is a very fundamental component and I think it's better to fix all the bugs we know. Aggregate is not the only place that uses `UnsafeRow`. It can even be used by external data sources. If we think the decimal sum overflow is serious enough, we should consider backporting the actual fix, and evaluate the streaming backward compatibility impact. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [WIP][SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon edited a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-668348425 Just to share the current status, In [ScaCap/action-surefire-report](https://github.com/ScaCap/action-surefire-report) plugin (and all other similar plugins), it leverages `GITHUB_TOKEN` that is set by default in GitHub Actions. It uses GitHub API to create [status checks](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-status-checks) via [here](https://github.com/ScaCap/action-surefire-report/blob/master/action.js#L42-L43) - it requires write permission to the repo. However, the permissions of `GITHUB_TOKEN` [does not cover the case when a PR was raised based on the fork](https://docs.github.com/en/actions/configuring-and-managing-workflows/authenticating-with-the-github_token#permissions-for-the-github_token). There are many similar issues and questions, for example, in [codecov](https://github.com/codecov/codecov-action/issues/29) or [GitHub community](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). In case of Codecov, they managed to remove the requirement of `GITHUB_TOKEN` at [here](https://github.com/codecov/codecov-action/issues/29#issuecomment-595062189). Basically they used existing GitHub Actions environment variables to verify in their service. This is not feasible in our case because the plugin is dependent of GitHub API to create the status checks directly. I investigated this issue yesterday and concluded there's no clean workaround to make this working out of the box. I am currently investigating the feasibility of _potential_ alternatives. I am not yet sure if all of them work or not: - Use one environment variable, for example, `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret. And then, guide committers to set `TEST_REPORT_GITHUB_TOKEN` as a GitHub secret in their forks so that the PRs report test results. Note that the contributors would _not_ be able to report the test results as their tokens don't have the write access to the repo. - Just run the test reports only in the commits of the repo and don't run them in PRs until GitHub provides an alternative to work around this. There look many requests such as [this](https://github.community/t/make-secrets-available-to-builds-of-forks/16166). - Just generate a token that only has the permission to change the status checks, and hardcode it in the repo. At the worst case people abuse this token, the status checks of PRs or commits can be changed. This does not affect the codes and Jenkins runs as a safeguard so it might be fine. I wonder what people can get by abusing this status checks. I opened an INFRA ticket in ASF and a ticket for Github Actions, and am discussing the options. Once I verify the feasible options, we will be able to discuss further which one to pick (or just drop the PR at worst case). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29020: [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API
AmplabJenkins commented on pull request #29020: URL: https://github.com/apache/spark/pull/29020#issuecomment-668364199 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org