[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238256611 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -377,41 +377,42 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) -try { + protected def withTableLocation(tableNames: String)(f: File => Unit): Unit = { +val tableLoc = + new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier(tableNames))) +try --- End diff -- nit `try {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23194#discussion_r238256563 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -377,41 +377,42 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } - test("CTAS a managed table with the existing empty directory") { -val tableLoc = new File(spark.sessionState.catalog.defaultTablePath(TableIdentifier("tab1"))) -try { + protected def withTableLocation(tableNames: String)(f: File => Unit): Unit = { --- End diff -- How about `withEmptyDirInTablePath(dirName: String)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99602/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23152 **[Test build #99602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99602/testReport)** for PR 23152 at commit [`b90a3d4`](https://github.com/apache/spark/commit/b90a3d4d627f288313ce3c11b8fccab23fa0e868). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23205 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23205 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99600/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23205 **[Test build #99600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99600/testReport)** for PR 23205 at commit [`e5f5774`](https://github.com/apache/spark/commit/e5f5774a12e2e1b67c6089269900fd4f3cbe8ca1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user golovan commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238246140 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """ --- End diff -- > I like a simpler API design; how about splitting an identifier into the two parts (db and table names) outside `JdbcDialect`? Then, how about applying `quoteIdentifer` into each name part? I'll review tonight. If meanwhile you can point me to the best place, you are welcome! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99604/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99601/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #99604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99604/testReport)** for PR 20433 at commit [`c1912a4`](https://github.com/apache/spark/commit/c1912a4cb27fbd54c20c5f57d93e050d78fb9bdf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #99601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99601/testReport)** for PR 22512 at commit [`4cdc504`](https://github.com/apache/spark/commit/4cdc5040feb3da1e4cf9efcf434138d5873fae04). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20935: [SPARK-23819][SQL] Fix InMemoryTableScanExec complex typ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20935 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22318 In the example @viirya described above (https://github.com/apache/spark/pull/22318#issuecomment-426317617), I think the interpretation is unclear to most users and I'm fairly concerned that it could be error-prone... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, v...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23186 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23186 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238239581 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """ --- End diff -- btw, I feel we need more general name handling here like `UnresolvedAttribute. parseAttributeName` https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala#L151 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238239195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """ --- End diff -- I like a simpler API design; how about splitting an identifier into the two parts (db and table names) outside `JdbcDialect`? Then, how about applying `quoteIdentifer` into each name part? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user golovan commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238235137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """ --- End diff -- Yes, as we also need to support cases when we have not only table but schema as well. E.g. for MySQL DB.TABLE becomes `DB`.`TABLE` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99599/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99598/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23203 **[Test build #99599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99599/testReport)** for PR 23203 at commit [`44c622b`](https://github.com/apache/spark/commit/44c622bf17ab642ef372d9a534b5bfc18c98a0da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23186 **[Test build #99598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99598/testReport)** for PR 23186 at commit [`88d8f0a`](https://github.com/apache/spark/commit/88d8f0a0bb9b408df6c4135448035336e329f3e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22318: [SPARK-25150][SQL] Rewrite condition when deduplicate Jo...
Github user peter-toth commented on the issue: https://github.com/apache/spark/pull/22318 @maropu, I don't think we can. Actually this is how we deal with [simpler joins](https://github.com/apache/spark/pull/22318#issuecomment-427080091) Do you think changing the behaviour is unacceptable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23203 Not look closely at the changes yet, but I think it should be very useful. Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23094: [SPARK-26077][SQL] Reserved SQL words are not esc...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23094#discussion_r238232050 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -87,11 +87,32 @@ abstract class JdbcDialect extends Serializable { def getJDBCType(dt: DataType): Option[JdbcType] = None /** - * Quotes the identifier. This is used to put quotes around the identifier in case the column - * name is a reserved keyword, or in case it contains characters that require quotes (e.g. space). + * Gets the character used for identifier quoting. + */ + def getIdentifierQuoteCharacter: String = """ --- End diff -- We need this new API instead of `quoteIdentifier`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23010: [SPARK-26012][SQL]Null and '' values should not cause dy...
Github user eatoncys commented on the issue: https://github.com/apache/spark/pull/23010 But we may forget to filter null values when we write sql. The following function protects this situation and writes the value of null partitions as __HIVE_DEFAULT_PARTITION__ def getPartitionPathString(col: String, value: String): String = { val partitionString = if (value == null || value.isEmpty) { DEFAULT_PARTITION_NAME } else { escapePathName(value) } escapePathName(col) + "=" + partitionString } --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23094 Also, can you add tests in MySQLIntegrationSuite, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23010: [SPARK-26012][SQL]Null and '' values should not cause dy...
Github user eatoncys commented on the issue: https://github.com/apache/spark/pull/23010 @cloud-fan, Thanks for review, Do you mean we should filter out invalid partitions in sql before write? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23094 @golovan thanks for your work. Can you make the title complete (... -> for table names)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23094: [SPARK-26077][SQL] Reserved SQL words are not escaped by...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23094 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/23195 @HeartSaVioR thanks for the review! cc @steveloughran maybe also interested --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user httfighter commented on the issue: https://github.com/apache/spark/pull/22683 @srowen Yes. I agree with you! These places should be consistent, otherwise it is easy to be confused. I will try to modify log statements and docs. Should I modify it in this PR or a new one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23203 I used to run pyspark test via `python python/pyspark/sql/dataframe.py`, after setting `export PYTHONPATH="$(find "${SPARK_HOME}"/python/lib/ -name 'py4j-*-src.zip' -type f | uniq)":"${SPARK_HOME}"/python`. I'm happy to see an easier way to do it, though I'm not very familiar with these scrpts. Thanks for doing it! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22683: [SPARK-25696] The storage memory displayed on spa...
Github user httfighter commented on a diff in the pull request: https://github.com/apache/spark/pull/22683#discussion_r238215152 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1164,17 +1164,17 @@ private[spark] object Utils extends Logging { } else { val (value, unit) = { if (size >= 2 * EB) { - (BigDecimal(size) / EB, "EB") + (BigDecimal(size) / EB, "EiB") --- End diff -- OK! I have submitted a modification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23120 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22468 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23120 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23010: [SPARK-26012][SQL]Null and '' values should not cause dy...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23010 The root cause is, `DynamicPartitionDataWriter` treats null and empty string as different partition values, and creates new files. However, null and empty string are converted to `__HIVE_DEFAULT_PARTITION__` at the end. I think we should deal with invalid partition values ahead, so that we don't need to worry about them during writing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238211153 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticate +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After delegation token successfully obtained Spark spreads it across nodes and renews it accordingly. --- End diff -- Yeah, it's better that way and and changed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238210009 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticate +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After delegation token successfully obtained Spark spreads it across nodes and renews it accordingly. + Delegation token uses `SCRAM` login module for authentication. + + When delegation token is available for example on an executor it can be overridden with JAAS login + configuration. +- **JAAS login configuration**: JAAS login configuration must be created and transferred to all --- End diff -- Yeah, I was focusing on transferring JAAS files which made the description more technical as it should be. Nice catch and fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238209243 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticate --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238209492 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticate +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After delegation token successfully obtained Spark spreads it across nodes and renews it accordingly. + Delegation token uses `SCRAM` login module for authentication. + + When delegation token is available for example on an executor it can be overridden with JAAS login --- End diff -- Yeah, removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/23194 LGTM, cc @HyukjinKwon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Extends Analyze commands for cached t...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22204 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23206 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23206 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23206 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23206: [SPARK-26249][SQL] Add ability to inject a rule i...
GitHub user skambha opened a pull request: https://github.com/apache/spark/pull/23206 [SPARK-26249][SQL] Add ability to inject a rule in order and to add a batch via the Spark Extension Points API ## What changes were proposed in this pull request? Add two new APIs to SparkSessionExtensions: - Inject optimizer rule in a specific order - Inject optimizer batch. Design details is here: https://drive.google.com/file/d/1m7rQZ9OZFl0MH5KS12CiIg3upLJSYfsA/view?usp=sharing ## How was this patch tested? - New unit tests have been added to the SparkSessionExtensionSuite - Ran the sql, hive, catalyst unit tests without any issues. You can merge this pull request into a Git repository by running: $ git pull https://github.com/skambha/spark ext1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23206.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23206 commit b25c31666c0136e3d98962b51a8637a21e63278a Author: Sunitha Kambhampati Date: 2018-11-30T11:42:59Z Add 2 api to the extension points and add unit tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23202: [SPARK-26248][SQL] Infer date type from CSV
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23202#discussion_r238199960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala --- @@ -98,6 +100,7 @@ class CSVInferSchema(options: CSVOptions) extends Serializable { compatibleType(typeSoFar, tryParseDecimal(field)).getOrElse(StringType) case DoubleType => tryParseDouble(field) case TimestampType => tryParseTimestamp(field) +case DateType => tryParseDate(field) --- End diff -- I see. Can we try date first above? was wondering if there was a reason to try date first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23155: [MINOR][K8S] add missing docs for podTemplateContainerNa...
Github user aditanase commented on the issue: https://github.com/apache/spark/pull/23155 @dongjoon-hyun is this ok to merge? Should I squash it and ping you again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5661/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22198 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20433 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20433 **[Test build #99604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99604/testReport)** for PR 20433 at commit [`c1912a4`](https://github.com/apache/spark/commit/c1912a4cb27fbd54c20c5f57d93e050d78fb9bdf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22198 **[Test build #99603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99603/testReport)** for PR 22198 at commit [`9a6da27`](https://github.com/apache/spark/commit/9a6da2750554cd03ce468d6bfe0aeba8b75883b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22198 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23186: [SPARK-26230][SQL]FileIndex: if case sensitive, validate...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/23186 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/23150#discussion_r238195206 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1618,6 +1618,13 @@ object SQLConf { "a SparkConf entry.") .booleanConf .createWithDefault(true) + + val LEGACY_TIME_PARSER_ENABLED = buildConf("spark.sql.legacy.timeParser.enabled") +.doc("When set to true, java.text.SimpleDateFormat is using for formatting and parsing " + + " dates/timestamps in a locale-sensitive manner. When set to false, classes from " + + "java.time.* packages are using for the same purpose.") --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/23150#discussion_r238195153 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1618,6 +1618,13 @@ object SQLConf { "a SparkConf entry.") .booleanConf .createWithDefault(true) + + val LEGACY_TIME_PARSER_ENABLED = buildConf("spark.sql.legacy.timeParser.enabled") +.doc("When set to true, java.text.SimpleDateFormat is using for formatting and parsing " + --- End diff -- nit `using` -> `used` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23202: [SPARK-26248][SQL] Infer date type from CSV
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23202#discussion_r238194142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala --- @@ -98,6 +100,7 @@ class CSVInferSchema(options: CSVOptions) extends Serializable { compatibleType(typeSoFar, tryParseDecimal(field)).getOrElse(StringType) case DoubleType => tryParseDouble(field) case TimestampType => tryParseTimestamp(field) +case DateType => tryParseDate(field) --- End diff -- Just in case, I did exact match here as well, see https://github.com/apache/spark/pull/23202/files#diff-17719da188b2c15129f848f654a0e6feR174 . If date parser didn't consume all input (`pos.getIndex != field.length`), it fails. If I move it up in type inferring pipeline, it should work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23152 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5659/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23152 **[Test build #99602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99602/testReport)** for PR 23152 at commit [`b90a3d4`](https://github.com/apache/spark/commit/b90a3d4d627f288313ce3c11b8fccab23fa0e868). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user juliuszsompolski commented on the issue: https://github.com/apache/spark/pull/23152 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23197: [SPARK-26165][Optimizer] Filter Query Date and Ti...
Github user sujith71955 closed the pull request at: https://github.com/apache/spark/pull/23197 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/23197 Closing the PR has per Li's comment. Thanks for the suggestion Li. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23205 Thanks @pgandhi999 :+1: --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [SPARK-26117][FOLLOW-UP][SQL]throw SparkOutOfMemoryError...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/23190 thanks, @cloud-fan @gatorsmile @kiszk. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user JkSelf commented on the issue: https://github.com/apache/spark/pull/23204 **Cluster info:**  | Master Node | Worker Nodes -- | -- | -- Node | 1x | 4x Processor | Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz | Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Memory | 192 GB | 384 GB Storage Main | 8 x 960G SSD | 8 x 960G SSD Network | 10Gbe Role | CM Management NameNodeSecondary NameNodeResource ManagerHive Metastore Server | DataNodeNodeManager OS Version | CentOS 7.2 | CentOS 7.2 Hadoop | Apache Hadoop 2.7.5 | Apache Hadoop 2.7.5 Hive | Apache Hive 2.2.0 |  Spark | Apache Spark 2.1.0 & Apache Spark2.3.0 |  JDK version | 1.8.0_112 | 1.8.0_112 **Related parameters setting:** Component | Parameter | Value -- | -- | -- Yarn Resource Manager | yarn.scheduler.maximum-allocation-mb | 40GB  | yarn.scheduler.minimum-allocation-mb | 1GB  | yarn.scheduler.maximum-allocation-vcores | 121  | Yarn.resourcemanager.scheduler.class | Fair Scheduler Yarn Node Manager | yarn.nodemanager.resource.memory-mb | 40GB  | yarn.nodemanager.resource.cpu-vcores | 121 Spark | spark.executor.memory | 34GB  | spark.executor.cores | 40 In above test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is [L486](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486) and [L487](https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487) . Following is the result of TPC-DS Q19 in spark2.1, spark2.3 remove L486&487, spark2.3 add L486&487 and spark2.4. spark2.1 | spark2.3 remove L486&487 | spark2.3 addL486&487 | spark2.4 -- | -- | -- | -- 49s | 47s | 307s | 270s --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5657/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23205 **[Test build #99600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99600/testReport)** for PR 23205 at commit [`e5f5774`](https://github.com/apache/spark/commit/e5f5774a12e2e1b67c6089269900fd4f3cbe8ca1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL] InterpretedMutableProjection should h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #99601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99601/testReport)** for PR 22512 at commit [`4cdc504`](https://github.com/apache/spark/commit/4cdc5040feb3da1e4cf9efcf434138d5873fae04). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238177033 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,21 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { }) // When we are regenerating the golden files we don't need to run all the configs as they // all need to return the same result - if (regenerateGoldenFiles && configs.nonEmpty) { + if (regenerateGoldenFiles) { configs.take(1) --- End diff -- For better readability, fixed in https://github.com/apache/spark/pull/22512/commits/4cdc5040feb3da1e4cf9efcf434138d5873fae04 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23205 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23205 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/23088 PR : https://github.com/apache/spark/pull/23205 for displaying the message on empty data in task summary metrics table. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table on Sta...
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/23205 cc @tgravescs @abellina --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238176184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala --- @@ -64,7 +85,7 @@ class InterpretedMutableProjection(expressions: Seq[Expression]) extends Mutable i = 0 while (i < validExprs.length) { val (_, ordinal) = validExprs(i) - mutableRow(ordinal) = buffer(ordinal) + fieldWriters(i)(buffer(ordinal)) --- End diff -- fixed in https://github.com/apache/spark/pull/22512/commits/95411c8b8b76503dff93756482642083a694b0b7 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23205: [SPARK-26253][WebUI] : Task Summary Metrics Table...
GitHub user pgandhi999 opened a pull request: https://github.com/apache/spark/pull/23205 [SPARK-26253][WebUI] : Task Summary Metrics Table on Stage Page shows empty table when no data is present Task Summary Metrics Table on Stage Page shows empty table when no data is present instead of showing a message. ## What changes were proposed in this pull request? Added a custom message to show on the task summary metrics table as well as executor summary table when no data is present. ## How was this patch tested? **Before:** ![49335550-29277d00-f615-11e8-8e62-a953e76bcebf](https://user-images.githubusercontent.com/8190/49361520-425a2780-f702-11e8-8df4-08862ab6ceb8.png) **After:** https://user-images.githubusercontent.com/8190/49362019-8699f780-f703-11e8-93e1-d02df6572923.png;> You can merge this pull request into a Git repository by running: $ git pull https://github.com/pgandhi999/spark SPARK-26253 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23205.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23205 commit e5f5774a12e2e1b67c6089269900fd4f3cbe8ca1 Author: pgandhi Date: 2018-12-03T08:11:59Z [SPARK-26253] : Task Summary Metrics Table on Stage Page shows empty table when no data is present Added a custom message to show on the task summary metrics table as well as executor summary table when no data is present. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22512: [SPARK-25498][SQL] InterpretedMutableProjection s...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22512#discussion_r238175630 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -148,12 +156,21 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { }) // When we are regenerating the golden files we don't need to run all the configs as they // all need to return the same result - if (regenerateGoldenFiles && configs.nonEmpty) { + if (regenerateGoldenFiles) { configs.take(1) --- End diff -- Actually, it returns an empty array? ``` scala> Array.empty.take(1) res0: Array[Nothing] = Array() scala> Seq.empty.take(1) res1: Seq[Nothing] = List() ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23204 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...
GitHub user JkSelf opened a pull request: https://github.com/apache/spark/pull/23204 Revert "[SPARK-21052][SQL] Add hash map metrics to join" Because of the performance degradation discussion in [SPARK-26155](https://issues.apache.org/jira/browse/SPARK-26155), currently we revert [SPARK-21052](https://issues.apache.org/jira/browse/SPARK-21052) You can merge this pull request into a Git repository by running: $ git pull https://github.com/JkSelf/spark revert-21052 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23204 commit 7d5008d11e37086f4a8206276791b9424bcf60b7 Author: jiake Date: 2018-12-03T08:18:44Z Revert "[SPARK-21052][SQL] Add hash map metrics to join" This reverts commit 18066f2e61f430b691ed8a777c9b4e5786bf9dbc. Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5656/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23190: [SPARK-26117][FOLLOW-UP][SQL]throw SparkOutOfMemo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23190 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23203#discussion_r238173745 --- Diff: python/run-tests-with-coverage --- @@ -50,8 +50,6 @@ export SPARK_CONF_DIR="$COVERAGE_DIR/conf" # This environment variable enables the coverage. export COVERAGE_PROCESS_START="$FWDIR/.coveragerc" -# If you'd like to run a specific unittest class, you could do such as -# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests ./run-tests "$@" --- End diff -- BTW, it works with coverage script as well. manually tested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23190: [SPARK-26117][FOLLOW-UP][SQL]throw SparkOutOfMemoryError...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23190 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org