[GitHub] [spark-website] Ngone51 commented on a change in pull request #356: Improve the guideline of Preparing gpg key
Ngone51 commented on a change in pull request #356: URL: https://github.com/apache/spark-website/pull/356#discussion_r691834308 ## File path: release-process.md ## @@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the process from the follow You can skip this section if you have already uploaded your key. -After generating the gpg key, you need to upload your key to a public key server. Please refer to -https://www.apache.org/dev/openpgp.html#generate-key";>https://www.apache.org/dev/openpgp.html#generate-key -for details. +Generate Key -If you want to do the release on another machine, you can transfer your secret key to that machine -via the `gpg --export-secret-keys` and `gpg --import` commands. +Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer to https://www.apache.org/dev/openpgp.html#generate-key";>generate-key for details. + +``` +:::console +$ gpg --full-gen-key +gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc. +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. + +Please select what kind of key you want: + (1) RSA and RSA (default) + (2) DSA and Elgamal + (3) DSA (sign only) + (4) RSA (sign only) +Your selection? 1 +RSA keys may be between 1024 and 4096 bits long. +What keysize do you want? (2048) 4096 +Requested keysize is 4096 bits +Please specify how long the key should be valid. + 0 = key does not expire += key expires in n days + w = key expires in n weeks + m = key expires in n months + y = key expires in n years +Key is valid for? (0) +Key does not expire at all +Is this correct? (y/N) y + +GnuPG needs to construct a user ID to identify your key. + +Real name: Robert Burrell Donkin +Email address: rdon...@apache.org +Comment: CODE SIGNING KEY +You selected this USER-ID: +"Robert Burrell Donkin (CODE SIGNING KEY) " + +Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O +You need a Passphrase to protect your secret key. +``` + +Upload Key Review comment: These are necessary steps that are abstracted from https://infra.apache.org/openpgp.html, which includes massive and too much irrelevant information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (013f2b7 -> 1235bd2)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 013f2b7 [SPARK-36512][UI][TESTS] Fix UISeleniumSuite in sql/hive-thriftserver add 1235bd2 [SPARK-36536][SQL] Use CAST for datetime in CSV/JSON by default No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/csv/CSVInferSchema.scala| 2 +- .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 12 +-- .../sql/catalyst/csv/UnivocityGenerator.scala | 6 ++-- .../spark/sql/catalyst/csv/UnivocityParser.scala | 6 ++-- .../spark/sql/catalyst/json/JSONOptions.scala | 12 +-- .../spark/sql/catalyst/json/JacksonGenerator.scala | 6 ++-- .../spark/sql/catalyst/json/JacksonParser.scala| 6 ++-- .../spark/sql/catalyst/json/JsonInferSchema.scala | 2 +- .../spark/sql/catalyst/util/DateFormatter.scala| 8 + .../sql/catalyst/util/TimestampFormatter.scala | 24 ++ .../sql/catalyst/csv/UnivocityParserSuite.scala| 4 +-- .../sql/execution/datasources/csv/CSVSuite.scala | 37 +- .../sql/execution/datasources/json/JsonSuite.scala | 37 -- 13 files changed, 139 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on a change in pull request #356: Improve the guideline of Preparing gpg key
gengliangwang commented on a change in pull request #356: URL: https://github.com/apache/spark-website/pull/356#discussion_r691803255 ## File path: release-process.md ## @@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the process from the follow You can skip this section if you have already uploaded your key. -After generating the gpg key, you need to upload your key to a public key server. Please refer to -https://www.apache.org/dev/openpgp.html#generate-key";>https://www.apache.org/dev/openpgp.html#generate-key -for details. +Generate Key -If you want to do the release on another machine, you can transfer your secret key to that machine -via the `gpg --export-secret-keys` and `gpg --import` commands. +Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer to https://www.apache.org/dev/openpgp.html#generate-key";>generate-key for details. + +``` +:::console +$ gpg --full-gen-key +gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc. +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. + +Please select what kind of key you want: + (1) RSA and RSA (default) + (2) DSA and Elgamal + (3) DSA (sign only) + (4) RSA (sign only) +Your selection? 1 +RSA keys may be between 1024 and 4096 bits long. +What keysize do you want? (2048) 4096 +Requested keysize is 4096 bits +Please specify how long the key should be valid. + 0 = key does not expire += key expires in n days + w = key expires in n weeks + m = key expires in n months + y = key expires in n years +Key is valid for? (0) +Key does not expire at all +Is this correct? (y/N) y + +GnuPG needs to construct a user ID to identify your key. + +Real name: Robert Burrell Donkin +Email address: rdon...@apache.org +Comment: CODE SIGNING KEY +You selected this USER-ID: +"Robert Burrell Donkin (CODE SIGNING KEY) " + +Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O +You need a Passphrase to protect your secret key. +``` + +Upload Key Review comment: This new section seems duplicated with https://infra.apache.org/openpgp.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (559fe96 -> 013f2b7)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 559fe96 [SPARK-35991][SQL] Add PlanStability suite for TPCH add 013f2b7 [SPARK-36512][UI][TESTS] Fix UISeleniumSuite in sql/hive-thriftserver No new revisions were added by this update. Summary of changes: .../sql/hive/thriftserver/UISeleniumSuite.scala| 31 ++ 1 file changed, 26 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] cloud-fan commented on a change in pull request #356: Improve the guideline of Preparing gpg key
cloud-fan commented on a change in pull request #356: URL: https://github.com/apache/spark-website/pull/356#discussion_r691793523 ## File path: release-process.md ## @@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the process from the follow You can skip this section if you have already uploaded your key. -After generating the gpg key, you need to upload your key to a public key server. Please refer to -https://www.apache.org/dev/openpgp.html#generate-key";>https://www.apache.org/dev/openpgp.html#generate-key -for details. +Generate Key -If you want to do the release on another machine, you can transfer your secret key to that machine -via the `gpg --export-secret-keys` and `gpg --import` commands. +Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer to https://www.apache.org/dev/openpgp.html#generate-key";>generate-key for details. + +``` +:::console +$ gpg --full-gen-key +gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc. +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. + +Please select what kind of key you want: + (1) RSA and RSA (default) + (2) DSA and Elgamal + (3) DSA (sign only) + (4) RSA (sign only) +Your selection? 1 +RSA keys may be between 1024 and 4096 bits long. +What keysize do you want? (2048) 4096 +Requested keysize is 4096 bits +Please specify how long the key should be valid. + 0 = key does not expire += key expires in n days + w = key expires in n weeks + m = key expires in n months + y = key expires in n years +Key is valid for? (0) +Key does not expire at all +Is this correct? (y/N) y + +GnuPG needs to construct a user ID to identify your key. + +Real name: Robert Burrell Donkin +Email address: rdon...@apache.org +Comment: CODE SIGNING KEY +You selected this USER-ID: +"Robert Burrell Donkin (CODE SIGNING KEY) " + +Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O +You need a Passphrase to protect your secret key. +``` + +Upload Key -The last step is to update the KEYS file with your code signing key -https://www.apache.org/dev/openpgp.html#export-public-key";>https://www.apache.org/dev/openpgp.html#export-public-key +After generating the key, we should upload the public key to a https://infra.apache.org/release-signing.html#keyserver";>public key server. +Upload the public key either by: + +(Recommended) +First, export all public keys to ASCII-armored public key by +``` +:::console +$ gpg --export --armor +``` +or export the specific public key if you know the https://infra.apache.org/release-signing.html#key-id";>key ID, e.g., +``` +:::console +$ gpg --export --armor AD741727 +``` +(Please refer to https://infra.apache.org/openpgp.html#export-public-key";>export-public-key for details.) + +Second, copy-paste your ASCII-armored public key to http://keyserver.ubuntu.com:11371/#submitKey";>OpenPGP Keyserver and submit. + +or + +Use gpg command to upload, e.g., + +``` +$ gpg --send-key B13131DE2 Review comment: Can we use the same id in the previous example `AD741727`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] cloud-fan commented on a change in pull request #356: Improve the guideline of Preparing gpg key
cloud-fan commented on a change in pull request #356: URL: https://github.com/apache/spark-website/pull/356#discussion_r691793389 ## File path: release-process.md ## @@ -39,15 +39,82 @@ If you are a new Release Manager, you can read up on the process from the follow You can skip this section if you have already uploaded your key. -After generating the gpg key, you need to upload your key to a public key server. Please refer to -https://www.apache.org/dev/openpgp.html#generate-key";>https://www.apache.org/dev/openpgp.html#generate-key -for details. +Generate Key -If you want to do the release on another machine, you can transfer your secret key to that machine -via the `gpg --export-secret-keys` and `gpg --import` commands. +Here's an example of gpg 2.0.12. If you use gpg version 1 series, please refer to https://www.apache.org/dev/openpgp.html#generate-key";>generate-key for details. + +``` +:::console +$ gpg --full-gen-key +gpg (GnuPG) 2.0.12; Copyright (C) 2009 Free Software Foundation, Inc. +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. + +Please select what kind of key you want: + (1) RSA and RSA (default) + (2) DSA and Elgamal + (3) DSA (sign only) + (4) RSA (sign only) +Your selection? 1 +RSA keys may be between 1024 and 4096 bits long. +What keysize do you want? (2048) 4096 +Requested keysize is 4096 bits +Please specify how long the key should be valid. + 0 = key does not expire += key expires in n days + w = key expires in n weeks + m = key expires in n months + y = key expires in n years +Key is valid for? (0) +Key does not expire at all +Is this correct? (y/N) y + +GnuPG needs to construct a user ID to identify your key. + +Real name: Robert Burrell Donkin +Email address: rdon...@apache.org +Comment: CODE SIGNING KEY +You selected this USER-ID: +"Robert Burrell Donkin (CODE SIGNING KEY) " + +Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O +You need a Passphrase to protect your secret key. +``` + +Upload Key -The last step is to update the KEYS file with your code signing key -https://www.apache.org/dev/openpgp.html#export-public-key";>https://www.apache.org/dev/openpgp.html#export-public-key +After generating the key, we should upload the public key to a https://infra.apache.org/release-signing.html#keyserver";>public key server. +Upload the public key either by: + +(Recommended) +First, export all public keys to ASCII-armored public key by +``` +:::console +$ gpg --export --armor +``` +or export the specific public key if you know the https://infra.apache.org/release-signing.html#key-id";>key ID, e.g., +``` +:::console +$ gpg --export --armor AD741727 Review comment: is there a way to find the id of the newly created CODE SIGNING KEY? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] Ngone51 commented on pull request #356: Improve the guideline of Preparing gpg key
Ngone51 commented on pull request #356: URL: https://github.com/apache/spark-website/pull/356#issuecomment-901591270 cc @HyukjinKwon @cloud-fan @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] Ngone51 opened a new pull request #356: Improve the guideline of Preparing gpg key
Ngone51 opened a new pull request #356: URL: https://github.com/apache/spark-website/pull/356 This PR proposes to improve the guideline of `Preparing gpg key` section in the release process. This's how it looks like before and after: ### Before https://user-images.githubusercontent.com/16397174/130005588-5e1f6b54-b996-410c-bd81-0e1885e742a2.png";> ### After https://user-images.githubusercontent.com/16397174/130005226-760a6901-c603-4532-b5e1-ee2f5e1e47b8.png";> https://user-images.githubusercontent.com/16397174/130005256-7a616564-7484-4dc3-9bc0-ce67948e88cf.png";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (07d173a -> c458edb)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 07d173a [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES add c458edb [SPARK-36371][SQL] Support raw string literal No new revisions were added by this update. Summary of changes: docs/sql-ref-literals.md | 13 +++- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 2 + .../spark/sql/catalyst/parser/ParserUtils.scala| 72 -- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 9 +++ 4 files changed, 61 insertions(+), 35 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yutoacts closed pull request #350: [SPARK-36335] Add local-cluster docs to developer-tools.md
yutoacts closed pull request #350: URL: https://github.com/apache/spark-website/pull/350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yutoacts commented on pull request #350: [SPARK-36335] Add local-cluster docs to developer-tools.md
yutoacts commented on pull request #350: URL: https://github.com/apache/spark-website/pull/350#issuecomment-901576855 It ended up as https://github.com/apache/spark/pull/33537. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8f3b4c4 [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES 8f3b4c4 is described below commit 8f3b4c4b7d717c5cfc922ce160a1da42303d5304 Author: Wenchen Fan AuthorDate: Thu Aug 19 11:04:05 2021 +0800 [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/30648 ANALYZE TABLE and TABLES are essentially the same command, it's weird to put them in 2 different doc pages. This PR proposes to merge them into one doc page. ### Why are the changes needed? simplify the doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #33781 from cloud-fan/doc. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan (cherry picked from commit 07d173a8b0a19a2912905387bcda10e9db3c43c6) Signed-off-by: Wenchen Fan --- docs/sql-ref-syntax-aux-analyze-table.md | 85 +++ docs/sql-ref-syntax-aux-analyze-tables.md | 110 -- docs/sql-ref-syntax-aux-analyze.md| 23 --- docs/sql-ref-syntax.md| 1 - 4 files changed, 70 insertions(+), 149 deletions(-) diff --git a/docs/sql-ref-syntax-aux-analyze-table.md b/docs/sql-ref-syntax-aux-analyze-table.md index da53385..0e65de1 100644 --- a/docs/sql-ref-syntax-aux-analyze-table.md +++ b/docs/sql-ref-syntax-aux-analyze-table.md @@ -21,7 +21,8 @@ license: | ### Description -The `ANALYZE TABLE` statement collects statistics about the table to be used by the query optimizer to find a better query execution plan. +The `ANALYZE TABLE` statement collects statistics about one specific table or all the tables in one specified database, +that are to be used by the query optimizer to find a better query execution plan. ### Syntax @@ -30,6 +31,10 @@ ANALYZE TABLE table_identifier [ partition_spec ] COMPUTE STATISTICS [ NOSCAN | FOR COLUMNS col [ , ... ] | FOR ALL COLUMNS ] ``` +```sql +ANALYZE TABLES [ { FROM | IN } database_name ] COMPUTE STATISTICS [ NOSCAN ] +``` + ### Parameters * **table_identifier** @@ -45,22 +50,31 @@ ANALYZE TABLE table_identifier [ partition_spec ] **Syntax:** `PARTITION ( partition_col_name [ = partition_col_val ] [ , ... ] )` -* **[ NOSCAN `|` FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS ]** +* **{ FROM `|` IN } database_name** + + Specifies the name of the database to be analyzed. Without a database name, `ANALYZE` collects all tables in the current database that the current user has permission to analyze. + +* **NOSCAN** + + Collects only the table's size in bytes (which does not require scanning the entire table). - * If no analyze option is specified, `ANALYZE TABLE` collects the table's number of rows and size in bytes. - * **NOSCAN** +* **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS** - Collects only the table's size in bytes (which does not require scanning the entire table). - * **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS** + Collects column statistics for each column specified, or alternatively for every column, as well as table statistics. - Collects column statistics for each column specified, or alternatively for every column, as well as table statistics. +If no analyze option is specified, both number of rows and size in bytes are collected. ### Examples ```sql +CREATE DATABASE school_db; +USE school_db; + +CREATE TABLE teachers (name STRING, teacher_id INT); +INSERT INTO teachers VALUES ('Tom', 1), ('Jerry', 2); + CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY (student_id); -INSERT INTO students PARTITION (student_id = 11) VALUES ('Mark'); -INSERT INTO students PARTITION (student_id = 22) VALUES ('John'); +INSERT INTO students VALUES ('Mark', 11), ('John', 22); ANALYZE TABLE students COMPUTE STATISTICS NOSCAN; @@ -73,7 +87,6 @@ DESC EXTENDED students; | ...| ...|...| | Statistics| 864 bytes| | | ...| ...|...| -| Partition Provider| Catalog| | +++---+ ANALYZE TABLE students COMPUTE STATISTICS; @@ -87,7 +100,6 @@ DESC EXTENDED students; | ...| ...|...| | Statistics| 864 bytes, 2 rows| | | ...| ...|...| -| Partition Provider| Catalog| | +++---
[spark] branch master updated: [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 07d173a [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES 07d173a is described below commit 07d173a8b0a19a2912905387bcda10e9db3c43c6 Author: Wenchen Fan AuthorDate: Thu Aug 19 11:04:05 2021 +0800 [SPARK-33687][SQL][DOC][FOLLOWUP] Merge the doc pages of ANALYZE TABLE and ANALYZE TABLES ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/30648 ANALYZE TABLE and TABLES are essentially the same command, it's weird to put them in 2 different doc pages. This PR proposes to merge them into one doc page. ### Why are the changes needed? simplify the doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #33781 from cloud-fan/doc. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- docs/sql-ref-syntax-aux-analyze-table.md | 85 +++ docs/sql-ref-syntax-aux-analyze-tables.md | 110 -- docs/sql-ref-syntax-aux-analyze.md| 23 --- docs/sql-ref-syntax.md| 1 - 4 files changed, 70 insertions(+), 149 deletions(-) diff --git a/docs/sql-ref-syntax-aux-analyze-table.md b/docs/sql-ref-syntax-aux-analyze-table.md index da53385..0e65de1 100644 --- a/docs/sql-ref-syntax-aux-analyze-table.md +++ b/docs/sql-ref-syntax-aux-analyze-table.md @@ -21,7 +21,8 @@ license: | ### Description -The `ANALYZE TABLE` statement collects statistics about the table to be used by the query optimizer to find a better query execution plan. +The `ANALYZE TABLE` statement collects statistics about one specific table or all the tables in one specified database, +that are to be used by the query optimizer to find a better query execution plan. ### Syntax @@ -30,6 +31,10 @@ ANALYZE TABLE table_identifier [ partition_spec ] COMPUTE STATISTICS [ NOSCAN | FOR COLUMNS col [ , ... ] | FOR ALL COLUMNS ] ``` +```sql +ANALYZE TABLES [ { FROM | IN } database_name ] COMPUTE STATISTICS [ NOSCAN ] +``` + ### Parameters * **table_identifier** @@ -45,22 +50,31 @@ ANALYZE TABLE table_identifier [ partition_spec ] **Syntax:** `PARTITION ( partition_col_name [ = partition_col_val ] [ , ... ] )` -* **[ NOSCAN `|` FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS ]** +* **{ FROM `|` IN } database_name** + + Specifies the name of the database to be analyzed. Without a database name, `ANALYZE` collects all tables in the current database that the current user has permission to analyze. + +* **NOSCAN** + + Collects only the table's size in bytes (which does not require scanning the entire table). - * If no analyze option is specified, `ANALYZE TABLE` collects the table's number of rows and size in bytes. - * **NOSCAN** +* **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS** - Collects only the table's size in bytes (which does not require scanning the entire table). - * **FOR COLUMNS col [ , ... ] `|` FOR ALL COLUMNS** + Collects column statistics for each column specified, or alternatively for every column, as well as table statistics. - Collects column statistics for each column specified, or alternatively for every column, as well as table statistics. +If no analyze option is specified, both number of rows and size in bytes are collected. ### Examples ```sql +CREATE DATABASE school_db; +USE school_db; + +CREATE TABLE teachers (name STRING, teacher_id INT); +INSERT INTO teachers VALUES ('Tom', 1), ('Jerry', 2); + CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY (student_id); -INSERT INTO students PARTITION (student_id = 11) VALUES ('Mark'); -INSERT INTO students PARTITION (student_id = 22) VALUES ('John'); +INSERT INTO students VALUES ('Mark', 11), ('John', 22); ANALYZE TABLE students COMPUTE STATISTICS NOSCAN; @@ -73,7 +87,6 @@ DESC EXTENDED students; | ...| ...|...| | Statistics| 864 bytes| | | ...| ...|...| -| Partition Provider| Catalog| | +++---+ ANALYZE TABLE students COMPUTE STATISTICS; @@ -87,7 +100,6 @@ DESC EXTENDED students; | ...| ...|...| | Statistics| 864 bytes, 2 rows| | | ...| ...|...| -| Partition Provider| Catalog| | +++---+ ANALYZE TABLE students PARTITION (student_id = 11) COMPUTE STATISTICS; @@ -101,7 +113,6 @@ DESC EXTENDED
[spark] branch master updated: [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2fc9c0b [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker 2fc9c0b is described below commit 2fc9c0bfb5d43a6ee1dbbf941b84e4c3dd74d8ef Author: tooptoop4 <33283496+toopto...@users.noreply.github.com> AuthorDate: Thu Aug 19 10:31:10 2021 +0900 [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker ### What changes were proposed in this pull request? This log should at least be WARN not INFO (in org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala ) "Expected $numSubmittedFiles files, but only saw $numFiles." ### Why are the changes needed? INFO logs don't indicate possible issue but WARN logs should ### Does this PR introduce any user-facing change? Yes, Log level changed. ### How was this patch tested? manual, trivial change Closes #2 from tooptoop4/warn. Authored-by: tooptoop4 <33283496+toopto...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon --- .../apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala index f3815ab..a8fae66 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala @@ -169,7 +169,7 @@ class BasicWriteTaskStatsTracker( } if (numSubmittedFiles != numFiles) { - logInfo(s"Expected $numSubmittedFiles files, but only saw $numFiles. " + + logWarning(s"Expected $numSubmittedFiles files, but only saw $numFiles. " + "This could be due to the output format not writing empty files, " + "or files being not immediately visible in the filesystem.") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #355: Use ASF mail archives not defunct nabble links
HyukjinKwon commented on pull request #355: URL: https://github.com/apache/spark-website/pull/355#issuecomment-901519830 lgtm2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (ae07b63 -> 8862657)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from ae07b63 [HOTFIX] Add missing deps update for commit protocol change add 8862657 [SPARK-34949][CORE][3.0] Prevent BlockManager reregister when Executor is shutting down No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/executor/Executor.scala | 2 +- .../org/apache/spark/executor/ExecutorSuite.scala | 68 +- 2 files changed, 53 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (31d771d -> 79ea014)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 31d771d [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex add 79ea014 [SPARK-35011][CORE][3.1] Avoid Block Manager registrations when StopExecutor msg is in-flight No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/HeartbeatReceiver.scala | 4 +- .../spark/storage/BlockManagerMasterEndpoint.scala | 118 +++-- .../main/scala/org/apache/spark/util/Utils.scala | 7 ++ .../apache/spark/storage/BlockManagerSuite.scala | 20 +++- 4 files changed, 109 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f2e593b [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3 f2e593b is described below commit f2e593bcf1a1aa8dde9f73b77e4863ceed5a7e28 Author: itholic AuthorDate: Wed Aug 18 11:38:59 2021 -0700 [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3 ### What changes were proposed in this pull request? This PR proposes to fix the behavior of `astype` for `CategoricalDtype` to follow pandas 1.3. **Before:** ```python >>> pcat 0a 1b 2c dtype: category Categories (3, object): ['a', 'b', 'c'] >>> pcat.astype(CategoricalDtype(["b", "c", "a"])) 0a 1b 2c dtype: category Categories (3, object): ['b', 'c', 'a'] ``` **After:** ```python >>> pcat 0a 1b 2c dtype: category Categories (3, object): ['a', 'b', 'c'] >>> pcat.astype(CategoricalDtype(["b", "c", "a"])) 0a 1b 2c dtype: category Categories (3, object): ['a', 'b', 'c'] # CategoricalDtype is not updated if dtype is the same ``` `CategoricalDtype` is treated as a same `dtype` if the unique values are the same. ```python >>> pcat1 = pser.astype(CategoricalDtype(["b", "c", "a"])) >>> pcat2 = pser.astype(CategoricalDtype(["a", "b", "c"])) >>> pcat1.dtype == pcat2.dtype True ``` ### Why are the changes needed? We should follow the latest pandas as much as possible. ### Does this PR introduce _any_ user-facing change? Yes, the behavior is changed as example in the PR description. ### How was this patch tested? Unittest Closes #33757 from itholic/SPARK-36368. Authored-by: itholic Signed-off-by: Takuya UESHIN --- python/pyspark/pandas/categorical.py | 3 ++- python/pyspark/pandas/data_type_ops/categorical_ops.py | 4 +++- .../pandas/tests/data_type_ops/test_categorical_ops.py | 6 ++ python/pyspark/pandas/tests/indexes/test_category.py | 16 +++- python/pyspark/pandas/tests/test_categorical.py | 16 +++- 5 files changed, 21 insertions(+), 24 deletions(-) diff --git a/python/pyspark/pandas/categorical.py b/python/pyspark/pandas/categorical.py index 77a3cee..fa11228 100644 --- a/python/pyspark/pandas/categorical.py +++ b/python/pyspark/pandas/categorical.py @@ -22,6 +22,7 @@ from pandas.api.types import CategoricalDtype, is_dict_like, is_list_like from pyspark.pandas.internal import InternalField from pyspark.pandas.spark import functions as SF +from pyspark.pandas.data_type_ops.categorical_ops import _to_cat from pyspark.sql import functions as F from pyspark.sql.types import StructField @@ -735,7 +736,7 @@ class CategoricalAccessor(object): return self._data.copy() else: dtype = CategoricalDtype(categories=new_categories, ordered=ordered) -psser = self._data.astype(dtype) +psser = _to_cat(self._data).astype(dtype) if inplace: internal = self._data._psdf._internal.with_new_spark_column( diff --git a/python/pyspark/pandas/data_type_ops/categorical_ops.py b/python/pyspark/pandas/data_type_ops/categorical_ops.py index b524cdd..c1be683 100644 --- a/python/pyspark/pandas/data_type_ops/categorical_ops.py +++ b/python/pyspark/pandas/data_type_ops/categorical_ops.py @@ -57,7 +57,9 @@ class CategoricalOps(DataTypeOps): def astype(self, index_ops: IndexOpsLike, dtype: Union[str, type, Dtype]) -> IndexOpsLike: dtype, _ = pandas_on_spark_type(dtype) -if isinstance(dtype, CategoricalDtype) and cast(CategoricalDtype, dtype).categories is None: +if isinstance(dtype, CategoricalDtype) and ( +(dtype.categories is None) or (index_ops.dtype == dtype) +): return index_ops.copy() return _to_cat(index_ops).astype(dtype) diff --git a/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py b/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py index 11871ea..5e79eb3 100644 --- a/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py +++ b/python/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py @@ -192,13 +192,11 @@ class CategoricalOpsTest(PandasOnSparkTestCase, TestCasesUtils): self.assert_eq(pser.astype("category"), psser.astype("category")) cat_type = CategoricalDtype(categories=[3, 1, 2]) +# CategoricalDtype is not updated if the dtype is same from pandas 1.3. if LooseVersion(pd.__version__) >= LooseVersion("1.3"): -# TODO(SPARK-36367): Fix the b
[spark] branch master updated: [SPARK-36388][SPARK-36386][PYTHON][FOLLOWUP] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c91ae54 [SPARK-36388][SPARK-36386][PYTHON][FOLLOWUP] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3 c91ae54 is described below commit c91ae544fdd44c67fe1e4c73825570dbe71a3206 Author: itholic AuthorDate: Wed Aug 18 11:17:01 2021 -0700 [SPARK-36388][SPARK-36386][PYTHON][FOLLOWUP] Fix DataFrame groupby-rolling and groupby-expanding to follow pandas 1.3 ### What changes were proposed in this pull request? This PR is followup for https://github.com/apache/spark/pull/33646 to add missing tests. ### Why are the changes needed? Some tests are missing ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unittest Closes #33776 from itholic/SPARK-36388-followup. Authored-by: itholic Signed-off-by: Takuya UESHIN --- .../pandas/tests/test_ops_on_diff_frames_groupby_expanding.py| 9 ++--- .../pandas/tests/test_ops_on_diff_frames_groupby_rolling.py | 9 ++--- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py index 223adea..634cbd7 100644 --- a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py +++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py @@ -52,14 +52,17 @@ class OpsOnDiffFramesGroupByExpandingTest(PandasOnSparkTestCase, TestUtils): psdf = ps.from_pandas(pdf) kkey = ps.from_pandas(pkey) +# The behavior of GroupBy.expanding is changed from pandas 1.3. if LooseVersion(pd.__version__) >= LooseVersion("1.3"): -# TODO(SPARK-36367): Fix the behavior to follow pandas >= 1.3 -pass -else: self.assert_eq( getattr(psdf.groupby(kkey).expanding(2), f)().sort_index(), getattr(pdf.groupby(pkey).expanding(2), f)().sort_index(), ) +else: +self.assert_eq( +getattr(psdf.groupby(kkey).expanding(2), f)().sort_index(), +getattr(pdf.groupby(pkey).expanding(2), f)().drop("a", axis=1).sort_index(), +) self.assert_eq( getattr(psdf.groupby(kkey)["b"].expanding(2), f)().sort_index(), diff --git a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py index 4f97769..04ea448 100644 --- a/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py +++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py @@ -50,14 +50,17 @@ class OpsOnDiffFramesGroupByRollingTest(PandasOnSparkTestCase, TestUtils): psdf = ps.from_pandas(pdf) kkey = ps.from_pandas(pkey) +# The behavior of GroupBy.rolling is changed from pandas 1.3. if LooseVersion(pd.__version__) >= LooseVersion("1.3"): -# TODO(SPARK-36367): Fix the behavior to follow pandas >= 1.3 -pass -else: self.assert_eq( getattr(psdf.groupby(kkey).rolling(2), f)().sort_index(), getattr(pdf.groupby(pkey).rolling(2), f)().sort_index(), ) +else: +self.assert_eq( +getattr(psdf.groupby(kkey).rolling(2), f)().sort_index(), +getattr(pdf.groupby(pkey).rolling(2), f)().drop("a", axis=1).sort_index(), +) self.assert_eq( getattr(psdf.groupby(kkey)["b"].rolling(2), f)().sort_index(), - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #355: Use ASF mail archives not defunct nabble links
srowen closed pull request #355: URL: https://github.com/apache/spark-website/pull/355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Use ASF mail archives not defunct nabble links
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new dc9faff Use ASF mail archives not defunct nabble links dc9faff is described below commit dc9faff4a121070d58fc0f145d8f0a3521074fb3 Author: Sean Owen AuthorDate: Wed Aug 18 12:58:05 2021 -0500 Use ASF mail archives not defunct nabble links Nabble archive links appear to not work anymore. Use ASF pony mail links instead for archives. Author: Sean Owen Closes #355 from srowen/Nabble. --- community.md| 16 faq.md | 2 +- site/community.html | 16 site/faq.html | 2 +- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/community.md b/community.md index e8f2cf7..ebc438a 100644 --- a/community.md +++ b/community.md @@ -24,8 +24,8 @@ Some quick tips when using StackOverflow: - Search StackOverflow's https://stackoverflow.com/questions/tagged/apache-spark";>`apache-spark` tag to see if your question has already been answered - - Search the nabble archive for - http://apache-spark-user-list.1001560.n3.nabble.com/";>u...@spark.apache.org + - Search the ASF archive for + https://lists.apache.org/list.html?u...@spark.apache.org";>u...@spark.apache.org - Please follow the StackOverflow https://stackoverflow.com/help/how-to-ask";>code of conduct - Always use the `apache-spark` tag when asking questions - Please also use a secondary tag to specify components so subject matter experts can more easily find them. @@ -42,16 +42,16 @@ project, and scenarios, it is recommended you use the u...@spark.apache.org mail -http://apache-spark-user-list.1001560.n3.nabble.com";>u...@spark.apache.org is for usage questions, help, and announcements. +https://lists.apache.org/list.html?u...@spark.apache.org";>u...@spark.apache.org is for usage questions, help, and announcements. mailto:user-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)">(subscribe) mailto:user-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)">(unsubscribe) -http://apache-spark-user-list.1001560.n3.nabble.com";>(archives) +https://lists.apache.org/list.html?u...@spark.apache.org";>(archives) -http://apache-spark-developers-list.1001551.n3.nabble.com";>d...@spark.apache.org is for people who want to contribute code to Spark. +https://lists.apache.org/list.html?d...@spark.apache.org";>d...@spark.apache.org is for people who want to contribute code to Spark. mailto:dev-subscr...@spark.apache.org?subject=(send%20this%20email%20to%20subscribe)">(subscribe) mailto:dev-unsubscr...@spark.apache.org?subject=(send%20this%20email%20to%20unsubscribe)">(unsubscribe) -http://apache-spark-developers-list.1001551.n3.nabble.com";>(archives) +https://lists.apache.org/list.html?d...@spark.apache.org";>(archives) @@ -60,8 +60,8 @@ Some quick tips when using email: - Prior to asking submitting questions, please: - Search StackOverflow at https://stackoverflow.com/questions/tagged/apache-spark";>`apache-spark` to see if your question has already been answered - - Search the nabble archive for - http://apache-spark-user-list.1001560.n3.nabble.com/";>u...@spark.apache.org + - Search the ASF archive for + https://lists.apache.org/list.html?u...@spark.apache.org";>u...@spark.apache.org - Tagging the subject line of your email will help you get a faster response, e.g. `[Spark SQL]: Does Spark SQL support LEFT SEMI JOIN?` - Tags may help identify a topic by: diff --git a/faq.md b/faq.md index af57f26..0275c18 100644 --- a/faq.md +++ b/faq.md @@ -71,4 +71,4 @@ Please also refer to our Where can I get more help? -Please post on StackOverflow's https://stackoverflow.com/questions/tagged/apache-spark";>apache-spark tag or http://apache-spark-user-list.1001560.n3.nabble.com";>Spark Users mailing list. For more information, please refer to https://spark.apache.org/community.html#have-questions";>Have Questions?. We'll be glad to help! +Please post on StackOverflow's https://stackoverflow.com/questions/tagged/apache-spark";>apache-spark tag or https://lists.apache.org/list.html?u...@spark.apache.org";>Spark Users mailing list. For more information, please refer to https://spark.apache.org/community.html#have-questions";>Have Questions?. We'll be glad to help! diff --git a/site/community.html b/site/community.html index b779d37..f4e1fcf 100644 --- a/site/community.html +++ b/site/community.html @@ -219,8 +219,8 @@ as it is an active forum for Spark users’ questions and answers. Search StackOverflow’s https://stackoverflow.com/questions/tagged/apache-spark";>apache-spark tag to see if your question has already been answered -
[GitHub] [spark-website] zero323 edited a comment on pull request #355: Use ASF mail archives not defunct nabble links
zero323 edited a comment on pull request #355: URL: https://github.com/apache/spark-website/pull/355#issuecomment-901305574 I was looking into it and [the following](https://support.nabble.com/Downsizing-Nabble-td7609715.html): > Forum owners who want their forum preserved can post to this support forum to let us know to move their forum to that one server. Note that we no longer do mailing list archiving, so if you own an old mailing list archive, there is no point to preserving it. So it is seems like it definitely not going back in our case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] zero323 commented on pull request #355: Use ASF mail archives not defunct nabble links
zero323 commented on pull request #355: URL: https://github.com/apache/spark-website/pull/355#issuecomment-901305574 I was looking into it and [found this](https://support.nabble.com/Downsizing-Nabble-td7609715.html): > Forum owners who want their forum preserved can post to this support forum to let us know to move their forum to that one server. Note that we no longer do mailing list archiving, so if you own an old mailing list archive, there is no point to preserving it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen opened a new pull request #355: Use ASF mail archives not defunct nabble links
srowen opened a new pull request #355: URL: https://github.com/apache/spark-website/pull/355 Nabble archive links appear to not work anymore. Use ASF pony mail links instead for archives. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (707eefa -> 1859d9b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 707eefa [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp add 1859d9b [SPARK-36407][CORE][SQL] Convert int to long to avoid potential integer multiplications overflow risk No new revisions were added by this update. Summary of changes: .../shuffle/checksum/ShuffleChecksumHelper.java| 2 +- .../org/apache/spark/util/collection/TestTimSort.java | 18 +- .../spark/sql/execution/UnsafeKVExternalSorter.java| 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 3d69d0d [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp 3d69d0d is described below commit 3d69d0d0038a1065bb5d24430bf30da9a3463184 Author: gengjiaan AuthorDate: Wed Aug 18 22:57:06 2021 +0800 [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp ### What changes were proposed in this pull request? The implement of https://github.com/apache/spark/pull/33665 make `make_timestamp` could accepts integer type as the seconds parameter. This PR let `make_timestamp` accepts `decimal(16,6)` type as the seconds parameter and cast integer to `decimal(16,6)` is safe, so we can simplify the code. ### Why are the changes needed? Simplify `make_timestamp`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? New tests. Closes #33775 from beliefer/SPARK-36428-followup. Lead-authored-by: gengjiaan Co-authored-by: Jiaan Geng Signed-off-by: Gengliang Wang (cherry picked from commit 707eefa3c706561f904dad65f3e347028dafb6ea) Signed-off-by: Gengliang Wang --- .../catalyst/expressions/datetimeExpressions.scala | 33 - .../expressions/DateExpressionsSuite.scala | 43 +- .../test/resources/sql-tests/inputs/timestamp.sql | 3 ++ .../sql-tests/results/ansi/timestamp.sql.out | 28 +- .../sql-tests/results/datetime-legacy.sql.out | 26 - .../resources/sql-tests/results/timestamp.sql.out | 26 - .../results/timestampNTZ/timestamp-ansi.sql.out| 28 +- .../results/timestampNTZ/timestamp.sql.out | 26 - 8 files changed, 157 insertions(+), 56 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 84dfb41..0e74eff 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -2557,22 +2557,16 @@ case class MakeTimestamp( override def children: Seq[Expression] = Seq(year, month, day, hour, min, sec) ++ timezone // Accept `sec` as DecimalType to avoid loosing precision of microseconds while converting - // them to the fractional part of `sec`. + // them to the fractional part of `sec`. For accepts IntegerType as `sec` and integer can be + // casted into decimal safely, we use DecimalType(16, 6) which is wider than DecimalType(10, 0). override def inputTypes: Seq[AbstractDataType] = -Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, - TypeCollection(DecimalType(8, 6), IntegerType, NullType)) ++ timezone.map(_ => StringType) +Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, DecimalType(16, 6)) ++ + timezone.map(_ => StringType) override def nullable: Boolean = if (failOnError) children.exists(_.nullable) else true override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = copy(timeZoneId = Option(timeZoneId)) - private lazy val toDecimal = sec.dataType match { -case DecimalType() => - (secEval: Any) => secEval.asInstanceOf[Decimal] -case IntegerType => - (secEval: Any) => Decimal(BigDecimal(secEval.asInstanceOf[Int]), 8, 6) - } - private def toMicros( year: Int, month: Int, @@ -2585,8 +2579,6 @@ case class MakeTimestamp( assert(secAndMicros.scale == 6, s"Seconds fraction must have 6 digits for microseconds but got ${secAndMicros.scale}") val unscaledSecFrac = secAndMicros.toUnscaledLong - assert(secAndMicros.precision <= 8, -s"Seconds and fraction cannot have more than 8 digits but got ${secAndMicros.precision}") val totalMicros = unscaledSecFrac.toInt // 8 digits cannot overflow Int val seconds = Math.floorDiv(totalMicros, MICROS_PER_SECOND.toInt) val nanos = Math.floorMod(totalMicros, MICROS_PER_SECOND.toInt) * NANOS_PER_MICROS.toInt @@ -2627,7 +2619,7 @@ case class MakeTimestamp( day.asInstanceOf[Int], hour.asInstanceOf[Int], min.asInstanceOf[Int], - toDecimal(sec), + sec.asInstanceOf[Decimal], zid) } @@ -2635,7 +2627,6 @@ case class MakeTimestamp( val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) val d = Decimal.getClass.getName.stripSuffix("$") -val decimalValue = ctx.freshName("decimalValue") val fa
[spark] branch master updated: [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 707eefa [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp 707eefa is described below commit 707eefa3c706561f904dad65f3e347028dafb6ea Author: gengjiaan AuthorDate: Wed Aug 18 22:57:06 2021 +0800 [SPARK-36428][SQL][FOLLOWUP] Simplify the implementation of make_timestamp ### What changes were proposed in this pull request? The implement of https://github.com/apache/spark/pull/33665 make `make_timestamp` could accepts integer type as the seconds parameter. This PR let `make_timestamp` accepts `decimal(16,6)` type as the seconds parameter and cast integer to `decimal(16,6)` is safe, so we can simplify the code. ### Why are the changes needed? Simplify `make_timestamp`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? New tests. Closes #33775 from beliefer/SPARK-36428-followup. Lead-authored-by: gengjiaan Co-authored-by: Jiaan Geng Signed-off-by: Gengliang Wang --- .../catalyst/expressions/datetimeExpressions.scala | 33 - .../expressions/DateExpressionsSuite.scala | 43 +- .../test/resources/sql-tests/inputs/timestamp.sql | 3 ++ .../sql-tests/results/ansi/timestamp.sql.out | 28 +- .../sql-tests/results/datetime-legacy.sql.out | 26 - .../resources/sql-tests/results/timestamp.sql.out | 26 - .../results/timestampNTZ/timestamp-ansi.sql.out| 28 +- .../results/timestampNTZ/timestamp.sql.out | 26 - 8 files changed, 157 insertions(+), 56 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 84dfb41..0e74eff 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -2557,22 +2557,16 @@ case class MakeTimestamp( override def children: Seq[Expression] = Seq(year, month, day, hour, min, sec) ++ timezone // Accept `sec` as DecimalType to avoid loosing precision of microseconds while converting - // them to the fractional part of `sec`. + // them to the fractional part of `sec`. For accepts IntegerType as `sec` and integer can be + // casted into decimal safely, we use DecimalType(16, 6) which is wider than DecimalType(10, 0). override def inputTypes: Seq[AbstractDataType] = -Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, - TypeCollection(DecimalType(8, 6), IntegerType, NullType)) ++ timezone.map(_ => StringType) +Seq(IntegerType, IntegerType, IntegerType, IntegerType, IntegerType, DecimalType(16, 6)) ++ + timezone.map(_ => StringType) override def nullable: Boolean = if (failOnError) children.exists(_.nullable) else true override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = copy(timeZoneId = Option(timeZoneId)) - private lazy val toDecimal = sec.dataType match { -case DecimalType() => - (secEval: Any) => secEval.asInstanceOf[Decimal] -case IntegerType => - (secEval: Any) => Decimal(BigDecimal(secEval.asInstanceOf[Int]), 8, 6) - } - private def toMicros( year: Int, month: Int, @@ -2585,8 +2579,6 @@ case class MakeTimestamp( assert(secAndMicros.scale == 6, s"Seconds fraction must have 6 digits for microseconds but got ${secAndMicros.scale}") val unscaledSecFrac = secAndMicros.toUnscaledLong - assert(secAndMicros.precision <= 8, -s"Seconds and fraction cannot have more than 8 digits but got ${secAndMicros.precision}") val totalMicros = unscaledSecFrac.toInt // 8 digits cannot overflow Int val seconds = Math.floorDiv(totalMicros, MICROS_PER_SECOND.toInt) val nanos = Math.floorMod(totalMicros, MICROS_PER_SECOND.toInt) * NANOS_PER_MICROS.toInt @@ -2627,7 +2619,7 @@ case class MakeTimestamp( day.asInstanceOf[Int], hour.asInstanceOf[Int], min.asInstanceOf[Int], - toDecimal(sec), + sec.asInstanceOf[Decimal], zid) } @@ -2635,7 +2627,6 @@ case class MakeTimestamp( val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) val d = Decimal.getClass.getName.stripSuffix("$") -val decimalValue = ctx.freshName("decimalValue") val failOnErrorBranch = if (failOnError) "throw e;" else s"${ev.isNull} = true;" nullSafeCodeGen(ctx, ev, (year, mon
[spark] branch branch-3.2 updated: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 181d33e [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang 181d33e is described below commit 181d33e16edfb6fa5abde29de87634bdf4ce7e61 Author: yi.wu AuthorDate: Wed Aug 18 22:46:48 2021 +0800 [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang ### What changes were proposed in this pull request? Instead of exiting the executor within the RpcEnv's thread, exit the executor in a separate thread. ### Why are the changes needed? The current exit way in `onDisconnected` can cause the deadlock, which has the exact same root cause with https://github.com/apache/spark/pull/12012: * `onDisconnected` -> `System.exit` are called in sequence in the thread of `MessageLoop.threadpool` * `System.exit` triggers shutdown hooks and `executor.stop` is one of the hooks. * `executor.stop` stops the `Dispatcher`, which waits for the `MessageLoop.threadpool` to shutdown further. * Thus, the thread which runs `System.exit` waits for hooks to be done, but the `MessageLoop.threadpool` in the hook waits that thread to finish. Finally, this mutual dependence results in the deadlock. ### Does this PR introduce _any_ user-facing change? Yes, the executor shutdown won't hang. ### How was this patch tested? Pass existing tests. Closes #33759 from Ngone51/fix-executor-shutdown-hang. Authored-by: yi.wu Signed-off-by: Wenchen Fan (cherry picked from commit 996551fecee8c3549438c4f536f8ab9607c644c5) Signed-off-by: Wenchen Fan --- .../spark/executor/CoarseGrainedExecutorBackend.scala | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala index d18ffaa..ffcb30d 100644 --- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala +++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala @@ -202,11 +202,17 @@ private[spark] class CoarseGrainedExecutorBackend( stopping.set(true) new Thread("CoarseGrainedExecutorBackend-stop-executor") { override def run(): Unit = { - // executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally. - // However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to - // stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180). - // Therefore, we put this line in a new thread. - executor.stop() + // `executor` can be null if there's any error in `CoarseGrainedExecutorBackend.onStart` + // or fail to create `Executor`. + if (executor == null) { +System.exit(1) + } else { +// executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally. +// However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to +// stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180). +// Therefore, we put this line in a new thread. +executor.stop() + } } }.start() @@ -282,8 +288,7 @@ private[spark] class CoarseGrainedExecutorBackend( if (notifyDriver && driver.nonEmpty) { driver.get.send(RemoveExecutor(executorId, new ExecutorLossReason(reason))) } - - System.exit(code) + self.send(Shutdown) } else { logInfo("Skip exiting executor since it's been already asked to exit before.") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a1ecf83 -> 996551f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a1ecf83 [SPARK-36451][BUILD] Ivy skips looking for source and doc pom add 996551f [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang No new revisions were added by this update. Summary of changes: .../spark/executor/CoarseGrainedExecutorBackend.scala | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (281b00a -> a1ecf83)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 281b00a [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2 add a1ecf83 [SPARK-36451][BUILD] Ivy skips looking for source and doc pom No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 6 ++ 1 file changed, 6 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org