[spark] branch branch-3.3 updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new aa39b06462a [MINOR][TEST][SQL] Add a CTE subquery scope test case aa39b06462a is described below commit aa39b06462a98f37be59e239d12edd9f09a25b88 Author: Reynold Xin AuthorDate: Fri Dec 23 14:55:14 2022 -0800 [MINOR][TEST][SQL] Add a CTE subquery scope test case ### What changes were proposed in this pull request? I noticed we were missing a test case for this in SQL tests, so I added one. ### Why are the changes needed? To ensure we scope CTEs properly in subqueries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a test case change. Closes #39189 from rxin/cte_test. Authored-by: Reynold Xin Signed-off-by: Reynold Xin (cherry picked from commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b) Signed-off-by: Reynold Xin --- .../test/resources/sql-tests/inputs/cte-nested.sql | 10 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++ .../resources/sql-tests/results/cte-nested.sql.out | 28 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++ 4 files changed, 94 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index 5f12388b9cb..e5ef2443417 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -17,6 +17,16 @@ SELECT ( SELECT * FROM t ); +-- Make sure CTE in subquery is scoped to that subquery rather than global +-- the 2nd half of the union should fail because the cte is scoped to the first half +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte; + -- CTE in CTE definition shadows outer WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out index 264b64ffe96..ebdd64c3ac8 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out index 2c622de3f36..b6e1793f7d7 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out index 283f5a54a42..546ab7ecb95 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisE
[spark] branch master updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24edf8ecb5e [MINOR][TEST][SQL] Add a CTE subquery scope test case 24edf8ecb5e is described below commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b Author: Reynold Xin AuthorDate: Fri Dec 23 14:55:14 2022 -0800 [MINOR][TEST][SQL] Add a CTE subquery scope test case ### What changes were proposed in this pull request? I noticed we were missing a test case for this in SQL tests, so I added one. ### Why are the changes needed? To ensure we scope CTEs properly in subqueries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a test case change. Closes #39189 from rxin/cte_test. Authored-by: Reynold Xin Signed-off-by: Reynold Xin --- .../test/resources/sql-tests/inputs/cte-nested.sql | 10 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++ .../resources/sql-tests/results/cte-nested.sql.out | 28 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++ 4 files changed, 94 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index 5f12388b9cb..e5ef2443417 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -17,6 +17,16 @@ SELECT ( SELECT * FROM t ); +-- Make sure CTE in subquery is scoped to that subquery rather than global +-- the 2nd half of the union should fail because the cte is scoped to the first half +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte; + -- CTE in CTE definition shadows outer WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out index 013c5f27b50..65000471c75 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out index ed6d69b233e..2c67f2db56a 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out index 6a48e1bec43..154ebd20223 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +
svn commit: r46414 - /dev/spark/v3.1.1-rc3-bin/ /release/spark/spark-3.1.1/
Author: rxin Date: Tue Mar 2 11:00:12 2021 New Revision: 46414 Log: Moving Apache Spark 3.1.1 RC3 to Apache Spark 3.1.1 Added: release/spark/spark-3.1.1/ - copied from r46413, dev/spark/v3.1.1-rc3-bin/ Removed: dev/spark/v3.1.1-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46413 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/
Author: rxin Date: Tue Mar 2 10:55:39 2021 New Revision: 46413 Log: Recover 3.1.1 RC3 Added: dev/spark/v3.1.1-rc3-bin/ - copied from r46410, dev/spark/v3.1.1-rc3-bin/ dev/spark/v3.1.1-rc3-docs/ - copied from r46410, dev/spark/v3.1.1-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46411 - in /dev/spark: v3.1.1-rc3-bin/ v3.1.1-rc3-docs/
Author: rxin Date: Tue Mar 2 10:39:38 2021 New Revision: 46411 Log: Removing RC artifacts. Removed: dev/spark/v3.1.1-rc3-bin/ dev/spark/v3.1.1-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46412 - in /dev/spark: v3.1.0-rc1-bin/ v3.1.0-rc1-docs/
Author: rxin Date: Tue Mar 2 10:39:58 2021 New Revision: 46412 Log: Removing RC artifacts. Removed: dev/spark/v3.1.0-rc1-bin/ dev/spark/v3.1.0-rc1-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46410 - in /dev/spark: v3.1.1-rc2-bin/ v3.1.1-rc2-docs/
Author: rxin Date: Tue Mar 2 10:39:32 2021 New Revision: 46410 Log: Removing RC artifacts. Removed: dev/spark/v3.1.1-rc2-bin/ dev/spark/v3.1.1-rc2-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r46409 - in /dev/spark: v3.1.1-rc1-bin/ v3.1.1-rc1-docs/
Author: rxin Date: Tue Mar 2 10:39:25 2021 New Revision: 46409 Log: Removing RC artifacts. Removed: dev/spark/v3.1.1-rc1-bin/ dev/spark/v3.1.1-rc1-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r40088 - in /dev/spark: v3.0.0-rc1-bin/ v3.0.0-rc1-docs/ v3.0.0-rc2-bin/ v3.0.0-rc2-docs/ v3.0.0-rc3-docs/
Author: rxin Date: Thu Jun 18 16:41:27 2020 New Revision: 40088 Log: Removing RC artifacts. Removed: dev/spark/v3.0.0-rc1-bin/ dev/spark/v3.0.0-rc1-docs/ dev/spark/v3.0.0-rc2-bin/ dev/spark/v3.0.0-rc2-docs/ dev/spark/v3.0.0-rc3-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r40050 - /dev/spark/v3.0.0-rc3-bin/ /release/spark/spark-3.0.0/
Author: rxin Date: Tue Jun 16 09:18:02 2020 New Revision: 40050 Log: release 3.0.0 Added: release/spark/spark-3.0.0/ - copied from r40049, dev/spark/v3.0.0-rc3-bin/ Removed: dev/spark/v3.0.0-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v3.0.0 created (now 3fdfce3)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0 in repository https://gitbox.apache.org/repos/asf/spark.git. at 3fdfce3 (commit) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39960 - in /dev/spark/v3.0.0-rc3-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu
Author: rxin Date: Sat Jun 6 14:03:25 2020 New Revision: 39960 Log: Apache Spark v3.0.0-rc3 docs [This commit notification would consist of 1920 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39959 - /dev/spark/v3.0.0-rc3-bin/
Author: rxin Date: Sat Jun 6 13:35:40 2020 New Revision: 39959 Log: Apache Spark v3.0.0-rc3 Added: dev/spark/v3.0.0-rc3-bin/ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Sat Jun 6 13:35:40 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3gQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZjGIEACG3gsdARN8puRHS2YL+brOmjbrS4wVY/Av +l+ZR59moZ7QuwjYoixyqNnztIKgIyleYJq9DL5TqqMxFgGpuoDrnuWVqI+8MngVA +gau/QDmYINabZsJxFfDn1IjxxSQBsgf6pwfqQbB+fGSjLSPnDq+u3DIWr3fRMh4X +DrTuATNewKiiBIwQHUKAtPMAbsdDvXv0DRL7CGTiIJri43opAntQzHec3sP9hgRU +J5J2HnjOlamgv58S7zrUw/Wo1xPLmz2PGIsP0aq9DRRw0bLnesrtEaWAKFp2HL5E +QlbjfboaDQz/X+meruW57/sO/DDwA90/XvF44z4Gu6kbS8nRuTsU5wVfZ/1iyWZk +PLP2nFoWl7O85k/DLB5ADYgce3e6k2qD2obKxzsEx0nr0Wu13cxCR2+IBQmv05jb +4Kwi7iE0iKIxt3cESDH6j9GqZoTrcxt6Jb88KSQ+YM2TBNUr1ZZNmkjgYdmLvm7a +wH6vLtdpZzUKIGd6bt1grEwoQJBMnQjkoDYxhx+ugjbs8CwwxcdUNd2Q5xz0WaSn +p443ZlMR5lbGf6D6U4PUigaIrdD8d+ef/rRTDtXdoDqC+FdNuepyS9+2+dUZGErx +N2IMNunKIdKw57GZGcILey1hY45SSuQFw5JAe+nWqCAzCmFX72ulkv9The7rLdlE +YdLu6XQIBA== +=HhHH +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Sat Jun 6 13:35:40 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: 394DCFEB 4E202A8E 5C58BF94 A77548FD 79A00F92 34538535 + B0242E1B 96068E3E 80F78188 D71831F8 4A350224 41AA14B1 + D72AE704 F2390842 DBEAB41F 5AC9859A Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Sat Jun 6 13:35:40 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7bh3oQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZvhPD/9Vyrywk4kYFaUo36Kg6nickTvWMm+yDfGZ +tKrUo3dAoMta7OwUjT6W0Roo8a4BBgumaDv47Dm6mlquF2DuLuBrFsqFo8c5VNA/ +jT1tdSdHiTzjq7LfY9GQDn8Wkgp1gyIKON70XFdZifduW0gcFDkJ+FjhPYWcA6jy +GGOGK5qboCdi9C+KowUVj4VB9bbxPbWvW7FVF3+VlcrKvkmNx+EmqmIrqsh72w8O +EL70za2uBRUUiFcaOpY/wpmEN1raCAkMzQ+dPl7p1PFgmLFrMN9RaRXJ1stF+fXO +rDLBLNPqb85TvvOOHpcr4PSP38GrdZvDAvljCOEbBzacF719bewu/IVRcNi9lPZE +HDPUcZLgnocNIF6kafykrm3JhagzmPIhQ8d4DFTuH6ePxgWqdUa9lWKQL54z3mjU +LT2CJ8gMDY0Wz5zSKc/sI/ZwL+Q6U8xiIGYSzQgT9yPztbhDd5AM2DgohJkZSD4b +jOrEsSyNRJiwwRAHlbeOOVPb4UNYzsx1USPbPEBeXTt8X8VUb8jsU84o/RhXexk9 +EMJjxz/aChB+NefbmUjBZmXSaa/zYubprJrWnUgPw7hFxAnmtgIUdjSWSNIOJ6bp +EV1M6xwuvrmGhOa3D0C+lYyAuYZca2FQrcAtzNiL6iOMQ6USFZvzjxGWQiV2CDGQ +O8CNfkwOGA
svn commit: r39958 - /dev/spark/v3.0.0-rc3-bin/
Author: rxin Date: Sat Jun 6 11:18:32 2020 New Revision: 39958 Log: remove 3.0 rc3 binary Removed: dev/spark/v3.0.0-rc3-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (fa608b9 -> 3ea461d)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from fa608b9 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns add 3fdfce3 Preparing Spark release v3.0.0-rc3 new 3ea461d Preparing development version 3.0.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 3ea461d61e635835c07bacb5a0c403ae2a3099a0 Author: Reynold Xin AuthorDate: Sat Jun 6 02:57:41 2020 + Preparing development version 3.0.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 3bad429..21f3eaa 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.0 +Version: 3.0.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 0a52a00..8bef9d8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fa4fcb1f..fc1441d 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 14a1b7d..de2a6fb 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index e75a843..6c0c016 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 004af0a..b8df191 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a35156a..8119709 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --g
[spark] 01/01: Preparing Spark release v3.0.0-rc3
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to tag v3.0.0-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git commit 3fdfce3120f307147244e5eaf46d61419a723d50 Author: Reynold Xin AuthorDate: Sat Jun 6 02:57:35 2020 + Preparing Spark release v3.0.0-rc3 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 21f3eaa..3bad429 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.1 +Version: 3.0.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 8bef9d8..0a52a00 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fc1441d..fa4fcb1f 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index de2a6fb..14a1b7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 6c0c016..e75a843 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index b8df191..004af0a 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 8119709..a35156a 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/ta
[spark] tag v3.0.0-rc3 created (now 3fdfce3)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0-rc3 in repository https://gitbox.apache.org/repos/asf/spark.git. at 3fdfce3 (commit) This tag includes the following new commits: new 3fdfce3 Preparing Spark release v3.0.0-rc3 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39951 - /dev/spark/v3.0.0-rc3-bin/
Author: rxin Date: Fri Jun 5 19:08:09 2020 New Revision: 39951 Log: Apache Spark v3.0.0-rc3 Added: dev/spark/v3.0.0-rc3-bin/ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc3-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.asc Fri Jun 5 19:08:09 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4gQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZpBZD/9vSiD946kwdMWalYM01Zw2yjKK60eakhLY +jxHRy1T6Yipspyh2idCrzd2MaGJFqUwRZjs1mpA/mKZUGRSzYFjlWWoaSc/T19MD +3q/zg6glgoKquzxHcAqum/OCc1C1MJTcsMic2+LIelXRoJ2GPCeECq91JGX4xpD4 +09sDElvooqfMCLb05gaaF8Eyrpm+7WSyAEVpb1Fjpp/gtdG1YQyiW3o3WzNSJgeA +dewZaSoI58lx3Rfs1jZN1M4Gyj1aKh4Yqw21+CDoHAhtkeOp5oGPgrWef4fZAE4D +4xKoz1I/5C1s0wIZEhUI2IUJLeGyCR117QhIO/bQFR1XEOO22auQaPppGJKUa5bb +bwpx6TARNP13fe2R48G+yZ9Em0uC3P1CucGYCRlY22umzkbalrVFeZ77n/FWRB7E +nC29bso/R2VwmDRI6yWXiCPLMyQy/PukniWRJZiU7Ath1930cORAlqFC7EOBHgHu +k3AVX/3h2qZBFuYu/wIsd89rgeiwrf4fksiuMhp8YXJh3xCLLSl4uT+q3flutJ3H +nsOLYkuie/r4qx+M2J7rfezTzTeYr+SN8mn4CTsGRznHhb0amqlZE6yNFWVatr6D +LEYWe9L3DK92Kj0Jtl5QyPXQlKSoBQriketgZXKxzeBScKeFd6acGxOhM5LpZRCo +ngKbsgfcoQ== +=bwFz +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc3-bin/SparkR_3.0.0.tar.gz.sha512 Fri Jun 5 19:08:09 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: 37496F1A C5BD0DFF 0F6B08B9 05CB55B7 DAA6397A 8C377126 + C6887AEB CB05F172 0E4A9754 9ED4B6B4 68E9266A 6459229F + 48D58F7C 9C0A58B1 183CC6D0 A18ACE18 Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc3-bin/pyspark-3.0.0.tar.gz.asc Fri Jun 5 19:08:09 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7ag4kQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZlwHD/9tPwfyzwQkl6qkYp27AgZexy5k15gjJ/Bi +MWWwv3bMhJiRlZN3hCyGC0QTTkRG+AJTd3SflbUhHzw9ttFAnt3VqZ7RZBB4UBDI +5W85jUaF5bOMu7K4hW2iZdcLLLbq7/sXNNqRhomQStL4j6TerZjgP8IytCGEmLX4 +Qt894N7+MunZxbPXKkUqZfO0cWlxY53+zNGqXKJdwDhQUrrH0i+2fs3gd97OJs42 +83l+pE27C7+aTr6fSRWIS55nw9GzKrDOr0N47wtfCs0mqIW+dI+cVjZh8W/Gf9Dl +EifAsLIpahNRpQLu0PqiWrsJ3meertha4DLWRPS0esYyZAGFK+DjD9Zm1cOovA9v +ywjQVWCkmaqaozvm2RTKxwvS7kkBB2dJPUJJ8YeCBr0A7wHBAIeA0vvWe9q7u0KW +O78uGswTF4EKz85ZMhuo8IjdjKjzTumzdFws4akeTzv60t+439zFdyhUghfQ71om +biS1Fgopz1QLqCb3eaqhMBM0ZB4JVMTtMKb2/gqH/8qaQq91CEkLTpOOsRK+xdeg +A8XoFCWEsBbHzLT3Y3FKsHC7ipo2FYXCcn/n/67bRuFFBwhLZzOyEISH72nKIk4k +YOU5wZnsykG2oiV3ZysRlYewtU0mIIuUINrMVRZB69CUk9Q2fnDyuT02OEGIoNZC +LohvgOFbqQ
svn commit: r39657 - in /dev/spark/v3.0.0-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu
Author: rxin Date: Mon May 18 16:11:38 2020 New Revision: 39657 Log: Apache Spark v3.0.0-rc2 docs [This commit notification would consist of 1921 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r39656 - /dev/spark/v3.0.0-rc2-bin/
Author: rxin Date: Mon May 18 15:42:56 2020 New Revision: 39656 Log: Apache Spark v3.0.0-rc2 Added: dev/spark/v3.0.0-rc2-bin/ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc2-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.asc Mon May 18 15:42:56 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHgQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZllrEACaCgpeO1qK4uJLQC00J1iU2970iVn9Aqh/ +gZnikK7mBClXekg2Q8+poAhueXS1XfGoJfOCwTeOp8iMvD0BcLhIxftKBg7CxmOa +yKrtL/dehNyYMTWofxluZzolPR4O0DDNva2W6ExKPhrUAAOTPjPkMx9ty0C57IqO +Pwblsr6iI3BWrmRdN2Dpfo+enxJ1rd6H/0kYCmXEFgyW8lBbGiN23KrjkriZOJxo +6Ad8zFIEI+rSmmgvy6lkXdlJFduCmRFFZguRtWq48rYEY3pu6geIUetPMsosBnDW +mb5ywNMuqZomeEes1JoWp96E65K3HUO8LxPrP3wJY9TfUGduAAwwBX8nGsa0r+mz +JJq2f4zwvINM2eQGXIfcpg21K3ijqdkqylAKuBGiil5QcHABGQIQ6N1M+1ruKjKp +zHeXh6tac2IM3dvpyh12mC7ZhKPBAC1sUZD8qzvB6sjaHgvv3uSUc2xTW7kzs8l2 +mwNT8SmCscR6+PAm29dY6CoRtVtDEygt+oOMhRkturaDQ9vtYgduKo+p6PiqffUE +7SUKwk7a3Cqe46uxHabHdi+6NedFuX7/bPSAX51Q4MpeHC8l4HpgHDPodtfRcEQm +VDSeLBfhs3WHi+OrqZ2et/EYaGFxiZTTi2PfpeMBPmC4d4k+yymZEenJcXVps7+G +fFFeOvCfyQ== +=2zdl +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc2-bin/SparkR_3.0.0.tar.gz.sha512 Mon May 18 15:42:56 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: B50B8062 8C2158C5 5931EB47 275FB32D 52EFF715 F3B39524 + 29C03A21 583459D5 32EC2135 D27AB970 0F345B7A 620E4281 + 950CC383 58231D1D BB08817C 4EDC6A05 Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc2-bin/pyspark-3.0.0.tar.gz.asc Mon May 18 15:42:56 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl7CmHoQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9Zn7ED/9Ujdr6jmTAFbtJtJiaDCevVGDhoND+9wca +4MEaUYecgrYWSx12YBZe+d4nIbTuVWK6X29C76E/wbwREWFqG1fA17P7ZpBh8x3W +xHSfzyYAP6G63I6IC+7jiHkOIOYBScGKj9h6z5j39eqt05HGAv088YEeTMpAC32B +GbACEglWGgrE3JsrKXf77hIU8AizcE6rhS5OapqWdxFoqTHbxgjg3uJjsxVKsMXG +wchOtedVfcDZihoqrPoO+pwjP8LIt+iv53luaUJowosC8K62OcjL1ay9Gw4a8KMQ +9pEr9HgjAj9abel0q+ic4reLcCh+bjFSBzXR8/uJHjmSsWHNlwyXJq5Ymff7T2xJ +s75vYuHI9bcOqqb2X1r5TY6v34p13PzKuzL7Y5la1ZCPo0nXjCne5NcSTxu9sQY5 +jl9BsVwWONGSZHsNlW6dy3XeXRaAFAPDCHJvqEsP8cgxMd9ryLG2niITVBGrs3jV +Q3ylNTsM5G7/As6PR5hYYmTqCBBXJWizJmENMJq0zXinNe83ycWmKikACUXtBDlO +qfRr3op3DAxdcNWbfCG7l9Ifoyr6w7HYDHEA6mMSsZ0MSSaiWcnhBc4ul5P4JUN8 +1p9/4o2WV6lfT2c6VmCfx4W4d5w3pgEVRHakvGzXE59datTZs1AQREG9G87jEd7R +wv/RT1q+dA
[spark] branch branch-3.0 updated (740da34 -> f6053b9)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 740da34 [SPARK-31738][SQL][DOCS] Describe 'L' and 'M' month pattern letters add 29853ec Preparing Spark release v3.0.0-rc2 new f6053b9 Preparing development version 3.0.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.0.0-rc2
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to tag v3.0.0-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git commit 29853eca69bceefd227cbe8421a09c116b7b753a Author: Reynold Xin AuthorDate: Mon May 18 13:21:37 2020 + Preparing Spark release v3.0.0-rc2 --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 21f3eaa..3bad429 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.1 +Version: 3.0.0 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 8bef9d8..0a52a00 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fc1441d..fa4fcb1f 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index de2a6fb..14a1b7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 6c0c016..e75a843 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index b8df191..004af0a 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 8119709..a35156a 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.1-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/ta
[spark] tag v3.0.0-rc2 created (now 29853ec)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git. at 29853ec (commit) This tag includes the following new commits: new 29853ec Preparing Spark release v3.0.0-rc2 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit f6053b94f874c62856baa7bfa35df14c78bebc9f Author: Reynold Xin AuthorDate: Mon May 18 13:21:43 2020 + Preparing development version 3.0.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index 3bad429..21f3eaa 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.0 +Version: 3.0.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 0a52a00..8bef9d8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fa4fcb1f..fc1441d 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 14a1b7d..de2a6fb 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index e75a843..6c0c016 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 004af0a..b8df191 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a35156a..8119709 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --g
svn commit: r38759 - in /dev/spark/v3.0.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu
Author: rxin Date: Tue Mar 31 13:45:27 2020 New Revision: 38759 Log: Apache Spark v3.0.0-rc1 docs [This commit notification would consist of 1911 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r38754 - /dev/spark/v3.0.0-rc1-bin/
Author: rxin Date: Tue Mar 31 09:57:10 2020 New Revision: 38754 Log: Apache Spark v3.0.0-rc1 Added: dev/spark/v3.0.0-rc1-bin/ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0sQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZtCiD/9GtNXfxGR9oh2B4k+fg38uCrloGUYo3Dx9 +eJU6G55fbKtXK24dKlxZQCVDpwLihycnLULcV+/D75vWa4tSoG6n/FTHimCnUJWQ +UkEsxqhWuGi25rUx4VsOQeHPYIP9/2pVGVyanFzRp+yAyldATGG36u3Xv5lqox6b +6pARVwC6FZWKuk1b47xbRfYKUoNTkObhGjcKKyigexqx/nZOp99NP+sVlEqRD/l/ +B7l3kgAVq3XlZKUCkMhWgAHT6rPNkvwBdYZFce9gJHuG75Zw5rQ2hHesEqDOVlC1 +kqJPtpmb2U93ItBF6ArlmXcm+60rLa++B8cyrEsKLIyYxRpHH1bQmLB9TTzDeFpz +e+WWlUiDpC1Lorzvg+44MeOXSj9EhNgqsYypGKhlh6WTN8A+BRzvJRMpDMLElRz6 +lHaceqn9NC4eE5tzcyXAFL+8Y644nCTIZQuND72LvIv7rO0YXq/6yeudM+SDeANU +vscR4LiQ7/a3oSpxoIuA0MjKz6gWUaYFgsb8OuUC4VQPJKQZG+57SOazq1VTlB6/ +Ur8pePIUxU52EmzmIp08ws8v+NOo9pMxw7lyBwpmGX0/ax6p9v1xVcCeXqH4HYvA +9d7a7hZy9yoguAGsVkibSym8e6XITCDoXLb9/HPEhfdyxFgi87DVjKZ84HkyFw9/ +OzHhumSp/Q== +=zl/N +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Tue Mar 31 09:57:10 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: C2D9C0A5 E71C5B56 48AC15AA 998ABD06 2FDB4D5C D2B7C344 + B1949A7B 28508364 A9A45767 F2642F17 7EBFF4B0 55823EBD + BE76A2CE 5604660F 62D1654D 8271287B Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Tue Mar 31 09:57:10 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6C/0wQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZkfTD/4zQ5FuCr+giluZHaBnaZy7PAtSkoTjAWKX +8zObXESsoTlIIjHEpBUmUU6O0tZODFOF7Zau9HkftroGurYxpTWE5nX0e//71JuC +smBWLCgAeOlNEdeZUd2zm7pPWJfwRpsOcEfexb+RvaFQriw559Erxb5NoWHFIkg/ +tsjtjitMqLxcMlzZW7A/89zqmrnzBu1vhh/q8STzA0Ub6Jq+JzD4e6yatYAzjRj3 ++Um7+NL+g/2tmweH8f9TtYzQFcowm6DdXi53fWZX55oVc1xBRTNuSnAdCJlkgEPg +nUxEcuXUvHn/NbNNHPBwP6xMKyKqJu8+4vNLzr2ZxaxArPYF2FqTl8sFNxwVBM1Y +PnKun7iZiLq5JqC2OopiDa8FJP0JQkYVyBWAx3BOscsAELfdlZHlPdekcLE6YHHV +pde79YJ0tzUFIdH/Ulw4Jag4Ixunrg+ajmLS8n9ncpX0I81Zv8IJDaBf0cBboFw8 +kTqAvNkcsoGdRn1OiQnlE2IUib/R0fk7MktOyoZpfKzbCzxBZgLTO4FKTbRCydQX +I8UhuRhELHCI7YXJHwbk0Swp6+h36dUQtLxFfD/OZdDQABOK+nEVjNsBIHb7ULDB +pCckj8HBHwaynvNLogS1KJHThW8LEXAmVQFCD39XTNMnhfCUePyzlAC4RPByIFR4 +yD6VQ7bJDA
svn commit: r38753 - /dev/spark/v3.0.0-rc1-bin/
Author: rxin Date: Tue Mar 31 07:25:15 2020 New Revision: 38753 Log: retry Removed: dev/spark/v3.0.0-rc1-bin/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r38740 - /dev/spark/v3.0.0-rc1-bin/
Author: rxin Date: Mon Mar 30 16:00:46 2020 New Revision: 38740 Log: Apache Spark v3.0.0-rc1 Added: dev/spark/v3.0.0-rc1-bin/ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz (with props) dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7-hive1.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz (with props) dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.asc dev/spark/v3.0.0-rc1-bin/spark-3.0.0.tgz.sha512 Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPMQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9Zr8LD/9WOO4mDufkmhhXk78zWAyhRjJpG0Kjuvla +KEnx8MK4MUtr77cQsmVLgj+FXFwmUvtZTZXHJX704Jk6xAAFXzii4EwIfk46wka0 +CY0arEleHJ6MBohLbOVW3sp86LduQBBd+dmBbIh7spJjd054RRqsAe8sVx0uqezD +y4Fv+LM0B7kQhHdhsYymVClAwgwKOwecdks0l9PonE9YwyJixMEOZwxxk4aaRNwR +VUH6X4mHlpWiQ+zHWTAmE7aOvjOwxQqciqtmgzLLRlDjuTtz160XLthUneoOVoDw +spphs7pMpj8r4T9BZQCeIiuRvE5VeT6037Uz03X56xhzEvna9+0/frHR/Vb88gW8 +U5YJio4p8h286vLwb0X48K7lyfd60VM0kyfh31xl1ZppdAFXhV9qA7435wn6R4NU +1zi/oXnHOgAWW037C+QFXpPnKzCY3BpmLw3uAGMgYRA+2NqrAT2HE8vmnlxJkrBS +JT3OlJCCkIw2yitPN5zZaWZLpbvT07wFEH8KFoh7Wgs4FBl1mDeyGT53RhbSHjy1 ++i85E6g9366CZNoD3bSUlPlY9iOtP4QK4Qp+VOn1j13Bu3BE9Fpuprani1ESsGME +16qzwf5It3TVWK9czXqa8HBJvlrjaEInloWThmSysYFweKIRT+8CEu9+KyakTKVL +fnGKXfbXzQ== +=0ZBt +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 == --- dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-rc1-bin/SparkR_3.0.0.tar.gz.sha512 Mon Mar 30 16:00:46 2020 @@ -0,0 +1,3 @@ +SparkR_3.0.0.tar.gz: A4828C8D BA3BA1AA 116EEA62 D7028B85 85FF87AE 8AE9F0B5 + 421F1A3E E5E04F19 F1D4F0A6 144CEF29 8D690FC8 D9836830 + 4518FF9E 96004114 1083326B 84B5C0EC Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc == --- dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc (added) +++ dev/spark/v3.0.0-rc1-bin/pyspark-3.0.0.tar.gz.asc Mon Mar 30 16:00:46 2020 @@ -0,0 +1,17 @@ +-BEGIN PGP SIGNATURE- + +iQJEBAABCgAuFiEESovaSObiEqc0YyUC3qlj4uk0fWYFAl6CCPUQHHJ4aW5AYXBh +Y2hlLm9yZwAKCRDeqWPi6TR9ZmRGD/9UkePDo4IawkYALJoaqpwnjp1Md3RP5dbK +l/x1VLfHzAkbYQo+tKe692koHo45tE0izt+99humvZT7SjP4sVPHuR16Ik0gE6h0 +Yn8CG4Qsof30Se9feg6EllACBDEvueGlcchHN+aPyYJoLjajAzfH/5P6fC9rHe5Z +d3aYd93cqYtIKbDtQ6fxnI387wTmWkVKAXWNB7K5iEB8KFjzCjGeyac5JbnYBC6G +Y9uWcxqQ+3XV2SIfDQuxFuj421RBx2IIu56qJLgVEzcs8yLh4APM29DfYv7YcRGg +ILex3j8SWjgqG1rdDhc2U/SeakR/rErJ+oebxD9dTC19wMTnp37cgS0HgtWLHaU2 +RvxaMdAvF3GjN2LFhSRht/uZV350O3EI+L6ye9WauXzaK4iD7Mi5x7BIBN1csNWn +MW0B+goqTpzvC78h5R2ETCw1xmAarjKmdLKf3AUuqGeobv/7+4sLuwq+PSyrTgUi +BHPIgkYYk+EhHryB6wLkKYRXWKKmMyGCl+5HLYPuY4GyZm4rwc2et8v1pX3RvcCF +NoOcg/TZgn6+Tz0OjUm4TARs9RkbJEhKk1EWKCFvPalhenLbHHOvDJJPoqp3LNVT +/HQ1f1JRWqXWfc/O1BR9CRFNbZTxKorPxMXIEYn583lufZyvWiyAnYKD6ev0UAdB +/iwwQeeM/Q
[spark] 01/01: Preparing development version 3.0.1-SNAPSHOT
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit fc5079841907443369af98b17c20f1ac24b3727d Author: Reynold Xin AuthorDate: Mon Mar 30 08:42:27 2020 + Preparing development version 3.0.1-SNAPSHOT --- R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index c8cb1c3..3eff30b 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -1,6 +1,6 @@ Package: SparkR Type: Package -Version: 3.0.0 +Version: 3.0.1 Title: R Front End for 'Apache Spark' Description: Provides an R Front end for 'Apache Spark' <https://spark.apache.org>. Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), diff --git a/assembly/pom.xml b/assembly/pom.xml index 0a52a00..8bef9d8 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index fa4fcb1f..fc1441d 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 14a1b7d..de2a6fb 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index e75a843..6c0c016 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 004af0a..b8df191 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index a35156a..8119709 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0 +3.0.1-SNAPSHOT ../../pom.xml diff --g
[spark] branch branch-3.0 updated (5687b31 -> fc50798)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 5687b31 [SPARK-30532] DataFrameStatFunctions to work with TABLE.COLUMN syntax add 6550d0d Preparing Spark release v3.0.0-rc1 new fc50798 Preparing development version 3.0.1-SNAPSHOT The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: R/pkg/DESCRIPTION | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 39 files changed, 40 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] tag v3.0.0-rc1 created (now 6550d0d)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to tag v3.0.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. at 6550d0d (commit) This tag includes the following new commits: new 6550d0d Preparing Spark release v3.0.0-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Preparing Spark release v3.0.0-rc1
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to tag v3.0.0-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git commit 6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1 Author: Reynold Xin AuthorDate: Mon Mar 30 08:42:10 2020 + Preparing Spark release v3.0.0-rc1 --- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 2 +- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- python/pyspark/version.py | 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 38 files changed, 38 insertions(+), 38 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 193ad3d..0a52a00 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../pom.xml diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml index a1c8a8e..fa4fcb1f 100644 --- a/common/kvstore/pom.xml +++ b/common/kvstore/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 163c250..14a1b7d 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index a6d9981..e75a843 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 76a402b..004af0a 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 3c3c0d2..a35156a 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/tags/pom.xml b/common/tags/pom.xml index 883b73a..dedc7df 100644 --- a/common/tags/pom.xml +++ b/common/tags/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml index 93a4f67..ebb0525 100644 --- a/common/unsafe/pom.xml +++ b/common/unsafe/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.12 -3.0.0-SNAPSHOT +3.0.0 ../../pom.xml diff --git
svn commit: r38725 - /dev/spark/KEYS
Author: rxin Date: Mon Mar 30 07:26:00 2020 New Revision: 38725 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Mon Mar 30 07:26:00 2020 @@ -1167,3 +1167,61 @@ rMA+YcuC9o2K7dKjVv3KinQ2Tiv4TVxyTjcyZurg 0TbepIdiQlc= =wdlY -END PGP PUBLIC KEY BLOCK- + +pub rsa4096 2020-03-30 [SC] + 4A8BDA48E6E212A734632502DEA963E2E9347D66 +uid [ultimate] Reynold Xin (CODE SIGNING KEY) +sub rsa4096 2020-03-30 [E] + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBF6BkJkBEACmRKcV6c575E6jOyZBwLteV7hJsETNYx9jMkENiyeyTFJ3A8Hg ++gPAmoU6jvzugR98qgVSH0uj/HZH1zEkJx049+OHwBcZ48mGJakIaKcg3k1CPRTL +VDRWg7M4P7nQisMHsPHrdGPJFVBE7Mn6pafuRZ46gtnXf2Ec1EsvMBOYjRNt6nSg +GvoQdiv5SjUuwxfrw7CICj1agxwLarBcWpIF6PMU7yG+XjTIrSM63KuuV+fOZvKM +AdjwwUNNj2aOkprPHfmFIgSnEMsxvoJQNqYTaWzwT8WAyW1qTd0LhYYDTnb4J+j2 +BxgG5ASHYpsLQ1Moy+lYsTxWsoZMvqTqv/h+Mlb8fiUTiYppeMnLzxtI/t8Trvt8 +rXNGSkNd8dM5uqJ9Ba2MS6UB6EZUd5e7aPy8z5ThlhygRjLk0527O4BYAWlZw5F8 +egq/X0liCeRHoFUsyNnuQYSqo2spdTIV2ExKo/hEF1FgbXF6s1v/TcfzS0PkSYEH +5yhKYoEkYOXIneIjUasy8xM9O2578NsVu1GH0n+E29KDA0w+QKwpbjgb9VWKCjk1 +CPvK7oi3DKA4A28w/h5jI9Xzb343L0gb+IhdgL5lNWp2HoSy+y7Smnbz6IchjAP7 +zCtQ9ZJCLdXgCtDlXUeF+TXzEfKUYwa0jnha/fArM3PVGvQlWdpVhe/oLQARAQAB +tDBSZXlub2xkIFhpbiAoQ09ERSBTSUdOSU5HIEtFWSkgPHJ4aW5AYXBhY2hlLm9y +Zz6JAk4EEwEIADgWIQRKi9pI5uISpzRjJQLeqWPi6TR9ZgUCXoGQmQIbAwULCQgH +AgYVCgkICwIEFgIDAQIeAQIXgAAKCRDeqWPi6TR9ZrBJEACW92VdruNL+dYYH0Cu +9oxZx0thCE1twc/6rvgvIj//0kZ4ZA6RoDId8vSmKSkB0GwMT7daIoeIvRTiEdMQ +Wai7zqvNEdT1qdNn7MfN1rveN1tBNVndzbZ8S8Nz4sqZ/8R3wG90c2XLwno3joXA +FhFRfVa+TWI1Ux84/ZXuzD14f54dorVo0CT51CnU67ERBAijl7UugPM3Fs7ApU/o +SWCMq7ScPde81jmgMqBDLcj/hueCOTU5m8irOGGY439qEF+H41I+IB60yzAS4Gez +xZl55Mv7ZKdwWtCcwtUYIm4R8NNu4alTxUpxw4ttRW3Kzue78TOIMTWTwRKrP5t2 +yq9bMT1fSO7h/Ntn8dXUL0EM/h+6k5py5Kr0+mrV/s0Z530Fit6AC/ReWV6hSGdk +F1Z1ECa4AoUHqtoQKL+CNgO2qlJn/sKj3g10NiSwqUdUuxCSOpsY72udRLG9tfkB +OwW3lTKLp66gYYE3nYaHzJKGdRs7aJ8RRALMQkadsyqpdVMp+Yvbj/3Hn3uB3jTt +S+RolH545toeuhXaiIWlm2434oHW6QjzpPwaNp5AiWm+vMfPkhhCX6WT0jv9nEtM +kJJVgwlWNKYEW9nLaIRMWWONSy9aJapZfLW0XDiKidibPHqNFih9z49eDVLobi5e +mzmOFkKFxs9D4sg9oVmId6Y9SbkCDQRegZCZARAA5ZMv1ki5mKJVpASRGfTHVH5o +9HixwJOinkHjSK3zFpuvh0bs+rKZL2+TUXci9Em64xXuYbiGH3YgH061H9tgAMaN +iSIFGPlbBPbduJjdiUALqauOjjCIoWJLyuAC25zSGCeAwzQiRXN6VJUYwjQnDMDG +8iUyL+IdXjq2T6vFVZGR/uVteRqqvEcg9km6IrFmXefqfry4hZ5a7SbmThCHqGxx +5Oy+VkWw1IP7fHIUdC9ie45X6n08yC2BfWI4+RBny8906pSXEN/ag0Yw7vWkiyuK +wZsoe0pRczV8mx6QF2+oJjRMtziKYW72jKE9a/DXXzQ3Luq5gyZeq0cluYNGHVdj +ijA2ORNLloAfGjVGRKVznUFN8LMkcxm4jiiHKRkZEcjgm+1tRzGPufFidyhQIYO2 +YCOpnPQh5IXznb3RZ0JqJcXdne+7Nge85URTEMmMyx5kXvD03ZmUObshDL12YoM3 +bGzObo6jYg+h38Xlx9+9QAwGkf+gApIPI8KqPAVyP6s60AR4iR6iehEOciz7h6/b +T9bKMw0w9cvyJzY1IJsy2sQYFwNyHYWQkyDciRAmIwriHhBDfXdBodF95V3uGbIp +DZw3jVxcgJWKZ3y65N1aCguEI1fyy9JU12++GMBa+wuv9kdhSoj2qgInFB1VXGC7 +bBlRnHB44tsFTBEqqOcAEQEAAYkCNgQYAQgAIBYhBEqL2kjm4hKnNGMlAt6pY+Lp +NH1mBQJegZCZAhsMAAoJEN6pY+LpNH1mwIYQAIRqbhEjL6uMxM19OMPDydbhiWoI +8BmoqzsvRNF9VidjPRicYJ5JL5FFvvTyT6g87L8aRhiAdX/la92PdJ9DTS3sfIKF +pIcUDFybKgk4pmGWl0fNIwEjHewf6HlndCFmVuPe32V/ZkCwb58dro15xzxblckB +kgsqb0Xbfz/3Iwlqr5eTKH5iPrDFcYKy1ODcFmXS+udMm5uwn+d/RNmj8B3kgwrw +brs53264qdWbfsxGPC1ZkDNNSRyIy6wGvc/diRm4TSV/Lmd5OoDX4UkPJ++JhGoO +cYKxc2KzrEZxzMgJ3xFRs3zeymOwtgXUU1GBCuD7uxr1vacFwUV+9ymTeyUdTxB3 ++/DzxYOJGQL/3IXlyQ2azoCWUpCjW0MFM1OolragOFJeQ+V0xrlOiXXAFfHo0KPG +y0QdK810Ok+XYR6U9Y7yb6tYDgi+w9r46XjurdiZnUxxLUpFG++tSgBQ5X4y2UGw +C4n0T8/jn6KIUZ0kx51ZZ6CEChjBt+AU+HCnw2sZfgq8Nlos95tw2MT6kn8BrY68 +n297ev/1T6B0OasQaw3Itw29+T+FdzdU4c6XW/rC6VAlBikWIS5zCT//vAeBacxL +HYoqwKL52HzG121lfWXhx5vNF4bg/fKrFEOy2Wp1fMG6nRcuUUROvieD6ZU4ZrLA +NjpTIP+lOkfxRwUi +=rggH +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch test-branch deleted (was 0f8b07e)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch test-branch in repository https://gitbox.apache.org/repos/asf/spark.git. was 0f8b07e test This change permanently discards the following revisions: discard 0f8b07e test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch test-branch created (now 0f8b07e)
This is an automated email from the ASF dual-hosted git repository. rxin pushed a change to branch test-branch in repository https://gitbox.apache.org/repos/asf/spark.git. at 0f8b07e test This branch includes the following new commits: new 0f8b07e test The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: test
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch test-branch in repository https://gitbox.apache.org/repos/asf/spark.git commit 0f8b07e5034af2819b75b53aadffda82ae0c31b8 Author: Reynold Xin AuthorDate: Fri Feb 1 13:28:18 2019 -0800 test --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 271f2f5..2c1e02a 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ For general development tips, including info on developing Spark using an IDE, s The easiest way to start using Spark is through the Scala shell: -./bin/spark-shell +./bin/spark-shella Try the following command, which should return 1000: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-26142] followup: Move sql shuffle read metrics relatives to SQLShuffleMetricsReporter
Repository: spark Updated Branches: refs/heads/master 9fdc7a840 -> cb368f2c2 [SPARK-26142] followup: Move sql shuffle read metrics relatives to SQLShuffleMetricsReporter ## What changes were proposed in this pull request? Follow up for https://github.com/apache/spark/pull/23128, move sql read metrics relatives to `SQLShuffleMetricsReporter`, in order to put sql shuffle read metrics relatives closer and avoid possible problem about forgetting update SQLShuffleMetricsReporter while new metrics added by others. ## How was this patch tested? Existing tests. Closes #23175 from xuanyuanking/SPARK-26142-follow. Authored-by: Yuanjian Li Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb368f2c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb368f2c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb368f2c Branch: refs/heads/master Commit: cb368f2c2964797d7313d3a4151e2352ff7847a9 Parents: 9fdc7a8 Author: Yuanjian Li Authored: Thu Nov 29 12:09:30 2018 -0800 Committer: Reynold Xin Committed: Thu Nov 29 12:09:30 2018 -0800 -- .../exchange/ShuffleExchangeExec.scala | 4 +- .../org/apache/spark/sql/execution/limit.scala | 6 +-- .../spark/sql/execution/metric/SQLMetrics.scala | 20 .../metric/SQLShuffleMetricsReporter.scala | 50 .../execution/UnsafeRowSerializerSuite.scala| 4 +- 5 files changed, 47 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala index 8938d93..c9ca395 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala @@ -30,7 +30,7 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, BoundReference, Uns import org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.execution._ -import org.apache.spark.sql.execution.metric.SQLMetrics +import org.apache.spark.sql.execution.metric.{SQLMetrics, SQLShuffleMetricsReporter} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.StructType import org.apache.spark.util.MutablePair @@ -49,7 +49,7 @@ case class ShuffleExchangeExec( override lazy val metrics = Map( "dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size") - ) ++ SQLMetrics.getShuffleReadMetrics(sparkContext) + ) ++ SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) override def nodeName: String = { val extraInfo = coordinator match { http://git-wip-us.apache.org/repos/asf/spark/blob/cb368f2c/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala index ea845da..e9ab7cd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe import org.apache.spark.sql.catalyst.plans.physical._ import org.apache.spark.sql.catalyst.util.truncatedString import org.apache.spark.sql.execution.exchange.ShuffleExchangeExec -import org.apache.spark.sql.execution.metric.SQLMetrics +import org.apache.spark.sql.execution.metric.SQLShuffleMetricsReporter /** * Take the first `limit` elements and collect them to a single partition. @@ -38,7 +38,7 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLMetrics.getShuffleReadMetrics(sparkContext) + override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) protected override def doExecute(): RDD[InternalRow] = { val locallyLimited = child.execute().mapPartitionsInternal(_.take(limit)) val shuffled = new ShuffledRowRDD( @@ -154,7 +154,7 @@ case class TakeOrderedAndProjectExec(
spark git commit: [SPARK-26141] Enable custom metrics implementation in shuffle write
Repository: spark Updated Branches: refs/heads/master 85383d29e -> 6a064ba8f [SPARK-26141] Enable custom metrics implementation in shuffle write ## What changes were proposed in this pull request? This is the write side counterpart to https://github.com/apache/spark/pull/23105 ## How was this patch tested? No behavior change expected, as it is a straightforward refactoring. Updated all existing test cases. Closes #23106 from rxin/SPARK-26141. Authored-by: Reynold Xin Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a064ba8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a064ba8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a064ba8 Branch: refs/heads/master Commit: 6a064ba8f271d5f9d04acd41d0eea50a5b0f5018 Parents: 85383d2 Author: Reynold Xin Authored: Mon Nov 26 22:35:52 2018 -0800 Committer: Reynold Xin Committed: Mon Nov 26 22:35:52 2018 -0800 -- .../sort/BypassMergeSortShuffleWriter.java| 11 +-- .../spark/shuffle/sort/ShuffleExternalSorter.java | 18 -- .../spark/shuffle/sort/UnsafeShuffleWriter.java | 9 + .../spark/storage/TimeTrackingOutputStream.java | 7 --- .../spark/executor/ShuffleWriteMetrics.scala | 13 +++-- .../apache/spark/scheduler/ShuffleMapTask.scala | 3 ++- .../org/apache/spark/shuffle/ShuffleManager.scala | 6 +- .../spark/shuffle/sort/SortShuffleManager.scala | 10 ++ .../org/apache/spark/storage/BlockManager.scala | 7 +++ .../spark/storage/DiskBlockObjectWriter.scala | 4 ++-- .../spark/util/collection/ExternalSorter.scala| 4 ++-- .../shuffle/sort/UnsafeShuffleWriterSuite.java| 6 -- .../scala/org/apache/spark/ShuffleSuite.scala | 12 .../sort/BypassMergeSortShuffleWriterSuite.scala | 16 project/MimaExcludes.scala| 7 ++- 15 files changed, 79 insertions(+), 54 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6a064ba8/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java -- diff --git a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java index b020a6d..fda33cd 100644 --- a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java +++ b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java @@ -37,12 +37,11 @@ import org.slf4j.LoggerFactory; import org.apache.spark.Partitioner; import org.apache.spark.ShuffleDependency; import org.apache.spark.SparkConf; -import org.apache.spark.TaskContext; -import org.apache.spark.executor.ShuffleWriteMetrics; import org.apache.spark.scheduler.MapStatus; import org.apache.spark.scheduler.MapStatus$; import org.apache.spark.serializer.Serializer; import org.apache.spark.serializer.SerializerInstance; +import org.apache.spark.shuffle.ShuffleWriteMetricsReporter; import org.apache.spark.shuffle.IndexShuffleBlockResolver; import org.apache.spark.shuffle.ShuffleWriter; import org.apache.spark.storage.*; @@ -79,7 +78,7 @@ final class BypassMergeSortShuffleWriter extends ShuffleWriter { private final int numPartitions; private final BlockManager blockManager; private final Partitioner partitioner; - private final ShuffleWriteMetrics writeMetrics; + private final ShuffleWriteMetricsReporter writeMetrics; private final int shuffleId; private final int mapId; private final Serializer serializer; @@ -103,8 +102,8 @@ final class BypassMergeSortShuffleWriter extends ShuffleWriter { IndexShuffleBlockResolver shuffleBlockResolver, BypassMergeSortShuffleHandle handle, int mapId, - TaskContext taskContext, - SparkConf conf) { + SparkConf conf, + ShuffleWriteMetricsReporter writeMetrics) { // Use getSizeAsKb (not bytes) to maintain backwards compatibility if no units are provided this.fileBufferSize = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024; this.transferToEnabled = conf.getBoolean("spark.file.transferTo", true); @@ -114,7 +113,7 @@ final class BypassMergeSortShuffleWriter extends ShuffleWriter { this.shuffleId = dep.shuffleId(); this.partitioner = dep.partitioner(); this.numPartitions = partitioner.numPartitions(); -this.writeMetrics = taskContext.taskMetrics().shuffleWriteMetrics(); +this.writeMetrics = writeMetrics; this.serializer = dep.serializer(); this.shuffleBlockResolver = shuffleBlockResolver; } http://git-wip-us.apache.org/repos/asf/spark/blob/6a064
spark git commit: [SPARK-26129][SQL] Instrumentation for per-query planning time
Repository: spark Updated Branches: refs/heads/master 6bbdf34ba -> 07a700b37 [SPARK-26129][SQL] Instrumentation for per-query planning time ## What changes were proposed in this pull request? We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases. ## How was this patch tested? Added unit tests and end-to-end integration tests. Closes #23096 from rxin/SPARK-26129. Authored-by: Reynold Xin Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07a700b3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07a700b3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07a700b3 Branch: refs/heads/master Commit: 07a700b3711057553dfbb7b047216565726509c7 Parents: 6bbdf34 Author: Reynold Xin Authored: Wed Nov 21 16:41:12 2018 +0100 Committer: Reynold Xin Committed: Wed Nov 21 16:41:12 2018 +0100 -- .../sql/catalyst/QueryPlanningTracker.scala | 127 +++ .../spark/sql/catalyst/analysis/Analyzer.scala | 22 ++-- .../spark/sql/catalyst/rules/RuleExecutor.scala | 19 ++- .../catalyst/QueryPlanningTrackerSuite.scala| 78 .../sql/catalyst/analysis/AnalysisTest.scala| 3 +- .../ResolveGroupingAnalyticsSuite.scala | 3 +- .../analysis/ResolvedUuidExpressionsSuite.scala | 10 +- .../scala/org/apache/spark/sql/Dataset.scala| 9 ++ .../org/apache/spark/sql/SparkSession.scala | 6 +- .../spark/sql/execution/QueryExecution.scala| 21 ++- .../QueryPlanningTrackerEndToEndSuite.scala | 52 .../apache/spark/sql/hive/test/TestHive.scala | 16 ++- 12 files changed, 338 insertions(+), 28 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/07a700b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala new file mode 100644 index 000..420f2a1 --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst + +import scala.collection.JavaConverters._ + +import org.apache.spark.util.BoundedPriorityQueue + + +/** + * A simple utility for tracking runtime and associated stats in query planning. + * + * There are two separate concepts we track: + * + * 1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis, + * optimizationm and physical planning (just planning). + * + * 2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also + * track the number of invocations and effective invocations. + */ +object QueryPlanningTracker { + + // Define a list of common phases here. + val PARSING = "parsing" + val ANALYSIS = "analysis" + val OPTIMIZATION = "optimization" + val PLANNING = "planning" + + class RuleSummary( +var totalTimeNs: Long, var numInvocations: Long, var numEffectiveInvocations: Long) { + +def this() = this(totalTimeNs = 0, numInvocations = 0, numEffectiveInvocations = 0) + +override def toString: String = { + s"RuleSummary($totalTimeNs, $numInvocations, $numEffectiveInvocations)" +} + } + + /** + * A thread local variable to implicitly pass the tracker around. This assumes the query planner + * is single-threaded, and avoids passing the same tracker context in every function call. + */ + private val localTracker = new ThreadLocal[QueryPlanningTracker]() { +override def initialValue: QueryPlanningTracker = null + } + + /** Returns the current tra
spark-website git commit: Use Heilmeier Catechism for SPIP template.
Repository: spark-website Updated Branches: refs/heads/asf-site e4b87718d -> 005a2a0d1 Use Heilmeier Catechism for SPIP template. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/005a2a0d Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/005a2a0d Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/005a2a0d Branch: refs/heads/asf-site Commit: 005a2a0d1d88c893518d98cddcb7d373a562b339 Parents: e4b8771 Author: Reynold Xin Authored: Wed Oct 24 11:51:43 2018 -0700 Committer: Reynold Xin Committed: Thu Oct 25 11:25:30 2018 -0700 -- improvement-proposals.md| 34 ++ site/improvement-proposals.html | 32 2 files changed, 42 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark-website/blob/005a2a0d/improvement-proposals.md -- diff --git a/improvement-proposals.md b/improvement-proposals.md index 8fab696..55d57d9 100644 --- a/improvement-proposals.md +++ b/improvement-proposals.md @@ -11,7 +11,7 @@ navigation: The purpose of an SPIP is to inform and involve the user community in major improvements to the Spark codebase throughout the development process, to increase the likelihood that user needs are met. -SPIPs should be used for significant user-facing or cross-cutting changes, not small incremental improvements. When in doubt, if a committer thinks a change needs an SPIP, it does. +SPIPs should be used for significant user-facing or cross-cutting changes, not small incremental improvements. When in doubt, if a committer thinks a change needs an SPIP, it does. What is a SPIP? @@ -48,30 +48,40 @@ Any community member can help by discussing whether an SPIP is SPIP Process Proposing an SPIP -Anyone may propose an SPIP, using the template below. Please only submit an SPIP if you are willing to help, at least with discussion. +Anyone may propose an SPIP, using the document template below. Please only submit an SPIP if you are willing to help, at least with discussion. After a SPIP is created, the author should email mailto:d...@spark.apache.org;>d...@spark.apache.org to notify the community of the SPIP, and discussions should ensue on the JIRA ticket. If an SPIP is too small or incremental and should have been done through the normal JIRA process, a committer should remove the SPIP label. -Template for an SPIP +SPIP Document Template - -Background and Motivation: What problem is this solving? +A SPIP document is a short document with a few questions, inspired by the Heilmeier Catechism: -Target Personas: Examples include data scientists, data engineers, library developers, devops. A single SPIP can have multiple target personas. +Q1. What are you trying to do? Articulate your objectives using absolutely no jargon. -Goals: What must this allow users to do, that they can't currently? +Q2. What problem is this proposal NOT designed to solve? -Non-Goals: What problem is this proposal not designed to solve? +Q3. How is it done today, and what are the limits of current practice? -Proposed API Changes: Optional section defining APIs changes, if any. Backward and forward compatibility must be taken into account. +Q4. What is new in your approach and why do you think it will be successful? + +Q5. Who cares? If you are successful, what difference will it make? + +Q6. What are the risks? + +Q7. How long will it take? + +Q8. What are the mid-term and final âexamsâ to check for success? + +Appendix A. Proposed API Changes. Optional section defining APIs changes, if any. Backward and forward compatibility must be taken into account. + +Appendix B. Optional Design Sketch: How are the goals going to be accomplished? Give sufficient technical detail to allow a contributor to judge whether it's likely to be feasible. Note that this is not a full design document. + +Appendix C. Optional Rejected Designs: What alternatives were considered? Why were they rejected? If no alternatives have been considered, the problem needs more thought. -Optional Design Sketch: How are the goals going to be accomplished? Give sufficient technical detail to allow a contributor to judge whether it's likely to be feasible. This is not a full design document. -Optional Rejected Designs: What alternatives were considered? Why were they rejected? If no alternatives have been considered, the problem needs more thought. - Discussing an SPIP http://git-wip-us.apache.org/repos/asf/spark-website/blob/005a2a0d/site/improvement-proposals.html -- diff --git a/site/improvement-proposals.html
spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled
Repository: spark Updated Branches: refs/heads/branch-2.4 99ae693b3 -> 535bf1cc9 [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled ## What changes were proposed in this pull request? This patch changes the config option `spark.sql.streaming.noDataMicroBatchesEnabled` to `spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with rest of the configs. Unfortunately there is one streaming config called `spark.sql.streaming.metricsEnabled`. For that one we should just use a fallback config and change it in a separate patch. ## How was this patch tested? Made sure no other references to this config are in the code base: ``` > git grep "noDataMicro" sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled") ``` Closes #22476 from rxin/SPARK-24157. Authored-by: Reynold Xin Signed-off-by: Reynold Xin (cherry picked from commit 936c920347e196381b48bc3656ca81a06f2ff46d) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/535bf1cc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/535bf1cc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/535bf1cc Branch: refs/heads/branch-2.4 Commit: 535bf1cc9e6b54df7059ac3109b8cba30057d040 Parents: 99ae693 Author: Reynold Xin Authored: Wed Sep 19 18:51:20 2018 -0700 Committer: Reynold Xin Committed: Wed Sep 19 18:51:31 2018 -0700 -- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/535bf1cc/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 3e9cde4..8b82fe1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1056,7 +1056,7 @@ object SQLConf { .createWithDefault(1L) val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED = -buildConf("spark.sql.streaming.noDataMicroBatchesEnabled") +buildConf("spark.sql.streaming.noDataMicroBatches.enabled") .doc( "Whether streaming micro-batch engine will execute batches without data " + "for eager state management for stateful streaming queries.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled
Repository: spark Updated Branches: refs/heads/master 90e3955f3 -> 936c92034 [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicroBatches.enabled ## What changes were proposed in this pull request? This patch changes the config option `spark.sql.streaming.noDataMicroBatchesEnabled` to `spark.sql.streaming.noDataMicroBatches.enabled` to be more consistent with rest of the configs. Unfortunately there is one streaming config called `spark.sql.streaming.metricsEnabled`. For that one we should just use a fallback config and change it in a separate patch. ## How was this patch tested? Made sure no other references to this config are in the code base: ``` > git grep "noDataMicro" sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: buildConf("spark.sql.streaming.noDataMicroBatches.enabled") ``` Closes #22476 from rxin/SPARK-24157. Authored-by: Reynold Xin Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/936c9203 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/936c9203 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/936c9203 Branch: refs/heads/master Commit: 936c920347e196381b48bc3656ca81a06f2ff46d Parents: 90e3955 Author: Reynold Xin Authored: Wed Sep 19 18:51:20 2018 -0700 Committer: Reynold Xin Committed: Wed Sep 19 18:51:20 2018 -0700 -- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/936c9203/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index b1e9b17..c3328a6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -1076,7 +1076,7 @@ object SQLConf { .createWithDefault(1L) val STREAMING_NO_DATA_MICRO_BATCHES_ENABLED = -buildConf("spark.sql.streaming.noDataMicroBatchesEnabled") +buildConf("spark.sql.streaming.noDataMicroBatches.enabled") .doc( "Whether streaming micro-batch engine will execute batches without data " + "for eager state management for stateful streaming queries.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: add one supported type missing from the javadoc
Repository: spark Updated Branches: refs/heads/master e4fee395e -> c7c0b086a add one supported type missing from the javadoc ## What changes were proposed in this pull request? The supported java.math.BigInteger type is not mentioned in the javadoc of Encoders.bean() ## How was this patch tested? only Javadoc fix Please review http://spark.apache.org/contributing.html before opening a pull request. Author: James Yu Closes #21544 from yuj/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c7c0b086 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c7c0b086 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c7c0b086 Branch: refs/heads/master Commit: c7c0b086a0b18424725433ade840d5121ac2b86e Parents: e4fee39 Author: James Yu Authored: Fri Jun 15 21:04:04 2018 -0700 Committer: Reynold Xin Committed: Fri Jun 15 21:04:04 2018 -0700 -- sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c7c0b086/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala index 0b95a88..b47ec0b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala @@ -132,7 +132,7 @@ object Encoders { * - primitive types: boolean, int, double, etc. * - boxed types: Boolean, Integer, Double, etc. * - String - * - java.math.BigDecimal + * - java.math.BigDecimal, java.math.BigInteger * - time related: java.sql.Date, java.sql.Timestamp * - collection types: only array and java.util.List currently, map support is in progress * - nested java bean. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[1/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.
Repository: spark-website Updated Branches: refs/heads/asf-site 91b561749 -> 658467248 http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/strata-exercises-now-available-online.html -- diff --git a/site/news/strata-exercises-now-available-online.html b/site/news/strata-exercises-now-available-online.html index 916f242..4f250a3 100644 --- a/site/news/strata-exercises-now-available-online.html +++ b/site/news/strata-exercises-now-available-online.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-2014.html -- diff --git a/site/news/submit-talks-to-spark-summit-2014.html b/site/news/submit-talks-to-spark-summit-2014.html index 4f43c23..18f2642 100644 --- a/site/news/submit-talks-to-spark-summit-2014.html +++ b/site/news/submit-talks-to-spark-summit-2014.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-2016.html -- diff --git a/site/news/submit-talks-to-spark-summit-2016.html b/site/news/submit-talks-to-spark-summit-2016.html index 3163bab..3766932 100644 --- a/site/news/submit-talks-to-spark-summit-2016.html +++ b/site/news/submit-talks-to-spark-summit-2016.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-east-2016.html -- diff --git a/site/news/submit-talks-to-spark-summit-east-2016.html b/site/news/submit-talks-to-spark-summit-east-2016.html index 1984db7..b4a51a7 100644 --- a/site/news/submit-talks-to-spark-summit-east-2016.html +++ b/site/news/submit-talks-to-spark-summit-east-2016.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/submit-talks-to-spark-summit-eu-2016.html -- diff --git a/site/news/submit-talks-to-spark-summit-eu-2016.html b/site/news/submit-talks-to-spark-summit-eu-2016.html index 8e33a17..940bc6f 100644 --- a/site/news/submit-talks-to-spark-summit-eu-2016.html +++ b/site/news/submit-talks-to-spark-summit-eu-2016.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/two-weeks-to-spark-summit-2014.html -- diff --git a/site/news/two-weeks-to-spark-summit-2014.html b/site/news/two-weeks-to-spark-summit-2014.html index 3863298..d4e993a 100644 --- a/site/news/two-weeks-to-spark-summit-2014.html +++ b/site/news/two-weeks-to-spark-summit-2014.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/news/video-from-first-spark-development-meetup.html -- diff --git a/site/news/video-from-first-spark-development-meetup.html b/site/news/video-from-first-spark-development-meetup.html index 2be7f50..04151a8 100644 --- a/site/news/video-from-first-spark-development-meetup.html +++ b/site/news/video-from-first-spark-development-meetup.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/powered-by.html -- diff --git a/site/powered-by.html b/site/powered-by.html index 3449782..b303df0 100644 --- a/site/powered-by.html +++ b/site/powered-by.html @@ -66,7 +66,7 @@ - Lightning-fast cluster computing + Lightning-fast unified analytics engine http://git-wip-us.apache.org/repos/asf/spark-website/blob/65846724/site/release-process.html -- diff --git a/site/release-process.html b/site/release-process.html index
[2/2] spark-website git commit: Update text/wording to more "modern" Spark and more consistent.
Update text/wording to more "modern" Spark and more consistent. 1. Use DataFrame examples. 2. Reduce explicit comparison with MapReduce, since the topic does not really come up. 3. More focus on analytics rather than "cluster compute". 4. Update committer affiliation. 5. Make it more clear Spark runs in diverse environments (especially on MLlib page). There are a lot that needs to be done that I don't have time today, e.g. refer to Structured Streaming. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/65846724 Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/65846724 Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/65846724 Branch: refs/heads/asf-site Commit: 658467248b278b109bc3d2594b0ef08ff0c727cb Parents: 91b5617 Author: Reynold XinAuthored: Thu Apr 12 12:56:05 2018 -0700 Committer: Reynold Xin Committed: Thu Apr 12 12:56:05 2018 -0700 -- _layouts/global.html| 2 +- committers.md | 22 +- index.md| 34 +-- mllib/index.md | 18 +- site/committers.html| 24 +- site/community.html | 2 +- site/contributing.html | 2 +- site/developer-tools.html | 2 +- site/documentation.html | 2 +- site/downloads.html | 2 +- site/examples.html | 2 +- site/faq.html | 2 +- site/history.html | 2 +- site/improvement-proposals.html | 2 +- site/index.html | 36 +-- site/mailing-lists.html | 4 +- site/mllib/index.html | 18 +- site/news/amp-camp-2013-registration-ope.html | 2 +- .../news/announcing-the-first-spark-summit.html | 2 +- .../news/fourth-spark-screencast-published.html | 2 +- site/news/index.html| 2 +- site/news/nsdi-paper.html | 2 +- site/news/one-month-to-spark-summit-2015.html | 2 +- .../proposals-open-for-spark-summit-east.html | 2 +- ...registration-open-for-spark-summit-east.html | 2 +- .../news/run-spark-and-shark-on-amazon-emr.html | 2 +- site/news/spark-0-6-1-and-0-5-2-released.html | 2 +- site/news/spark-0-6-2-released.html | 2 +- site/news/spark-0-7-0-released.html | 2 +- site/news/spark-0-7-2-released.html | 2 +- site/news/spark-0-7-3-released.html | 2 +- site/news/spark-0-8-0-released.html | 2 +- site/news/spark-0-8-1-released.html | 2 +- site/news/spark-0-9-0-released.html | 2 +- site/news/spark-0-9-1-released.html | 2 +- site/news/spark-0-9-2-released.html | 2 +- site/news/spark-1-0-0-released.html | 2 +- site/news/spark-1-0-1-released.html | 2 +- site/news/spark-1-0-2-released.html | 2 +- site/news/spark-1-1-0-released.html | 2 +- site/news/spark-1-1-1-released.html | 2 +- site/news/spark-1-2-0-released.html | 2 +- site/news/spark-1-2-1-released.html | 2 +- site/news/spark-1-2-2-released.html | 2 +- site/news/spark-1-3-0-released.html | 2 +- site/news/spark-1-4-0-released.html | 2 +- site/news/spark-1-4-1-released.html | 2 +- site/news/spark-1-5-0-released.html | 2 +- site/news/spark-1-5-1-released.html | 2 +- site/news/spark-1-5-2-released.html | 2 +- site/news/spark-1-6-0-released.html | 2 +- site/news/spark-1-6-1-released.html | 2 +- site/news/spark-1-6-2-released.html | 2 +- site/news/spark-1-6-3-released.html | 2 +- site/news/spark-2-0-0-released.html | 2 +- site/news/spark-2-0-1-released.html | 2 +- site/news/spark-2-0-2-released.html | 2 +- site/news/spark-2-1-0-released.html | 2 +- site/news/spark-2-1-1-released.html | 2 +- site/news/spark-2-1-2-released.html | 2 +- site/news/spark-2-2-0-released.html | 2 +- site/news/spark-2-2-1-released.html | 2 +- site/news/spark-2-3-0-released.html | 2 +- site/news/spark-2.0.0-preview.html | 2 +- .../spark-accepted-into-apache-incubator.html | 2 +- site/news/spark-and-shark-in-the-news.html | 2 +- site/news/spark-becomes-tlp.html| 2 +-
[2/2] spark-website git commit: Squashed commit of the following:
Squashed commit of the following: commit 8e2dd71cf5613be6f019bb76b46226771422a40e Merge: 8bd24fb6d 01f0b4e0c Author: Reynold XinDate: Fri Mar 16 10:24:54 2018 -0700 Merge pull request #104 from mateiz/history Add a project history page commit 01f0b4e0c1fe77781850cf994058980664201bce Author: Matei Zaharia Date: Wed Mar 14 23:29:01 2018 -0700 Add a project history page Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/a1d84bcb Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/a1d84bcb Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/a1d84bcb Branch: refs/heads/asf-site Commit: a1d84bcbf53099be51c39914528bea3f4e2735a0 Parents: 8bd24fb Author: Reynold Xin Authored: Fri Mar 16 10:26:14 2018 -0700 Committer: Reynold Xin Committed: Fri Mar 16 10:26:14 2018 -0700 -- _layouts/global.html| 1 + community.md| 24 +- history.md | 29 +++ index.md| 16 +- site/committers.html| 1 + site/community.html | 24 +- site/contributing.html | 1 + site/developer-tools.html | 1 + site/documentation.html | 1 + site/downloads.html | 1 + site/examples.html | 1 + site/faq.html | 1 + site/graphx/index.html | 1 + site/history.html | 235 +++ site/improvement-proposals.html | 1 + site/index.html | 17 +- site/mailing-lists.html | 1 + site/mllib/index.html | 1 + site/news/amp-camp-2013-registration-ope.html | 1 + .../news/announcing-the-first-spark-summit.html | 1 + .../news/fourth-spark-screencast-published.html | 1 + site/news/index.html| 1 + site/news/nsdi-paper.html | 1 + site/news/one-month-to-spark-summit-2015.html | 1 + .../proposals-open-for-spark-summit-east.html | 1 + ...registration-open-for-spark-summit-east.html | 1 + .../news/run-spark-and-shark-on-amazon-emr.html | 1 + site/news/spark-0-6-1-and-0-5-2-released.html | 1 + site/news/spark-0-6-2-released.html | 1 + site/news/spark-0-7-0-released.html | 1 + site/news/spark-0-7-2-released.html | 1 + site/news/spark-0-7-3-released.html | 1 + site/news/spark-0-8-0-released.html | 1 + site/news/spark-0-8-1-released.html | 1 + site/news/spark-0-9-0-released.html | 1 + site/news/spark-0-9-1-released.html | 1 + site/news/spark-0-9-2-released.html | 1 + site/news/spark-1-0-0-released.html | 1 + site/news/spark-1-0-1-released.html | 1 + site/news/spark-1-0-2-released.html | 1 + site/news/spark-1-1-0-released.html | 1 + site/news/spark-1-1-1-released.html | 1 + site/news/spark-1-2-0-released.html | 1 + site/news/spark-1-2-1-released.html | 1 + site/news/spark-1-2-2-released.html | 1 + site/news/spark-1-3-0-released.html | 1 + site/news/spark-1-4-0-released.html | 1 + site/news/spark-1-4-1-released.html | 1 + site/news/spark-1-5-0-released.html | 1 + site/news/spark-1-5-1-released.html | 1 + site/news/spark-1-5-2-released.html | 1 + site/news/spark-1-6-0-released.html | 1 + site/news/spark-1-6-1-released.html | 1 + site/news/spark-1-6-2-released.html | 1 + site/news/spark-1-6-3-released.html | 1 + site/news/spark-2-0-0-released.html | 1 + site/news/spark-2-0-1-released.html | 1 + site/news/spark-2-0-2-released.html | 1 + site/news/spark-2-1-0-released.html | 1 + site/news/spark-2-1-1-released.html | 1 + site/news/spark-2-1-2-released.html | 1 + site/news/spark-2-2-0-released.html | 1 + site/news/spark-2-2-1-released.html | 1 + site/news/spark-2-3-0-released.html | 1 + site/news/spark-2.0.0-preview.html | 1 + .../spark-accepted-into-apache-incubator.html | 1 + site/news/spark-and-shark-in-the-news.html | 1 + site/news/spark-becomes-tlp.html| 1 +
[1/2] spark-website git commit: Squashed commit of the following:
Repository: spark-website Updated Branches: refs/heads/asf-site 8bd24fb6d -> a1d84bcbf http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2016-agenda-posted.html -- diff --git a/site/news/spark-summit-june-2016-agenda-posted.html b/site/news/spark-summit-june-2016-agenda-posted.html index ce68829..7947354 100644 --- a/site/news/spark-summit-june-2016-agenda-posted.html +++ b/site/news/spark-summit-june-2016-agenda-posted.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2017-agenda-posted.html -- diff --git a/site/news/spark-summit-june-2017-agenda-posted.html b/site/news/spark-summit-june-2017-agenda-posted.html index 5d2df4b..e4055c3 100644 --- a/site/news/spark-summit-june-2017-agenda-posted.html +++ b/site/news/spark-summit-june-2017-agenda-posted.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-summit-june-2018-agenda-posted.html -- diff --git a/site/news/spark-summit-june-2018-agenda-posted.html b/site/news/spark-summit-june-2018-agenda-posted.html index 17c284f..9b2f739 100644 --- a/site/news/spark-summit-june-2018-agenda-posted.html +++ b/site/news/spark-summit-june-2018-agenda-posted.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-tips-from-quantifind.html -- diff --git a/site/news/spark-tips-from-quantifind.html b/site/news/spark-tips-from-quantifind.html index bfbac1d..00c71c2 100644 --- a/site/news/spark-tips-from-quantifind.html +++ b/site/news/spark-tips-from-quantifind.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-user-survey-and-powered-by-page.html -- diff --git a/site/news/spark-user-survey-and-powered-by-page.html b/site/news/spark-user-survey-and-powered-by-page.html index 67935a9..c015e5c 100644 --- a/site/news/spark-user-survey-and-powered-by-page.html +++ b/site/news/spark-user-survey-and-powered-by-page.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-version-0-6-0-released.html -- diff --git a/site/news/spark-version-0-6-0-released.html b/site/news/spark-version-0-6-0-released.html index 3f670d7..d9120b0 100644 --- a/site/news/spark-version-0-6-0-released.html +++ b/site/news/spark-version-0-6-0-released.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-wins-cloudsort-100tb-benchmark.html -- diff --git a/site/news/spark-wins-cloudsort-100tb-benchmark.html b/site/news/spark-wins-cloudsort-100tb-benchmark.html index b498034..8bef605 100644 --- a/site/news/spark-wins-cloudsort-100tb-benchmark.html +++ b/site/news/spark-wins-cloudsort-100tb-benchmark.html @@ -123,6 +123,7 @@ https://issues.apache.org/jira/browse/SPARK;>Issue Tracker Powered By Project Committers + Project History http://git-wip-us.apache.org/repos/asf/spark-website/blob/a1d84bcb/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html -- diff --git a/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html b/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html index 18646f4..32f53e9 100644 ---
spark git commit: [SPARK-22648][K8S] Spark on Kubernetes - Documentation
Repository: spark Updated Branches: refs/heads/master 7beb375bf -> 7ab165b70 [SPARK-22648][K8S] Spark on Kubernetes - Documentation What changes were proposed in this pull request? This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by https://github.com/apache/spark/pull/19717 and https://github.com/apache/spark/pull/19468 which have merged already. How was this patch tested? The script has been in use for releases on our fork. Rest is documentation. cc rxin mateiz (shepherd) k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko reviewers: vanzin felixcheung jiangxb1987 mridulm TODO: - [x] Add dockerfiles directory to built distribution. (https://github.com/apache/spark/pull/20007) - [x] Change references to docker to instead say "container" (https://github.com/apache/spark/pull/19995) - [x] Update configuration table. - [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032) Author: foxish <ramanath...@google.com> Closes #19946 from foxish/update-k8s-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7ab165b7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7ab165b7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7ab165b7 Branch: refs/heads/master Commit: 7ab165b7061d9acc26523227076056e94354d204 Parents: 7beb375 Author: foxish <ramanath...@google.com> Authored: Thu Dec 21 17:21:11 2017 -0800 Committer: Reynold Xin <r...@databricks.com> Committed: Thu Dec 21 17:21:11 2017 -0800 -- docs/_layouts/global.html| 1 + docs/building-spark.md | 6 +- docs/cluster-overview.md | 7 +- docs/configuration.md| 2 + docs/img/k8s-cluster-mode.png| Bin 0 -> 55538 bytes docs/index.md| 3 +- docs/running-on-kubernetes.md| 578 ++ docs/running-on-yarn.md | 4 +- docs/submitting-applications.md | 16 + sbin/build-push-docker-images.sh | 68 10 files changed, 677 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7ab165b7/docs/_layouts/global.html -- diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 67b05ec..e5af5ae 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -99,6 +99,7 @@ Spark Standalone Mesos YARN +Kubernetes http://git-wip-us.apache.org/repos/asf/spark/blob/7ab165b7/docs/building-spark.md -- diff --git a/docs/building-spark.md b/docs/building-spark.md index 98f7df1..c391255 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured with Maven profile settings and so on like the direct Maven build. Example: -./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn +./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes This will build Spark distribution along with Python pip and R packages. For more information on usage, run `./dev/make-distribution.sh --help` @@ -90,6 +90,10 @@ like ZooKeeper and Hadoop itself. ## Building with Mesos support ./build/mvn -Pmesos -DskipTests clean package + +## Building with Kubernetes support + +./build/mvn -Pkubernetes -DskipTests clean package ## Building with Kafka 0.8 support http://git-wip-us.apache.org/repos/asf/spark/blob/7ab165b7/docs/cluster-overview.md -- diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md index c42bb4b..658e67f 100644 --- a/docs/cluster-overview.md +++ b/docs/cluster-overview.md @@ -52,11 +52,8 @@ The system currently supports three cluster managers: * [Apache Mesos](running-on-mesos.html) -- a general cluster manager that can also run Hadoop MapReduce and service applications. * [Hadoop YARN](running-on-yarn.html) -- the resource manager in Hadoop 2. -* [Kubernetes (experimental)](https://github.com/apac
[1/2] spark git commit: [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend
Repository: spark Updated Branches: refs/heads/master 475a29f11 -> e9b2070ab http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala -- diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala new file mode 100644 index 000..3febb2f --- /dev/null +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{ExecutorService, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.api.model.{DoneablePod, Pod, PodBuilder, PodList} +import io.fabric8.kubernetes.client.{KubernetesClient, Watch, Watcher} +import io.fabric8.kubernetes.client.Watcher.Action +import io.fabric8.kubernetes.client.dsl.{FilterWatchListDeletable, MixedOperation, NonNamespaceOperation, PodResource} +import org.mockito.{AdditionalAnswers, ArgumentCaptor, Mock, MockitoAnnotations} +import org.mockito.Matchers.{any, eq => mockitoEq} +import org.mockito.Mockito.{doNothing, never, times, verify, when} +import org.scalatest.BeforeAndAfter +import org.scalatest.mockito.MockitoSugar._ +import scala.collection.JavaConverters._ +import scala.concurrent.Future + +import org.apache.spark.{SparkConf, SparkContext, SparkFunSuite} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.rpc._ +import org.apache.spark.scheduler.{ExecutorExited, LiveListenerBus, SlaveLost, TaskSchedulerImpl} +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages.{RegisterExecutor, RemoveExecutor} +import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend +import org.apache.spark.util.ThreadUtils + +class KubernetesClusterSchedulerBackendSuite extends SparkFunSuite with BeforeAndAfter { + + private val APP_ID = "test-spark-app" + private val DRIVER_POD_NAME = "spark-driver-pod" + private val NAMESPACE = "test-namespace" + private val SPARK_DRIVER_HOST = "localhost" + private val SPARK_DRIVER_PORT = 7077 + private val POD_ALLOCATION_INTERVAL = 60L + private val DRIVER_URL = RpcEndpointAddress( +SPARK_DRIVER_HOST, SPARK_DRIVER_PORT, CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString + private val FIRST_EXECUTOR_POD = new PodBuilder() +.withNewMetadata() + .withName("pod1") + .endMetadata() +.withNewSpec() + .withNodeName("node1") + .endSpec() +.withNewStatus() + .withHostIP("192.168.99.100") + .endStatus() +.build() + private val SECOND_EXECUTOR_POD = new PodBuilder() +.withNewMetadata() + .withName("pod2") + .endMetadata() +.withNewSpec() + .withNodeName("node2") + .endSpec() +.withNewStatus() + .withHostIP("192.168.99.101") + .endStatus() +.build() + + private type PODS = MixedOperation[Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]] + private type LABELED_PODS = FilterWatchListDeletable[ +Pod, PodList, java.lang.Boolean, Watch, Watcher[Pod]] + private type IN_NAMESPACE_PODS = NonNamespaceOperation[ +Pod, PodList, DoneablePod, PodResource[Pod, DoneablePod]] + + @Mock + private var sparkContext: SparkContext = _ + + @Mock + private var listenerBus: LiveListenerBus = _ + + @Mock + private var taskSchedulerImpl: TaskSchedulerImpl = _ + + @Mock + private var allocatorExecutor: ScheduledExecutorService = _ + + @Mock + private var requestExecutorsService: ExecutorService = _ + + @Mock + private var executorPodFactory: ExecutorPodFactory = _ + + @Mock + private var kubernetesClient: KubernetesClient = _ + + @Mock + private var podOperations: PODS = _ + + @Mock + private var podsWithLabelOperations: LABELED_PODS = _ + +
[2/2] spark git commit: [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend
[SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend ## What changes were proposed in this pull request? This is a stripped down version of the `KubernetesClusterSchedulerBackend` for Spark with the following components: - Static Allocation of Executors - Executor Pod Factory - Executor Recovery Semantics It's step 1 from the step-wise plan documented [here](https://github.com/apache-spark-on-k8s/spark/issues/441#issuecomment-330802935). This addition is covered by the [SPIP vote](http://apache-spark-developers-list.1001551.n3.nabble.com/SPIP-Spark-on-Kubernetes-td22147.html) which passed on Aug 31 . ## How was this patch tested? - The patch contains unit tests which are passing. - Manual testing: `./build/mvn -Pkubernetes clean package` succeeded. - It is a **subset** of the entire changelist hosted in http://github.com/apache-spark-on-k8s/spark which is in active use in several organizations. - There is integration testing enabled in the fork currently [hosted by PepperData](spark-k8s-jenkins.pepperdata.org:8080) which is being moved over to RiseLAB CI. - Detailed documentation on trying out the patch in its entirety is in: https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html cc rxin felixcheung mateiz (shepherd) k8s-big-data SIG members & contributors: mccheah ash211 ssuchter varunkatta kimoonkim erikerlandson liyinan926 tnachen ifilonenko Author: Yinan Li <liyinan...@gmail.com> Author: foxish <ramanath...@google.com> Author: mcheah <mch...@palantir.com> Closes #19468 from foxish/spark-kubernetes-3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9b2070a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e9b2070a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e9b2070a Branch: refs/heads/master Commit: e9b2070ab2d04993b1c0c1d6c6aba249e6664c8d Parents: 475a29f Author: Yinan Li <liyinan...@gmail.com> Authored: Tue Nov 28 23:02:09 2017 -0800 Committer: Reynold Xin <r...@databricks.com> Committed: Tue Nov 28 23:02:09 2017 -0800 -- .travis.yml | 2 +- NOTICE | 6 + .../cluster/SchedulerBackendUtils.scala | 47 ++ dev/sparktestsupport/modules.py | 8 + docs/configuration.md | 4 +- pom.xml | 7 + project/SparkBuild.scala| 8 +- resource-managers/kubernetes/core/pom.xml | 100 + .../org/apache/spark/deploy/k8s/Config.scala| 123 ++ .../spark/deploy/k8s/ConfigurationUtils.scala | 41 ++ .../org/apache/spark/deploy/k8s/Constants.scala | 50 +++ .../k8s/SparkKubernetesClientFactory.scala | 102 + .../cluster/k8s/ExecutorPodFactory.scala| 219 + .../cluster/k8s/KubernetesClusterManager.scala | 70 +++ .../k8s/KubernetesClusterSchedulerBackend.scala | 442 +++ .../core/src/test/resources/log4j.properties| 31 ++ .../cluster/k8s/ExecutorPodFactorySuite.scala | 135 ++ ...KubernetesClusterSchedulerBackendSuite.scala | 440 ++ .../spark/deploy/yarn/YarnAllocator.scala | 3 +- .../spark/deploy/yarn/YarnSparkHadoopUtil.scala | 24 - .../cluster/YarnClientSchedulerBackend.scala| 2 +- .../cluster/YarnClusterSchedulerBackend.scala | 2 +- 22 files changed, 1832 insertions(+), 34 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/.travis.yml -- diff --git a/.travis.yml b/.travis.yml index d7e9f8c..05b94ade 100644 --- a/.travis.yml +++ b/.travis.yml @@ -43,7 +43,7 @@ notifications: # 5. Run maven install before running lint-java. install: - export MAVEN_SKIP_RC=1 - - build/mvn -T 4 -q -DskipTests -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver install + - build/mvn -T 4 -q -DskipTests -Pkubernetes -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver install # 6. Run lint-java. script: http://git-wip-us.apache.org/repos/asf/spark/blob/e9b2070a/NOTICE -- diff --git a/NOTICE b/NOTICE index f4b64b5..6ec240e 100644 --- a/NOTICE +++ b/NOTICE @@ -448,6 +448,12 @@ Copyright (C) 2011 Google Inc. Apache Commons Pool Copyright 1999-2009 The Apache Software Foundation +This product includes/uses Kubernetes & OpenShift 3 Java Client (https://github.com/fabric8io/kubernetes-client) +Copyright (C) 2015 Red Hat, Inc. + +This product includes/uses OkHttp (https://github.com/square/okhttp) +Copyright (C) 2012 The Android Open Source Project + = == NO
spark git commit: [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark
Repository: spark Updated Branches: refs/heads/master b2463fad7 -> 41b60125b [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark ## What changes were proposed in this pull request? This PR proposes to add a link from `spark.catalog(..)` to `Catalog` and expose Catalog APIs in PySpark as below: https://user-images.githubusercontent.com/6477701/32135863-f8e9b040-bc40-11e7-92ad-09c8043a1295.png;> https://user-images.githubusercontent.com/6477701/32135849-bb257b86-bc40-11e7-9eda-4d58fc1301c2.png;> Note that this is not shown in the list on the top - https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql https://user-images.githubusercontent.com/6477701/32135854-d50fab16-bc40-11e7-9181-812c56fd22f5.png;> This is basically similar with `DataFrameReader` and `DataFrameWriter`. ## How was this patch tested? Manually built the doc. Author: hyukjinkwonCloses #19596 from HyukjinKwon/SPARK-22369. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41b60125 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41b60125 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41b60125 Branch: refs/heads/master Commit: 41b60125b673bad0c133cd5c825d353ac2e6dfd6 Parents: b2463fa Author: hyukjinkwon Authored: Thu Nov 2 15:22:52 2017 +0100 Committer: Reynold Xin Committed: Thu Nov 2 15:22:52 2017 +0100 -- python/pyspark/sql/__init__.py | 3 ++- python/pyspark/sql/session.py | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/41b60125/python/pyspark/sql/__init__.py -- diff --git a/python/pyspark/sql/__init__.py b/python/pyspark/sql/__init__.py index 22ec416..c3c06c8 100644 --- a/python/pyspark/sql/__init__.py +++ b/python/pyspark/sql/__init__.py @@ -46,6 +46,7 @@ from pyspark.sql.types import Row from pyspark.sql.context import SQLContext, HiveContext, UDFRegistration from pyspark.sql.session import SparkSession from pyspark.sql.column import Column +from pyspark.sql.catalog import Catalog from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, DataFrameStatFunctions from pyspark.sql.group import GroupedData from pyspark.sql.readwriter import DataFrameReader, DataFrameWriter @@ -54,7 +55,7 @@ from pyspark.sql.window import Window, WindowSpec __all__ = [ 'SparkSession', 'SQLContext', 'HiveContext', 'UDFRegistration', -'DataFrame', 'GroupedData', 'Column', 'Row', +'DataFrame', 'GroupedData', 'Column', 'Catalog', 'Row', 'DataFrameNaFunctions', 'DataFrameStatFunctions', 'Window', 'WindowSpec', 'DataFrameReader', 'DataFrameWriter' ] http://git-wip-us.apache.org/repos/asf/spark/blob/41b60125/python/pyspark/sql/session.py -- diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py index 2cc0e2d..c3dc1a46 100644 --- a/python/pyspark/sql/session.py +++ b/python/pyspark/sql/session.py @@ -271,6 +271,8 @@ class SparkSession(object): def catalog(self): """Interface through which the user may create, drop, alter or query underlying databases, tables, functions etc. + +:return: :class:`Catalog` """ if not hasattr(self, "_catalog"): self._catalog = Catalog(self) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages
Repository: spark Updated Branches: refs/heads/master 849b465bb -> 277b1924b [SPARK-22408][SQL] RelationalGroupedDataset's distinct pivot value calculation launches unnecessary stages ## What changes were proposed in this pull request? Adding a global limit on top of the distinct values before sorting and collecting will reduce the overall work in the case where we have more distinct values. We will also eagerly perform a collect rather than a take because we know we only have at most (maxValues + 1) rows. ## How was this patch tested? Existing tests cover sorted order Author: Patrick WoodyCloses #19629 from pwoody/SPARK-22408. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/277b1924 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/277b1924 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/277b1924 Branch: refs/heads/master Commit: 277b1924b46a70ab25414f5670eb784906dbbfdf Parents: 849b465 Author: Patrick Woody Authored: Thu Nov 2 14:19:21 2017 +0100 Committer: Reynold Xin Committed: Thu Nov 2 14:19:21 2017 +0100 -- .../scala/org/apache/spark/sql/RelationalGroupedDataset.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/277b1924/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala index 21e94fa..3e4edd4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala @@ -321,10 +321,10 @@ class RelationalGroupedDataset protected[sql]( // Get the distinct values of the column and sort them so its consistent val values = df.select(pivotColumn) .distinct() + .limit(maxValues + 1) .sort(pivotColumn) // ensure that the output columns are in a consistent logical order - .rdd + .collect() .map(_.get(0)) - .take(maxValues + 1) .toSeq if (values.length > maxValues) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR] Data source v2 docs update.
Repository: spark Updated Branches: refs/heads/master 1ffe03d9e -> d43e1f06b [MINOR] Data source v2 docs update. ## What changes were proposed in this pull request? This patch includes some doc updates for data source API v2. I was reading the code and noticed some minor issues. ## How was this patch tested? This is a doc only change. Author: Reynold Xin <r...@databricks.com> Closes #19626 from rxin/dsv2-update. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d43e1f06 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d43e1f06 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d43e1f06 Branch: refs/heads/master Commit: d43e1f06bd545d00bfcaf1efb388b469effd5d64 Parents: 1ffe03d Author: Reynold Xin <r...@databricks.com> Authored: Wed Nov 1 18:39:15 2017 +0100 Committer: Reynold Xin <r...@databricks.com> Committed: Wed Nov 1 18:39:15 2017 +0100 -- .../org/apache/spark/sql/sources/v2/DataSourceV2.java| 9 - .../org/apache/spark/sql/sources/v2/WriteSupport.java| 4 ++-- .../spark/sql/sources/v2/reader/DataSourceV2Reader.java | 10 +- .../v2/reader/SupportsPushDownCatalystFilters.java | 2 -- .../sql/sources/v2/reader/SupportsScanUnsafeRow.java | 2 -- .../spark/sql/sources/v2/writer/DataSourceV2Writer.java | 11 +++ .../apache/spark/sql/sources/v2/writer/DataWriter.java | 10 +- 7 files changed, 19 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d43e1f06/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java index dbcbe32..6234071 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java +++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java @@ -20,12 +20,11 @@ package org.apache.spark.sql.sources.v2; import org.apache.spark.annotation.InterfaceStability; /** - * The base interface for data source v2. Implementations must have a public, no arguments - * constructor. + * The base interface for data source v2. Implementations must have a public, 0-arg constructor. * - * Note that this is an empty interface, data source implementations should mix-in at least one of - * the plug-in interfaces like {@link ReadSupport}. Otherwise it's just a dummy data source which is - * un-readable/writable. + * Note that this is an empty interface. Data source implementations should mix-in at least one of + * the plug-in interfaces like {@link ReadSupport} and {@link WriteSupport}. Otherwise it's just + * a dummy data source which is un-readable/writable. */ @InterfaceStability.Evolving public interface DataSourceV2 {} http://git-wip-us.apache.org/repos/asf/spark/blob/d43e1f06/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java index a8a9615..8fdfdfd 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java +++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java @@ -36,8 +36,8 @@ public interface WriteSupport { * sources can return None if there is no writing needed to be done according to the save mode. * * @param jobId A unique string for the writing job. It's possible that there are many writing - * jobs running at the same time, and the returned {@link DataSourceV2Writer} should - * use this job id to distinguish itself with writers of other jobs. + * jobs running at the same time, and the returned {@link DataSourceV2Writer} can + * use this job id to distinguish itself from other jobs. * @param schema the schema of the data to be written. * @param mode the save mode which determines what to do when the data are already in this data * source, please refer to {@link SaveMode} for more details. http://git-wip-us.apache.org/repos/asf/spark/blob/d43e1f06/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java index 5989a4a..88c3219 100644 --- a/sql/core/src/main/java/org/apache/spark/
spark git commit: [SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100
Repository: spark Updated Branches: refs/heads/master d29d1e879 -> 323806e68 [SPARK-22160][SQL] Make sample points per partition (in range partitioner) configurable and bump the default value up to 100 ## What changes were proposed in this pull request? Spark's RangePartitioner hard codes the number of sampling points per partition to be 20. This is sometimes too low. This ticket makes it configurable, via spark.sql.execution.rangeExchange.sampleSizePerPartition, and raises the default in Spark SQL to be 100. ## How was this patch tested? Added a pretty sophisticated test based on chi square test ... Author: Reynold Xin <r...@databricks.com> Closes #19387 from rxin/SPARK-22160. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/323806e6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/323806e6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/323806e6 Branch: refs/heads/master Commit: 323806e68f91f3c7521327186a37ddd1436267d0 Parents: d29d1e8 Author: Reynold Xin <r...@databricks.com> Authored: Thu Sep 28 21:07:12 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Thu Sep 28 21:07:12 2017 -0700 -- .../scala/org/apache/spark/Partitioner.scala| 15 - .../org/apache/spark/sql/internal/SQLConf.scala | 10 +++ .../exchange/ShuffleExchangeExec.scala | 7 ++- .../apache/spark/sql/ConfigBehaviorSuite.scala | 66 4 files changed, 95 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/323806e6/core/src/main/scala/org/apache/spark/Partitioner.scala -- diff --git a/core/src/main/scala/org/apache/spark/Partitioner.scala b/core/src/main/scala/org/apache/spark/Partitioner.scala index 1484f29..debbd8d 100644 --- a/core/src/main/scala/org/apache/spark/Partitioner.scala +++ b/core/src/main/scala/org/apache/spark/Partitioner.scala @@ -108,11 +108,21 @@ class HashPartitioner(partitions: Int) extends Partitioner { class RangePartitioner[K : Ordering : ClassTag, V]( partitions: Int, rdd: RDD[_ <: Product2[K, V]], -private var ascending: Boolean = true) +private var ascending: Boolean = true, +val samplePointsPerPartitionHint: Int = 20) extends Partitioner { + // A constructor declared in order to maintain backward compatibility for Java, when we add the + // 4th constructor parameter samplePointsPerPartitionHint. See SPARK-22160. + // This is added to make sure from a bytecode point of view, there is still a 3-arg ctor. + def this(partitions: Int, rdd: RDD[_ <: Product2[K, V]], ascending: Boolean) = { +this(partitions, rdd, ascending, samplePointsPerPartitionHint = 20) + } + // We allow partitions = 0, which happens when sorting an empty RDD under the default settings. require(partitions >= 0, s"Number of partitions cannot be negative but found $partitions.") + require(samplePointsPerPartitionHint > 0, +s"Sample points per partition must be greater than 0 but found $samplePointsPerPartitionHint") private var ordering = implicitly[Ordering[K]] @@ -122,7 +132,8 @@ class RangePartitioner[K : Ordering : ClassTag, V]( Array.empty } else { // This is the sample size we need to have roughly balanced output partitions, capped at 1M. - val sampleSize = math.min(20.0 * partitions, 1e6) + // Cast to double to avoid overflowing ints or longs + val sampleSize = math.min(samplePointsPerPartitionHint.toDouble * partitions, 1e6) // Assume the input partitions are roughly balanced and over-sample a little bit. val sampleSizePerPartition = math.ceil(3.0 * sampleSize / rdd.partitions.length).toInt val (numItems, sketched) = RangePartitioner.sketch(rdd.map(_._1), sampleSizePerPartition) http://git-wip-us.apache.org/repos/asf/spark/blob/323806e6/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 358cf62..1a73d16 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -907,6 +907,14 @@ object SQLConf { .booleanConf .createWithDefault(false) + val RANGE_EXCHANGE_SAMPLE_SIZE_PER_PARTITION = +buildConf("spark.sql.execution.rangeExchange.sampleSizePerPartition") + .internal() + .doc("Number of points to sample per partition in order to determine the range boundaries" + + &
spark git commit: [MINOR][TYPO] Fix typos: runnning and Excecutors
Repository: spark Updated Branches: refs/heads/master 7880909c4 -> a2db5c576 [MINOR][TYPO] Fix typos: runnning and Excecutors ## What changes were proposed in this pull request? Fix typos ## How was this patch tested? Existing tests Author: Andrew AshCloses #18996 from ash211/patch-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a2db5c57 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a2db5c57 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a2db5c57 Branch: refs/heads/master Commit: a2db5c5761b0c72babe48b79859d3b208ee8e9f6 Parents: 7880909 Author: Andrew Ash Authored: Fri Aug 18 13:43:42 2017 -0700 Committer: Reynold Xin Committed: Fri Aug 18 13:43:42 2017 -0700 -- .../main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a2db5c57/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala -- diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index f73e7dc..7052fb3 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -551,8 +551,8 @@ private[yarn] class YarnAllocator( updateInternalState() } } else { -logInfo(("Skip launching executorRunnable as runnning Excecutors count: %d " + - "reached target Executors count: %d.").format( +logInfo(("Skip launching executorRunnable as running executors count: %d " + + "reached target executors count: %d.").format( numExecutorsRunning.get, targetNumExecutors)) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog
Repository: spark Updated Branches: refs/heads/branch-2.2 3ca55eaaf -> c90949698 [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog ## What changes were proposed in this pull request? This patch removes the unused SessionCatalog.getTableMetadataOption and ExternalCatalog. getTableOption. ## How was this patch tested? Removed the test case. Author: Reynold Xin <r...@databricks.com> Closes #18912 from rxin/remove-getTableOption. (cherry picked from commit 584c7f14370cdfafdc6cd554b2760b7ce7709368) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9094969 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9094969 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9094969 Branch: refs/heads/branch-2.2 Commit: c909496983314b48dd4d8587e586b553b04ff0ce Parents: 3ca55ea Author: Reynold Xin <r...@databricks.com> Authored: Thu Aug 10 18:56:25 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Thu Aug 10 18:56:43 2017 -0700 -- .../sql/catalyst/catalog/ExternalCatalog.scala | 2 -- .../sql/catalyst/catalog/InMemoryCatalog.scala | 4 .../sql/catalyst/catalog/SessionCatalog.scala | 17 +++-- .../sql/catalyst/catalog/SessionCatalogSuite.scala | 11 --- .../spark/sql/hive/HiveExternalCatalog.scala | 4 5 files changed, 3 insertions(+), 35 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c9094969/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala index 974ef90..18644b0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala @@ -162,8 +162,6 @@ abstract class ExternalCatalog def getTable(db: String, table: String): CatalogTable - def getTableOption(db: String, table: String): Option[CatalogTable] - def tableExists(db: String, table: String): Boolean def listTables(db: String): Seq[String] http://git-wip-us.apache.org/repos/asf/spark/blob/c9094969/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala index 864ee48..bf8542c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala @@ -315,10 +315,6 @@ class InMemoryCatalog( catalog(db).tables(table).table } - override def getTableOption(db: String, table: String): Option[CatalogTable] = synchronized { -if (!tableExists(db, table)) None else Option(catalog(db).tables(table).table) - } - override def tableExists(db: String, table: String): Boolean = synchronized { requireDbExists(db) catalog(db).tables.contains(table) http://git-wip-us.apache.org/repos/asf/spark/blob/c9094969/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 57006bf..8d9fb4c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -388,9 +388,10 @@ class SessionCatalog( /** * Retrieve the metadata of an existing permanent table/view. If no database is specified, - * assume the table/view is in the current database. If the specified table/view is not found - * in the database then a [[NoSuchTableException]] is thrown. + * assume the table/view is in the current database. */ + @throws[NoSuchDatabaseException] + @throws[NoSuchTableException] def getTableMetadata(name: TableIdentifier): CatalogTable = { val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase)) val table = formatTableName(name.table) @@ -400,18 +401,6 @@ class SessionCatalog( } /** - * Retrieve the metadata of an existing metastore
spark git commit: [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog
Repository: spark Updated Branches: refs/heads/master ca6955858 -> 584c7f143 [SPARK-21699][SQL] Remove unused getTableOption in ExternalCatalog ## What changes were proposed in this pull request? This patch removes the unused SessionCatalog.getTableMetadataOption and ExternalCatalog. getTableOption. ## How was this patch tested? Removed the test case. Author: Reynold Xin <r...@databricks.com> Closes #18912 from rxin/remove-getTableOption. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/584c7f14 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/584c7f14 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/584c7f14 Branch: refs/heads/master Commit: 584c7f14370cdfafdc6cd554b2760b7ce7709368 Parents: ca69558 Author: Reynold Xin <r...@databricks.com> Authored: Thu Aug 10 18:56:25 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Thu Aug 10 18:56:25 2017 -0700 -- .../sql/catalyst/catalog/ExternalCatalog.scala | 2 -- .../sql/catalyst/catalog/InMemoryCatalog.scala | 4 .../sql/catalyst/catalog/SessionCatalog.scala | 17 +++-- .../sql/catalyst/catalog/SessionCatalogSuite.scala | 11 --- .../spark/sql/hive/HiveExternalCatalog.scala | 4 5 files changed, 3 insertions(+), 35 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/584c7f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala index 68644f4..d4c58db 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala @@ -167,8 +167,6 @@ abstract class ExternalCatalog def getTable(db: String, table: String): CatalogTable - def getTableOption(db: String, table: String): Option[CatalogTable] - def tableExists(db: String, table: String): Boolean def listTables(db: String): Seq[String] http://git-wip-us.apache.org/repos/asf/spark/blob/584c7f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala index 37e9eea..98370c1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala @@ -326,10 +326,6 @@ class InMemoryCatalog( catalog(db).tables(table).table } - override def getTableOption(db: String, table: String): Option[CatalogTable] = synchronized { -if (!tableExists(db, table)) None else Option(catalog(db).tables(table).table) - } - override def tableExists(db: String, table: String): Boolean = synchronized { requireDbExists(db) catalog(db).tables.contains(table) http://git-wip-us.apache.org/repos/asf/spark/blob/584c7f14/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index b44d2ee..e3237a8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -387,9 +387,10 @@ class SessionCatalog( /** * Retrieve the metadata of an existing permanent table/view. If no database is specified, - * assume the table/view is in the current database. If the specified table/view is not found - * in the database then a [[NoSuchTableException]] is thrown. + * assume the table/view is in the current database. */ + @throws[NoSuchDatabaseException] + @throws[NoSuchTableException] def getTableMetadata(name: TableIdentifier): CatalogTable = { val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase)) val table = formatTableName(name.table) @@ -399,18 +400,6 @@ class SessionCatalog( } /** - * Retrieve the metadata of an existing metastore table. - * If no database is specified, assume the table is in the current database. - * If the specified table is not found in the
spark git commit: [SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter jobs
Repository: spark Updated Branches: refs/heads/master 84454d7d3 -> 95ad960ca [SPARK-21669] Internal API for collecting metrics/stats during FileFormatWriter jobs ## What changes were proposed in this pull request? This patch introduces an internal interface for tracking metrics and/or statistics on data on the fly, as it is being written to disk during a `FileFormatWriter` job and partially reimplements SPARK-20703 in terms of it. The interface basically consists of 3 traits: - `WriteTaskStats`: just a tag for classes that represent statistics collected during a `WriteTask` The only constraint it adds is that the class should be `Serializable`, as instances of it will be collected on the driver from all executors at the end of the `WriteJob`. - `WriteTaskStatsTracker`: a trait for classes that can actually compute statistics based on tuples that are processed by a given `WriteTask` and eventually produce a `WriteTaskStats` instance. - `WriteJobStatsTracker`: a trait for classes that act as containers of `Serializable` state that's necessary for instantiating `WriteTaskStatsTracker` on executors and finally process the resulting collection of `WriteTaskStats`, once they're gathered back on the driver. Potential future use of this interface is e.g. CBO stats maintenance during `INSERT INTO table ... ` operations. ## How was this patch tested? Existing tests for SPARK-20703 exercise the new code: `hive/SQLMetricsSuite`, `sql/JavaDataFrameReaderWriterSuite`, etc. Author: Adrian IonescuCloses #18884 from adrian-ionescu/write-stats-tracker-api. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/95ad960c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/95ad960c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/95ad960c Branch: refs/heads/master Commit: 95ad960caf009d843ec700ee41cbccc2fa3a68a5 Parents: 84454d7 Author: Adrian Ionescu Authored: Thu Aug 10 12:37:10 2017 -0700 Committer: Reynold Xin Committed: Thu Aug 10 12:37:10 2017 -0700 -- .../execution/command/DataWritingCommand.scala | 34 +-- .../datasources/BasicWriteStatsTracker.scala| 133 ++ .../datasources/FileFormatWriter.scala | 245 ++- .../InsertIntoHadoopFsRelationCommand.scala | 43 ++-- .../datasources/WriteStatsTracker.scala | 121 + .../execution/streaming/FileStreamSink.scala| 2 +- .../hive/execution/InsertIntoHiveTable.scala| 4 +- 7 files changed, 420 insertions(+), 162 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/95ad960c/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala index 700f7f8..4e1c5e4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala @@ -17,10 +17,13 @@ package org.apache.spark.sql.execution.command +import org.apache.hadoop.conf.Configuration + import org.apache.spark.SparkContext -import org.apache.spark.sql.execution.SQLExecution -import org.apache.spark.sql.execution.datasources.ExecutedWriteSummary +import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics} +import org.apache.spark.util.SerializableConfiguration + /** * A special `RunnableCommand` which writes data out and updates metrics. @@ -37,29 +40,8 @@ trait DataWritingCommand extends RunnableCommand { ) } - /** - * Callback function that update metrics collected from the writing operation. - */ - protected def updateWritingMetrics(writeSummaries: Seq[ExecutedWriteSummary]): Unit = { -val sparkContext = SparkContext.getActive.get -var numPartitions = 0 -var numFiles = 0 -var totalNumBytes: Long = 0L -var totalNumOutput: Long = 0L - -writeSummaries.foreach { summary => - numPartitions += summary.updatedPartitions.size - numFiles += summary.numOutputFile - totalNumBytes += summary.numOutputBytes - totalNumOutput += summary.numOutputRows -} - -metrics("numFiles").add(numFiles) -metrics("numOutputBytes").add(totalNumBytes) -metrics("numOutputRows").add(totalNumOutput) -metrics("numParts").add(numPartitions) - -val executionId = sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) -SQLMetrics.postDriverMetricUpdates(sparkContext,
spark git commit: [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator
Repository: spark Updated Branches: refs/heads/master 0fb73253f -> c06f3f5ac [SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator ## What changes were proposed in this pull request? This modification increases the timeout for `serveIterator` (which is not dynamically configurable). This fixes timeout issues in pyspark when using `collect` and similar functions, in cases where Python may take more than a couple seconds to connect. See https://issues.apache.org/jira/browse/SPARK-21551 ## How was this patch tested? Ran the tests. cc rxin Author: peay <p...@protonmail.com> Closes #18752 from peay/spark-21551. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c06f3f5a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c06f3f5a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c06f3f5a Branch: refs/heads/master Commit: c06f3f5ac500b02d38ca7ec5fcb33085e07f2f75 Parents: 0fb7325 Author: peay <p...@protonmail.com> Authored: Wed Aug 9 14:03:18 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Wed Aug 9 14:03:18 2017 -0700 -- .../src/main/scala/org/apache/spark/api/python/PythonRDD.scala | 6 +++--- python/pyspark/rdd.py | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c06f3f5a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala index 6a81752..3377101 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala @@ -683,7 +683,7 @@ private[spark] object PythonRDD extends Logging { * Create a socket server and a background thread to serve the data in `items`, * * The socket server can only accept one connection, or close if no connection - * in 3 seconds. + * in 15 seconds. * * Once a connection comes in, it tries to serialize all the data in `items` * and send them into this connection. @@ -692,8 +692,8 @@ private[spark] object PythonRDD extends Logging { */ def serveIterator[T](items: Iterator[T], threadName: String): Int = { val serverSocket = new ServerSocket(0, 1, InetAddress.getByName("localhost")) -// Close the socket if no connection in 3 seconds -serverSocket.setSoTimeout(3000) +// Close the socket if no connection in 15 seconds +serverSocket.setSoTimeout(15000) new Thread(threadName) { setDaemon(true) http://git-wip-us.apache.org/repos/asf/spark/blob/c06f3f5a/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 3325b65..ea993c5 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -127,7 +127,7 @@ def _load_from_socket(port, serializer): af, socktype, proto, canonname, sa = res sock = socket.socket(af, socktype, proto) try: -sock.settimeout(3) +sock.settimeout(15) sock.connect(sa) except socket.error: sock.close() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions
Repository: spark Updated Branches: refs/heads/master cf29828d7 -> 60472dbfd [SPARK-21485][SQL][DOCS] Spark SQL documentation generation for built-in functions ## What changes were proposed in this pull request? This generates a documentation for Spark SQL built-in functions. One drawback is, this requires a proper build to generate built-in function list. Once it is built, it only takes few seconds by `sql/create-docs.sh`. Please see https://spark-test.github.io/sparksqldoc/ that I hosted to show the output documentation. There are few more works to be done in order to make the documentation pretty, for example, separating `Arguments:` and `Examples:` but I guess this should be done within `ExpressionDescription` and `ExpressionInfo` rather than manually parsing it. I will fix these in a follow up. This requires `pip install mkdocs` to generate HTMLs from markdown files. ## How was this patch tested? Manually tested: ``` cd docs jekyll build ``` , ``` cd docs jekyll serve ``` and ``` cd sql create-docs.sh ``` Author: hyukjinkwonCloses #18702 from HyukjinKwon/SPARK-21485. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60472dbf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60472dbf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60472dbf Branch: refs/heads/master Commit: 60472dbfd97acfd6c4420a13f9b32bc9d84219f3 Parents: cf29828 Author: hyukjinkwon Authored: Wed Jul 26 09:38:51 2017 -0700 Committer: Reynold Xin Committed: Wed Jul 26 09:38:51 2017 -0700 -- .gitignore | 2 + docs/README.md | 6 +- docs/_layouts/global.html | 1 + docs/_plugins/copy_api_dirs.rb | 27 ++ docs/api.md | 1 + docs/index.md | 1 + sql/README.md | 2 + .../spark/sql/api/python/PythonSQLUtils.scala | 7 ++ sql/create-docs.sh | 49 +++ sql/gen-sql-markdown.py | 91 sql/mkdocs.yml | 19 11 files changed, 203 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/.gitignore -- diff --git a/.gitignore b/.gitignore index cf9780d..903297d 100644 --- a/.gitignore +++ b/.gitignore @@ -47,6 +47,8 @@ dev/pr-deps/ dist/ docs/_site docs/api +sql/docs +sql/site lib_managed/ lint-r-report.log log/ http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/docs/README.md -- diff --git a/docs/README.md b/docs/README.md index 90e10a1..0090dd0 100644 --- a/docs/README.md +++ b/docs/README.md @@ -68,6 +68,6 @@ jekyll plugin to run `build/sbt unidoc` before building the site so if you haven may take some time as it generates all of the scaladoc. The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-doc.org/). -NOTE: To skip the step of building and copying over the Scala, Python, R API docs, run `SKIP_API=1 -jekyll`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, and `SKIP_RDOC=1` can be used to skip a single -step of the corresponding language. +NOTE: To skip the step of building and copying over the Scala, Python, R and SQL API docs, run `SKIP_API=1 +jekyll`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used +to skip a single step of the corresponding language. http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/docs/_layouts/global.html -- diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 570483c..67b05ec 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -86,6 +86,7 @@ Java Python R +SQL, Built-in Functions http://git-wip-us.apache.org/repos/asf/spark/blob/60472dbf/docs/_plugins/copy_api_dirs.rb -- diff --git a/docs/_plugins/copy_api_dirs.rb b/docs/_plugins/copy_api_dirs.rb index 95e3ba3..00366f8 100644 --- a/docs/_plugins/copy_api_dirs.rb +++ b/docs/_plugins/copy_api_dirs.rb @@ -150,4 +150,31 @@ if not (ENV['SKIP_API'] == '1') cp("../R/pkg/DESCRIPTION", "api") end + if not (ENV['SKIP_SQLDOC'] == '1') +# Build SQL API docs + +puts
spark git commit: [SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong.
Repository: spark Updated Branches: refs/heads/master 2cbfc975b -> 24367f23f [SPARK-21382] The note about Scala 2.10 in building-spark.md is wrong. [https://issues.apache.org/jira/browse/SPARK-21382](https://issues.apache.org/jira/browse/SPARK-21382) There should be "Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be removed in Spark 2.3.0",right? Author: liuzhaokunCloses #18606 from liu-zhaokun/new07120923. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24367f23 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24367f23 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24367f23 Branch: refs/heads/master Commit: 24367f23f77349a864da340573e39ab2168c5403 Parents: 2cbfc97 Author: liuzhaokun Authored: Tue Jul 11 23:02:20 2017 -0700 Committer: Reynold Xin Committed: Tue Jul 11 23:02:20 2017 -0700 -- docs/building-spark.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/24367f23/docs/building-spark.md -- diff --git a/docs/building-spark.md b/docs/building-spark.md index 777635a..815843c 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -97,7 +97,7 @@ To produce a Spark package compiled with Scala 2.10, use the `-Dscala-2.10` prop ./dev/change-scala-version.sh 2.10 ./build/mvn -Pyarn -Dscala-2.10 -DskipTests clean package -Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be removed in Spark 2.2.0. +Note that support for Scala 2.10 is deprecated as of Spark 2.1.0 and may be removed in Spark 2.3.0. ## Building submodules individually - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark
Repository: spark Updated Branches: refs/heads/master d03aebbe6 -> c3713fde8 [SPARK-21358][EXAMPLES] Argument of repartitionandsortwithinpartitions at pyspark ## What changes were proposed in this pull request? At example of repartitionAndSortWithinPartitions at rdd.py, third argument should be True or False. I proposed fix of example code. ## How was this patch tested? * I rename test_repartitionAndSortWithinPartitions to test_repartitionAndSortWIthinPartitions_asc to specify boolean argument. * I added test_repartitionAndSortWithinPartitions_desc to test False pattern at third argument. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: chie8842Closes #18586 from chie8842/SPARK-21358. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c3713fde Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c3713fde Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c3713fde Branch: refs/heads/master Commit: c3713fde86204bf3f027483914ff9e60e7aad261 Parents: d03aebb Author: chie8842 Authored: Mon Jul 10 18:56:54 2017 -0700 Committer: Reynold Xin Committed: Mon Jul 10 18:56:54 2017 -0700 -- python/pyspark/rdd.py | 2 +- python/pyspark/tests.py | 12 ++-- 2 files changed, 11 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c3713fde/python/pyspark/rdd.py -- diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py index 7dfa17f..3325b65 100644 --- a/python/pyspark/rdd.py +++ b/python/pyspark/rdd.py @@ -608,7 +608,7 @@ class RDD(object): sort records by their keys. >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) ->>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, 2) +>>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) >>> rdd2.glom().collect() [[(0, 5), (0, 8), (2, 6)], [(1, 3), (3, 8), (3, 8)]] """ http://git-wip-us.apache.org/repos/asf/spark/blob/c3713fde/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py index bb13de5..73ab442 100644 --- a/python/pyspark/tests.py +++ b/python/pyspark/tests.py @@ -1019,14 +1019,22 @@ class RDDTests(ReusedPySparkTestCase): self.assertEqual((["ab", "ef"], [5]), rdd.histogram(1)) self.assertRaises(TypeError, lambda: rdd.histogram(2)) -def test_repartitionAndSortWithinPartitions(self): +def test_repartitionAndSortWithinPartitions_asc(self): rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)], 2) -repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: key % 2) +repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: key % 2, True) partitions = repartitioned.glom().collect() self.assertEqual(partitions[0], [(0, 5), (0, 8), (2, 6)]) self.assertEqual(partitions[1], [(1, 3), (3, 8), (3, 8)]) +def test_repartitionAndSortWithinPartitions_desc(self): +rdd = self.sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)], 2) + +repartitioned = rdd.repartitionAndSortWithinPartitions(2, lambda key: key % 2, False) +partitions = repartitioned.glom().collect() +self.assertEqual(partitions[0], [(2, 6), (0, 5), (0, 8)]) +self.assertEqual(partitions[1], [(3, 8), (3, 8), (1, 3)]) + def test_repartition_no_skewed(self): num_partitions = 20 a = self.sc.parallelize(range(int(1000)), 2) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval
Repository: spark Updated Branches: refs/heads/master 48e44b24a -> bf66335ac [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval ## What changes were proposed in this pull request? Rename org.apache.spark.sql.catalyst.plans.logical.statsEstimation.Range to ValueInterval. The current naming is identical to logical operator "range". Refactoring it to ValueInterval is more accurate. ## How was this patch tested? unit test Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Wang GengliangCloses #18549 from gengliangwang/ValueInterval. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf66335a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bf66335a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bf66335a Branch: refs/heads/master Commit: bf66335acab3c0c188f6c378eb8aa6948a259cb2 Parents: 48e44b2 Author: Wang Gengliang Authored: Thu Jul 6 13:58:27 2017 -0700 Committer: Reynold Xin Committed: Thu Jul 6 13:58:27 2017 -0700 -- .../statsEstimation/FilterEstimation.scala | 36 .../statsEstimation/JoinEstimation.scala| 14 +-- .../plans/logical/statsEstimation/Range.scala | 88 --- .../logical/statsEstimation/ValueInterval.scala | 91 4 files changed, 117 insertions(+), 112 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bf66335a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala index 5a3bee7..e13db85 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala @@ -316,8 +316,8 @@ case class FilterEstimation(plan: Filter) extends Logging { // decide if the value is in [min, max] of the column. // We currently don't store min/max for binary/string type. // Hence, we assume it is in boundary for binary/string type. -val statsRange = Range(colStat.min, colStat.max, attr.dataType) -if (statsRange.contains(literal)) { +val statsInterval = ValueInterval(colStat.min, colStat.max, attr.dataType) +if (statsInterval.contains(literal)) { if (update) { // We update ColumnStat structure after apply this equality predicate: // Set distinctCount to 1, nullCount to 0, and min/max values (if exist) to the literal @@ -388,9 +388,10 @@ case class FilterEstimation(plan: Filter) extends Logging { // use [min, max] to filter the original hSet dataType match { case _: NumericType | BooleanType | DateType | TimestampType => -val statsRange = Range(colStat.min, colStat.max, dataType).asInstanceOf[NumericRange] +val statsInterval = + ValueInterval(colStat.min, colStat.max, dataType).asInstanceOf[NumericValueInterval] val validQuerySet = hSet.filter { v => - v != null && statsRange.contains(Literal(v, dataType)) + v != null && statsInterval.contains(Literal(v, dataType)) } if (validQuerySet.isEmpty) { @@ -440,12 +441,13 @@ case class FilterEstimation(plan: Filter) extends Logging { update: Boolean): Option[BigDecimal] = { val colStat = colStatsMap(attr) -val statsRange = Range(colStat.min, colStat.max, attr.dataType).asInstanceOf[NumericRange] -val max = statsRange.max.toBigDecimal -val min = statsRange.min.toBigDecimal +val statsInterval = + ValueInterval(colStat.min, colStat.max, attr.dataType).asInstanceOf[NumericValueInterval] +val max = statsInterval.max.toBigDecimal +val min = statsInterval.min.toBigDecimal val ndv = BigDecimal(colStat.distinctCount) -// determine the overlapping degree between predicate range and column's range +// determine the overlapping degree between predicate interval and column's interval val numericLiteral = if (literal.dataType == BooleanType) { if (literal.value.asInstanceOf[Boolean]) BigDecimal(1) else BigDecimal(0) } else { @@ -566,18 +568,18 @@ case class FilterEstimation(plan: Filter) extends Logging { } val colStatLeft = colStatsMap(attrLeft) -val statsRangeLeft = Range(colStatLeft.min, colStatLeft.max, attrLeft.dataType) - .asInstanceOf[NumericRange] -val maxLeft =
spark git commit: [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan
Repository: spark Updated Branches: refs/heads/master e862dc904 -> b6b108826 [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan ## What changes were proposed in this pull request? QueryPlanConstraints should be part of LogicalPlan, rather than QueryPlan, since the constraint framework is only used for query plan rewriting and not for physical planning. ## How was this patch tested? Should be covered by existing tests, since it is a simple refactoring. Author: Reynold Xin <r...@databricks.com> Closes #18310 from rxin/SPARK-21103. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6b10882 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b6b10882 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b6b10882 Branch: refs/heads/master Commit: b6b108826a5dd5c889a70180365f9320452557fc Parents: e862dc9 Author: Reynold Xin <r...@databricks.com> Authored: Tue Jun 20 11:34:22 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Tue Jun 20 11:34:22 2017 -0700 -- .../spark/sql/catalyst/plans/QueryPlan.scala| 5 +- .../catalyst/plans/QueryPlanConstraints.scala | 195 -- .../catalyst/plans/logical/LogicalPlan.scala| 2 +- .../plans/logical/QueryPlanConstraints.scala| 196 +++ 4 files changed, 198 insertions(+), 200 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b6b10882/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala index 9130b14..1f6d05b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala @@ -22,10 +22,7 @@ import org.apache.spark.sql.catalyst.trees.TreeNode import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.{DataType, StructType} -abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] - extends TreeNode[PlanType] - with QueryPlanConstraints[PlanType] { - +abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] { self: PlanType => def conf: SQLConf = SQLConf.get http://git-wip-us.apache.org/repos/asf/spark/blob/b6b10882/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala deleted file mode 100644 index b08a009..000 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala +++ /dev/null @@ -1,195 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.plans - -import org.apache.spark.sql.catalyst.expressions._ - - -trait QueryPlanConstraints[PlanType <: QueryPlan[PlanType]] { self: QueryPlan[PlanType] => - - /** - * An [[ExpressionSet]] that contains invariants about the rows output by this operator. For - * example, if this set contains the expression `a = 2` then that expression is guaranteed to - * evaluate to `true` for all rows produced. - */ - lazy val constraints: ExpressionSet = { -if (conf.constraintPropagationEnabled) { - ExpressionSet( -validConstraints - .union(inferAdditionalConstraints(validConstraints)) - .union(constructIsNotNullConstraints(validConstraints)) - .filter { c => -c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic - } - ) -} else { - ExpressionSet(Set.e
spark git commit: [SPARK-21092][SQL] Wire SQLConf in logical plan and expressions
Repository: spark Updated Branches: refs/heads/master 292467440 -> fffeb6d7c [SPARK-21092][SQL] Wire SQLConf in logical plan and expressions ## What changes were proposed in this pull request? It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf explicitly in functions). This patch exposes SQLConf in logical plan, using a thread local variable and a getter closure that's set once there is an active SparkSession. The implementation is a bit of a hack, since we didn't anticipate this need in the beginning (config was only exposed in physical plan). The implementation is described in `SQLConf.get`. In terms of future work, we should follow up to clean up CBO (remove the need for passing in config). ## How was this patch tested? Updated relevant tests for constraint propagation. Author: Reynold Xin <r...@databricks.com> Closes #18299 from rxin/SPARK-21092. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fffeb6d7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fffeb6d7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fffeb6d7 Branch: refs/heads/master Commit: fffeb6d7c37ee673a32584f3b2fd3afe86af793a Parents: 2924674 Author: Reynold Xin <r...@databricks.com> Authored: Wed Jun 14 22:11:41 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Wed Jun 14 22:11:41 2017 -0700 -- .../sql/catalyst/optimizer/Optimizer.scala | 25 ++-- .../spark/sql/catalyst/optimizer/joins.scala| 5 +-- .../spark/sql/catalyst/plans/QueryPlan.scala| 3 ++ .../catalyst/plans/QueryPlanConstraints.scala | 33 +-- .../org/apache/spark/sql/internal/SQLConf.scala | 42 .../BinaryComparisonSimplificationSuite.scala | 2 +- .../optimizer/BooleanSimplificationSuite.scala | 2 +- .../InferFiltersFromConstraintsSuite.scala | 24 +-- .../optimizer/OuterJoinEliminationSuite.scala | 37 - .../optimizer/PropagateEmptyRelationSuite.scala | 4 +- .../catalyst/optimizer/PruneFiltersSuite.scala | 36 +++-- .../catalyst/optimizer/SetOperationSuite.scala | 2 +- .../plans/ConstraintPropagationSuite.scala | 29 +- .../org/apache/spark/sql/SparkSession.scala | 5 +++ 14 files changed, 141 insertions(+), 108 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fffeb6d7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index d16689a..3ab70fb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -77,12 +77,12 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: SQLConf) // Operator push down PushProjectionThroughUnion, ReorderJoin(conf), - EliminateOuterJoin(conf), + EliminateOuterJoin, PushPredicateThroughJoin, PushDownPredicate, LimitPushDown(conf), ColumnPruning, - InferFiltersFromConstraints(conf), + InferFiltersFromConstraints, // Operator combine CollapseRepartition, CollapseProject, @@ -102,7 +102,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog, conf: SQLConf) SimplifyConditionals, RemoveDispensableExpressions, SimplifyBinaryComparison, - PruneFilters(conf), + PruneFilters, EliminateSorts, SimplifyCasts, SimplifyCaseConversionExpressions, @@ -619,14 +619,15 @@ object CollapseWindow extends Rule[LogicalPlan] { * Note: While this optimization is applicable to all types of join, it primarily benefits Inner and * LeftSemi joins. */ -case class InferFiltersFromConstraints(conf: SQLConf) -extends Rule[LogicalPlan] with PredicateHelper { - def apply(plan: LogicalPlan): LogicalPlan = if (conf.constraintPropagationEnabled) { -inferFilters(plan) - } else { -plan - } +object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = { +if (SQLConf.get.constraintPropagationEnabled) { + inferFilters(plan) +} else { + plan +} + } private def inferFilters(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, child) => @@ -717,7 +718,7 @@ object EliminateSorts extends Rule[LogicalPlan] { * 2) by substituting a dummy empty relation whe
spark git commit: [SPARK-21091][SQL] Move constraint code into QueryPlanConstraints
Repository: spark Updated Branches: refs/heads/master 77a2fc5b5 -> e254e868f [SPARK-21091][SQL] Move constraint code into QueryPlanConstraints ## What changes were proposed in this pull request? This patch moves constraint related code into a separate trait QueryPlanConstraints, so we don't litter QueryPlan with a lot of constraint private functions. ## How was this patch tested? This is a simple move refactoring and should be covered by existing tests. Author: Reynold Xin <r...@databricks.com> Closes #18298 from rxin/SPARK-21091. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e254e868 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e254e868 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e254e868 Branch: refs/heads/master Commit: e254e868f1e640b59d8d3e0e01a5e0c488dd6e70 Parents: 77a2fc5 Author: Reynold Xin <r...@databricks.com> Authored: Wed Jun 14 14:28:21 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Wed Jun 14 14:28:21 2017 -0700 -- .../spark/sql/catalyst/plans/QueryPlan.scala| 187 + .../catalyst/plans/QueryPlanConstraints.scala | 206 +++ 2 files changed, 210 insertions(+), 183 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e254e868/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala index 5ba043e..8bc462e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala @@ -21,194 +21,15 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.trees.TreeNode import org.apache.spark.sql.types.{DataType, StructType} -abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanType] { +abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] + extends TreeNode[PlanType] + with QueryPlanConstraints[PlanType] { + self: PlanType => def output: Seq[Attribute] /** - * Extracts the relevant constraints from a given set of constraints based on the attributes that - * appear in the [[outputSet]]. - */ - protected def getRelevantConstraints(constraints: Set[Expression]): Set[Expression] = { -constraints - .union(inferAdditionalConstraints(constraints)) - .union(constructIsNotNullConstraints(constraints)) - .filter(constraint => -constraint.references.nonEmpty && constraint.references.subsetOf(outputSet) && - constraint.deterministic) - } - - /** - * Infers a set of `isNotNull` constraints from null intolerant expressions as well as - * non-nullable attributes. For e.g., if an expression is of the form (`a > 5`), this - * returns a constraint of the form `isNotNull(a)` - */ - private def constructIsNotNullConstraints(constraints: Set[Expression]): Set[Expression] = { -// First, we propagate constraints from the null intolerant expressions. -var isNotNullConstraints: Set[Expression] = constraints.flatMap(inferIsNotNullConstraints) - -// Second, we infer additional constraints from non-nullable attributes that are part of the -// operator's output -val nonNullableAttributes = output.filterNot(_.nullable) -isNotNullConstraints ++= nonNullableAttributes.map(IsNotNull).toSet - -isNotNullConstraints -- constraints - } - - /** - * Infer the Attribute-specific IsNotNull constraints from the null intolerant child expressions - * of constraints. - */ - private def inferIsNotNullConstraints(constraint: Expression): Seq[Expression] = -constraint match { - // When the root is IsNotNull, we can push IsNotNull through the child null intolerant - // expressions - case IsNotNull(expr) => scanNullIntolerantAttribute(expr).map(IsNotNull(_)) - // Constraints always return true for all the inputs. That means, null will never be returned. - // Thus, we can infer `IsNotNull(constraint)`, and also push IsNotNull through the child - // null intolerant expressions. - case _ => scanNullIntolerantAttribute(constraint).map(IsNotNull(_)) -} - - /** - * Recursively explores the expressions which are null intolerant and returns all attributes - * in these expressions. - */ - private def scanNullIntolerantAttribute(expr: Expression): Seq[Attribute] = expr match { -case a: Attribute => Seq(a) -case _: NullIntolerant => expr.children.flatMap
spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position
Repository: spark Updated Branches: refs/heads/branch-2.2 869af5bcb -> 815a0820b [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users. ## How was this patch tested? N/A - doc only change. Author: Reynold Xin <r...@databricks.com> Closes #18256 from rxin/SPARK-21042. (cherry picked from commit b78e3849b20d0d09b7146efd7ce8f203ef67b890) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/815a0820 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/815a0820 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/815a0820 Branch: refs/heads/branch-2.2 Commit: 815a0820b1808118ae198a44f4aa0f0f2b6511e6 Parents: 869af5b Author: Reynold Xin <r...@databricks.com> Authored: Fri Jun 9 18:29:33 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Jun 9 18:29:39 2017 -0700 -- R/pkg/R/DataFrame.R | 1 + python/pyspark/sql/dataframe.py | 13 + .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 -- 3 files changed, 18 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index a7b1e3b..b606f1f 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2642,6 +2642,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) { #' Input SparkDataFrames can have different schemas (names and data types). #' #' Note: This does not remove duplicate rows across the two SparkDataFrames. +#' Also as standard in SQL, this function resolves columns by position (not by name). #' #' @param x A SparkDataFrame #' @param y A SparkDataFrame http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/python/pyspark/sql/dataframe.py -- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index b1eb80e..d1b336d 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1166,18 +1166,23 @@ class DataFrame(object): @since(2.0) def union(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). """ return DataFrame(self._jdf.union(other._jdf), self.sql_ctx) @since(1.3) def unionAll(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. + +This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union +(that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). .. note:: Deprecated in 2.0, use union instead. """ http://git-wip-us.apache.org/repos/asf/spark/blob/815a0820/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index f37d433..3658890 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1630,10 +1630,11 @@ class Dataset[T] private[sql]( /** * Returns a new Dataset containing union of rows in this Dataset and another Dataset. - * This is equivalent to `UNION ALL` in SQL. * - * To do a SQL-style set union (that does deduplication of elements), use this function followed - * by a [[distinct]]. + * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does + * deduplication of elements), use this function followed by a [[distinct]]. + * + * Also as standard in SQL, this function resolves co
spark git commit: [SPARK-21042][SQL] Document Dataset.union is resolution by position
Repository: spark Updated Branches: refs/heads/master 571635488 -> b78e3849b [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since this has been a confusing point for a lot of users. ## How was this patch tested? N/A - doc only change. Author: Reynold Xin <r...@databricks.com> Closes #18256 from rxin/SPARK-21042. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b78e3849 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b78e3849 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b78e3849 Branch: refs/heads/master Commit: b78e3849b20d0d09b7146efd7ce8f203ef67b890 Parents: 5716354 Author: Reynold Xin <r...@databricks.com> Authored: Fri Jun 9 18:29:33 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Jun 9 18:29:33 2017 -0700 -- R/pkg/R/DataFrame.R | 1 + python/pyspark/sql/dataframe.py | 13 + .../src/main/scala/org/apache/spark/sql/Dataset.scala | 14 -- 3 files changed, 18 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/R/pkg/R/DataFrame.R -- diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 166b398..3b9d42d 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -2646,6 +2646,7 @@ generateAliasesForIntersectedCols <- function (x, intersectedColNames, suffix) { #' Input SparkDataFrames can have different schemas (names and data types). #' #' Note: This does not remove duplicate rows across the two SparkDataFrames. +#' Also as standard in SQL, this function resolves columns by position (not by name). #' #' @param x A SparkDataFrame #' @param y A SparkDataFrame http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/python/pyspark/sql/dataframe.py -- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 99abfcc..8541403 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1175,18 +1175,23 @@ class DataFrame(object): @since(2.0) def union(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). """ return DataFrame(self._jdf.union(other._jdf), self.sql_ctx) @since(1.3) def unionAll(self, other): -""" Return a new :class:`DataFrame` containing union of rows in this -frame and another frame. +""" Return a new :class:`DataFrame` containing union of rows in this and another frame. + +This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union +(that does deduplication of elements), use this function followed by a distinct. + +Also as standard in SQL, this function resolves columns by position (not by name). .. note:: Deprecated in 2.0, use union instead. """ http://git-wip-us.apache.org/repos/asf/spark/blob/b78e3849/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index f7637e0..d28ff78 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1734,10 +1734,11 @@ class Dataset[T] private[sql]( /** * Returns a new Dataset containing union of rows in this Dataset and another Dataset. - * This is equivalent to `UNION ALL` in SQL. * - * To do a SQL-style set union (that does deduplication of elements), use this function followed - * by a [[distinct]]. + * This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does + * deduplication of elements), use this function followed by a [[distinct]]. + * + * Also as standard in SQL, this function resolves columns by position (not by name). * * @group typedrel * @since 2.0.0 @@ -1747,10 +1748,11 @@ class Dataset[T
spark git commit: [SPARK-20854][TESTS] Removing duplicate test case
Repository: spark Updated Branches: refs/heads/branch-2.2 421d8ecb8 -> 3f93d076b [SPARK-20854][TESTS] Removing duplicate test case ## What changes were proposed in this pull request? Removed a duplicate case in "SPARK-20854: select hint syntax with expressions" ## How was this patch tested? Existing tests. Author: Bogdan RaducanuCloses #18217 from bogdanrdc/SPARK-20854-2. (cherry picked from commit cb83ca1433c865cb0aef973df2b872a83671acfd) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3f93d076 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3f93d076 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3f93d076 Branch: refs/heads/branch-2.2 Commit: 3f93d076b8c4a932bace2ebef400abe60ad5927c Parents: 421d8ec Author: Bogdan Raducanu Authored: Tue Jun 6 22:51:10 2017 -0700 Committer: Reynold Xin Committed: Tue Jun 6 22:51:18 2017 -0700 -- .../apache/spark/sql/catalyst/parser/PlanParserSuite.scala | 8 1 file changed, 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3f93d076/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala index 954f6da..77ae843 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala @@ -545,14 +545,6 @@ class PlanParserSuite extends PlanTest { ) comparePlans( - parsePlan("SELECT /*+ HINT1(a, array(1, 2, 3)) */ * from t"), - UnresolvedHint("HINT1", Seq($"a", -UnresolvedFunction("array", Literal(1) :: Literal(2) :: Literal(3) :: Nil, false)), -table("t").select(star()) - ) -) - -comparePlans( parsePlan("SELECT /*+ HINT1(a, 5, 'a', b) */ * from t"), UnresolvedHint("HINT1", Seq($"a", Literal(5), Literal("a"), $"b"), table("t").select(star()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20854][TESTS] Removing duplicate test case
Repository: spark Updated Branches: refs/heads/master c92949ac2 -> cb83ca143 [SPARK-20854][TESTS] Removing duplicate test case ## What changes were proposed in this pull request? Removed a duplicate case in "SPARK-20854: select hint syntax with expressions" ## How was this patch tested? Existing tests. Author: Bogdan RaducanuCloses #18217 from bogdanrdc/SPARK-20854-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb83ca14 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb83ca14 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb83ca14 Branch: refs/heads/master Commit: cb83ca1433c865cb0aef973df2b872a83671acfd Parents: c92949a Author: Bogdan Raducanu Authored: Tue Jun 6 22:51:10 2017 -0700 Committer: Reynold Xin Committed: Tue Jun 6 22:51:10 2017 -0700 -- .../apache/spark/sql/catalyst/parser/PlanParserSuite.scala | 8 1 file changed, 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cb83ca14/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala index d004d04..fef39a5 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala @@ -576,14 +576,6 @@ class PlanParserSuite extends PlanTest { ) comparePlans( - parsePlan("SELECT /*+ HINT1(a, array(1, 2, 3)) */ * from t"), - UnresolvedHint("HINT1", Seq($"a", -UnresolvedFunction("array", Literal(1) :: Literal(2) :: Literal(3) :: Nil, false)), -table("t").select(star()) - ) -) - -comparePlans( parsePlan("SELECT /*+ HINT1(a, 5, 'a', b) */ * from t"), UnresolvedHint("HINT1", Seq($"a", Literal(5), Literal("a"), $"b"), table("t").select(star()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8184][SQL] Add additional function description for weekofyear
Repository: spark Updated Branches: refs/heads/branch-2.2 26640a269 -> 3b79e4cda [SPARK-8184][SQL] Add additional function description for weekofyear ## What changes were proposed in this pull request? Add additional function description for weekofyear. ## How was this patch tested? manual tests ![weekofyear](https://cloud.githubusercontent.com/assets/5399861/26525752/08a1c278-4394-11e7-8988-7cbf82c3a999.gif) Author: Yuming WangCloses #18132 from wangyum/SPARK-8184. (cherry picked from commit 1c7db00c74ec6a91c7eefbdba85cbf41fbe8634a) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3b79e4cd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3b79e4cd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3b79e4cd Branch: refs/heads/branch-2.2 Commit: 3b79e4cda74e0bf82ec55e673beb8f84e7cfaca4 Parents: 26640a2 Author: Yuming Wang Authored: Mon May 29 16:10:22 2017 -0700 Committer: Reynold Xin Committed: Mon May 29 16:10:29 2017 -0700 -- .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3b79e4cd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 6a76058..0ab7207 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -402,13 +402,15 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa } } +// scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(date) - Returns the week of the year of the given date.", + usage = "_FUNC_(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.", extended = """ Examples: > SELECT _FUNC_('2008-02-20'); 8 """) +// scalastyle:on line.size.limit case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8184][SQL] Add additional function description for weekofyear
Repository: spark Updated Branches: refs/heads/master c9749068e -> 1c7db00c7 [SPARK-8184][SQL] Add additional function description for weekofyear ## What changes were proposed in this pull request? Add additional function description for weekofyear. ## How was this patch tested? manual tests ![weekofyear](https://cloud.githubusercontent.com/assets/5399861/26525752/08a1c278-4394-11e7-8988-7cbf82c3a999.gif) Author: Yuming WangCloses #18132 from wangyum/SPARK-8184. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1c7db00c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1c7db00c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1c7db00c Branch: refs/heads/master Commit: 1c7db00c74ec6a91c7eefbdba85cbf41fbe8634a Parents: c974906 Author: Yuming Wang Authored: Mon May 29 16:10:22 2017 -0700 Committer: Reynold Xin Committed: Mon May 29 16:10:22 2017 -0700 -- .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1c7db00c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 43ca2cf..4098300 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -402,13 +402,15 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa } } +// scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(date) - Returns the week of the year of the given date.", + usage = "_FUNC_(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.", extended = """ Examples: > SELECT _FUNC_('2008-02-20'); 8 """) +// scalastyle:on line.size.limit case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20857][SQL] Generic resolved hint node
Repository: spark Updated Branches: refs/heads/branch-2.2 dbb068f4f -> d20c64695 [SPARK-20857][SQL] Generic resolved hint node ## What changes were proposed in this pull request? This patch renames BroadcastHint to ResolvedHint (and Hint to UnresolvedHint) so the hint framework is more generic and would allow us to introduce other hint types in the future without introducing new hint nodes. ## How was this patch tested? Updated test cases. Author: Reynold Xin <r...@databricks.com> Closes #18072 from rxin/SPARK-20857. (cherry picked from commit 0d589ba00b5d539fbfef5174221de046a70548cd) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d20c6469 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d20c6469 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d20c6469 Branch: refs/heads/branch-2.2 Commit: d20c6469565c4f7687f9af14a6f12a775b0c6e62 Parents: dbb068f Author: Reynold Xin <r...@databricks.com> Authored: Tue May 23 18:44:49 2017 +0200 Committer: Reynold Xin <r...@databricks.com> Committed: Tue May 23 18:45:08 2017 +0200 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 2 +- .../sql/catalyst/analysis/ResolveHints.scala| 12 ++--- .../sql/catalyst/optimizer/Optimizer.scala | 2 +- .../sql/catalyst/optimizer/expressions.scala| 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 4 +- .../spark/sql/catalyst/planning/patterns.scala | 4 +- .../sql/catalyst/plans/logical/Statistics.scala | 5 ++ .../plans/logical/basicLogicalOperators.scala | 22 + .../sql/catalyst/plans/logical/hints.scala | 49 .../catalyst/analysis/ResolveHintsSuite.scala | 41 .../catalyst/optimizer/ColumnPruningSuite.scala | 5 +- .../optimizer/FilterPushdownSuite.scala | 4 +- .../optimizer/JoinOptimizationSuite.scala | 4 +- .../sql/catalyst/parser/PlanParserSuite.scala | 15 +++--- .../BasicStatsEstimationSuite.scala | 2 +- .../scala/org/apache/spark/sql/Dataset.scala| 2 +- .../spark/sql/execution/SparkStrategies.scala | 2 +- .../scala/org/apache/spark/sql/functions.scala | 5 +- .../execution/joins/BroadcastJoinSuite.scala| 14 +++--- 20 files changed, 118 insertions(+), 80 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d20c6469/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 5be67ac..9979642 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1311,7 +1311,7 @@ class Analyzer( // Category 1: // BroadcastHint, Distinct, LeafNode, Repartition, and SubqueryAlias -case _: BroadcastHint | _: Distinct | _: LeafNode | _: Repartition | _: SubqueryAlias => +case _: ResolvedHint | _: Distinct | _: LeafNode | _: Repartition | _: SubqueryAlias => // Category 2: // These operators can be anywhere in a correlated subquery. http://git-wip-us.apache.org/repos/asf/spark/blob/d20c6469/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index ea4560a..2e3ac3e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -399,7 +399,7 @@ trait CheckAnalysis extends PredicateHelper { |in operator ${operator.simpleString} """.stripMargin) - case _: Hint => + case _: UnresolvedHint => throw new IllegalStateException( "Internal error: logical hint operator should have been removed during analysis") http://git-wip-us.apache.org/repos/asf/spark/blob/d20c6469/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala b/sq
spark git commit: [SPARK-20857][SQL] Generic resolved hint node
Repository: spark Updated Branches: refs/heads/master ad09e4ca0 -> 0d589ba00 [SPARK-20857][SQL] Generic resolved hint node ## What changes were proposed in this pull request? This patch renames BroadcastHint to ResolvedHint (and Hint to UnresolvedHint) so the hint framework is more generic and would allow us to introduce other hint types in the future without introducing new hint nodes. ## How was this patch tested? Updated test cases. Author: Reynold Xin <r...@databricks.com> Closes #18072 from rxin/SPARK-20857. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d589ba0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0d589ba0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0d589ba0 Branch: refs/heads/master Commit: 0d589ba00b5d539fbfef5174221de046a70548cd Parents: ad09e4c Author: Reynold Xin <r...@databricks.com> Authored: Tue May 23 18:44:49 2017 +0200 Committer: Reynold Xin <r...@databricks.com> Committed: Tue May 23 18:44:49 2017 +0200 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 2 +- .../sql/catalyst/analysis/ResolveHints.scala| 12 ++--- .../sql/catalyst/optimizer/Optimizer.scala | 2 +- .../sql/catalyst/optimizer/expressions.scala| 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 4 +- .../spark/sql/catalyst/planning/patterns.scala | 4 +- .../sql/catalyst/plans/logical/Statistics.scala | 5 ++ .../plans/logical/basicLogicalOperators.scala | 22 + .../sql/catalyst/plans/logical/hints.scala | 49 .../catalyst/analysis/ResolveHintsSuite.scala | 41 .../catalyst/optimizer/ColumnPruningSuite.scala | 5 +- .../optimizer/FilterPushdownSuite.scala | 4 +- .../optimizer/JoinOptimizationSuite.scala | 4 +- .../sql/catalyst/parser/PlanParserSuite.scala | 15 +++--- .../BasicStatsEstimationSuite.scala | 2 +- .../scala/org/apache/spark/sql/Dataset.scala| 2 +- .../spark/sql/execution/SparkStrategies.scala | 2 +- .../scala/org/apache/spark/sql/functions.scala | 5 +- .../execution/joins/BroadcastJoinSuite.scala| 14 +++--- 20 files changed, 118 insertions(+), 80 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0d589ba0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index d58b8ac..d130962 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1336,7 +1336,7 @@ class Analyzer( // Category 1: // BroadcastHint, Distinct, LeafNode, Repartition, and SubqueryAlias -case _: BroadcastHint | _: Distinct | _: LeafNode | _: Repartition | _: SubqueryAlias => +case _: ResolvedHint | _: Distinct | _: LeafNode | _: Repartition | _: SubqueryAlias => // Category 2: // These operators can be anywhere in a correlated subquery. http://git-wip-us.apache.org/repos/asf/spark/blob/0d589ba0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index ea4560a..2e3ac3e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -399,7 +399,7 @@ trait CheckAnalysis extends PredicateHelper { |in operator ${operator.simpleString} """.stripMargin) - case _: Hint => + case _: UnresolvedHint => throw new IllegalStateException( "Internal error: logical hint operator should have been removed during analysis") http://git-wip-us.apache.org/repos/asf/spark/blob/0d589ba0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala index df688fa..9dfd84c 100644 --- a/sql/cata
spark git commit: Revert "[SPARK-12297][SQL] Hive compatibility for Parquet Timestamps"
Repository: spark Updated Branches: refs/heads/master 1b85bcd92 -> ac1ab6b9d Revert "[SPARK-12297][SQL] Hive compatibility for Parquet Timestamps" This reverts commit 22691556e5f0dfbac81b8cc9ca0a67c70c1711ca. See JIRA ticket for more information. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac1ab6b9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac1ab6b9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac1ab6b9 Branch: refs/heads/master Commit: ac1ab6b9db188ac54c745558d57dd0a031d0b162 Parents: 1b85bcd Author: Reynold XinAuthored: Tue May 9 11:35:59 2017 -0700 Committer: Reynold Xin Committed: Tue May 9 11:35:59 2017 -0700 -- .../spark/sql/catalyst/catalog/interface.scala | 4 +- .../spark/sql/catalyst/util/DateTimeUtils.scala | 5 - .../parquet/VectorizedColumnReader.java | 28 +- .../parquet/VectorizedParquetRecordReader.java | 6 +- .../spark/sql/execution/command/tables.scala| 8 +- .../datasources/parquet/ParquetFileFormat.scala | 2 - .../parquet/ParquetReadSupport.scala| 3 +- .../parquet/ParquetRecordMaterializer.scala | 9 +- .../parquet/ParquetRowConverter.scala | 53 +-- .../parquet/ParquetWriteSupport.scala | 25 +- .../spark/sql/hive/HiveExternalCatalog.scala| 11 +- .../spark/sql/hive/HiveMetastoreCatalog.scala | 12 +- .../hive/ParquetHiveCompatibilitySuite.scala| 379 +-- 13 files changed, 29 insertions(+), 516 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ac1ab6b9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala index c39017e..cc0cbba 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala @@ -132,10 +132,10 @@ case class CatalogTablePartition( /** * Given the partition schema, returns a row with that schema holding the partition values. */ - def toRow(partitionSchema: StructType, defaultTimeZoneId: String): InternalRow = { + def toRow(partitionSchema: StructType, defaultTimeZondId: String): InternalRow = { val caseInsensitiveProperties = CaseInsensitiveMap(storage.properties) val timeZoneId = caseInsensitiveProperties.getOrElse( - DateTimeUtils.TIMEZONE_OPTION, defaultTimeZoneId) + DateTimeUtils.TIMEZONE_OPTION, defaultTimeZondId) InternalRow.fromSeq(partitionSchema.map { field => val partValue = if (spec(field.name) == ExternalCatalogUtils.DEFAULT_PARTITION_NAME) { null http://git-wip-us.apache.org/repos/asf/spark/blob/ac1ab6b9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala index bf596fa..6c1592f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala @@ -498,11 +498,6 @@ object DateTimeUtils { false } - lazy val validTimezones = TimeZone.getAvailableIDs().toSet - def isValidTimezone(timezoneId: String): Boolean = { -validTimezones.contains(timezoneId) - } - /** * Returns the microseconds since year zero (-17999) from microseconds since epoch. */ http://git-wip-us.apache.org/repos/asf/spark/blob/ac1ab6b9/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java index dabbc2b..9d641b5 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java @@ -18,9 +18,7 @@ package org.apache.spark.sql.execution.datasources.parquet; import java.io.IOException; -import java.util.TimeZone; -import org.apache.hadoop.conf.Configuration; import org.apache.parquet.bytes.BytesUtils; import
spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
Repository: spark Updated Branches: refs/heads/master b31648c08 -> 5d75b14bf [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch. ## How was this patch tested? Now the debug message prints the diff between start and end of batch. Author: Juliusz SompolskiCloses #17875 from juliuszsompolski/SPARK-20616. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d75b14b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d75b14b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d75b14b Branch: refs/heads/master Commit: 5d75b14bf0f4c1f0813287efaabf49797908ed55 Parents: b31648c Author: Juliusz Sompolski Authored: Fri May 5 15:31:06 2017 -0700 Committer: Reynold Xin Committed: Fri May 5 15:31:06 2017 -0700 -- .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5d75b14b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala index 6fc828f..85b368c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala @@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { logDebug( s""" |=== Result of Batch ${batch.name} === - |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")} + |${sideBySide(batchStartPlan.treeString, curPlan.treeString).mkString("\n")} """.stripMargin) } else { logTrace(s"Batch ${batch.name} has no effect.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
Repository: spark Updated Branches: refs/heads/branch-2.2 f59c74a94 -> 1d9b7a74a [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch. ## How was this patch tested? Now the debug message prints the diff between start and end of batch. Author: Juliusz SompolskiCloses #17875 from juliuszsompolski/SPARK-20616. (cherry picked from commit 5d75b14bf0f4c1f0813287efaabf49797908ed55) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1d9b7a74 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1d9b7a74 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1d9b7a74 Branch: refs/heads/branch-2.2 Commit: 1d9b7a74a839021814ab28d3eba3636c64483130 Parents: f59c74a Author: Juliusz Sompolski Authored: Fri May 5 15:31:06 2017 -0700 Committer: Reynold Xin Committed: Fri May 5 15:31:13 2017 -0700 -- .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1d9b7a74/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala index 6fc828f..85b368c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala @@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { logDebug( s""" |=== Result of Batch ${batch.name} === - |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")} + |${sideBySide(batchStartPlan.treeString, curPlan.treeString).mkString("\n")} """.stripMargin) } else { logTrace(s"Batch ${batch.name} has no effect.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
Repository: spark Updated Branches: refs/heads/branch-2.1 704b249b6 -> a1112c615 [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch. ## How was this patch tested? Now the debug message prints the diff between start and end of batch. Author: Juliusz SompolskiCloses #17875 from juliuszsompolski/SPARK-20616. (cherry picked from commit 5d75b14bf0f4c1f0813287efaabf49797908ed55) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1112c61 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1112c61 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1112c61 Branch: refs/heads/branch-2.1 Commit: a1112c615b05d615048159c9d324aa10a4391d4e Parents: 704b249 Author: Juliusz Sompolski Authored: Fri May 5 15:31:06 2017 -0700 Committer: Reynold Xin Committed: Fri May 5 15:31:23 2017 -0700 -- .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a1112c61/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala index 6fc828f..85b368c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala @@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { logDebug( s""" |=== Result of Batch ${batch.name} === - |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")} + |${sideBySide(batchStartPlan.treeString, curPlan.treeString).mkString("\n")} """.stripMargin) } else { logTrace(s"Batch ${batch.name} has no effect.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20584][PYSPARK][SQL] Python generic hint support
Repository: spark Updated Branches: refs/heads/master 13eb37c86 -> 02bbe7311 [SPARK-20584][PYSPARK][SQL] Python generic hint support ## What changes were proposed in this pull request? Adds `hint` method to PySpark `DataFrame`. ## How was this patch tested? Unit tests, doctests. Author: zero323Closes #17850 from zero323/SPARK-20584. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02bbe731 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02bbe731 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02bbe731 Branch: refs/heads/master Commit: 02bbe73118a39e2fb378aa2002449367a92f6d67 Parents: 13eb37c Author: zero323 Authored: Wed May 3 19:15:28 2017 -0700 Committer: Reynold Xin Committed: Wed May 3 19:15:28 2017 -0700 -- python/pyspark/sql/dataframe.py | 29 + python/pyspark/sql/tests.py | 16 2 files changed, 45 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/02bbe731/python/pyspark/sql/dataframe.py -- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index ab6d35b..7b67985 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -380,6 +380,35 @@ class DataFrame(object): jdf = self._jdf.withWatermark(eventTime, delayThreshold) return DataFrame(jdf, self.sql_ctx) +@since(2.2) +def hint(self, name, *parameters): +"""Specifies some hint on the current DataFrame. + +:param name: A name of the hint. +:param parameters: Optional parameters. +:return: :class:`DataFrame` + +>>> df.join(df2.hint("broadcast"), "name").show() +++---+--+ +|name|age|height| +++---+--+ +| Bob| 5|85| +++---+--+ +""" +if len(parameters) == 1 and isinstance(parameters[0], list): +parameters = parameters[0] + +if not isinstance(name, str): +raise TypeError("name should be provided as str, got {0}".format(type(name))) + +for p in parameters: +if not isinstance(p, str): +raise TypeError( +"all parameters should be str, got {0} of type {1}".format(p, type(p))) + +jdf = self._jdf.hint(name, self._jseq(parameters)) +return DataFrame(jdf, self.sql_ctx) + @since(1.3) def count(self): """Returns the number of rows in this :class:`DataFrame`. http://git-wip-us.apache.org/repos/asf/spark/blob/02bbe731/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index ce4abf8..f644624 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -1906,6 +1906,22 @@ class SQLTests(ReusedPySparkTestCase): # planner should not crash without a join broadcast(df1)._jdf.queryExecution().executedPlan() +def test_generic_hints(self): +from pyspark.sql import DataFrame + +df1 = self.spark.range(10e10).toDF("id") +df2 = self.spark.range(10e10).toDF("id") + +self.assertIsInstance(df1.hint("broadcast"), DataFrame) +self.assertIsInstance(df1.hint("broadcast", []), DataFrame) + +# Dummy rules +self.assertIsInstance(df1.hint("broadcast", "foo", "bar"), DataFrame) +self.assertIsInstance(df1.hint("broadcast", ["foo", "bar"]), DataFrame) + +plan = df1.join(df2.hint("broadcast"), "id")._jdf.queryExecution().executedPlan() +self.assertEqual(1, plan.toString().count("BroadcastHashJoin")) + def test_toDF_with_schema_string(self): data = [Row(key=i, value=str(i)) for i in range(100)] rdd = self.sc.parallelize(data, 5) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20584][PYSPARK][SQL] Python generic hint support
Repository: spark Updated Branches: refs/heads/branch-2.2 a3a5fcfef -> d8bd213f1 [SPARK-20584][PYSPARK][SQL] Python generic hint support ## What changes were proposed in this pull request? Adds `hint` method to PySpark `DataFrame`. ## How was this patch tested? Unit tests, doctests. Author: zero323Closes #17850 from zero323/SPARK-20584. (cherry picked from commit 02bbe73118a39e2fb378aa2002449367a92f6d67) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d8bd213f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d8bd213f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d8bd213f Branch: refs/heads/branch-2.2 Commit: d8bd213f13279664d50ffa57c1814d0b16fc5d23 Parents: a3a5fcf Author: zero323 Authored: Wed May 3 19:15:28 2017 -0700 Committer: Reynold Xin Committed: Wed May 3 19:15:42 2017 -0700 -- python/pyspark/sql/dataframe.py | 29 + python/pyspark/sql/tests.py | 16 2 files changed, 45 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/d8bd213f/python/pyspark/sql/dataframe.py -- diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index f567cc4..d62ba96 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -371,6 +371,35 @@ class DataFrame(object): jdf = self._jdf.withWatermark(eventTime, delayThreshold) return DataFrame(jdf, self.sql_ctx) +@since(2.2) +def hint(self, name, *parameters): +"""Specifies some hint on the current DataFrame. + +:param name: A name of the hint. +:param parameters: Optional parameters. +:return: :class:`DataFrame` + +>>> df.join(df2.hint("broadcast"), "name").show() +++---+--+ +|name|age|height| +++---+--+ +| Bob| 5|85| +++---+--+ +""" +if len(parameters) == 1 and isinstance(parameters[0], list): +parameters = parameters[0] + +if not isinstance(name, str): +raise TypeError("name should be provided as str, got {0}".format(type(name))) + +for p in parameters: +if not isinstance(p, str): +raise TypeError( +"all parameters should be str, got {0} of type {1}".format(p, type(p))) + +jdf = self._jdf.hint(name, self._jseq(parameters)) +return DataFrame(jdf, self.sql_ctx) + @since(1.3) def count(self): """Returns the number of rows in this :class:`DataFrame`. http://git-wip-us.apache.org/repos/asf/spark/blob/d8bd213f/python/pyspark/sql/tests.py -- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index cd92148..2aa2d23 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -1906,6 +1906,22 @@ class SQLTests(ReusedPySparkTestCase): # planner should not crash without a join broadcast(df1)._jdf.queryExecution().executedPlan() +def test_generic_hints(self): +from pyspark.sql import DataFrame + +df1 = self.spark.range(10e10).toDF("id") +df2 = self.spark.range(10e10).toDF("id") + +self.assertIsInstance(df1.hint("broadcast"), DataFrame) +self.assertIsInstance(df1.hint("broadcast", []), DataFrame) + +# Dummy rules +self.assertIsInstance(df1.hint("broadcast", "foo", "bar"), DataFrame) +self.assertIsInstance(df1.hint("broadcast", ["foo", "bar"]), DataFrame) + +plan = df1.join(df2.hint("broadcast"), "id")._jdf.queryExecution().executedPlan() +self.assertEqual(1, plan.toString().count("BroadcastHashJoin")) + def test_toDF_with_schema_string(self): data = [Row(key=i, value=str(i)) for i in range(100)] rdd = self.sc.parallelize(data, 5) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!=
Repository: spark Updated Branches: refs/heads/master 6b9e49d12 -> 13eb37c86 [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!= ## What changes were proposed in this pull request? This PR proposes three things as below: - This test looks not testing `<=>` and identical with the test above, `===`. So, it removes the test. ```diff - test("<=>") { - checkAnswer( - testData2.filter($"a" === 1), - testData2.collect().toSeq.filter(r => r.getInt(0) == 1)) - -checkAnswer( - testData2.filter($"a" === $"b"), - testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1))) - } ``` - Replace the test title from `=!=` to `<=>`. It looks the test actually testing `<=>`. ```diff + private lazy val nullData = Seq( +(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, None)).toDF("a", "b") + ... - test("=!=") { + test("<=>") { -val nullData = spark.createDataFrame(sparkContext.parallelize( - Row(1, 1) :: - Row(1, 2) :: - Row(1, null) :: - Row(null, null) :: Nil), - StructType(Seq(StructField("a", IntegerType), StructField("b", IntegerType - checkAnswer( nullData.filter($"b" <=> 1), ... ``` - Add the tests for `=!=` which looks not existing. ```diff + test("=!=") { +checkAnswer( + nullData.filter($"b" =!= 1), + Row(1, 2) :: Nil) + +checkAnswer(nullData.filter($"b" =!= null), Nil) + +checkAnswer( + nullData.filter($"a" =!= $"b"), + Row(1, 2) :: Nil) + } ``` ## How was this patch tested? Manually running the tests. Author: hyukjinkwonCloses #17842 from HyukjinKwon/minor-test-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/13eb37c8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/13eb37c8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/13eb37c8 Branch: refs/heads/master Commit: 13eb37c860c8f672d0e9d9065d0333f981db71e3 Parents: 6b9e49d Author: hyukjinkwon Authored: Wed May 3 13:08:25 2017 -0700 Committer: Reynold Xin Committed: Wed May 3 13:08:25 2017 -0700 -- .../spark/sql/ColumnExpressionSuite.scala | 31 +--- 1 file changed, 14 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/13eb37c8/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala index b0f398d..bc708ca 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala @@ -39,6 +39,9 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { StructType(Seq(StructField("a", BooleanType), StructField("b", BooleanType } + private lazy val nullData = Seq( +(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, None)).toDF("a", "b") + test("column names with space") { val df = Seq((1, "a")).toDF("name with space", "name.with.dot") @@ -284,23 +287,6 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { test("<=>") { checkAnswer( - testData2.filter($"a" === 1), - testData2.collect().toSeq.filter(r => r.getInt(0) == 1)) - -checkAnswer( - testData2.filter($"a" === $"b"), - testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1))) - } - - test("=!=") { -val nullData = spark.createDataFrame(sparkContext.parallelize( - Row(1, 1) :: - Row(1, 2) :: - Row(1, null) :: - Row(null, null) :: Nil), - StructType(Seq(StructField("a", IntegerType), StructField("b", IntegerType - -checkAnswer( nullData.filter($"b" <=> 1), Row(1, 1) :: Nil) @@ -321,7 +307,18 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { checkAnswer( nullData2.filter($"a" <=> null), Row(null) :: Nil) + } + test("=!=") { +checkAnswer( + nullData.filter($"b" =!= 1), + Row(1, 2) :: Nil) + +checkAnswer(nullData.filter($"b" =!= null), Nil) + +checkAnswer( + nullData.filter($"a" =!= $"b"), + Row(1, 2) :: Nil) } test(">") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!=
Repository: spark Updated Branches: refs/heads/branch-2.2 36d807906 -> 2629e7c7a [MINOR][SQL] Fix the test title from =!= to <=>, remove a duplicated test and add a test for =!= ## What changes were proposed in this pull request? This PR proposes three things as below: - This test looks not testing `<=>` and identical with the test above, `===`. So, it removes the test. ```diff - test("<=>") { - checkAnswer( - testData2.filter($"a" === 1), - testData2.collect().toSeq.filter(r => r.getInt(0) == 1)) - -checkAnswer( - testData2.filter($"a" === $"b"), - testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1))) - } ``` - Replace the test title from `=!=` to `<=>`. It looks the test actually testing `<=>`. ```diff + private lazy val nullData = Seq( +(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, None)).toDF("a", "b") + ... - test("=!=") { + test("<=>") { -val nullData = spark.createDataFrame(sparkContext.parallelize( - Row(1, 1) :: - Row(1, 2) :: - Row(1, null) :: - Row(null, null) :: Nil), - StructType(Seq(StructField("a", IntegerType), StructField("b", IntegerType - checkAnswer( nullData.filter($"b" <=> 1), ... ``` - Add the tests for `=!=` which looks not existing. ```diff + test("=!=") { +checkAnswer( + nullData.filter($"b" =!= 1), + Row(1, 2) :: Nil) + +checkAnswer(nullData.filter($"b" =!= null), Nil) + +checkAnswer( + nullData.filter($"a" =!= $"b"), + Row(1, 2) :: Nil) + } ``` ## How was this patch tested? Manually running the tests. Author: hyukjinkwonCloses #17842 from HyukjinKwon/minor-test-fix. (cherry picked from commit 13eb37c860c8f672d0e9d9065d0333f981db71e3) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2629e7c7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2629e7c7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2629e7c7 Branch: refs/heads/branch-2.2 Commit: 2629e7c7a1dacfb267d866cf825fa8a078612462 Parents: 36d8079 Author: hyukjinkwon Authored: Wed May 3 13:08:25 2017 -0700 Committer: Reynold Xin Committed: Wed May 3 13:08:31 2017 -0700 -- .../spark/sql/ColumnExpressionSuite.scala | 31 +--- 1 file changed, 14 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2629e7c7/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala index b0f398d..bc708ca 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala @@ -39,6 +39,9 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { StructType(Seq(StructField("a", BooleanType), StructField("b", BooleanType } + private lazy val nullData = Seq( +(Some(1), Some(1)), (Some(1), Some(2)), (Some(1), None), (None, None)).toDF("a", "b") + test("column names with space") { val df = Seq((1, "a")).toDF("name with space", "name.with.dot") @@ -284,23 +287,6 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { test("<=>") { checkAnswer( - testData2.filter($"a" === 1), - testData2.collect().toSeq.filter(r => r.getInt(0) == 1)) - -checkAnswer( - testData2.filter($"a" === $"b"), - testData2.collect().toSeq.filter(r => r.getInt(0) == r.getInt(1))) - } - - test("=!=") { -val nullData = spark.createDataFrame(sparkContext.parallelize( - Row(1, 1) :: - Row(1, 2) :: - Row(1, null) :: - Row(null, null) :: Nil), - StructType(Seq(StructField("a", IntegerType), StructField("b", IntegerType - -checkAnswer( nullData.filter($"b" <=> 1), Row(1, 1) :: Nil) @@ -321,7 +307,18 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { checkAnswer( nullData2.filter($"a" <=> null), Row(null) :: Nil) + } + test("=!=") { +checkAnswer( + nullData.filter($"b" =!= 1), + Row(1, 2) :: Nil) + +checkAnswer(nullData.filter($"b" =!= null), Nil) + +checkAnswer( + nullData.filter($"a" =!= $"b"), + Row(1, 2) :: Nil) } test(">") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For
spark git commit: [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame
Repository: spark Updated Branches: refs/heads/branch-2.2 b1a732fea -> f0e80aa2d [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame ## What changes were proposed in this pull request? We allow users to specify hints (currently only "broadcast" is supported) in SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), DataFrame doesn't have one and sometimes users are confused that they can't find how to apply a broadcast hint. This ticket adds a generic hint function on DataFrame that allows using the same hint on DataFrames as well as SQL. As an example, after this patch, the following will apply a broadcast hint on a DataFrame using the new hint function: ``` df1.join(df2.hint("broadcast")) ``` ## How was this patch tested? Added a test case in DataFrameJoinSuite. Author: Reynold Xin <r...@databricks.com> Closes #17839 from rxin/SPARK-20576. (cherry picked from commit 527fc5d0c990daaacad4740f62cfe6736609b77b) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f0e80aa2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f0e80aa2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f0e80aa2 Branch: refs/heads/branch-2.2 Commit: f0e80aa2ddee80819ef33ee24eb6a15a73bc02d5 Parents: b1a732f Author: Reynold Xin <r...@databricks.com> Authored: Wed May 3 09:22:25 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Wed May 3 09:22:41 2017 -0700 -- .../sql/catalyst/analysis/ResolveHints.scala | 8 +++- .../main/scala/org/apache/spark/sql/Dataset.scala | 16 .../org/apache/spark/sql/DataFrameJoinSuite.scala | 18 +- 3 files changed, 40 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f0e80aa2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala index c4827b8..df688fa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala @@ -86,7 +86,13 @@ object ResolveHints { def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case h: Hint if BROADCAST_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) => -applyBroadcastHint(h.child, h.parameters.toSet) +if (h.parameters.isEmpty) { + // If there is no table alias specified, turn the entire subtree into a BroadcastHint. + BroadcastHint(h.child) +} else { + // Otherwise, find within the subtree query plans that should be broadcasted. + applyBroadcastHint(h.child, h.parameters.toSet) +} } } http://git-wip-us.apache.org/repos/asf/spark/blob/f0e80aa2/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index 06dd550..5f602dc 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1074,6 +1074,22 @@ class Dataset[T] private[sql]( def apply(colName: String): Column = col(colName) /** + * Specifies some hint on the current Dataset. As an example, the following code specifies + * that one of the plan can be broadcasted: + * + * {{{ + * df1.join(df2.hint("broadcast")) + * }}} + * + * @group basic + * @since 2.2.0 + */ + @scala.annotation.varargs + def hint(name: String, parameters: String*): Dataset[T] = withTypedPlan { +Hint(name, parameters, logicalPlan) + } + + /** * Selects column based on the column name and return it as a [[Column]]. * * @note The column name can also reference to a nested column like `a.b`. http://git-wip-us.apache.org/repos/asf/spark/blob/f0e80aa2/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala index 541ffb5..4a52af6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala @@
spark git commit: [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame
Repository: spark Updated Branches: refs/heads/master 27f543b15 -> 527fc5d0c [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame ## What changes were proposed in this pull request? We allow users to specify hints (currently only "broadcast" is supported) in SQL and DataFrame. However, while SQL has a standard hint format (/*+ ... */), DataFrame doesn't have one and sometimes users are confused that they can't find how to apply a broadcast hint. This ticket adds a generic hint function on DataFrame that allows using the same hint on DataFrames as well as SQL. As an example, after this patch, the following will apply a broadcast hint on a DataFrame using the new hint function: ``` df1.join(df2.hint("broadcast")) ``` ## How was this patch tested? Added a test case in DataFrameJoinSuite. Author: Reynold Xin <r...@databricks.com> Closes #17839 from rxin/SPARK-20576. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/527fc5d0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/527fc5d0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/527fc5d0 Branch: refs/heads/master Commit: 527fc5d0c990daaacad4740f62cfe6736609b77b Parents: 27f543b Author: Reynold Xin <r...@databricks.com> Authored: Wed May 3 09:22:25 2017 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Wed May 3 09:22:25 2017 -0700 -- .../sql/catalyst/analysis/ResolveHints.scala | 8 +++- .../main/scala/org/apache/spark/sql/Dataset.scala | 16 .../org/apache/spark/sql/DataFrameJoinSuite.scala | 18 +- 3 files changed, 40 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/527fc5d0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala index c4827b8..df688fa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala @@ -86,7 +86,13 @@ object ResolveHints { def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case h: Hint if BROADCAST_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) => -applyBroadcastHint(h.child, h.parameters.toSet) +if (h.parameters.isEmpty) { + // If there is no table alias specified, turn the entire subtree into a BroadcastHint. + BroadcastHint(h.child) +} else { + // Otherwise, find within the subtree query plans that should be broadcasted. + applyBroadcastHint(h.child, h.parameters.toSet) +} } } http://git-wip-us.apache.org/repos/asf/spark/blob/527fc5d0/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index 147e765..620c8bd 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -1161,6 +1161,22 @@ class Dataset[T] private[sql]( def apply(colName: String): Column = col(colName) /** + * Specifies some hint on the current Dataset. As an example, the following code specifies + * that one of the plan can be broadcasted: + * + * {{{ + * df1.join(df2.hint("broadcast")) + * }}} + * + * @group basic + * @since 2.2.0 + */ + @scala.annotation.varargs + def hint(name: String, parameters: String*): Dataset[T] = withTypedPlan { +Hint(name, parameters, logicalPlan) + } + + /** * Selects column based on the column name and return it as a [[Column]]. * * @note The column name can also reference to a nested column like `a.b`. http://git-wip-us.apache.org/repos/asf/spark/blob/527fc5d0/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala index 541ffb5..4a52af6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala @@ -151,7 +151,7 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext { Row(1, 1, 1, 1) :: Row(2, 1, 2, 2) :: Nil) }
spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation
Repository: spark Updated Branches: refs/heads/branch-2.2 6709bcf6e -> e278876ba [SPARK-20474] Fixing OnHeapColumnVector reallocation ## What changes were proposed in this pull request? OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. ## How was this patch tested? Tested using existing unit tests. Author: Michal SzafranskiCloses #17773 from michal-databricks/spark-20474. (cherry picked from commit a277ae80a2836e6533b338d2b9c4e59ed8a1daae) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e278876b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e278876b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e278876b Branch: refs/heads/branch-2.2 Commit: e278876ba3d66d3fb249df59c3de8d78ca25c5f0 Parents: 6709bcf Author: Michal Szafranski Authored: Wed Apr 26 12:47:37 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 12:47:50 2017 -0700 -- .../vectorized/OnHeapColumnVector.java | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e278876b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java index 9b410ba..94ed322 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java @@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends ColumnVector { int[] newLengths = new int[newCapacity]; int[] newOffsets = new int[newCapacity]; if (this.arrayLengths != null) { -System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended); -System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended); +System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); +System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity); } arrayLengths = newLengths; arrayOffsets = newOffsets; } else if (type instanceof BooleanType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ByteType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ShortType) { if (shortData == null || shortData.length < newCapacity) { short[] newData = new short[newCapacity]; -if (shortData != null) System.arraycopy(shortData, 0, newData, 0, elementsAppended); +if (shortData != null) System.arraycopy(shortData, 0, newData, 0, capacity); shortData = newData; } } else if (type instanceof IntegerType || type instanceof DateType || DecimalType.is32BitDecimalType(type)) { if (intData == null || intData.length < newCapacity) { int[] newData = new int[newCapacity]; -if (intData != null) System.arraycopy(intData, 0, newData, 0, elementsAppended); +if (intData != null) System.arraycopy(intData, 0, newData, 0, capacity); intData = newData; } } else if (type instanceof LongType || type instanceof TimestampType || DecimalType.is64BitDecimalType(type)) { if (longData == null || longData.length < newCapacity) { long[] newData = new long[newCapacity]; -if (longData != null) System.arraycopy(longData, 0, newData, 0, elementsAppended); +if (longData != null) System.arraycopy(longData, 0, newData, 0, capacity); longData = newData; } } else if (type instanceof FloatType) { if (floatData == null || floatData.length < newCapacity) { float[] newData = new float[newCapacity]; -if (floatData != null) System.arraycopy(floatData, 0,
spark git commit: [SPARK-20474] Fixing OnHeapColumnVector reallocation
Repository: spark Updated Branches: refs/heads/master 99c6cf9ef -> a277ae80a [SPARK-20474] Fixing OnHeapColumnVector reallocation ## What changes were proposed in this pull request? OnHeapColumnVector reallocation copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the ColumnVector.appendX API, while ColumnVector.putX is more commonly used. ## How was this patch tested? Tested using existing unit tests. Author: Michal SzafranskiCloses #17773 from michal-databricks/spark-20474. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a277ae80 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a277ae80 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a277ae80 Branch: refs/heads/master Commit: a277ae80a2836e6533b338d2b9c4e59ed8a1daae Parents: 99c6cf9 Author: Michal Szafranski Authored: Wed Apr 26 12:47:37 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 12:47:37 2017 -0700 -- .../vectorized/OnHeapColumnVector.java | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a277ae80/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java index 9b410ba..94ed322 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java @@ -410,53 +410,53 @@ public final class OnHeapColumnVector extends ColumnVector { int[] newLengths = new int[newCapacity]; int[] newOffsets = new int[newCapacity]; if (this.arrayLengths != null) { -System.arraycopy(this.arrayLengths, 0, newLengths, 0, elementsAppended); -System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, elementsAppended); +System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity); +System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity); } arrayLengths = newLengths; arrayOffsets = newOffsets; } else if (type instanceof BooleanType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ByteType) { if (byteData == null || byteData.length < newCapacity) { byte[] newData = new byte[newCapacity]; -if (byteData != null) System.arraycopy(byteData, 0, newData, 0, elementsAppended); +if (byteData != null) System.arraycopy(byteData, 0, newData, 0, capacity); byteData = newData; } } else if (type instanceof ShortType) { if (shortData == null || shortData.length < newCapacity) { short[] newData = new short[newCapacity]; -if (shortData != null) System.arraycopy(shortData, 0, newData, 0, elementsAppended); +if (shortData != null) System.arraycopy(shortData, 0, newData, 0, capacity); shortData = newData; } } else if (type instanceof IntegerType || type instanceof DateType || DecimalType.is32BitDecimalType(type)) { if (intData == null || intData.length < newCapacity) { int[] newData = new int[newCapacity]; -if (intData != null) System.arraycopy(intData, 0, newData, 0, elementsAppended); +if (intData != null) System.arraycopy(intData, 0, newData, 0, capacity); intData = newData; } } else if (type instanceof LongType || type instanceof TimestampType || DecimalType.is64BitDecimalType(type)) { if (longData == null || longData.length < newCapacity) { long[] newData = new long[newCapacity]; -if (longData != null) System.arraycopy(longData, 0, newData, 0, elementsAppended); +if (longData != null) System.arraycopy(longData, 0, newData, 0, capacity); longData = newData; } } else if (type instanceof FloatType) { if (floatData == null || floatData.length < newCapacity) { float[] newData = new float[newCapacity]; -if (floatData != null) System.arraycopy(floatData, 0, newData, 0, elementsAppended); +if (floatData != null) System.arraycopy(floatData, 0, newData, 0, capacity);
spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array
Repository: spark Updated Branches: refs/heads/branch-2.2 b65858bb3 -> 6709bcf6e [SPARK-20473] Enabling missing types in ColumnVector.Array ## What changes were proposed in this pull request? ColumnVector implementations originally did not support some Catalyst types (float, short, and boolean). Now that they do, those types should be also added to the ColumnVector.Array. ## How was this patch tested? Tested using existing unit tests. Author: Michal SzafranskiCloses #17772 from michal-databricks/spark-20473. (cherry picked from commit 99c6cf9ef16bf8fae6edb23a62e46546a16bca80) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6709bcf6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6709bcf6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6709bcf6 Branch: refs/heads/branch-2.2 Commit: 6709bcf6e66e99e17ba2a3b1482df2dba1a15716 Parents: b65858b Author: Michal Szafranski Authored: Wed Apr 26 11:21:25 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 11:21:57 2017 -0700 -- .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6709bcf6/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java index 354c878..b105e60 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java @@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public boolean getBoolean(int ordinal) { - throw new UnsupportedOperationException(); + return data.getBoolean(offset + ordinal); } @Override @@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public short getShort(int ordinal) { - throw new UnsupportedOperationException(); + return data.getShort(offset + ordinal); } @Override @@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public float getFloat(int ordinal) { - throw new UnsupportedOperationException(); + return data.getFloat(offset + ordinal); } @Override - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20473] Enabling missing types in ColumnVector.Array
Repository: spark Updated Branches: refs/heads/master 66dd5b83f -> 99c6cf9ef [SPARK-20473] Enabling missing types in ColumnVector.Array ## What changes were proposed in this pull request? ColumnVector implementations originally did not support some Catalyst types (float, short, and boolean). Now that they do, those types should be also added to the ColumnVector.Array. ## How was this patch tested? Tested using existing unit tests. Author: Michal SzafranskiCloses #17772 from michal-databricks/spark-20473. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99c6cf9e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99c6cf9e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99c6cf9e Branch: refs/heads/master Commit: 99c6cf9ef16bf8fae6edb23a62e46546a16bca80 Parents: 66dd5b8 Author: Michal Szafranski Authored: Wed Apr 26 11:21:25 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 26 11:21:25 2017 -0700 -- .../apache/spark/sql/execution/vectorized/ColumnVector.java| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/99c6cf9e/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java index 354c878..b105e60 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java @@ -180,7 +180,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public boolean getBoolean(int ordinal) { - throw new UnsupportedOperationException(); + return data.getBoolean(offset + ordinal); } @Override @@ -188,7 +188,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public short getShort(int ordinal) { - throw new UnsupportedOperationException(); + return data.getShort(offset + ordinal); } @Override @@ -199,7 +199,7 @@ public abstract class ColumnVector implements AutoCloseable { @Override public float getFloat(int ordinal) { - throw new UnsupportedOperationException(); + return data.getFloat(offset + ordinal); } @Override - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT
Repository: spark Updated Branches: refs/heads/master 5280d93e6 -> f44c8a843 [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT This patch bumps the master branch version to `2.3.0-SNAPSHOT`. Author: Josh RosenCloses #17753 from JoshRosen/SPARK-20453. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f44c8a84 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f44c8a84 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f44c8a84 Branch: refs/heads/master Commit: f44c8a843ca512b319f099477415bc13eca2e373 Parents: 5280d93 Author: Josh Rosen Authored: Mon Apr 24 21:48:04 2017 -0700 Committer: Reynold Xin Committed: Mon Apr 24 21:48:04 2017 -0700 -- assembly/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml| 2 +- common/network-yarn/pom.xml | 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml | 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/flume-assembly/pom.xml | 2 +- external/flume-sink/pom.xml | 2 +- external/flume/pom.xml| 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml | 2 +- external/kafka-0-10/pom.xml | 2 +- external/kafka-0-8-assembly/pom.xml | 2 +- external/kafka-0-8/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml | 2 +- graphx/pom.xml| 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml | 2 +- mllib/pom.xml | 2 +- pom.xml | 2 +- project/MimaExcludes.scala| 5 + repl/pom.xml | 2 +- resource-managers/mesos/pom.xml | 2 +- resource-managers/yarn/pom.xml| 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 37 files changed, 42 insertions(+), 37 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/assembly/pom.xml -- diff --git a/assembly/pom.xml b/assembly/pom.xml index 9d8607d..742a4a1 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -21,7 +21,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.3.0-SNAPSHOT ../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/network-common/pom.xml -- diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml index 8657af7..066970f 100644 --- a/common/network-common/pom.xml +++ b/common/network-common/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.3.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/network-shuffle/pom.xml -- diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml index 24c10fb..2de882a 100644 --- a/common/network-shuffle/pom.xml +++ b/common/network-shuffle/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.3.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/network-yarn/pom.xml -- diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml index 5e5a80b..a8488d8 100644 --- a/common/network-yarn/pom.xml +++ b/common/network-yarn/pom.xml @@ -22,7 +22,7 @@ org.apache.spark spark-parent_2.11 -2.2.0-SNAPSHOT +2.3.0-SNAPSHOT ../../pom.xml http://git-wip-us.apache.org/repos/asf/spark/blob/f44c8a84/common/sketch/pom.xml -- diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml index 1356c47..6b81fc2 100644 --- a/common/sketch/pom.xml +++ b/common/sketch/pom.xml @@ -22,7 +22,7 @@
spark git commit: [SPARK-20420][SQL] Add events to the external catalog
Repository: spark Updated Branches: refs/heads/master 48d760d02 -> e2b3d2367 [SPARK-20420][SQL] Add events to the external catalog ## What changes were proposed in this pull request? It is often useful to be able to track changes to the `ExternalCatalog`. This PR makes the `ExternalCatalog` emit events when a catalog object is changed. Events are fired before and after the change. The following events are fired per object: - Database - CreateDatabasePreEvent: event fired before the database is created. - CreateDatabaseEvent: event fired after the database has been created. - DropDatabasePreEvent: event fired before the database is dropped. - DropDatabaseEvent: event fired after the database has been dropped. - Table - CreateTablePreEvent: event fired before the table is created. - CreateTableEvent: event fired after the table has been created. - RenameTablePreEvent: event fired before the table is renamed. - RenameTableEvent: event fired after the table has been renamed. - DropTablePreEvent: event fired before the table is dropped. - DropTableEvent: event fired after the table has been dropped. - Function - CreateFunctionPreEvent: event fired before the function is created. - CreateFunctionEvent: event fired after the function has been created. - RenameFunctionPreEvent: event fired before the function is renamed. - RenameFunctionEvent: event fired after the function has been renamed. - DropFunctionPreEvent: event fired before the function is dropped. - DropFunctionPreEvent: event fired after the function has been dropped. The current events currently only contain the names of the object modified. We add more events, and more details at a later point. A user can monitor changes to the external catalog by adding a listener to the Spark listener bus checking for `ExternalCatalogEvent`s using the `SparkListener.onOtherEvent` hook. A more direct approach is add listener directly to the `ExternalCatalog`. ## How was this patch tested? Added the `ExternalCatalogEventSuite`. Author: Herman van HovellCloses #17710 from hvanhovell/SPARK-20420. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e2b3d236 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e2b3d236 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e2b3d236 Branch: refs/heads/master Commit: e2b3d2367a563d4600d8d87b5317e71135c362f0 Parents: 48d760d Author: Herman van Hovell Authored: Fri Apr 21 00:05:03 2017 -0700 Committer: Reynold Xin Committed: Fri Apr 21 00:05:03 2017 -0700 -- .../sql/catalyst/catalog/ExternalCatalog.scala | 85 - .../sql/catalyst/catalog/InMemoryCatalog.scala | 22 ++- .../spark/sql/catalyst/catalog/events.scala | 158 .../catalog/ExternalCatalogEventSuite.scala | 188 +++ .../apache/spark/sql/internal/SharedState.scala | 7 + .../spark/sql/hive/HiveExternalCatalog.scala| 22 ++- 6 files changed, 457 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e2b3d236/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala index 08a01e8..974ef90 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.catalog import org.apache.spark.sql.catalyst.analysis.{FunctionAlreadyExistsException, NoSuchDatabaseException, NoSuchFunctionException, NoSuchTableException} import org.apache.spark.sql.catalyst.expressions.Expression import org.apache.spark.sql.types.StructType +import org.apache.spark.util.ListenerBus /** * Interface for the system catalog (of functions, partitions, tables, and databases). @@ -30,7 +31,8 @@ import org.apache.spark.sql.types.StructType * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog + extends ListenerBus[ExternalCatalogEventListener, ExternalCatalogEvent] { import CatalogTypes.TablePartitionSpec protected def requireDbExists(db: String): Unit = { @@ -61,9 +63,22 @@ abstract class ExternalCatalog { // Databases // -- - def createDatabase(dbDefinition: CatalogDatabase, ignoreIfExists: Boolean):
spark git commit: [SPARK-20420][SQL] Add events to the external catalog
Repository: spark Updated Branches: refs/heads/branch-2.2 6cd2f16b1 -> cddb4b7db [SPARK-20420][SQL] Add events to the external catalog ## What changes were proposed in this pull request? It is often useful to be able to track changes to the `ExternalCatalog`. This PR makes the `ExternalCatalog` emit events when a catalog object is changed. Events are fired before and after the change. The following events are fired per object: - Database - CreateDatabasePreEvent: event fired before the database is created. - CreateDatabaseEvent: event fired after the database has been created. - DropDatabasePreEvent: event fired before the database is dropped. - DropDatabaseEvent: event fired after the database has been dropped. - Table - CreateTablePreEvent: event fired before the table is created. - CreateTableEvent: event fired after the table has been created. - RenameTablePreEvent: event fired before the table is renamed. - RenameTableEvent: event fired after the table has been renamed. - DropTablePreEvent: event fired before the table is dropped. - DropTableEvent: event fired after the table has been dropped. - Function - CreateFunctionPreEvent: event fired before the function is created. - CreateFunctionEvent: event fired after the function has been created. - RenameFunctionPreEvent: event fired before the function is renamed. - RenameFunctionEvent: event fired after the function has been renamed. - DropFunctionPreEvent: event fired before the function is dropped. - DropFunctionPreEvent: event fired after the function has been dropped. The current events currently only contain the names of the object modified. We add more events, and more details at a later point. A user can monitor changes to the external catalog by adding a listener to the Spark listener bus checking for `ExternalCatalogEvent`s using the `SparkListener.onOtherEvent` hook. A more direct approach is add listener directly to the `ExternalCatalog`. ## How was this patch tested? Added the `ExternalCatalogEventSuite`. Author: Herman van HovellCloses #17710 from hvanhovell/SPARK-20420. (cherry picked from commit e2b3d2367a563d4600d8d87b5317e71135c362f0) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cddb4b7d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cddb4b7d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cddb4b7d Branch: refs/heads/branch-2.2 Commit: cddb4b7db81b01b4abf2ab683aba97e4eabb9769 Parents: 6cd2f16 Author: Herman van Hovell Authored: Fri Apr 21 00:05:03 2017 -0700 Committer: Reynold Xin Committed: Fri Apr 21 00:05:10 2017 -0700 -- .../sql/catalyst/catalog/ExternalCatalog.scala | 85 - .../sql/catalyst/catalog/InMemoryCatalog.scala | 22 ++- .../spark/sql/catalyst/catalog/events.scala | 158 .../catalog/ExternalCatalogEventSuite.scala | 188 +++ .../apache/spark/sql/internal/SharedState.scala | 7 + .../spark/sql/hive/HiveExternalCatalog.scala| 22 ++- 6 files changed, 457 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cddb4b7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala index 08a01e8..974ef90 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.catalog import org.apache.spark.sql.catalyst.analysis.{FunctionAlreadyExistsException, NoSuchDatabaseException, NoSuchFunctionException, NoSuchTableException} import org.apache.spark.sql.catalyst.expressions.Expression import org.apache.spark.sql.types.StructType +import org.apache.spark.util.ListenerBus /** * Interface for the system catalog (of functions, partitions, tables, and databases). @@ -30,7 +31,8 @@ import org.apache.spark.sql.types.StructType * * Implementations should throw [[NoSuchDatabaseException]] when databases don't exist. */ -abstract class ExternalCatalog { +abstract class ExternalCatalog + extends ListenerBus[ExternalCatalogEventListener, ExternalCatalogEvent] { import CatalogTypes.TablePartitionSpec protected def requireDbExists(db: String): Unit = { @@ -61,9 +63,22 @@ abstract class ExternalCatalog { // Databases //
spark git commit: Fixed typos in docs
Repository: spark Updated Branches: refs/heads/master dd6d55d5d -> bdc605691 Fixed typos in docs ## What changes were proposed in this pull request? Typos at a couple of place in the docs. ## How was this patch tested? build including docs Please review http://spark.apache.org/contributing.html before opening a pull request. Author: ymahajanCloses #17690 from ymahajan/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bdc60569 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bdc60569 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bdc60569 Branch: refs/heads/master Commit: bdc60569196e9ae4e9086c3e514a406a9e8b23a6 Parents: dd6d55d Author: ymahajan Authored: Wed Apr 19 20:08:31 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 19 20:08:31 2017 -0700 -- docs/sql-programming-guide.md | 2 +- docs/structured-streaming-programming-guide.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bdc60569/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 28942b6..490c1ce 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -571,7 +571,7 @@ be created by calling the `table` method on a `SparkSession` with the name of th For file-based data source, e.g. text, parquet, json, etc. you can specify a custom table path via the `path` option, e.g. `df.write.option("path", "/some/path").saveAsTable("t")`. When the table is dropped, the custom table path will not be removed and the table data is still there. If no custom table path is -specifed, Spark will write data to a default table path under the warehouse directory. When the table is +specified, Spark will write data to a default table path under the warehouse directory. When the table is dropped, the default table path will be removed too. Starting from Spark 2.1, persistent datasource tables have per-partition metadata stored in the Hive metastore. This brings several benefits: http://git-wip-us.apache.org/repos/asf/spark/blob/bdc60569/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3cf7151..5b18cf2 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -778,7 +778,7 @@ windowedCounts = words \ In this example, we are defining the watermark of the query on the value of the column "timestamp", and also defining "10 minutes" as the threshold of how late is the data allowed to be. If this query is run in Update output mode (discussed later in [Output Modes](#output-modes) section), -the engine will keep updating counts of a window in the Resule Table until the window is older +the engine will keep updating counts of a window in the Result Table until the window is older than the watermark, which lags behind the current event time in column "timestamp" by 10 minutes. Here is an illustration. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: Fixed typos in docs
Repository: spark Updated Branches: refs/heads/branch-2.2 e6bbdb0c5 -> 8d658b90b Fixed typos in docs ## What changes were proposed in this pull request? Typos at a couple of place in the docs. ## How was this patch tested? build including docs Please review http://spark.apache.org/contributing.html before opening a pull request. Author: ymahajanCloses #17690 from ymahajan/master. (cherry picked from commit bdc60569196e9ae4e9086c3e514a406a9e8b23a6) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d658b90 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d658b90 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d658b90 Branch: refs/heads/branch-2.2 Commit: 8d658b90b9f08ed4a3a899aad5d3ea77986b7302 Parents: e6bbdb0 Author: ymahajan Authored: Wed Apr 19 20:08:31 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 19 20:08:37 2017 -0700 -- docs/sql-programming-guide.md | 2 +- docs/structured-streaming-programming-guide.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8d658b90/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 28942b6..490c1ce 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -571,7 +571,7 @@ be created by calling the `table` method on a `SparkSession` with the name of th For file-based data source, e.g. text, parquet, json, etc. you can specify a custom table path via the `path` option, e.g. `df.write.option("path", "/some/path").saveAsTable("t")`. When the table is dropped, the custom table path will not be removed and the table data is still there. If no custom table path is -specifed, Spark will write data to a default table path under the warehouse directory. When the table is +specified, Spark will write data to a default table path under the warehouse directory. When the table is dropped, the default table path will be removed too. Starting from Spark 2.1, persistent datasource tables have per-partition metadata stored in the Hive metastore. This brings several benefits: http://git-wip-us.apache.org/repos/asf/spark/blob/8d658b90/docs/structured-streaming-programming-guide.md -- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3cf7151..5b18cf2 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -778,7 +778,7 @@ windowedCounts = words \ In this example, we are defining the watermark of the query on the value of the column "timestamp", and also defining "10 minutes" as the threshold of how late is the data allowed to be. If this query is run in Update output mode (discussed later in [Output Modes](#output-modes) section), -the engine will keep updating counts of a window in the Resule Table until the window is older +the engine will keep updating counts of a window in the Result Table until the window is older than the watermark, which lags behind the current event time in column "timestamp" by 10 minutes. Here is an illustration. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20398][SQL] range() operator should include cancellation reason when killed
Repository: spark Updated Branches: refs/heads/branch-2.2 af9f18c31 -> e6bbdb0c5 [SPARK-20398][SQL] range() operator should include cancellation reason when killed ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-19820 adds a reason field for why tasks were killed. However, for backwards compatibility it left the old TaskKilledException constructor which defaults to "unknown reason". The range() operator should use the constructor that fills in the reason rather than dropping it on task kill. ## How was this patch tested? Existing tests, and I tested this manually. Author: Eric LiangCloses #17692 from ericl/fix-kill-reason-in-range. (cherry picked from commit dd6d55d5de970662eccf024e5eae4e6821373d35) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e6bbdb0c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e6bbdb0c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e6bbdb0c Branch: refs/heads/branch-2.2 Commit: e6bbdb0c50657190192933f29b92278ea8f37704 Parents: af9f18c Author: Eric Liang Authored: Wed Apr 19 19:53:40 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 19 19:54:45 2017 -0700 -- .../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e6bbdb0c/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala index 44278e3..233a105 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala @@ -463,9 +463,7 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) | $number = $batchEnd; | } | - | if ($taskContext.isInterrupted()) { - | throw new TaskKilledException(); - | } + | $taskContext.killTaskIfInterrupted(); | | long $nextBatchTodo; | if ($numElementsTodo > ${batchSize}L) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20398][SQL] range() operator should include cancellation reason when killed
Repository: spark Updated Branches: refs/heads/master 39e303a8b -> dd6d55d5d [SPARK-20398][SQL] range() operator should include cancellation reason when killed ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-19820 adds a reason field for why tasks were killed. However, for backwards compatibility it left the old TaskKilledException constructor which defaults to "unknown reason". The range() operator should use the constructor that fills in the reason rather than dropping it on task kill. ## How was this patch tested? Existing tests, and I tested this manually. Author: Eric LiangCloses #17692 from ericl/fix-kill-reason-in-range. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dd6d55d5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dd6d55d5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dd6d55d5 Branch: refs/heads/master Commit: dd6d55d5de970662eccf024e5eae4e6821373d35 Parents: 39e303a Author: Eric Liang Authored: Wed Apr 19 19:53:40 2017 -0700 Committer: Reynold Xin Committed: Wed Apr 19 19:53:40 2017 -0700 -- .../org/apache/spark/sql/execution/basicPhysicalOperators.scala | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dd6d55d5/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala index 44278e3..233a105 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala @@ -463,9 +463,7 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range) | $number = $batchEnd; | } | - | if ($taskContext.isInterrupted()) { - | throw new TaskKilledException(); - | } + | $taskContext.killTaskIfInterrupted(); | | long $nextBatchTodo; | if ($numElementsTodo > ${batchSize}L) { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [TEST][MINOR] Replace repartitionBy with distribute in CollapseRepartitionSuite
Repository: spark Updated Branches: refs/heads/master 0075562dd -> 33ea908af [TEST][MINOR] Replace repartitionBy with distribute in CollapseRepartitionSuite ## What changes were proposed in this pull request? Replace non-existent `repartitionBy` with `distribute` in `CollapseRepartitionSuite`. ## How was this patch tested? local build and `catalyst/testOnly *CollapseRepartitionSuite` Author: Jacek LaskowskiCloses #17657 from jaceklaskowski/CollapseRepartitionSuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33ea908a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33ea908a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33ea908a Branch: refs/heads/master Commit: 33ea908af94152147e996a6dc8da41ada27d5af3 Parents: 0075562 Author: Jacek Laskowski Authored: Mon Apr 17 17:58:10 2017 -0700 Committer: Reynold Xin Committed: Mon Apr 17 17:58:10 2017 -0700 -- .../optimizer/CollapseRepartitionSuite.scala| 21 ++-- 1 file changed, 10 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/33ea908a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala index 59d2dc4..8cc8dec 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseRepartitionSuite.scala @@ -106,8 +106,8 @@ class CollapseRepartitionSuite extends PlanTest { comparePlans(optimized2, correctAnswer) } - test("repartitionBy above repartition") { -// Always respects the top repartitionBy amd removes useless repartition + test("distribute above repartition") { +// Always respects the top distribute and removes useless repartition val query1 = testRelation .repartition(10) .distribute('a)(20) @@ -123,8 +123,8 @@ class CollapseRepartitionSuite extends PlanTest { comparePlans(optimized2, correctAnswer) } - test("repartitionBy above coalesce") { -// Always respects the top repartitionBy amd removes useless coalesce below repartition + test("distribute above coalesce") { +// Always respects the top distribute and removes useless coalesce below repartition val query1 = testRelation .coalesce(10) .distribute('a)(20) @@ -140,8 +140,8 @@ class CollapseRepartitionSuite extends PlanTest { comparePlans(optimized2, correctAnswer) } - test("repartition above repartitionBy") { -// Always respects the top repartition amd removes useless distribute below repartition + test("repartition above distribute") { +// Always respects the top repartition and removes useless distribute below repartition val query1 = testRelation .distribute('a)(10) .repartition(20) @@ -155,11 +155,10 @@ class CollapseRepartitionSuite extends PlanTest { comparePlans(optimized1, correctAnswer) comparePlans(optimized2, correctAnswer) - } - test("coalesce above repartitionBy") { -// Remove useless coalesce above repartition + test("coalesce above distribute") { +// Remove useless coalesce above distribute val query1 = testRelation .distribute('a)(10) .coalesce(20) @@ -180,8 +179,8 @@ class CollapseRepartitionSuite extends PlanTest { comparePlans(optimized2, correctAnswer2) } - test("collapse two adjacent repartitionBys into one") { -// Always respects the top repartitionBy + test("collapse two adjacent distributes into one") { +// Always respects the top distribute val query1 = testRelation .distribute('b)(10) .distribute('a)(20) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate functions after using persistent functions
Repository: spark Updated Branches: refs/heads/branch-2.1 622d7a8bf -> 3808b4728 [SPARK-20349][SQL][REVERT-BRANCH2.1] ListFunctions returns duplicate functions after using persistent functions Revert the changes of https://github.com/apache/spark/pull/17646 made in Branch 2.1, because it breaks the build. It needs the parser interface, but SessionCatalog in branch 2.1 does not have it. ### What changes were proposed in this pull request? The session catalog caches some persistent functions in the `FunctionRegistry`, so there can be duplicates. Our Catalog API `listFunctions` does not handle it. It would be better if `SessionCatalog` API can de-duplciate the records, instead of doing it by each API caller. In `FunctionRegistry`, our functions are identified by the unquoted string. Thus, this PR is try to parse it using our parser interface and then de-duplicate the names. ### How was this patch tested? Added test cases. Author: Xiao LiCloses #17661 from gatorsmile/compilationFix17646. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3808b472 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3808b472 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3808b472 Branch: refs/heads/branch-2.1 Commit: 3808b472813a2cdf560107787f6971e5202044a8 Parents: 622d7a8 Author: Xiao Li Authored: Mon Apr 17 17:57:20 2017 -0700 Committer: Reynold Xin Committed: Mon Apr 17 17:57:20 2017 -0700 -- .../sql/catalyst/catalog/SessionCatalog.scala | 21 +--- .../spark/sql/execution/command/functions.scala | 4 +++- .../spark/sql/hive/execution/HiveUDFSuite.scala | 17 3 files changed, 8 insertions(+), 34 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3808b472/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 6f302d3..a5cf719 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -20,7 +20,6 @@ package org.apache.spark.sql.catalyst.catalog import javax.annotation.concurrent.GuardedBy import scala.collection.mutable -import scala.util.{Failure, Success, Try} import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path @@ -1099,25 +1098,15 @@ class SessionCatalog( def listFunctions(db: String, pattern: String): Seq[(FunctionIdentifier, String)] = { val dbName = formatDatabaseName(db) requireDbExists(dbName) -val dbFunctions = externalCatalog.listFunctions(dbName, pattern).map { f => - FunctionIdentifier(f, Some(dbName)) } -val loadedFunctions = - StringUtils.filterPattern(functionRegistry.listFunction(), pattern).map { f => -// In functionRegistry, function names are stored as an unquoted format. -Try(parser.parseFunctionIdentifier(f)) match { - case Success(e) => e - case Failure(_) => -// The names of some built-in functions are not parsable by our parser, e.g., % -FunctionIdentifier(f) -} - } +val dbFunctions = externalCatalog.listFunctions(dbName, pattern) + .map { f => FunctionIdentifier(f, Some(dbName)) } +val loadedFunctions = StringUtils.filterPattern(functionRegistry.listFunction(), pattern) + .map { f => FunctionIdentifier(f) } val functions = dbFunctions ++ loadedFunctions -// The session catalog caches some persistent functions in the FunctionRegistry -// so there can be duplicates. functions.map { case f if FunctionRegistry.functionSet.contains(f.funcName) => (f, "SYSTEM") case f => (f, "USER") -}.distinct +} } http://git-wip-us.apache.org/repos/asf/spark/blob/3808b472/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala index 75272d2..ea53987 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala @@ -208,6 +208,8 @@ case class ShowFunctionsCommand( case (f, "USER") if showUserFunctions => f.unquotedString case (f,
spark git commit: Typo fix: distitrbuted -> distributed
Repository: spark Updated Branches: refs/heads/master e5fee3e4f -> 0075562dd Typo fix: distitrbuted -> distributed ## What changes were proposed in this pull request? Typo fix: distitrbuted -> distributed ## How was this patch tested? Existing tests Author: Andrew AshCloses #17664 from ash211/patch-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0075562d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0075562d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0075562d Branch: refs/heads/master Commit: 0075562dd2551a31c35ca26922d6bd73cdb78ea4 Parents: e5fee3e Author: Andrew Ash Authored: Mon Apr 17 17:56:33 2017 -0700 Committer: Reynold Xin Committed: Mon Apr 17 17:56:33 2017 -0700 -- .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0075562d/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala -- diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index 424bbca..b817570 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -577,7 +577,7 @@ private[spark] class Client( ).foreach { case (flist, resType, addToClasspath) => flist.foreach { file => val (_, localizedPath) = distribute(file, resType = resType) -// If addToClassPath, we ignore adding jar multiple times to distitrbuted cache. +// If addToClassPath, we ignore adding jar multiple times to distributed cache. if (addToClasspath) { if (localizedPath != null) { cachedSecondaryJarLinks += localizedPath - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [HOTFIX] Fix compilation.
Repository: spark Updated Branches: refs/heads/branch-2.1 db9517c16 -> 622d7a8bf [HOTFIX] Fix compilation. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/622d7a8b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/622d7a8b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/622d7a8b Branch: refs/heads/branch-2.1 Commit: 622d7a8bf6be22e30db7ff38604ed86b44fcc87e Parents: db9517c Author: Reynold XinAuthored: Mon Apr 17 12:57:58 2017 -0700 Committer: Reynold Xin Committed: Mon Apr 17 12:57:58 2017 -0700 -- .../apache/spark/sql/catalyst/expressions/regexpExpressions.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/622d7a8b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index ad12177..0325d0e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -92,7 +92,8 @@ trait StringRegexExpression extends ImplicitCastInputTypes { See also: Use RLIKE to match with standard regular expressions. """) -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression) + extends BinaryExpression with StringRegexExpression { override def escape(v: String): String = StringUtils.escapeLikeRegex(v) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org