[jira] [Resolved] (SPARK-28343) PostgreSQL test should change some default config
[ https://issues.apache.org/jira/browse/SPARK-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28343. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25109 [https://github.com/apache/spark/pull/25109] > PostgreSQL test should change some default config > - > > Key: SPARK-28343 > URL: https://issues.apache.org/jira/browse/SPARK-28343 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > {noformat} > set spark.sql.crossJoin.enabled=true; > set spark.sql.parser.ansi.enabled=true; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28343) PostgreSQL test should change some default config
[ https://issues.apache.org/jira/browse/SPARK-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28343: - Assignee: Yuming Wang > PostgreSQL test should change some default config > - > > Key: SPARK-28343 > URL: https://issues.apache.org/jira/browse/SPARK-28343 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > {noformat} > set spark.sql.crossJoin.enabled=true; > set spark.sql.parser.ansi.enabled=true; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28378) Remove usage of cgi.escape
[ https://issues.apache.org/jira/browse/SPARK-28378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28378: - Fix Version/s: 2.4.4 > Remove usage of cgi.escape > -- > > Key: SPARK-28378 > URL: https://issues.apache.org/jira/browse/SPARK-28378 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Minor > Fix For: 2.4.4, 3.0.0 > > > {{cgi.escape}} is deprecated [1], and removed at 3.8 [2]. We better to > replace it. > [1] [https://docs.python.org/3/library/cgi.html#cgi.escape]. > [2] [https://docs.python.org/3.8/whatsnew/3.8.html#api-and-feature-removals] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28378) Remove usage of cgi.escape
[ https://issues.apache.org/jira/browse/SPARK-28378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28378: Assignee: Liang-Chi Hsieh > Remove usage of cgi.escape > -- > > Key: SPARK-28378 > URL: https://issues.apache.org/jira/browse/SPARK-28378 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Minor > > {{cgi.escape}} is deprecated [1], and removed at 3.8 [2]. We better to > replace it. > [1] [https://docs.python.org/3/library/cgi.html#cgi.escape]. > [2] [https://docs.python.org/3.8/whatsnew/3.8.html#api-and-feature-removals] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28378) Remove usage of cgi.escape
[ https://issues.apache.org/jira/browse/SPARK-28378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28378. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25142 [https://github.com/apache/spark/pull/25142] > Remove usage of cgi.escape > -- > > Key: SPARK-28378 > URL: https://issues.apache.org/jira/browse/SPARK-28378 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Minor > Fix For: 3.0.0 > > > {{cgi.escape}} is deprecated [1], and removed at 3.8 [2]. We better to > replace it. > [1] [https://docs.python.org/3/library/cgi.html#cgi.escape]. > [2] [https://docs.python.org/3.8/whatsnew/3.8.html#api-and-feature-removals] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28382) Array Functions: unnest
Yuming Wang created SPARK-28382: --- Summary: Array Functions: unnest Key: SPARK-28382 URL: https://issues.apache.org/jira/browse/SPARK-28382 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang ||Function||Return Type||Description||Example||Result|| |{{unnest}}({{anyarray}})|set of anyelement|expand an array to a set of rows|unnest(ARRAY[1,2])|1 2 (2 rows)| https://www.postgresql.org/docs/11/functions-array.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28379) Correlated scalar subqueries must be aggregated
[ https://issues.apache.org/jira/browse/SPARK-28379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28379: Description: {code:sql} create or replace temporary view INT8_TBL as select * from (values (123, 456), (123, 4567890123456789), (4567890123456789, 123), (4567890123456789, 4567890123456789), (4567890123456789, -4567890123456789)) as v(q1, q2); select * from int8_tbl t1 left join (select q1 as x, 42 as y from int8_tbl t2) ss on t1.q2 = ss.x where 1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1) order by 1,2; {code} PostgreSQL: {noformat} postgres=# select * from postgres-# int8_tbl t1 left join postgres-# (select q1 as x, 42 as y from int8_tbl t2) ss postgres-# on t1.q2 = ss.x postgres-# where postgres-# 1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1) postgres-# order by 1,2; q1|q2|x | y --+--+--+ 123 | 4567890123456789 | 4567890123456789 | 42 123 | 4567890123456789 | 4567890123456789 | 42 123 | 4567890123456789 | 4567890123456789 | 42 4567890123456789 | 123 | 123 | 42 4567890123456789 | 123 | 123 | 42 4567890123456789 | 4567890123456789 | 4567890123456789 | 42 4567890123456789 | 4567890123456789 | 4567890123456789 | 42 4567890123456789 | 4567890123456789 | 4567890123456789 | 42 (8 rows) {noformat} Spark SQL: {noformat} spark-sql> select * from > int8_tbl t1 left join > (select q1 as x, 42 as y from int8_tbl t2) ss > on t1.q2 = ss.x > where > 1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1) > order by 1,2; Error in query: Correlated scalar subqueries must be aggregated: GlobalLimit 1 +- LocalLimit 1 +- Project [1 AS 1#169] +- Filter isnotnull(outer(y#167)) +- SubqueryAlias `t3` +- SubqueryAlias `int8_tbl` +- Project [q1#164L, q2#165L] +- Project [col1#162L AS q1#164L, col2#163L AS q2#165L] +- SubqueryAlias `v` +- LocalRelation [col1#162L, col2#163L] ;; {noformat} was: Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. This allows them to reference columns provided by preceding {{FROM}} items. (Without {{LATERAL}}, each subquery is evaluated independently and so cannot cross-reference any other {{FROM}} item.) Table functions appearing in {{FROM}} can also be preceded by the key word {{LATERAL}}, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding {{FROM}} items in any case. A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a {{JOIN}} tree. In the latter case it can also refer to any items that are on the left-hand side of a {{JOIN}} that it is on the right-hand side of. When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation proceeds as follows: for each row of the {{FROM}} item providing the cross-referenced column(s), or set of rows of multiple {{FROM}} items providing the columns, the {{LATERAL}} item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s). A trivial example of {{LATERAL}} is {code:sql} SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; {code} *Feature ID*: T491 https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM > Correlated scalar subqueries must be aggregated > --- > > Key: SPARK-28379 > URL: https://issues.apache.org/jira/browse/SPARK-28379 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > create or replace temporary view INT8_TBL as select * from > (values > (123, 456), > (123, 4567890123456789), > (4567890123456789, 123), > (4567890123456789, 4567890123456789), > (4567890123456789, -4567890123456789)) > as v(q1, q2); > select * from > int8_tbl t1 left join > (select q1 as x, 42 as y from int8_tbl t2) ss > on t1.q2 = ss.x > where > 1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1) > order by 1,2; > {code} > PostgreSQL: > {noformat} > postgres=# select * from > postgres-# int8_tbl t1 left join > postgres-# (select q1 as x, 42 as y from int8_tbl t2) ss > postgres-# on t1.q2 = ss.x > postgres-# where > postgres-# 1 = (select 1 from int8_tbl t3 where ss.y is not null limit 1) > postgres-# ord
[jira] [Commented] (SPARK-28381) Upgraded version of Pyrolite to 4.30
[ https://issues.apache.org/jira/browse/SPARK-28381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884568#comment-16884568 ] Apache Spark commented on SPARK-28381: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/25143 > Upgraded version of Pyrolite to 4.30 > > > Key: SPARK-28381 > URL: https://issues.apache.org/jira/browse/SPARK-28381 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > This upgraded to a newer version of Pyrolite. Most updates in the newer > version are for dotnet. For java, it includes a bug fix to Unpickler > regarding cleaning up Unpickler memo, and support of protocol 5. > > After upgrading, we can remove the fix at SPARK-27629 for the bug in > Unpickler. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28381) Upgraded version of Pyrolite to 4.30
[ https://issues.apache.org/jira/browse/SPARK-28381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28381: Assignee: (was: Apache Spark) > Upgraded version of Pyrolite to 4.30 > > > Key: SPARK-28381 > URL: https://issues.apache.org/jira/browse/SPARK-28381 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > This upgraded to a newer version of Pyrolite. Most updates in the newer > version are for dotnet. For java, it includes a bug fix to Unpickler > regarding cleaning up Unpickler memo, and support of protocol 5. > > After upgrading, we can remove the fix at SPARK-27629 for the bug in > Unpickler. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28381) Upgraded version of Pyrolite to 4.30
[ https://issues.apache.org/jira/browse/SPARK-28381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28381: Assignee: Apache Spark > Upgraded version of Pyrolite to 4.30 > > > Key: SPARK-28381 > URL: https://issues.apache.org/jira/browse/SPARK-28381 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Major > > This upgraded to a newer version of Pyrolite. Most updates in the newer > version are for dotnet. For java, it includes a bug fix to Unpickler > regarding cleaning up Unpickler memo, and support of protocol 5. > > After upgrading, we can remove the fix at SPARK-27629 for the bug in > Unpickler. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28381) Upgraded version of Pyrolite to 4.30
Liang-Chi Hsieh created SPARK-28381: --- Summary: Upgraded version of Pyrolite to 4.30 Key: SPARK-28381 URL: https://issues.apache.org/jira/browse/SPARK-28381 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Liang-Chi Hsieh This upgraded to a newer version of Pyrolite. Most updates in the newer version are for dotnet. For java, it includes a bug fix to Unpickler regarding cleaning up Unpickler memo, and support of protocol 5. After upgrading, we can remove the fix at SPARK-27629 for the bug in Unpickler. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28379) Correlated scalar subqueries must be aggregated
[ https://issues.apache.org/jira/browse/SPARK-28379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28379: Summary: Correlated scalar subqueries must be aggregated (was: ANSI SQL: LATERAL derived table(T491)) > Correlated scalar subqueries must be aggregated > --- > > Key: SPARK-28379 > URL: https://issues.apache.org/jira/browse/SPARK-28379 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. > This allows them to reference columns provided by preceding {{FROM}} items. > (Without {{LATERAL}}, each subquery is evaluated independently and so cannot > cross-reference any other {{FROM}} item.) > Table functions appearing in {{FROM}} can also be preceded by the key word > {{LATERAL}}, but for functions the key word is optional; the function's > arguments can contain references to columns provided by preceding {{FROM}} > items in any case. > A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a > {{JOIN}} tree. In the latter case it can also refer to any items that are on > the left-hand side of a {{JOIN}} that it is on the right-hand side of. > When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation > proceeds as follows: for each row of the {{FROM}} item providing the > cross-referenced column(s), or set of rows of multiple {{FROM}} items > providing the columns, the {{LATERAL}} item is evaluated using that row or > row set's values of the columns. The resulting row(s) are joined as usual > with the rows they were computed from. This is repeated for each row or set > of rows from the column source table(s). > A trivial example of {{LATERAL}} is > {code:sql} > SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; > {code} > *Feature ID*: T491 > https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28379) Correlated scalar subqueries must be aggregated
[ https://issues.apache.org/jira/browse/SPARK-28379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28379: Comment: was deleted (was: The lateral versus parent references case: {code:sql} create or replace temporary view INT8_TBL as select * from (values (123, 456), (123, 4567890123456789), (4567890123456789, 123), (4567890123456789, 4567890123456789), (4567890123456789, -4567890123456789)) as v(q1, q2); select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl; {code} Spark SQL: {noformat} select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl -- !query 235 schema struct<> -- !query 235 output org.apache.spark.sql.AnalysisException Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: Project [outer(q1#xL) AS q2#xL] +- OneRowRelation ; {noformat} PostgreSQL: {noformat} postgres=# select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl; q1|q2 | r --+---+--- 123 | 456 | 456 123 | 4567890123456789 | 4567890123456789 4567890123456789 | 123 | 123 4567890123456789 | 4567890123456789 | 4567890123456789 4567890123456789 | -4567890123456789 | -4567890123456789 (5 rows) {noformat} ) > Correlated scalar subqueries must be aggregated > --- > > Key: SPARK-28379 > URL: https://issues.apache.org/jira/browse/SPARK-28379 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. > This allows them to reference columns provided by preceding {{FROM}} items. > (Without {{LATERAL}}, each subquery is evaluated independently and so cannot > cross-reference any other {{FROM}} item.) > Table functions appearing in {{FROM}} can also be preceded by the key word > {{LATERAL}}, but for functions the key word is optional; the function's > arguments can contain references to columns provided by preceding {{FROM}} > items in any case. > A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a > {{JOIN}} tree. In the latter case it can also refer to any items that are on > the left-hand side of a {{JOIN}} that it is on the right-hand side of. > When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation > proceeds as follows: for each row of the {{FROM}} item providing the > cross-referenced column(s), or set of rows of multiple {{FROM}} items > providing the columns, the {{LATERAL}} item is evaluated using that row or > row set's values of the columns. The resulting row(s) are joined as usual > with the rows they were computed from. This is repeated for each row or set > of rows from the column source table(s). > A trivial example of {{LATERAL}} is > {code:sql} > SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; > {code} > *Feature ID*: T491 > https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28317) Built-in Mathematical Functions: SCALE
[ https://issues.apache.org/jira/browse/SPARK-28317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884566#comment-16884566 ] Shivu Sondur commented on SPARK-28317: -- i am working on this > Built-in Mathematical Functions: SCALE > -- > > Key: SPARK-28317 > URL: https://issues.apache.org/jira/browse/SPARK-28317 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{scale(}}{{numeric}}{{)}}|{{integer}}|scale of the argument (the number of > decimal digits in the fractional part)|{{scale(8.41)}}|{{2}}| > https://www.postgresql.org/docs/11/functions-math.html#FUNCTIONS-MATH-FUNC-TABLE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28377) Fully support correlation names in the FROM clause
[ https://issues.apache.org/jira/browse/SPARK-28377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884555#comment-16884555 ] Yuming Wang commented on SPARK-28377: - Postgres will fill in with underlying names: PostgreSQL: {noformat} -- currently, Postgres will fill in with underlying names SELECT '' AS "xxx", * FROM J1_TBL t1 (a, b) NATURAL JOIN J2_TBL t2 (a); xxx | a | b | t | k -+---+---+---+ | 0 | | zero | | 1 | 4 | one | -1 | 2 | 3 | two | 2 | 2 | 3 | two | 4 | 3 | 2 | three | -3 | 5 | 0 | five | -5 | 5 | 0 | five | -5 (7 rows){noformat} Spark SQL: {noformat} SELECT '' AS `xxx`, * FROM J1_TBL t1 (a, b) NATURAL JOIN J2_TBL t2 (a) -- !query 44 schema struct<> -- !query 44 output org.apache.spark.sql.AnalysisException Number of column aliases does not match number of columns. Number of column aliases: 2; number of columns: 3.; line 2 pos 7 {noformat} > Fully support correlation names in the FROM clause > -- > > Key: SPARK-28377 > URL: https://issues.apache.org/jira/browse/SPARK-28377 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Specifying a list of column names is not fully support. Example: > {code:sql} > create or replace temporary view J1_TBL as select * from > (values (1, 4, 'one'), (2, 3, 'two')) > as v(i, j, t); > create or replace temporary view J2_TBL as select * from > (values (1, -1), (2, 2)) > as v(i, k); > SELECT '' AS xxx, t1.a, t2.e > FROM J1_TBL t1 (a, b, c), J2_TBL t2 (d, e) > WHERE t1.a = t2.d; > {code} > PostgreSQL: > {noformat} > postgres=# SELECT '' AS xxx, t1.a, t2.e > postgres-# FROM J1_TBL t1 (a, b, c), J2_TBL t2 (d, e) > postgres-# WHERE t1.a = t2.d; > xxx | a | e > -+---+ > | 1 | -1 > | 2 | 2 > (2 rows) > {noformat} > Spark SQL: > {noformat} > spark-sql> SELECT '' AS xxx, t1.a, t2.e > > FROM J1_TBL t1 (a, b, c), J2_TBL t2 (d, e) > > WHERE t1.a = t2.d; > Error in query: cannot resolve '`t1.a`' given input columns: [a, b, c, d, e]; > line 3 pos 8; > 'Project [ AS xxx#21, 't1.a, 't2.e] > +- 'Filter ('t1.a = 't2.d) >+- Join Inner > :- Project [i#14 AS a#22, j#15 AS b#23, t#16 AS c#24] > : +- SubqueryAlias `t1` > : +- SubqueryAlias `j1_tbl` > :+- Project [i#14, j#15, t#16] > : +- Project [col1#11 AS i#14, col2#12 AS j#15, col3#13 AS > t#16] > : +- SubqueryAlias `v` > : +- LocalRelation [col1#11, col2#12, col3#13] > +- Project [i#19 AS d#25, k#20 AS e#26] > +- SubqueryAlias `t2` > +- SubqueryAlias `j2_tbl` >+- Project [i#19, k#20] > +- Project [col1#17 AS i#19, col2#18 AS k#20] > +- SubqueryAlias `v` > +- LocalRelation [col1#17, col2#18] > {noformat} > > *Feature ID*: E051-08 > [https://www.postgresql.org/docs/11/sql-expressions.html] > [https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_correlationnames.html] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27877) ANSI SQL: LATERAL derived table(T491)
[ https://issues.apache.org/jira/browse/SPARK-27877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884552#comment-16884552 ] Yuming Wang commented on SPARK-27877: - The lateral versus parent references case: {code:sql} create or replace temporary view INT8_TBL as select * from (values (123, 456), (123, 4567890123456789), (4567890123456789, 123), (4567890123456789, 4567890123456789), (4567890123456789, -4567890123456789)) as v(q1, q2); select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl; {code} Spark SQL: {noformat} select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl -- !query 235 schema struct<> -- !query 235 output org.apache.spark.sql.AnalysisException Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: Project [outer(q1#xL) AS q2#xL] +- OneRowRelation ; {noformat} PostgreSQL: {noformat} postgres=# select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl; q1|q2 | r --+---+--- 123 | 456 | 456 123 | 4567890123456789 | 4567890123456789 4567890123456789 | 123 | 123 4567890123456789 | 4567890123456789 | 4567890123456789 4567890123456789 | -4567890123456789 | -4567890123456789 (5 rows) {noformat} > ANSI SQL: LATERAL derived table(T491) > - > > Key: SPARK-27877 > URL: https://issues.apache.org/jira/browse/SPARK-27877 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. > This allows them to reference columns provided by preceding {{FROM}} items. > (Without {{LATERAL}}, each subquery is evaluated independently and so cannot > cross-reference any other {{FROM}} item.) > Table functions appearing in {{FROM}} can also be preceded by the key word > {{LATERAL}}, but for functions the key word is optional; the function's > arguments can contain references to columns provided by preceding {{FROM}} > items in any case. > A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a > {{JOIN}} tree. In the latter case it can also refer to any items that are on > the left-hand side of a {{JOIN}} that it is on the right-hand side of. > When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation > proceeds as follows: for each row of the {{FROM}} item providing the > cross-referenced column(s), or set of rows of multiple {{FROM}} items > providing the columns, the {{LATERAL}} item is evaluated using that row or > row set's values of the columns. The resulting row(s) are joined as usual > with the rows they were computed from. This is repeated for each row or set > of rows from the column source table(s). > A trivial example of {{LATERAL}} is > {code:sql} > SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; > {code} > *Feature ID*: T491 > [https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM] > [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27877) ANSI SQL: LATERAL derived table(T491)
[ https://issues.apache.org/jira/browse/SPARK-27877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27877: Description: Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. This allows them to reference columns provided by preceding {{FROM}} items. (Without {{LATERAL}}, each subquery is evaluated independently and so cannot cross-reference any other {{FROM}} item.) Table functions appearing in {{FROM}} can also be preceded by the key word {{LATERAL}}, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding {{FROM}} items in any case. A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a {{JOIN}} tree. In the latter case it can also refer to any items that are on the left-hand side of a {{JOIN}} that it is on the right-hand side of. When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation proceeds as follows: for each row of the {{FROM}} item providing the cross-referenced column(s), or set of rows of multiple {{FROM}} items providing the columns, the {{LATERAL}} item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s). A trivial example of {{LATERAL}} is {code:sql} SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; {code} *Feature ID*: T491 [https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM] [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430] was: Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. This allows them to reference columns provided by preceding {{FROM}} items. A trivial example of {{LATERAL}} is: {code:sql} SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; {code} More details: [https://www.postgresql.org/docs/9.3/queries-table-expressions.html#QUERIES-LATERAL] [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430] > ANSI SQL: LATERAL derived table(T491) > - > > Key: SPARK-27877 > URL: https://issues.apache.org/jira/browse/SPARK-27877 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. > This allows them to reference columns provided by preceding {{FROM}} items. > (Without {{LATERAL}}, each subquery is evaluated independently and so cannot > cross-reference any other {{FROM}} item.) > Table functions appearing in {{FROM}} can also be preceded by the key word > {{LATERAL}}, but for functions the key word is optional; the function's > arguments can contain references to columns provided by preceding {{FROM}} > items in any case. > A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a > {{JOIN}} tree. In the latter case it can also refer to any items that are on > the left-hand side of a {{JOIN}} that it is on the right-hand side of. > When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation > proceeds as follows: for each row of the {{FROM}} item providing the > cross-referenced column(s), or set of rows of multiple {{FROM}} items > providing the columns, the {{LATERAL}} item is evaluated using that row or > row set's values of the columns. The resulting row(s) are joined as usual > with the rows they were computed from. This is repeated for each row or set > of rows from the column source table(s). > A trivial example of {{LATERAL}} is > {code:sql} > SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; > {code} > *Feature ID*: T491 > [https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM] > [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27877) ANSI SQL: LATERAL derived table(T491)
[ https://issues.apache.org/jira/browse/SPARK-27877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27877: Summary: ANSI SQL: LATERAL derived table(T491) (was: Implement SQL-standard LATERAL subqueries) > ANSI SQL: LATERAL derived table(T491) > - > > Key: SPARK-27877 > URL: https://issues.apache.org/jira/browse/SPARK-27877 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. > This allows them to reference columns provided by preceding {{FROM}} items. A > trivial example of {{LATERAL}} is: > {code:sql} > SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; > {code} > More details: > > [https://www.postgresql.org/docs/9.3/queries-table-expressions.html#QUERIES-LATERAL] > > [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28333) NULLS FIRST for DESC and NULLS LAST for ASC
[ https://issues.apache.org/jira/browse/SPARK-28333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28333: Assignee: (was: Apache Spark) > NULLS FIRST for DESC and NULLS LAST for ASC > --- > > Key: SPARK-28333 > URL: https://issues.apache.org/jira/browse/SPARK-28333 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > spark-sql> create or replace temporary view t1 as select * from (values(1), > (2), (null), (3), (null)) as v (val); > spark-sql> select * from t1 order by val asc; > NULL > NULL > 1 > 2 > 3 > spark-sql> select * from t1 order by val desc; > 3 > 2 > 1 > NULL > NULL > {code} > {code:sql} > postgres=# create or replace temporary view t1 as select * from (values(1), > (2), (null), (3), (null)) as v (val); > CREATE VIEW > postgres=# select * from t1 order by val asc; > val > - >1 >2 >3 > (5 rows) > postgres=# select * from t1 order by val desc; > val > - >3 >2 >1 > (5 rows) > {code} > https://www.postgresql.org/docs/11/queries-order.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28333) NULLS FIRST for DESC and NULLS LAST for ASC
[ https://issues.apache.org/jira/browse/SPARK-28333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28333: Assignee: Apache Spark > NULLS FIRST for DESC and NULLS LAST for ASC > --- > > Key: SPARK-28333 > URL: https://issues.apache.org/jira/browse/SPARK-28333 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > {code:sql} > spark-sql> create or replace temporary view t1 as select * from (values(1), > (2), (null), (3), (null)) as v (val); > spark-sql> select * from t1 order by val asc; > NULL > NULL > 1 > 2 > 3 > spark-sql> select * from t1 order by val desc; > 3 > 2 > 1 > NULL > NULL > {code} > {code:sql} > postgres=# create or replace temporary view t1 as select * from (values(1), > (2), (null), (3), (null)) as v (val); > CREATE VIEW > postgres=# select * from t1 order by val asc; > val > - >1 >2 >3 > (5 rows) > postgres=# select * from t1 order by val desc; > val > - >3 >2 >1 > (5 rows) > {code} > https://www.postgresql.org/docs/11/queries-order.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28379) ANSI SQL: LATERAL derived table(T491)
[ https://issues.apache.org/jira/browse/SPARK-28379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884539#comment-16884539 ] Yuming Wang commented on SPARK-28379: - The lateral versus parent references case: {code:sql} create or replace temporary view INT8_TBL as select * from (values (123, 456), (123, 4567890123456789), (4567890123456789, 123), (4567890123456789, 4567890123456789), (4567890123456789, -4567890123456789)) as v(q1, q2); select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl; {code} Spark SQL: {noformat} select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl -- !query 235 schema struct<> -- !query 235 output org.apache.spark.sql.AnalysisException Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: Project [outer(q1#xL) AS q2#xL] +- OneRowRelation ; {noformat} PostgreSQL: {noformat} postgres=# select *, (select r from (select q1 as q2) x, (select q2 as r) y) from int8_tbl; q1|q2 | r --+---+--- 123 | 456 | 456 123 | 4567890123456789 | 4567890123456789 4567890123456789 | 123 | 123 4567890123456789 | 4567890123456789 | 4567890123456789 4567890123456789 | -4567890123456789 | -4567890123456789 (5 rows) {noformat} > ANSI SQL: LATERAL derived table(T491) > - > > Key: SPARK-28379 > URL: https://issues.apache.org/jira/browse/SPARK-28379 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. > This allows them to reference columns provided by preceding {{FROM}} items. > (Without {{LATERAL}}, each subquery is evaluated independently and so cannot > cross-reference any other {{FROM}} item.) > Table functions appearing in {{FROM}} can also be preceded by the key word > {{LATERAL}}, but for functions the key word is optional; the function's > arguments can contain references to columns provided by preceding {{FROM}} > items in any case. > A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a > {{JOIN}} tree. In the latter case it can also refer to any items that are on > the left-hand side of a {{JOIN}} that it is on the right-hand side of. > When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation > proceeds as follows: for each row of the {{FROM}} item providing the > cross-referenced column(s), or set of rows of multiple {{FROM}} items > providing the columns, the {{LATERAL}} item is evaluated using that row or > row set's values of the columns. The resulting row(s) are joined as usual > with the rows they were computed from. This is repeated for each row or set > of rows from the column source table(s). > A trivial example of {{LATERAL}} is > {code:sql} > SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; > {code} > *Feature ID*: T491 > https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28319) DataSourceV2: Support SHOW TABLES
[ https://issues.apache.org/jira/browse/SPARK-28319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884538#comment-16884538 ] Terry Kim commented on SPARK-28319: --- I will work on this. > DataSourceV2: Support SHOW TABLES > - > > Key: SPARK-28319 > URL: https://issues.apache.org/jira/browse/SPARK-28319 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Priority: Major > > SHOW TABLES needs to support v2 catalogs. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28370) Upgrade Mockito to 2.28.2
[ https://issues.apache.org/jira/browse/SPARK-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28370. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25139 [https://github.com/apache/spark/pull/25139] > Upgrade Mockito to 2.28.2 > - > > Key: SPARK-28370 > URL: https://issues.apache.org/jira/browse/SPARK-28370 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > This issue aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to > bring the latest bug fixes and to be up-to-date for JDK9+ support before > Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had > better wait and see for the stability. > **RELEASE NOTE** > https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md > **NOTABLE FIXES** > - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) > - When mock is called multiple times, and verify fails, the error message > reports only the first invocation (2.27.4) > - Memory leak in mockito-inline calling method on mock with at least a mock > as parameter (2.25.0) > - Cross-references and a single spy cause memory leak (2.25.0) > - Nested spies cause memory leaks (2.25.0) > - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) > - Return null instead of causing a CCE (2.24.9, 2.24.3) > - Issue with mocking type in "java.util.*", Java 12 (2.24.2) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28370) Upgrade Mockito to 2.28.2
[ https://issues.apache.org/jira/browse/SPARK-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28370: - Assignee: Dongjoon Hyun > Upgrade Mockito to 2.28.2 > - > > Key: SPARK-28370 > URL: https://issues.apache.org/jira/browse/SPARK-28370 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > This issue aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to > bring the latest bug fixes and to be up-to-date for JDK9+ support before > Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had > better wait and see for the stability. > **RELEASE NOTE** > https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md > **NOTABLE FIXES** > - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) > - When mock is called multiple times, and verify fails, the error message > reports only the first invocation (2.27.4) > - Memory leak in mockito-inline calling method on mock with at least a mock > as parameter (2.25.0) > - Cross-references and a single spy cause memory leak (2.25.0) > - Nested spies cause memory leaks (2.25.0) > - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) > - Return null instead of causing a CCE (2.24.9, 2.24.3) > - Issue with mocking type in "java.util.*", Java 12 (2.24.2) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28349) Add FALSE and SETMINUS to ansiNonReserved
[ https://issues.apache.org/jira/browse/SPARK-28349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28349. --- Resolution: Won't Do > Add FALSE and SETMINUS to ansiNonReserved > - > > Key: SPARK-28349 > URL: https://issues.apache.org/jira/browse/SPARK-28349 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28370) Upgrade Mockito to 2.28.2
[ https://issues.apache.org/jira/browse/SPARK-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28370: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-24417 > Upgrade Mockito to 2.28.2 > - > > Key: SPARK-28370 > URL: https://issues.apache.org/jira/browse/SPARK-28370 > Project: Spark > Issue Type: Sub-task > Components: Build, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to > bring the latest bug fixes and to be up-to-date for JDK9+ support before > Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had > better wait and see for the stability. > **RELEASE NOTE** > https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md > **NOTABLE FIXES** > - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) > - When mock is called multiple times, and verify fails, the error message > reports only the first invocation (2.27.4) > - Memory leak in mockito-inline calling method on mock with at least a mock > as parameter (2.25.0) > - Cross-references and a single spy cause memory leak (2.25.0) > - Nested spies cause memory leaks (2.25.0) > - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) > - Return null instead of causing a CCE (2.24.9, 2.24.3) > - Issue with mocking type in "java.util.*", Java 12 (2.24.2) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28370) Upgrade Mockito to 2.28.2
[ https://issues.apache.org/jira/browse/SPARK-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28370: -- Description: This issue aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to bring the latest bug fixes and to be up-to-date for JDK9+ support before Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had better wait and see for the stability. **RELEASE NOTE** https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md **NOTABLE FIXES** - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) - When mock is called multiple times, and verify fails, the error message reports only the first invocation (2.27.4) - Memory leak in mockito-inline calling method on mock with at least a mock as parameter (2.25.0) - Cross-references and a single spy cause memory leak (2.25.0) - Nested spies cause memory leaks (2.25.0) - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) - Return null instead of causing a CCE (2.24.9, 2.24.3) - Issue with mocking type in "java.util.*", Java 12 (2.24.2) was: ## What changes were proposed in this pull request? This PR aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to bring the latest bug fixes and to be up-to-date for JDK9+ support before Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had better wait and see for the stability. **RELEASE NOTE** https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md **NOTABLE FIXES** - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) - When mock is called multiple times, and verify fails, the error message reports only the first invocation (2.27.4) - Memory leak in mockito-inline calling method on mock with at least a mock as parameter (2.25.0) - Cross-references and a single spy cause memory leak (2.25.0) - Nested spies cause memory leaks (2.25.0) - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) - Return null instead of causing a CCE (2.24.9, 2.24.3) - Issue with mocking type in "java.util.*", Java 12 (2.24.2) Mainly, Maven (Hadoop-2.7/Hadoop-3.2) and SBT(Hadoop-2.7) Jenkins test passed. ## How was this patch tested? Pass the Jenkins with the exiting UTs. > Upgrade Mockito to 2.28.2 > - > > Key: SPARK-28370 > URL: https://issues.apache.org/jira/browse/SPARK-28370 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This issue aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to > bring the latest bug fixes and to be up-to-date for JDK9+ support before > Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had > better wait and see for the stability. > **RELEASE NOTE** > https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md > **NOTABLE FIXES** > - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) > - When mock is called multiple times, and verify fails, the error message > reports only the first invocation (2.27.4) > - Memory leak in mockito-inline calling method on mock with at least a mock > as parameter (2.25.0) > - Cross-references and a single spy cause memory leak (2.25.0) > - Nested spies cause memory leaks (2.25.0) > - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) > - Return null instead of causing a CCE (2.24.9, 2.24.3) > - Issue with mocking type in "java.util.*", Java 12 (2.24.2) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28370) Upgrade Mockito to 2.28.2
[ https://issues.apache.org/jira/browse/SPARK-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28370: -- Description: ## What changes were proposed in this pull request? This PR aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to bring the latest bug fixes and to be up-to-date for JDK9+ support before Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had better wait and see for the stability. **RELEASE NOTE** https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md **NOTABLE FIXES** - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) - When mock is called multiple times, and verify fails, the error message reports only the first invocation (2.27.4) - Memory leak in mockito-inline calling method on mock with at least a mock as parameter (2.25.0) - Cross-references and a single spy cause memory leak (2.25.0) - Nested spies cause memory leaks (2.25.0) - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) - Return null instead of causing a CCE (2.24.9, 2.24.3) - Issue with mocking type in "java.util.*", Java 12 (2.24.2) Mainly, Maven (Hadoop-2.7/Hadoop-3.2) and SBT(Hadoop-2.7) Jenkins test passed. ## How was this patch tested? Pass the Jenkins with the exiting UTs. > Upgrade Mockito to 2.28.2 > - > > Key: SPARK-28370 > URL: https://issues.apache.org/jira/browse/SPARK-28370 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > ## What changes were proposed in this pull request? > This PR aims to upgrade Mockito from **2.23.4** to **2.28.2** in order to > bring the latest bug fixes and to be up-to-date for JDK9+ support before > Apache Spark 3.0.0. There is Mockito 3.0 released 4 days ago, but we had > better wait and see for the stability. > **RELEASE NOTE** > https://github.com/mockito/mockito/blob/release/2.x/doc/release-notes/official.md > **NOTABLE FIXES** > - Configure the MethodVisitor for Java 11+ compatibility (2.27.5) > - When mock is called multiple times, and verify fails, the error message > reports only the first invocation (2.27.4) > - Memory leak in mockito-inline calling method on mock with at least a mock > as parameter (2.25.0) > - Cross-references and a single spy cause memory leak (2.25.0) > - Nested spies cause memory leaks (2.25.0) > - [Java 9 support] ClassCastExceptions with JDK9 javac (2.24.9, 2.24.3) > - Return null instead of causing a CCE (2.24.9, 2.24.3) > - Issue with mocking type in "java.util.*", Java 12 (2.24.2) > Mainly, Maven (Hadoop-2.7/Hadoop-3.2) and SBT(Hadoop-2.7) Jenkins test passed. > ## How was this patch tested? > Pass the Jenkins with the exiting UTs. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not mapped correctly for read/write of SQLServer Tables
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28152: Assignee: Apache Spark > [JDBC Connector] ShortType and FloatTypes are not mapped correctly for > read/write of SQLServer Tables > - > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Assignee: Apache Spark >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. The issue was observed when validating against > SQLServer. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not mapped correctly for read/write of SQLServer Tables
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28152: Assignee: (was: Apache Spark) > [JDBC Connector] ShortType and FloatTypes are not mapped correctly for > read/write of SQLServer Tables > - > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. The issue was observed when validating against > SQLServer. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not mapped correctly for read/write of SQLServer Tables
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884460#comment-16884460 ] Shiv Prashant Sood commented on SPARK-28152: Pull request for fix created. https://github.com/apache/spark/pull/25146 > [JDBC Connector] ShortType and FloatTypes are not mapped correctly for > read/write of SQLServer Tables > - > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. The issue was observed when validating against > SQLServer. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not mapped correctly for read/write of SQLServer Tables
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28152: --- Summary: [JDBC Connector] ShortType and FloatTypes are not mapped correctly for read/write of SQLServer Tables (was: [JDBC Connector] ShortType and FloatTypes are not mapped correctly) > [JDBC Connector] ShortType and FloatTypes are not mapped correctly for > read/write of SQLServer Tables > - > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. The issue was observed when validating against > SQLServer. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28151) [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer tables
[ https://issues.apache.org/jira/browse/SPARK-28151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884454#comment-16884454 ] Shiv Prashant Sood commented on SPARK-28151: Removed the FloatType and Short Type fix description as that would be handled by separate PR (SPARK-28152) > [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer > tables > > > Key: SPARK-28151 > URL: https://issues.apache.org/jira/browse/SPARK-28151 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ##ByteType issue > Writing dataframe with column type BYTETYPE fails when using JDBC connector > for SQL Server. Append and Read of tables also fail. The problem is due > 1. (Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in > jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped > to TINYINT > {color:#cc7832}case {color}ByteType => > Option(JdbcType({color:#6a8759}"BYTE"{color}{color:#cc7832}, > {color}java.sql.Types.{color:#9876aa}TINYINT{color})) > In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to > INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from > the point of view of upcasting, but will lead to 4 byte allocation rather > than 1 byte for BYTETYPE. > 2. (read path) Read path ends up calling makeGetter(dt: DataType, metadata: > Metadata). The function sets the value in RDD row. The value is set per the > data type. Here there is no mapping for BYTETYPE and thus results will result > in an error when getCatalystType() is fixed. > Note : These issues were found when reading/writing with SQLServer. Will be > submitting a PR soon to fix these mappings in MSSQLServerDialect. > Error seen when writing table > (JDBC Write failed,com.microsoft.sqlserver.jdbc.SQLServerException: Column, > parameter, or variable #2: *Cannot find data type BYTE*.) > com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or > variable #2: Cannot find data type BYTE. > com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254) > com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608) > com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:859) > .. > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28151) [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer tables
[ https://issues.apache.org/jira/browse/SPARK-28151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28151: --- Description: ##ByteType issue Writing dataframe with column type BYTETYPE fails when using JDBC connector for SQL Server. Append and Read of tables also fail. The problem is due 1. (Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped to TINYINT {color:#cc7832}case {color}ByteType => Option(JdbcType({color:#6a8759}"BYTE"{color}{color:#cc7832}, {color}java.sql.Types.{color:#9876aa}TINYINT{color})) In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from the point of view of upcasting, but will lead to 4 byte allocation rather than 1 byte for BYTETYPE. 2. (read path) Read path ends up calling makeGetter(dt: DataType, metadata: Metadata). The function sets the value in RDD row. The value is set per the data type. Here there is no mapping for BYTETYPE and thus results will result in an error when getCatalystType() is fixed. Note : These issues were found when reading/writing with SQLServer. Will be submitting a PR soon to fix these mappings in MSSQLServerDialect. Error seen when writing table (JDBC Write failed,com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #2: *Cannot find data type BYTE*.) com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #2: Cannot find data type BYTE. com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254) com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608) com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:859) .. was: ##ByteType issue Writing dataframe with column type BYTETYPE fails when using JDBC connector for SQL Server. Append and Read of tables also fail. The problem is due 1. (Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped to TINYINT {color:#cc7832}case {color}ByteType => Option(JdbcType({color:#6a8759}"BYTE"{color}{color:#cc7832}, {color}java.sql.Types.{color:#9876aa}TINYINT{color})) In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from the point of view of upcasting, but will lead to 4 byte allocation rather than 1 byte for BYTETYPE. 2. (read path) Read path ends up calling makeGetter(dt: DataType, metadata: Metadata). The function sets the value in RDD row. The value is set per the data type. Here there is no mapping for BYTETYPE and thus results will result in an error when getCatalystType() is fixed. Note : These issues were found when reading/writing with SQLServer. Will be submitting a PR soon to fix these mappings in MSSQLServerDialect. Error seen when writing table (JDBC Write failed,com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #2: *Cannot find data type BYTE*.) com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or variable #2: Cannot find data type BYTE. com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254) com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608) com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:859) .. ##ShortType and FloatType issue ShortType and FloatTypes are not correctly mapped to right JDBC types when using JDBC connector. This results in tables and spark data frame being created with unintended types. Some example issue Write from df with column type results in a SQL table of with column type as INTEGER as opposed to SMALLINT. Thus a larger table that expected. read results in a dataframe with type INTEGER as opposed to ShortType FloatTypes have a issue with read path. In the write path Spark data type 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in the read path when JDBC data types need to be converted to Catalyst data types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' rather than 'FloatType'. > [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer > tables > > > Key: SPARK-28151 > URL: https://issues.apache.org/jira/browse/SPARK-28151 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priori
[jira] [Updated] (SPARK-28151) [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer tables
[ https://issues.apache.org/jira/browse/SPARK-28151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28151: --- Summary: [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer tables (was: ByteType, ShortType and FloatTypes are not correctly mapped for read/write of SQLServer tables) > [JDBC Connector] ByteType is not correctly mapped for read/write of SQLServer > tables > > > Key: SPARK-28151 > URL: https://issues.apache.org/jira/browse/SPARK-28151 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ##ByteType issue > Writing dataframe with column type BYTETYPE fails when using JDBC connector > for SQL Server. Append and Read of tables also fail. The problem is due > 1. (Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in > jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped > to TINYINT > {color:#cc7832}case {color}ByteType => > Option(JdbcType({color:#6a8759}"BYTE"{color}{color:#cc7832}, > {color}java.sql.Types.{color:#9876aa}TINYINT{color})) > In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to > INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from > the point of view of upcasting, but will lead to 4 byte allocation rather > than 1 byte for BYTETYPE. > 2. (read path) Read path ends up calling makeGetter(dt: DataType, metadata: > Metadata). The function sets the value in RDD row. The value is set per the > data type. Here there is no mapping for BYTETYPE and thus results will result > in an error when getCatalystType() is fixed. > Note : These issues were found when reading/writing with SQLServer. Will be > submitting a PR soon to fix these mappings in MSSQLServerDialect. > Error seen when writing table > (JDBC Write failed,com.microsoft.sqlserver.jdbc.SQLServerException: Column, > parameter, or variable #2: *Cannot find data type BYTE*.) > com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or > variable #2: Cannot find data type BYTE. > com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254) > com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608) > com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:859) > .. > ##ShortType and FloatType issue > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. > Some example issue > Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > > > > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not mapped correctly
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28152: --- Description: ShortType and FloatTypes are not correctly mapped to right JDBC types when using JDBC connector. This results in tables and spark data frame being created with unintended types. The issue was observed when validating against SQLServer. Some example issue * Write from df with column type results in a SQL table of with column type as INTEGER as opposed to SMALLINT. Thus a larger table that expected. * read results in a dataframe with type INTEGER as opposed to ShortType FloatTypes have a issue with read path. In the write path Spark data type 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in the read path when JDBC data types need to be converted to Catalyst data types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' rather than 'FloatType'. was: ShortType and FloatTypes are not correctly mapped to right JDBC types when using JDBC connector. This results in tables and spark data frame being created with unintended types. Some example issue * Write from df with column type results in a SQL table of with column type as INTEGER as opposed to SMALLINT. Thus a larger table that expected. * read results in a dataframe with type INTEGER as opposed to ShortType FloatTypes have a issue with read path. In the write path Spark data type 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in the read path when JDBC data types need to be converted to Catalyst data types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' rather than 'FloatType'. > [JDBC Connector] ShortType and FloatTypes are not mapped correctly > -- > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. The issue was observed when validating against > SQLServer. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not mapped correctly
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28152: --- Summary: [JDBC Connector] ShortType and FloatTypes are not mapped correctly (was: [JDBC Connector] ShortType and FloatTypes are not correctly mapped correctly) > [JDBC Connector] ShortType and FloatTypes are not mapped correctly > -- > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28152) [JDBC Connector] ShortType and FloatTypes are not correctly mapped correctly
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28152: --- Summary: [JDBC Connector] ShortType and FloatTypes are not correctly mapped correctly (was: ShortType and FloatTypes are not correctly mapped correctly) > [JDBC Connector] ShortType and FloatTypes are not correctly mapped correctly > > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28152) ShortType and FloatTypes are not correctly mapped correctly
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-28152: --- Summary: ShortType and FloatTypes are not correctly mapped correctly (was: ShortType and FloatTypes are not correctly mapped to right JDBC types when using JDBC connector) > ShortType and FloatTypes are not correctly mapped correctly > --- > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-28152) ShortType and FloatTypes are not correctly mapped to right JDBC types when using JDBC connector
[ https://issues.apache.org/jira/browse/SPARK-28152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood reopened SPARK-28152: Reopening this issue to submit this change as a separate PR for clarity. Earlier this change for made part of the ByteType PR ( 28151) > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector > --- > > Key: SPARK-28152 > URL: https://issues.apache.org/jira/browse/SPARK-28152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 2.4.3 >Reporter: Shiv Prashant Sood >Priority: Minor > > ShortType and FloatTypes are not correctly mapped to right JDBC types when > using JDBC connector. This results in tables and spark data frame being > created with unintended types. > Some example issue > * Write from df with column type results in a SQL table of with column type > as INTEGER as opposed to SMALLINT. Thus a larger table that expected. > * read results in a dataframe with type INTEGER as opposed to ShortType > FloatTypes have a issue with read path. In the write path Spark data type > 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in > the read path when JDBC data types need to be converted to Catalyst data > types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' > rather than 'FloatType'. > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28380) DataSourceV2 API based JDBC connector
Shiv Prashant Sood created SPARK-28380: -- Summary: DataSourceV2 API based JDBC connector Key: SPARK-28380 URL: https://issues.apache.org/jira/browse/SPARK-28380 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Shiv Prashant Sood JIRA for DataSourceV2 API based JDBC connector. Goals : - Generic connector based on JDBC that supports all databases (min bar is support for all V1 data bases). - Reference implementation and Interface for any specialized JDBC connectors. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28371) Parquet "starts with" filter is not null-safe
[ https://issues.apache.org/jira/browse/SPARK-28371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28371. --- Resolution: Fixed Fix Version/s: 2.4.4 3.0.0 Issue resolved by pull request 25140 [https://github.com/apache/spark/pull/25140] > Parquet "starts with" filter is not null-safe > - > > Key: SPARK-28371 > URL: https://issues.apache.org/jira/browse/SPARK-28371 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Major > Fix For: 3.0.0, 2.4.4 > > > I ran into this when running unit tests with Parquet 1.11. It seems that 1.10 > has the same behavior in a few places but Spark somehow doesn't trigger those > code paths. > Basically, {{UserDefinedPredicate.keep}} should be null-safe, and Spark's > implementation is not. This was clarified in Parquet's documentation in > PARQUET-1489. > Failure I was getting: > {noformat} > Job aborted due to stage failure: Task 0 in stage 1304.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 1304.0 (TID 2528, localhost, executor > driver): java.lang.NullPointerException > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:544) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:523) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:152) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56) > at > org.apache.parquet.filter2.predicate.Operators$UserDefined.accept(Operators.java:377) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:181) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56) > at > org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81) > at > org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81) > at > org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:954) > at > org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:759) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:207) > at > org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182) > at > org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:439) > ... > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28371) Parquet "starts with" filter is not null-safe
[ https://issues.apache.org/jira/browse/SPARK-28371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28371: - Assignee: Marcelo Vanzin > Parquet "starts with" filter is not null-safe > - > > Key: SPARK-28371 > URL: https://issues.apache.org/jira/browse/SPARK-28371 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin >Priority: Major > > I ran into this when running unit tests with Parquet 1.11. It seems that 1.10 > has the same behavior in a few places but Spark somehow doesn't trigger those > code paths. > Basically, {{UserDefinedPredicate.keep}} should be null-safe, and Spark's > implementation is not. This was clarified in Parquet's documentation in > PARQUET-1489. > Failure I was getting: > {noformat} > Job aborted due to stage failure: Task 0 in stage 1304.0 failed 1 times, most > recent failure: Lost task 0.0 in stage 1304.0 (TID 2528, localhost, executor > driver): java.lang.NullPointerException > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:544) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:523) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:152) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56) > at > org.apache.parquet.filter2.predicate.Operators$UserDefined.accept(Operators.java:377) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:181) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56) > at > org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81) > at > org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137) > at > org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81) > at > org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:954) > at > org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:759) > at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:207) > at > org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182) > at > org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:439) > ... > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28247) Flaky test: "query without test harness" in ContinuousSuite
[ https://issues.apache.org/jira/browse/SPARK-28247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28247. --- Resolution: Fixed Assignee: Jungtaek Lim Fix Version/s: 3.0.0 Resolved by https://github.com/apache/spark/pull/25048 > Flaky test: "query without test harness" in ContinuousSuite > --- > > Key: SPARK-28247 > URL: https://issues.apache.org/jira/browse/SPARK-28247 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > This test has failed a few times in some PRs, as well as easy to reproduce > locally. Example of a failure: > {noformat} > [info] - query without test harness *** FAILED *** (2 seconds, 931 > milliseconds) > [info] scala.Predef.Set.apply[Int](0, 1, 2, > 3).map[org.apache.spark.sql.Row, > scala.collection.immutable.Set[org.apache.spark.sql.Row]](((x$3: Int) => > org.apache.spark.sql.Row.apply(x$3)))(immutable.this.Set.canBuildFrom[org.apache.spark.sql.Row]).subsetOf(scala.Predef.refArrayOps[org.apache.spark.sql.Row](results).toSet[org.apache.spark.sql.Row]) > was false > (ContinuousSuite.scala:226){noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28222) Feature importance outputs different values in GBT and Random Forest in 2.3.3 and 2.4 pyspark version
[ https://issues.apache.org/jira/browse/SPARK-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884412#comment-16884412 ] Marco Gaido commented on SPARK-28222: - [~eneriwrt] do you have a simple repro for this? I can try and check it if I have an example to debug. > Feature importance outputs different values in GBT and Random Forest in 2.3.3 > and 2.4 pyspark version > - > > Key: SPARK-28222 > URL: https://issues.apache.org/jira/browse/SPARK-28222 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 >Reporter: eneriwrt >Priority: Minor > > Feature importance values obtained in a binary classification project outputs > different values if 2.3.3 version used or 2.4.0. It happens in Random Forest > and GBT. Turns out that values that are equal than sklearn output are from > 2.3.3 version. > As an example: > *SPARK 2.4* > MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, > 0.06894132653061226, 0.15857667209786705, 0.2974447311021076, > 0.06324418636918638] > MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, > 0.06578883597468652, 0.17433924485055197, 0.31754597164210124, > 0.055888697733790925] > MODEL GradientBoostingClassifier [0.0, 0.7556, > 0.24438, 0.0, 1.4602196686471875e-17, 0.0] > *SPARK 2.3.3* > MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, > 0.06894132653061226, 0.16413222765342259, 0.2974447311021076, > 0.05991085303585305] > MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, > 0.06578883597468652, 0.18789704501922055, 0.30398817147343266, > 0.055888697733790925] > MODEL GradientBoostingClassifier [0.0, 0.7555, > 0.24438, 0.0, 2.4326753518951276e-17, 0.0] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28379) ANSI SQL: LATERAL derived table(T491)
Yuming Wang created SPARK-28379: --- Summary: ANSI SQL: LATERAL derived table(T491) Key: SPARK-28379 URL: https://issues.apache.org/jira/browse/SPARK-28379 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. This allows them to reference columns provided by preceding {{FROM}} items. (Without {{LATERAL}}, each subquery is evaluated independently and so cannot cross-reference any other {{FROM}} item.) Table functions appearing in {{FROM}} can also be preceded by the key word {{LATERAL}}, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding {{FROM}} items in any case. A {{LATERAL}} item can appear at top level in the {{FROM}} list, or within a {{JOIN}} tree. In the latter case it can also refer to any items that are on the left-hand side of a {{JOIN}} that it is on the right-hand side of. When a {{FROM}} item contains {{LATERAL}} cross-references, evaluation proceeds as follows: for each row of the {{FROM}} item providing the cross-referenced column(s), or set of rows of multiple {{FROM}} items providing the columns, the {{LATERAL}} item is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual with the rows they were computed from. This is repeated for each row or set of rows from the column source table(s). A trivial example of {{LATERAL}} is {code:sql} SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss; {code} *Feature ID*: T491 https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-FROM -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28375) Enforce idempotence on the PullupCorrelatedPredicates optimizer rule
[ https://issues.apache.org/jira/browse/SPARK-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28375: Assignee: Apache Spark > Enforce idempotence on the PullupCorrelatedPredicates optimizer rule > > > Key: SPARK-28375 > URL: https://issues.apache.org/jira/browse/SPARK-28375 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Apache Spark >Priority: Major > > The current PullupCorrelatedPredicates implementation can accidentally remove > predicates for multiple runs. > For example, for the following logical plan, one more optimizer run can > remove the predicate in the SubqueryExpresssion. > {code:java} > # Optimized > Project [a#0] > +- Filter a#0 IN (list#4 [(b#1 < d#3)]) >: +- Project [c#2, d#3] >: +- LocalRelation , [c#2, d#3] >+- LocalRelation , [a#0, b#1] > # Double optimized > Project [a#0] > +- Filter a#0 IN (list#4 []) >: +- Project [c#2, d#3] >: +- LocalRelation , [c#2, d#3] >+- LocalRelation , [a#0, b#1] > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28375) Enforce idempotence on the PullupCorrelatedPredicates optimizer rule
[ https://issues.apache.org/jira/browse/SPARK-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28375: Assignee: (was: Apache Spark) > Enforce idempotence on the PullupCorrelatedPredicates optimizer rule > > > Key: SPARK-28375 > URL: https://issues.apache.org/jira/browse/SPARK-28375 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Major > > The current PullupCorrelatedPredicates implementation can accidentally remove > predicates for multiple runs. > For example, for the following logical plan, one more optimizer run can > remove the predicate in the SubqueryExpresssion. > {code:java} > # Optimized > Project [a#0] > +- Filter a#0 IN (list#4 [(b#1 < d#3)]) >: +- Project [c#2, d#3] >: +- LocalRelation , [c#2, d#3] >+- LocalRelation , [a#0, b#1] > # Double optimized > Project [a#0] > +- Filter a#0 IN (list#4 []) >: +- Project [c#2, d#3] >: +- LocalRelation , [c#2, d#3] >+- LocalRelation , [a#0, b#1] > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28355) Use Spark conf for threshold at which UDF is compressed by broadcast
[ https://issues.apache.org/jira/browse/SPARK-28355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28355. - Resolution: Fixed Fix Version/s: 3.0.0 > Use Spark conf for threshold at which UDF is compressed by broadcast > > > Key: SPARK-28355 > URL: https://issues.apache.org/jira/browse/SPARK-28355 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jesse Cai >Assignee: Jesse Cai >Priority: Blocker > Fix For: 3.0.0 > > > The _prepare_for_python_RDD method currently broadcasts a pickled command if > its length is greater than the hardcoded value 1 << 20 (1M). We would like to > set this value as a Spark conf instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28355) Use Spark conf for threshold at which UDF is compressed by broadcast
[ https://issues.apache.org/jira/browse/SPARK-28355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-28355: --- Assignee: Jesse Cai > Use Spark conf for threshold at which UDF is compressed by broadcast > > > Key: SPARK-28355 > URL: https://issues.apache.org/jira/browse/SPARK-28355 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jesse Cai >Assignee: Jesse Cai >Priority: Blocker > > The _prepare_for_python_RDD method currently broadcasts a pickled command if > its length is greater than the hardcoded value 1 << 20 (1M). We would like to > set this value as a Spark conf instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28355) Use Spark conf for threshold at which UDF is compressed by broadcast
[ https://issues.apache.org/jira/browse/SPARK-28355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28355: Priority: Minor (was: Blocker) > Use Spark conf for threshold at which UDF is compressed by broadcast > > > Key: SPARK-28355 > URL: https://issues.apache.org/jira/browse/SPARK-28355 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jesse Cai >Assignee: Jesse Cai >Priority: Minor > Fix For: 3.0.0 > > > The _prepare_for_python_RDD method currently broadcasts a pickled command if > its length is greater than the hardcoded value 1 << 20 (1M). We would like to > set this value as a Spark conf instead. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-20856) support statement using nested joins
[ https://issues.apache.org/jira/browse/SPARK-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reopened SPARK-20856: - > support statement using nested joins > > > Key: SPARK-20856 > URL: https://issues.apache.org/jira/browse/SPARK-20856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Major > Labels: bulk-closed > > While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does > not. > Not supported > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > versus written as shown > select * from > cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner > join cert.tbint tbint on tint.rnum = tbint.rnum > > ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', > 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, > 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', > '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', > '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) > == SQL == > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > -^^^ > , Query: select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum. > SQLState: HY000 > ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28316) Decimal precision issue
[ https://issues.apache.org/jira/browse/SPARK-28316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884398#comment-16884398 ] Marco Gaido commented on SPARK-28316: - Well, IIUC, this is just the result of Postgres having no limit on decimal precision, while Spark's Decimal max precision is 38. Our decimal implementation draws from SQLServer's (and Hive's, which follows SQLServer) one. > Decimal precision issue > --- > > Key: SPARK-28316 > URL: https://issues.apache.org/jira/browse/SPARK-28316 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Multiply check: > {code:sql} > -- Spark SQL > spark-sql> select cast(-34338492.215397047 as decimal(38, 10)) * > cast(-34338492.215397047 as decimal(38, 10)); > 1179132047626883.596862 > -- PostgreSQL > postgres=# select cast(-34338492.215397047 as numeric(38, 10)) * > cast(-34338492.215397047 as numeric(38, 10)); >?column? > --- > 1179132047626883.59686213585632020900 > (1 row) > {code} > Division check: > {code:sql} > -- Spark SQL > spark-sql> select cast(93901.57763026 as decimal(38, 10)) / cast(4.31 as > decimal(38, 10)); > 21786.908963 > -- PostgreSQL > postgres=# select cast(93901.57763026 as numeric(38, 10)) / cast(4.31 as > numeric(38, 10)); > ?column? > > 21786.908962937355 > (1 row) > {code} > POWER(10, LN(value)) check: > {code:sql} > -- Spark SQL > spark-sql> SELECT CAST(POWER(cast('10' as decimal(38, 18)), > LN(ABS(round(cast(-24926804.04504742 as decimal(38, 10)),200 AS > decimal(38, 10)); > 107511333880051856 > -- PostgreSQL > postgres=# SELECT CAST(POWER(cast('10' as numeric(38, 18)), > LN(ABS(round(cast(-24926804.04504742 as numeric(38, 10)),200 AS > numeric(38, 10)); > power > --- > 107511333880052007.0414112467 > (1 row) > {code} > AVG, STDDEV and VARIANCE returns double type: > {code:sql} > -- Spark SQL > spark-sql> create temporary view t1 as select * from values > > (cast(-24926804.04504742 as decimal(38, 10))), > > (cast(16397.038491 as decimal(38, 10))), > > (cast(7799461.4119 as decimal(38, 10))) > > as t1(t); > spark-sql> SELECT AVG(t), STDDEV(t), VARIANCE(t) FROM t1; > -5703648.53155214 1.7096528995154984E72.922913036821751E14 > -- PostgreSQL > postgres=# SELECT AVG(t), STDDEV(t), VARIANCE(t) from (values > (cast(-24926804.04504742 as decimal(38, 10))), (cast(16397.038491 as > decimal(38, 10))), (cast(7799461.4119 as decimal(38, 10 t1(t); > avg |stddev | > variance > ---+---+-- > -5703648.53155214 | 17096528.99515498420743029415 | > 292291303682175.094017569588 > (1 row) > {code} > EXP returns double type: > {code:sql} > -- Spark SQL > spark-sql> select exp(cast(1.0 as decimal(31,30))); > 2.718281828459045 > -- PostgreSQL > postgres=# select exp(cast(1.0 as decimal(31,30))); >exp > -- > 2.718281828459045235360287471353 > (1 row) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28324) The LOG function using 10 as the base, but Spark using E
[ https://issues.apache.org/jira/browse/SPARK-28324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884397#comment-16884397 ] Marco Gaido commented on SPARK-28324: - +1 for [~srowen]'s opinion. I don't think it is a good idea to change the behavior here. > The LOG function using 10 as the base, but Spark using E > > > Key: SPARK-28324 > URL: https://issues.apache.org/jira/browse/SPARK-28324 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Spark SQL: > {code:sql} > spark-sql> select log(10); > 2.302585092994046 > {code} > PostgreSQL: > {code:sql} > postgres=# select log(10); > log > - >1 > (1 row) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28369) Check overflow in decimal UDF
[ https://issues.apache.org/jira/browse/SPARK-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28369: Assignee: Apache Spark > Check overflow in decimal UDF > - > > Key: SPARK-28369 > URL: https://issues.apache.org/jira/browse/SPARK-28369 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mick Jermsurawong >Assignee: Apache Spark >Priority: Minor > > Udf resulting in overflowing BigDecimal currently returns null. This is > inconsistent with new behavior allow option to check and throw overflow > introduced in https://issues.apache.org/jira/browse/SPARK-23179 > {code:java} > import spark.implicits._ > val tenFold: java.math.BigDecimal => java.math.BigDecimal = > _.multiply(new java.math.BigDecimal("10")) > val tenFoldUdf = udf(tenFold) > val ds = spark > .createDataset(Seq(BigDecimal("12345678901234567890.123"))) > .select(tenFoldUdf(col("value"))) > .as[BigDecimal] > ds.collect shouldEqual Seq(null){code} > The problem is at the {{CatalystTypeConverters}} where {{toPrecision}} gets > converted to null > [https://github.com/apache/spark/blob/13ae9ebb38ba357aeb3f1e3fe497b322dff8eb35/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L344-L356] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28369) Check overflow in decimal UDF
[ https://issues.apache.org/jira/browse/SPARK-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28369: Assignee: (was: Apache Spark) > Check overflow in decimal UDF > - > > Key: SPARK-28369 > URL: https://issues.apache.org/jira/browse/SPARK-28369 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Mick Jermsurawong >Priority: Minor > > Udf resulting in overflowing BigDecimal currently returns null. This is > inconsistent with new behavior allow option to check and throw overflow > introduced in https://issues.apache.org/jira/browse/SPARK-23179 > {code:java} > import spark.implicits._ > val tenFold: java.math.BigDecimal => java.math.BigDecimal = > _.multiply(new java.math.BigDecimal("10")) > val tenFoldUdf = udf(tenFold) > val ds = spark > .createDataset(Seq(BigDecimal("12345678901234567890.123"))) > .select(tenFoldUdf(col("value"))) > .as[BigDecimal] > ds.collect shouldEqual Seq(null){code} > The problem is at the {{CatalystTypeConverters}} where {{toPrecision}} gets > converted to null > [https://github.com/apache/spark/blob/13ae9ebb38ba357aeb3f1e3fe497b322dff8eb35/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L344-L356] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 2:24 PM: -- Yes, needs debugging (build Spark with extra log statements, one way to do it), but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] Another question is why there is no PythonRunner thread, has that exited? was (Author: skonto): Yes, needs debugging (build Spark with extra log statements), but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] Another question is why there is no PythonRunner thread, has that exited? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 2:20 PM: -- Yes, needs debugging (build Spark with extra log statements), but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] Another question is why there is no PythonRunner thread, has that exited? was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] Another question is why there is no PythonRunner thread, has that exited? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 2:13 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] Another question is why there is no PythonRunner thread, has that exited? was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:56 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Btw since DestroyJavaVM is there as a thread in your dump the shutdown process has started but blocked. Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:53 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Btw since DestroyJavaVM is there as a thread in your dump the shutdown process has started but blocked. Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: [https://bugs.openjdk.java.net/browse/JDK-8154017)?|https://bugs.openjdk.java.net/browse/JDK-8154017)] was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Btw since DestroyJavaVM is there as a thread in your dump the shutdown process has started but blocked. Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: https://bugs.openjdk.java.net/browse/JDK-8154017)? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:52 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Btw since DestroyJavaVM is there as a thread in your dump the shutdown process has started but blocked. Are you using btw the same jdk (we need to make sure behavior has not changed as in this one: https://bugs.openjdk.java.net/browse/JDK-8154017)? was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Btw since DestroyJavaVM is there as a thread in your dump the shutdown process has started but blocked. Are you using btw the same jdk? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:51 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? Btw since DestroyJavaVM is there as a thread in your dump the shutdown process has started but blocked. Are you using btw the same jdk? was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:46 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen (logging would help but i suspect it never happens)? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? was (Author: skonto): Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 1:43 PM: -- Yes, needs debugging, but if you check the code there, there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Without the interrupt the EventLoop thread cannot exit. Does this ever happen? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff with the working version will be why in the working case the shutdown happens? was (Author: skonto): Yes, needs debugging not sure if the commit itself if the issue, but if you check the code there there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Does this ever happen? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff will be why in the working case the shutdown happens? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884374#comment-16884374 ] Stavros Kontopoulos commented on SPARK-27927: - Yes, needs debugging not sure if the commit itself if the issue, but if you check the code there there is an interrupt call by the other thread that joins the EventLoop one: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L78] Does this ever happen? Stop is called there by the shutdownhook when sparkcontext is stopped. So the diff will be why in the working case the shutdown happens? > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884339#comment-16884339 ] Edwin Biemond edited comment on SPARK-27927 at 7/13/19 10:24 AM: - thanks again, indeed it looks like this commit can an issue [https://github.com/apache/spark/commit/03e90f65bfdad376400a4ae4df31a82c05ed4d4b#diff-2952082eba54dc17cd6f73a3260e8f2d] it is related to dag-scheduler-event-loop blocked thread and hitting my sparkcontext, spark session issue. but this was already commit 1 year ago and spark 2.4.0 is working fine for us. Which already had this change. The hunt goes on was (Author: ebiemond): thanks again, indeed it looks like this commit can be the issue [https://github.com/apache/spark/commit/03e90f65bfdad376400a4ae4df31a82c05ed4d4b#diff-2952082eba54dc17cd6f73a3260e8f2d] it is related to dag-scheduler-event-loop blocked thread and hitting my sparkcontext, spark session issue. > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884339#comment-16884339 ] Edwin Biemond commented on SPARK-27927: --- thanks again, indeed it looks like this commit can be the issue [https://github.com/apache/spark/commit/03e90f65bfdad376400a4ae4df31a82c05ed4d4b#diff-2952082eba54dc17cd6f73a3260e8f2d] it is related to dag-scheduler-event-loop blocked thread and hitting my sparkcontext, spark session issue. > driver pod hangs with pyspark 2.4.3 and master on kubenetes > --- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. >Reporter: Edwin Biemond >Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20856) support statement using nested joins
[ https://issues.apache.org/jira/browse/SPARK-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884332#comment-16884332 ] Yuming Wang commented on SPARK-20856: - Could we reopen it because I encounter this case when porting [join.sql#L1170-L1243|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/join.sql#L1170-L1243]? > support statement using nested joins > > > Key: SPARK-20856 > URL: https://issues.apache.org/jira/browse/SPARK-20856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Major > Labels: bulk-closed > > While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does > not. > Not supported > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > versus written as shown > select * from > cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner > join cert.tbint tbint on tint.rnum = tbint.rnum > > ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', > 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, > 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', > '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', > '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) > == SQL == > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > -^^^ > , Query: select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum. > SQLState: HY000 > ErrorCode: 500051 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28269) ArrowStreamPandasSerializer get stack
[ https://issues.apache.org/jira/browse/SPARK-28269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884330#comment-16884330 ] Modi Tamam commented on SPARK-28269: [~hyukjin.kwon] I think that your diagnose is wrong and you haven't reached the problematic action. The problem is on this row: {code:java} full_spark_df.withColumn(grouped_col,F.lit('0')).groupBy(grouped_col).apply(very_simpl_udf).show() {code} And it seems like you haven't reached it. > ArrowStreamPandasSerializer get stack > - > > Key: SPARK-28269 > URL: https://issues.apache.org/jira/browse/SPARK-28269 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: Modi Tamam >Priority: Major > Attachments: Untitled.xcf > > > I'm working with Pyspark version 2.4.3. > I have a big data frame: > * ~15M rows > * ~130 columns > * ~2.5 GB - I've converted it to a Pandas data frame, then, pickling it > (pandas_df.toPickle() ) resulted with a file of size 2.5GB. > I have some code that groups this data frame and applying a Pandas-UDF: > > {code:java} > from pyspark.sql import Row > from pyspark.sql.functions import lit, pandas_udf, PandasUDFType, to_json > from pyspark.sql.types import * > from pyspark.sql import functions as F > initial_list = range(4500) > rdd = sc.parallelize(initial_list) > rdd = rdd.map(lambda x: Row(val=x)) > initial_spark_df = spark.createDataFrame(rdd) > cols_count = 132 > rows = 1000 > # --- Start Generating the big data frame--- > # Generating the schema > schema = StructType([StructField(str(i), IntegerType()) for i in > range(cols_count)]) > @pandas_udf(returnType=schema,functionType=PandasUDFType.GROUPED_MAP) > def random_pd_df_generator(df): > import numpy as np > import pandas as pd > return pd.DataFrame(np.random.randint(0, 100, size=(rows, cols_count)), > columns=range(cols_count)) > full_spark_df = initial_spark_df.groupBy("val").apply(random_pd_df_generator) > # --- End Generating the big data frame--- > # ---Start the bug reproduction--- > grouped_col = "col_0" > @pandas_udf("%s string" %grouped_col, PandasUDFType.GROUPED_MAP) > def very_simpl_udf(pdf): > import pandas as pd > ret_val = pd.DataFrame({grouped_col: [str(pdf[grouped_col].iloc[0])]}) > return ret_val > # In order to create a huge dataset, I've set all of the grouped_col value to > a single value, then, grouped it into a single dataset. > # Here is where to program gets stuck > full_spark_df.withColumn(grouped_col,F.lit('0')).groupBy(grouped_col).apply(very_simpl_udf).show() > assert False, "If we're, means that the issue wasn't reproduced" > {code} > > The above code gets stuck on the ArrowStreamPandasSerializer: (on the first > line when reading batch from the reader) > > {code:java} > for batch in reader: > yield [self.arrow_to_pandas(c) for c in > pa.Table.from_batches([batch]).itercolumns()]{code} > > You can just run the first code snippet and it will reproduce. > Open a Pyspark shell with this configuration: > {code:java} > pyspark --conf "spark.python.worker.memory=3G" --conf > "spark.executor.memory=20G" --conf > "spark.executor.extraJavaOptions=-XX:+UseG1GC" --conf > "spark.driver.memory=10G"{code} > > Versions: > * pandas - 0.24.2 > * pyarrow - 0.13.0 > * Spark - 2.4.2 > * Python - 2.7.16 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28378) Remove usage of cgi.escape
[ https://issues.apache.org/jira/browse/SPARK-28378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28378: Assignee: Apache Spark > Remove usage of cgi.escape > -- > > Key: SPARK-28378 > URL: https://issues.apache.org/jira/browse/SPARK-28378 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Minor > > {{cgi.escape}} is deprecated [1], and removed at 3.8 [2]. We better to > replace it. > [1] [https://docs.python.org/3/library/cgi.html#cgi.escape]. > [2] [https://docs.python.org/3.8/whatsnew/3.8.html#api-and-feature-removals] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28378) Remove usage of cgi.escape
[ https://issues.apache.org/jira/browse/SPARK-28378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28378: Assignee: (was: Apache Spark) > Remove usage of cgi.escape > -- > > Key: SPARK-28378 > URL: https://issues.apache.org/jira/browse/SPARK-28378 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Minor > > {{cgi.escape}} is deprecated [1], and removed at 3.8 [2]. We better to > replace it. > [1] [https://docs.python.org/3/library/cgi.html#cgi.escape]. > [2] [https://docs.python.org/3.8/whatsnew/3.8.html#api-and-feature-removals] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28378) Remove usage of cgi.escape
Liang-Chi Hsieh created SPARK-28378: --- Summary: Remove usage of cgi.escape Key: SPARK-28378 URL: https://issues.apache.org/jira/browse/SPARK-28378 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.0.0 Reporter: Liang-Chi Hsieh {{cgi.escape}} is deprecated [1], and removed at 3.8 [2]. We better to replace it. [1] [https://docs.python.org/3/library/cgi.html#cgi.escape]. [2] [https://docs.python.org/3.8/whatsnew/3.8.html#api-and-feature-removals] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org