[jira] [Commented] (SPARK-28506) not handling usage of group function and window function at some conditions
[ https://issues.apache.org/jira/browse/SPARK-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240650#comment-17240650 ] Dylan Guedes commented on SPARK-28506: -- I took this based on their golden files; maybe they changed the behavior in recent versions. If you ran both queries in PgSQL and in SparkSQL and the output was the same you can close this just fine I think. > not handling usage of group function and window function at some conditions > --- > > Key: SPARK-28506 > URL: https://issues.apache.org/jira/browse/SPARK-28506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Dylan Guedes >Priority: Major > > Hi, > looks like SparkSQL is not able to handle this query: > {code:sql}SELECT rank() OVER (ORDER BY 1), count(*) FROM empsalary GROUP BY > 1;{code} > PgSQL, on the other hand, does. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29638) Spark handles 'NaN' as 0 in sums
[ https://issues.apache.org/jira/browse/SPARK-29638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240649#comment-17240649 ] Dylan Guedes commented on SPARK-29638: -- Hmmm I wasn't expecting this :( I took that PgSQL would return `NaN` by reading their outputs for the golden tests. Maybe they changed this in recent versions? oO Anyway, if Spark is summing `NaN` as zero that stills inconsistent with PgSql. > Spark handles 'NaN' as 0 in sums > > > Key: SPARK-29638 > URL: https://issues.apache.org/jira/browse/SPARK-29638 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. > PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = > 'NaN' > I experienced this with the query below: > {code:sql} > SELECT a, b, >SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) > FROM (VALUES(1,1),(2,2),(3,(cast('nan' as int))),(4,3),(5,4)) t(a,b); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28064) Order by does not accept a call to rank()
[ https://issues.apache.org/jira/browse/SPARK-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238662#comment-17238662 ] Dylan Guedes commented on SPARK-28064: -- Sorry, my only intention was to help to map the differences between PostgreSQL and SparkSQL APIs. > Order by does not accept a call to rank() > - > > Key: SPARK-28064 > URL: https://issues.apache.org/jira/browse/SPARK-28064 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Dylan Guedes >Priority: Major > > Currently in Spark, we can't use a call to `rank()` in a order by; we need to > first rename the rank column to, for instance, `r` and then, use `order by > r`. For example: > This does not work: > {code:sql} > SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS > (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w; > {code} > However, this one does: > {code:sql} > SELECT depname, empno, salary, rank() OVER w as r FROM empsalary WINDOW w AS > (PARTITION BY depname ORDER BY salary) ORDER BY r; > {code} > By the way, I took this one from Postgres behavior: postgres accept both ways. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31539) Backport SPARK-27138 Remove AdminUtils calls (fixes deprecation)
[ https://issues.apache.org/jira/browse/SPARK-31539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091705#comment-17091705 ] Dylan Guedes commented on SPARK-31539: -- Agreed, I think it is not worth it. > Backport SPARK-27138 Remove AdminUtils calls (fixes deprecation) > -- > > Key: SPARK-31539 > URL: https://issues.apache.org/jira/browse/SPARK-31539 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 2.4.6 >Reporter: Holden Karau >Priority: Major > > SPARK-27138 Remove AdminUtils calls (fixes deprecation) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29638) Spark handles 'NaN' as 0 in sums
[ https://issues.apache.org/jira/browse/SPARK-29638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-29638: - Description: Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = 'NaN' I experienced this with the query below: {code:sql} SELECT a, b, SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM (VALUES(1,1),(2,2),(3,(cast('nan' as int))),(4,3),(5,4)) t(a,b); {code} was:Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = 'NaN' > Spark handles 'NaN' as 0 in sums > > > Key: SPARK-29638 > URL: https://issues.apache.org/jira/browse/SPARK-29638 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. > PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = > 'NaN' > I experienced this with the query below: > {code:sql} > SELECT a, b, >SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) > FROM (VALUES(1,1),(2,2),(3,(cast('nan' as int))),(4,3),(5,4)) t(a,b); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29638) Spark handles 'NaN' as 0 in sums
Dylan Guedes created SPARK-29638: Summary: Spark handles 'NaN' as 0 in sums Key: SPARK-29638 URL: https://issues.apache.org/jira/browse/SPARK-29638 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = 'NaN' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29636) Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp
Dylan Guedes created SPARK-29636: Summary: Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp Key: SPARK-29636 URL: https://issues.apache.org/jira/browse/SPARK-29636 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark can't parse a string such as '11:00 BST' or '2000-10-19 10:23:54+01' to timestamp: {code:sql} spark-sql> select cast ('11:00 BST' as timestamp); NULL Time taken: 2.248 seconds, Fetched 1 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29540) Thrift in some cases can't parse string to date
Dylan Guedes created SPARK-29540: Summary: Thrift in some cases can't parse string to date Key: SPARK-29540 URL: https://issues.apache.org/jira/browse/SPARK-29540 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes I'm porting tests from PostgreSQL window.sql but anything related to casting a string to datetime seems to fail on Thrift. For instance, the following does not work: {code:sql} CREATE TABLE empsalary ( depname string, empno integer, salary int, enroll_date date ) USING parquet; INSERT INTO empsalary VALUES ('develop', 10, 5200, '2007-08-01'); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-29107) Add window.sql - Part 1
[ https://issues.apache.org/jira/browse/SPARK-29107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes reopened SPARK-29107: -- We reverted the initial PR. > Add window.sql - Part 1 > --- > > Key: SPARK-29107 > URL: https://issues.apache.org/jira/browse/SPARK-29107 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Assignee: Dylan Guedes >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29451) Some queries with divisions in windows are failling in Thrift
Dylan Guedes created SPARK-29451: Summary: Some queries with divisions in windows are failling in Thrift Key: SPARK-29451 URL: https://issues.apache.org/jira/browse/SPARK-29451 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Hello, the following queries are not properly working on Thrift. The only difference between them and some other queries that works fine are the numeric divisions, I think. {code:sql} SELECT four, ten/4 as two, sum(ten/4) over (partition by four order by ten/4 rows between unbounded preceding and current row), last(ten/4) over (partition by four order by ten/4 rows between unbounded preceding and current row) FROM (select distinct ten, four from tenk1) ss; {code} {code:sql} SELECT four, ten/4 as two, sum(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row), last(ten/4) over (partition by four order by ten/4 range between unbounded preceding and current row) FROM (select distinct ten, four from tenk1) ss; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29451) Some queries with divisions in SQL windows are failling in Thrift
[ https://issues.apache.org/jira/browse/SPARK-29451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-29451: - Summary: Some queries with divisions in SQL windows are failling in Thrift (was: Some queries with divisions in windows are failling in Thrift) > Some queries with divisions in SQL windows are failling in Thrift > - > > Key: SPARK-29451 > URL: https://issues.apache.org/jira/browse/SPARK-29451 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Hello, > the following queries are not properly working on Thrift. The only difference > between them and some other queries that works fine are the numeric > divisions, I think. > {code:sql} > SELECT four, ten/4 as two, > sum(ten/4) over (partition by four order by ten/4 rows between unbounded > preceding and current row), > last(ten/4) over (partition by four order by ten/4 rows between unbounded > preceding and current row) > FROM (select distinct ten, four from tenk1) ss; > {code} > {code:sql} > SELECT four, ten/4 as two, > sum(ten/4) over (partition by four order by ten/4 range between unbounded > preceding and current row), > last(ten/4) over (partition by four order by ten/4 range between unbounded > preceding and current row) > FROM (select distinct ten, four from tenk1) ss; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29108) Add window.sql - Part 2
[ https://issues.apache.org/jira/browse/SPARK-29108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-29108: - Description: In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L320-L562|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L320-L562] (was: In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L320-L562|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319]) > Add window.sql - Part 2 > --- > > Key: SPARK-29108 > URL: https://issues.apache.org/jira/browse/SPARK-29108 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L320-L562|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L320-L562] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29109) Add window.sql - Part 3
[ https://issues.apache.org/jira/browse/SPARK-29109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-29109: - Description: In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L553-L911|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L553-L911] (was: In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L553-L911|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319]) > Add window.sql - Part 3 > --- > > Key: SPARK-29109 > URL: https://issues.apache.org/jira/browse/SPARK-29109 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L553-L911|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L553-L911] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29110) Add window.sql - Part 4
Dylan Guedes created SPARK-29110: Summary: Add window.sql - Part 4 Key: SPARK-29110 URL: https://issues.apache.org/jira/browse/SPARK-29110 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes Fix For: 3.0.0 In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L912-L1259|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29110) Add window.sql - Part 4
[ https://issues.apache.org/jira/browse/SPARK-29110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-29110: - Description: In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L912-L1259|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L912-L1259] (was: In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L912-L1259|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319]) > Add window.sql - Part 4 > --- > > Key: SPARK-29110 > URL: https://issues.apache.org/jira/browse/SPARK-29110 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L912-L1259|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L912-L1259] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29109) Add window.sql - Part 3
Dylan Guedes created SPARK-29109: Summary: Add window.sql - Part 3 Key: SPARK-29109 URL: https://issues.apache.org/jira/browse/SPARK-29109 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes Fix For: 3.0.0 In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L553-L911|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29108) Add window.sql - Part 2
Dylan Guedes created SPARK-29108: Summary: Add window.sql - Part 2 Key: SPARK-29108 URL: https://issues.apache.org/jira/browse/SPARK-29108 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes Fix For: 3.0.0 In this ticket, we plan to add the regression test cases of [https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L320-L562|https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29107) Add window.sql - Part 1
Dylan Guedes created SPARK-29107: Summary: Add window.sql - Part 1 Key: SPARK-29107 URL: https://issues.apache.org/jira/browse/SPARK-29107 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes Fix For: 3.0.0 In this ticket, we plan to add the regression test cases of https://github.com/postgres/postgres/blob/REL_12_BETA3/src/test/regress/sql/window.sql#L1-L319 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28648) Adds support to `groups` unit type in window clauses
[ https://issues.apache.org/jira/browse/SPARK-28648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28648: - Summary: Adds support to `groups` unit type in window clauses (was: Adds support to `groups` in window clauses) > Adds support to `groups` unit type in window clauses > > > Key: SPARK-28648 > URL: https://issues.apache.org/jira/browse/SPARK-28648 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Spark currently support the two most common window functions unit types: rows > and ranges. However, in PgSQL a new type was added: `groups`. > According to [this > source|https://blog.jooq.org/2018/07/05/postgresql-11s-support-for-sql-standard-groups-and-exclude-window-function-clauses/], > the difference is: > """ROWS counts the exact number of rows in the frame. > RANGE performs logical windowing where we don’t count the number of rows, but > look for a value offset. > GROUPS counts all groups of tied rows within the window.""" -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28648) Adds support to `groups` in window clauses
Dylan Guedes created SPARK-28648: Summary: Adds support to `groups` in window clauses Key: SPARK-28648 URL: https://issues.apache.org/jira/browse/SPARK-28648 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Spark currently support the two most common window functions unit types: rows and ranges. However, in PgSQL a new type was added: `groups`. According to [this source|https://blog.jooq.org/2018/07/05/postgresql-11s-support-for-sql-standard-groups-and-exclude-window-function-clauses/], the difference is: """ROWS counts the exact number of rows in the frame. RANGE performs logical windowing where we don’t count the number of rows, but look for a value offset. GROUPS counts all groups of tied rows within the window.""" -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28646) Allow usage of `count` only for parameterless aggregate function
Dylan Guedes created SPARK-28646: Summary: Allow usage of `count` only for parameterless aggregate function Key: SPARK-28646 URL: https://issues.apache.org/jira/browse/SPARK-28646 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark allows calls to `count` even for non parameterless aggregate function. For example, the following query actually works: {code:sql}SELECT count() OVER () FROM tenk1;{code} In PgSQL, on the other hand, the following error is thrown: {code:sql}ERROR: count(*) must be used to call a parameterless aggregate function{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28645) Throw an error on window redefinition
[ https://issues.apache.org/jira/browse/SPARK-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28645: - Summary: Throw an error on window redefinition (was: Block redefinition of window) > Throw an error on window redefinition > - > > Key: SPARK-28645 > URL: https://issues.apache.org/jira/browse/SPARK-28645 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently in Spark one could redefine a window. For instance: > {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w > AS (ORDER BY unique1);{code} > The window `w` is defined two times. In PgSQL, on the other hand, a thrown > will happen: > {code:sql}ERROR: window "w" is already defined{code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28645) Block redefinition of window
Dylan Guedes created SPARK-28645: Summary: Block redefinition of window Key: SPARK-28645 URL: https://issues.apache.org/jira/browse/SPARK-28645 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently in Spark one could redefine a window. For instance: {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w AS (ORDER BY unique1);{code} The window `w` is defined two times. In PgSQL, on the other hand, a thrown will happen: {code:sql}ERROR: window "w" is already defined{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28602) Recognize interval as a numeric type
Dylan Guedes created SPARK-28602: Summary: Recognize interval as a numeric type Key: SPARK-28602 URL: https://issues.apache.org/jira/browse/SPARK-28602 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Hello, Spark does not recognize `interval` type as a `numeric` one, which means that we can't use `interval` columns in aggregated functions. For instance, the following query works on PgSQL but does not work on Spark: {code:sql}SELECT i,AVG(cast(v as interval)) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM (VALUES(1,'1 sec'),(2,'2 sec'),(3,NULL),(4,NULL)) t(i,v);{code} {code:sql}cannot resolve 'avg(CAST(`v` AS INTERVAL))' due to data type mismatch: function average requires numeric types, not interval; line 1 pos 9{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28566) window functions should not be allowed in window definitions
Dylan Guedes created SPARK-28566: Summary: window functions should not be allowed in window definitions Key: SPARK-28566 URL: https://issues.apache.org/jira/browse/SPARK-28566 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark allows the usage of window functions inside window definitions, such as: {code:sql} SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY random()));{code} However, in PgSQL such behavior is now allowed: {code:sql} postgres=# SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY random())); ERROR: window functions are not allowed in window definitions LINE 1: SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY random())...{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28553) subqueries must be aggregated before hand
[ https://issues.apache.org/jira/browse/SPARK-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes resolved SPARK-28553. -- Resolution: Duplicate This is a duplicate of SPARK-28379. > subqueries must be aggregated before hand > - > > Key: SPARK-28553 > URL: https://issues.apache.org/jira/browse/SPARK-28553 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Looks like Spark subqueries does not work well with variables and values from > outside of subquery. For instance, this query work on PgSQL: > {code:sql} > SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER > (PARTITION BY four ORDER BY ten) FROM tenk1 s WHERE unique2 < 10;{code} > However, it does not work in Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28553) subqueries must be aggregated before hand
[ https://issues.apache.org/jira/browse/SPARK-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896064#comment-16896064 ] Dylan Guedes commented on SPARK-28553: -- Oh, my bad. I even look'd for an already created JIRA, idk why I didn't find the one that you created (maybe I used the wrong terms). I'll edit my PR to use your JIRA Instead and I'll close this one. Thank you [~yumwang] > subqueries must be aggregated before hand > - > > Key: SPARK-28553 > URL: https://issues.apache.org/jira/browse/SPARK-28553 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Looks like Spark subqueries does not work well with variables and values from > outside of subquery. For instance, this query work on PgSQL: > {code:sql} > SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER > (PARTITION BY four ORDER BY ten) FROM tenk1 s WHERE unique2 < 10;{code} > However, it does not work in Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28553) subqueries must be aggregated before hand
[ https://issues.apache.org/jira/browse/SPARK-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28553: - Summary: subqueries must be aggregated before hand (was: subqueries always must be aggregated before hand) > subqueries must be aggregated before hand > - > > Key: SPARK-28553 > URL: https://issues.apache.org/jira/browse/SPARK-28553 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Looks like Spark subqueries does not work well with variables and values from > outside of subquery. For instance, this query work on PgSQL: > {code:sql} > SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER > (PARTITION BY four ORDER BY ten) FROM tenk1 s WHERE unique2 < 10;{code} > However, it does not work in Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28553) subqueries always must be aggregated before hand
[ https://issues.apache.org/jira/browse/SPARK-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28553: - Description: Looks like Spark subqueries does not work well with variables and values from outside of subquery. For instance, this query work on PgSQL: {code:sql} SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER (PARTITION BY four ORDER BY ten) FROM tenk1 s WHERE unique2 < 10;{code} However, it does not work in Spark. was: Looks like Spark subqueries does not work well with variables and values from outside of subquery. For instance, this query work on PgSQL: {code:sql} -- SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER (PARTITION BY four ORDER BY ten) -- FROM tenk1 s WHERE unique2 < 10;{code} However, it does not work in Spark. > subqueries always must be aggregated before hand > > > Key: SPARK-28553 > URL: https://issues.apache.org/jira/browse/SPARK-28553 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Looks like Spark subqueries does not work well with variables and values from > outside of subquery. For instance, this query work on PgSQL: > {code:sql} > SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER > (PARTITION BY four ORDER BY ten) FROM tenk1 s WHERE unique2 < 10;{code} > However, it does not work in Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28553) subqueries always must be aggregated before hand
[ https://issues.apache.org/jira/browse/SPARK-28553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28553: - Summary: subqueries always must be aggregated before hand (was: subqueries always must be aggregated before) > subqueries always must be aggregated before hand > > > Key: SPARK-28553 > URL: https://issues.apache.org/jira/browse/SPARK-28553 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Looks like Spark subqueries does not work well with variables and values from > outside of subquery. For instance, this query work on PgSQL: > {code:sql} > -- SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER > (PARTITION BY four ORDER BY ten) > -- FROM tenk1 s WHERE unique2 < 10;{code} > However, it does not work in Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28553) subqueries always must be aggregated before
Dylan Guedes created SPARK-28553: Summary: subqueries always must be aggregated before Key: SPARK-28553 URL: https://issues.apache.org/jira/browse/SPARK-28553 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Looks like Spark subqueries does not work well with variables and values from outside of subquery. For instance, this query work on PgSQL: {code:sql} -- SELECT lead(ten, (SELECT two FROM tenk1 WHERE s.unique2 = unique2)) OVER (PARTITION BY four ORDER BY ten) -- FROM tenk1 s WHERE unique2 < 10;{code} However, it does not work in Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28516) adds `to_char`
Dylan Guedes created SPARK-28516: Summary: adds `to_char` Key: SPARK-28516 URL: https://issues.apache.org/jira/browse/SPARK-28516 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark does not have support for `to_char`. PgSQL, however, [does|[https://www.postgresql.org/docs/9.6/functions-formatting.html]]: Query example: SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING),'9D9') -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28508) Support for range frame+row frame in the same query
Dylan Guedes created SPARK-28508: Summary: Support for range frame+row frame in the same query Key: SPARK-28508 URL: https://issues.apache.org/jira/browse/SPARK-28508 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, looks like some queries does not works if both, a range frame and a row frame are given. However, PgSQL is able to handle them: {code:sql} select last(salary) over(order by enroll_date range between 1 preceding and 1 following), lag(salary) over(order by enroll_date range between 1 preceding and 1 following), salary, enroll_date from empsalary; {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28506) not handling usage of group function and window function at some conditions
[ https://issues.apache.org/jira/browse/SPARK-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28506: - Summary: not handling usage of group function and window function at some conditions (was: now handling usage of group function and window function at some conditions) > not handling usage of group function and window function at some conditions > --- > > Key: SPARK-28506 > URL: https://issues.apache.org/jira/browse/SPARK-28506 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Hi, > looks like SparkSQL is not able to handle this query: > {code:sql}SELECT rank() OVER (ORDER BY 1), count(*) FROM empsalary GROUP BY > 1;{code} > PgSQL, on the other hand, does. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28506) now handling usage of group function and window function at some conditions
Dylan Guedes created SPARK-28506: Summary: now handling usage of group function and window function at some conditions Key: SPARK-28506 URL: https://issues.apache.org/jira/browse/SPARK-28506 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Hi, looks like SparkSQL is not able to handle this query: {code:sql}SELECT rank() OVER (ORDER BY 1), count(*) FROM empsalary GROUP BY 1;{code} PgSQL, on the other hand, does. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20856) support statement using nested joins
[ https://issues.apache.org/jira/browse/SPARK-20856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-20856: - Description: While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does not. Not supported {code:sql} select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum {code} versus written as shown {code:sql} select * from cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner join cert.tbint tbint on tint.rnum = tbint.rnum {code} {code:text} ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) == SQL == select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum -^^^ , Query: select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum. SQLState: HY000 ErrorCode: 500051 {code} was: While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does not. Not supported select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum versus written as shown select * from cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner join cert.tbint tbint on tint.rnum = tbint.rnum ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) == SQL == select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum -^^^ , Query: select * from cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint on tbint.rnum = tint.rnum on tint.rnum = tsint.rnum. SQLState: HY000 ErrorCode: 500051 > support statement using nested joins > > > Key: SPARK-20856 > URL: https://issues.apache.org/jira/browse/SPARK-20856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: N Campbell >Priority: Major > Labels: bulk-closed > > While DB2, ORACLE etc support a join expressed as follows, SPARK SQL does > not. > Not supported > {code:sql} > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > {code} > versus written as shown > {code:sql} > select * from > cert.tsint tsint inner join cert.tint tint on tsint.rnum = tint.rnum inner > join cert.tbint tbint on tint.rnum = tbint.rnum > {code} > {code:text} > ERROR_STATE, SQL state: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'on' expecting {, ',', '.', '[', 'WHERE', 'GROUP', > 'ORDER', 'HAVING', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, > 'IS', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', EQ, '<=>', > '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', > '^', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 4, pos 5) > == SQL == > select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum > -^^^ > , Query: select * from > cert.tsint tsint inner join cert.tint tint inner join cert.tbint tbint > on tbint.rnum = tint.rnum > on tint.rnum = tsint.rnum. > SQLState: HY000 > ErrorCode: 500051 > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28500) adds support for `filter` clause
[ https://issues.apache.org/jira/browse/SPARK-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891833#comment-16891833 ] Dylan Guedes commented on SPARK-28500: -- Hmm you are correct, there's some overlapping there. However, at some point the JIRA will be fragmented into smaller ones, like one for filter, other for distinct, right? > adds support for `filter` clause > > > Key: SPARK-28500 > URL: https://issues.apache.org/jira/browse/SPARK-28500 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Definition: "The {{filter}} clause extends aggregate functions ({{sum}}, > {{avg}}, {{count}}, …) by an additional {{where}} clause. The result of the > aggregate is built from only the rows that satisfy the additional {{where}} > clause too." [source|[https://modern-sql.com/feature/filter]] > Also, PgSQL currently support `filter` while Spark doesn't. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28501) frame bound must be a literal
Dylan Guedes created SPARK-28501: Summary: frame bound must be a literal Key: SPARK-28501 URL: https://issues.apache.org/jira/browse/SPARK-28501 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Spark frame bound currently only supports literals: {code:sql} SELECT sum(unique1) over (order by unique1 rows (SELECT unique1 FROM tenk1 ORDER BY unique1 LIMIT 1) + 1 PRECEDING), unique1 FROM tenk1 WHERE unique1 < 10;{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28065) ntile only accepting positive (>0) values
[ https://issues.apache.org/jira/browse/SPARK-28065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28065: - Description: Currently, Spark does not accept null as an input for `ntile`, or zero, however Postgres supports it. Example: {code:sql} SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; {code} was: Currently, Spark does not accept null as an input for `ntile`, however Postgres supports it. Example: {code:sql} SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; {code} > ntile only accepting positive (>0) values > - > > Key: SPARK-28065 > URL: https://issues.apache.org/jira/browse/SPARK-28065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not accept null as an input for `ntile`, or zero, > however Postgres supports it. > Example: > {code:sql} > SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28065) ntile only accepting posivile (>0) values
[ https://issues.apache.org/jira/browse/SPARK-28065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28065: - Summary: ntile only accepting posivile (>0) values (was: ntile does not accept NULL as input) > ntile only accepting posivile (>0) values > - > > Key: SPARK-28065 > URL: https://issues.apache.org/jira/browse/SPARK-28065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not accept null as an input for `ntile`, however > Postgres supports it. > Example: > {code:sql} > SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28065) ntile only accepting positive (>0) values
[ https://issues.apache.org/jira/browse/SPARK-28065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-28065: - Summary: ntile only accepting positive (>0) values (was: ntile only accepting posivile (>0) values) > ntile only accepting positive (>0) values > - > > Key: SPARK-28065 > URL: https://issues.apache.org/jira/browse/SPARK-28065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not accept null as an input for `ntile`, however > Postgres supports it. > Example: > {code:sql} > SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28500) adds support for `filter` clause
Dylan Guedes created SPARK-28500: Summary: adds support for `filter` clause Key: SPARK-28500 URL: https://issues.apache.org/jira/browse/SPARK-28500 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Definition: "The {{filter}} clause extends aggregate functions ({{sum}}, {{avg}}, {{count}}, …) by an additional {{where}} clause. The result of the aggregate is built from only the rows that satisfy the additional {{where}} clause too." [source|[https://modern-sql.com/feature/filter]] Also, PgSQL currently support `filter` while Spark doesn't. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28429) SQL Datetime util function being casted to double instead of timestamp
Dylan Guedes created SPARK-28429: Summary: SQL Datetime util function being casted to double instead of timestamp Key: SPARK-28429 URL: https://issues.apache.org/jira/browse/SPARK-28429 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes In the code below, 'now()+'100 days' are casted to double and then an error is thrown: {code:sql} CREATE TEMP VIEW v_window AS SELECT i, min(i) over (order by i range between '1 day' preceding and '10 days' following) as min_i FROM range(now(), now()+'100 days', '1 hour') i; {code} Error: {code:sql} cannot resolve '(current_timestamp() + CAST('100 days' AS DOUBLE))' due to data type mismatch: differing types in '(current_timestamp() + CAST('100 days' AS DOUBLE))' (timestamp and double).;{code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28428) Spark `exclude` always expecting `()`
Dylan Guedes created SPARK-28428: Summary: Spark `exclude` always expecting `()` Key: SPARK-28428 URL: https://issues.apache.org/jira/browse/SPARK-28428 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Dylan Guedes SparkSQL `exclude` always expects a following call to `()`, however, PgSQL `exclude` does not. Examples: {code:sql} SELECT sum(unique1) over (rows between 2 preceding and 2 following exclude no others), unique1, four FROM tenk1 WHERE unique1 < 10; {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28086) Adds `random()` sql function
[ https://issues.apache.org/jira/browse/SPARK-28086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886382#comment-16886382 ] Dylan Guedes commented on SPARK-28086: -- Well, to be fair I've created the JIRA because the `rand()` looks like a number generator, while `random()` (available at PgSQL) seems like a "pick any available value". For instance: you may use `order by random()` in PgSQL, however, in Spark `order by rand()` is not valid. But, I'm probably wrong: maybe it is related with PgSQL `order by` accepting literal values while Spark not. > Adds `random()` sql function > > > Key: SPARK-28086 > URL: https://issues.apache.org/jira/browse/SPARK-28086 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark does not have a `random()` function. Postgres, however, does. > For instance, this one is not valid: > {code:sql} > SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY random())) > {code} > Because of the `random()` call. On the other hand, [Postgres has > it.|https://www.postgresql.org/docs/8.2/functions-math.html] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27767) Built-in function: generate_series
[ https://issues.apache.org/jira/browse/SPARK-27767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865912#comment-16865912 ] Dylan Guedes edited comment on SPARK-27767 at 6/17/19 8:02 PM: --- [~smilegator] by the way, I just checked and there is a (minor) difference: when you use `range()` and define it as a sub-query called `x`, for instance, the default name for the column became `x.id`, instead of just `x`, that is the behaviour in Postgres. For instance: {code:sql} from range(-32766, -32764) x; {code} In Spark, looks like you should reference to these values as `x.id`. Meanwhile, in Postgres you can call them through just `x`. EDIT: Btw, this call also does not work: {code:sql} SELECT range(1, 100) OVER () FROM empsalary {code} was (Author: dylanguedes): [~smilegator] by the way, I just checked and there is a (minor) difference: when you use `range()` and define it as a sub-query called `x`, for instance, the default name for the column became `x.id`, instead of just `x`, that is the behaviour in Postgres. For instance: {code:sql} from range(-32766, -32764) x; {code} In Spark, looks like you should reference to these values as `x.id`. Meanwhile, in Postgres you can call them through just `x`. > Built-in function: generate_series > -- > > Key: SPARK-27767 > URL: https://issues.apache.org/jira/browse/SPARK-27767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > [https://www.postgresql.org/docs/9.1/functions-srf.html] > generate_series(start, stop): Generate a series of values, from start to stop > with a step size of one > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27767) Built-in function: generate_series
[ https://issues.apache.org/jira/browse/SPARK-27767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865912#comment-16865912 ] Dylan Guedes commented on SPARK-27767: -- [~smilegator] by the way, I just checked and there is a (minor) difference: when you use `range()` and define it as a sub-query called `x`, for instance, the default name for the column became `x.id`, instead of just `x`, that is the behaviour in Postgres. For instance: {code:sql} from range(-32766, -32764) x; {code} In Spark, looks like you should reference to these values as `x.id`. Meanwhile, in Postgres you can call them through just `x`. > Built-in function: generate_series > -- > > Key: SPARK-27767 > URL: https://issues.apache.org/jira/browse/SPARK-27767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > [https://www.postgresql.org/docs/9.1/functions-srf.html] > generate_series(start, stop): Generate a series of values, from start to stop > with a step size of one > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28086) Adds `random()` sql function
Dylan Guedes created SPARK-28086: Summary: Adds `random()` sql function Key: SPARK-28086 URL: https://issues.apache.org/jira/browse/SPARK-28086 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark does not have a `random()` function. Postgres, however, does. For instance, this one is not valid: {code:sql} SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY random())) {code} Because of the `random()` call. On the other hand, [Postgres has it.|https://www.postgresql.org/docs/8.2/functions-math.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28068) `lag` second argument must be a literal
Dylan Guedes created SPARK-28068: Summary: `lag` second argument must be a literal Key: SPARK-28068 URL: https://issues.apache.org/jira/browse/SPARK-28068 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently in Spark, `lag` (and, possible, some other window functions) requires the 2nd argument to be a literal. For example, this is not allowed: {code:sql} SELECT lag(ten, four) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; {code} However, this one works: {code:sql} SELECT lag(ten, 2) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; {code} In comparison, Postgres accepts a literal as a 2nd argument. I found this issue while porting `window.sql` tests from Postgres to Spark -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28065) ntile does not accept NULL as input
Dylan Guedes created SPARK-28065: Summary: ntile does not accept NULL as input Key: SPARK-28065 URL: https://issues.apache.org/jira/browse/SPARK-28065 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently, Spark does not accept null as an input for `ntile`, however Postgres supports it. Example: {code:sql} SELECT ntile(NULL) OVER (ORDER BY ten, four), ten, four FROM tenk1 LIMIT 2; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28064) Order by does not accept a call to rank()
Dylan Guedes created SPARK-28064: Summary: Order by does not accept a call to rank() Key: SPARK-28064 URL: https://issues.apache.org/jira/browse/SPARK-28064 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.0.0 Reporter: Dylan Guedes Currently in Spark, we can't use a call to `rank()` in a order by; we need to first rename the rank column to, for instance, `r` and then, use `order by r`. For example: This does not work: {code:sql} SELECT depname, empno, salary, rank() OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY rank() OVER w; {code} However, this one does: {code:sql} SELECT depname, empno, salary, rank() OVER w as r FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary) ORDER BY r; {code} By the way, I took this one from Postgres behavior: postgres accept both ways. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23160) Add window.sql
[ https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864692#comment-16864692 ] Dylan Guedes commented on SPARK-23160: -- Thank you! I'll be working on this, then. > Add window.sql > -- > > Key: SPARK-23160 > URL: https://issues.apache.org/jira/browse/SPARK-23160 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Minor > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23098) Migrate Kafka batch source to v2
[ https://issues.apache.org/jira/browse/SPARK-23098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835693#comment-16835693 ] Dylan Guedes commented on SPARK-23098: -- [~gsomogyi] No, I didn't picked this one. Whatever, happy to see you interested on it! > Migrate Kafka batch source to v2 > > > Key: SPARK-23098 > URL: https://issues.apache.org/jira/browse/SPARK-23098 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27138) Remove AdminUtils calls
[ https://issues.apache.org/jira/browse/SPARK-27138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Guedes updated SPARK-27138: - Description: KafkaTestUtils (from kafka010) currently uses AdminUtils to create and delete topics for test suites (what is currently deprecated). Since it will stop to work at some point, I think that it is a good opportunity to change the API calls. (was: KafkaTestUtils (from kafka010) currently uses AdminUtils to create and delete topics for test suites (what is currently deprecated). Since it will stop to work at some point, I think that it is a good opportunity.) > Remove AdminUtils calls > --- > > Key: SPARK-27138 > URL: https://issues.apache.org/jira/browse/SPARK-27138 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.4.0 >Reporter: Dylan Guedes >Priority: Minor > > KafkaTestUtils (from kafka010) currently uses AdminUtils to create and delete > topics for test suites (what is currently deprecated). Since it will stop to > work at some point, I think that it is a good opportunity to change the API > calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27138) Remove AdminUtils calls
Dylan Guedes created SPARK-27138: Summary: Remove AdminUtils calls Key: SPARK-27138 URL: https://issues.apache.org/jira/browse/SPARK-27138 Project: Spark Issue Type: Task Components: Tests Affects Versions: 2.4.0 Reporter: Dylan Guedes KafkaTestUtils (from kafka010) currently uses AdminUtils to create and delete topics for test suites (what is currently deprecated). Since it will stop to work at some point, I think that it is a good opportunity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23098) Migrate Kafka batch source to v2
[ https://issues.apache.org/jira/browse/SPARK-23098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790585#comment-16790585 ] Dylan Guedes commented on SPARK-23098: -- Hi, I would like to work on this one. [~joseph.torres] do you mind in helping me with a few suggestions if I get really stuck? Also, is this one similar to the CSVReader/JSONReader? > Migrate Kafka batch source to v2 > > > Key: SPARK-23098 > URL: https://issues.apache.org/jira/browse/SPARK-23098 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Jose Torres >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23160) Add more window sql tests
[ https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789877#comment-16789877 ] Dylan Guedes commented on SPARK-23160: -- Hi, I would like to work on this one, but to be fair I didn't get the meaning of "tests in other major databases". [~jiangxb1987] do you remember what scenarios you had in mind? > Add more window sql tests > - > > Key: SPARK-23160 > URL: https://issues.apache.org/jira/browse/SPARK-23160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xingbo Jiang >Priority: Minor > > We should also cover the window sql interface, example in > `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be > funny to see whether we can generate consistent results for window tests in > other major databases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array
[ https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472514#comment-16472514 ] Dylan Guedes commented on SPARK-23931: -- [~mn-mikke] I updated with a working version! Would you mind in giving a feedback/suggestion? I've decided to use an array of structs since Java doesn't handle well Scala Tuple2's, but to be fair I'm not sure if it is the best choice. > High-order function: zip(array1, array2[, ...]) → array > > > Key: SPARK-23931 > URL: https://issues.apache.org/jira/browse/SPARK-23931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the given arrays, element-wise, into a single array of rows. The M-th > element of the N-th argument will be the N-th field of the M-th output > element. If the arguments have an uneven length, missing values are filled > with NULL. > {noformat} > SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, > null), ROW(null, '3b')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array
[ https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472161#comment-16472161 ] Dylan Guedes commented on SPARK-23931: -- Hi Marek! I finally get some progress, I think that more a few hours and I can complete this. Thank you! > High-order function: zip(array1, array2[, ...]) → array > > > Key: SPARK-23931 > URL: https://issues.apache.org/jira/browse/SPARK-23931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the given arrays, element-wise, into a single array of rows. The M-th > element of the N-th argument will be the N-th field of the M-th output > element. If the arguments have an uneven length, missing values are filled > with NULL. > {noformat} > SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, > null), ROW(null, '3b')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array
[ https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436184#comment-16436184 ] Dylan Guedes commented on SPARK-23931: -- I'm having some trouble so I asked for help in the PR - suggestions/feedback are welcome. > High-order function: zip(array1, array2[, ...]) → array > > > Key: SPARK-23931 > URL: https://issues.apache.org/jira/browse/SPARK-23931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the given arrays, element-wise, into a single array of rows. The M-th > element of the N-th argument will be the N-th field of the M-th output > element. If the arguments have an uneven length, missing values are filled > with NULL. > {noformat} > SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, > null), ROW(null, '3b')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23931) High-order function: zip(array1, array2[, ...]) → array
[ https://issues.apache.org/jira/browse/SPARK-23931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432850#comment-16432850 ] Dylan Guedes commented on SPARK-23931: -- I would like to try this one. > High-order function: zip(array1, array2[, ...]) → array > > > Key: SPARK-23931 > URL: https://issues.apache.org/jira/browse/SPARK-23931 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Priority: Major > > Ref: https://prestodb.io/docs/current/functions/array.html > Merges the given arrays, element-wise, into a single array of rows. The M-th > element of the N-th argument will be the N-th field of the M-th output > element. If the arguments have an uneven length, missing values are filled > with NULL. > {noformat} > SELECT zip(ARRAY[1, 2], ARRAY['1b', null, '3b']); -- [ROW(1, '1b'), ROW(2, > null), ROW(null, '3b')] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20169) Groupby Bug with Sparksql
[ https://issues.apache.org/jira/browse/SPARK-20169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401029#comment-16401029 ] Dylan Guedes commented on SPARK-20169: -- Hi, I also reproduced it in v2.3 and master. I think that it is something related to the String type because if I cast the jr dataframe column to long it works fine - However, if I cast it to String, the bug still happens. I don't know the catalyst codebase that well (never touched it actually), do you guys have a suggestion to where to start looking after I call _jdf? I don't know how to follow the trace after converting to the JVM. Thank you! > Groupby Bug with Sparksql > - > > Key: SPARK-20169 > URL: https://issues.apache.org/jira/browse/SPARK-20169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.0 >Reporter: Bin Wu >Priority: Major > > We find a potential bug in Catalyst optimizer which cannot correctly > process "groupby". You can reproduce it by following simple example: > = > from pyspark.sql.functions import * > #e=sc.parallelize([(1,2),(1,3),(1,4),(2,1),(3,1),(4,1)]).toDF(["src","dst"]) > e = spark.read.csv("graph.csv", header=True) > r = sc.parallelize([(1,),(2,),(3,),(4,)]).toDF(['src']) > r1 = e.join(r, 'src').groupBy('dst').count().withColumnRenamed('dst','src') > jr = e.join(r1, 'src') > jr.show() > r2 = jr.groupBy('dst').count() > r2.show() > = > FYI, "graph.csv" contains exactly the same data as the commented line. > You can find that jr is: > |src|dst|count| > | 3| 1|1| > | 1| 4|3| > | 1| 3|3| > | 1| 2|3| > | 4| 1|1| > | 2| 1|1| > But, after the last groupBy, the 3 rows with dst = 1 are not grouped together: > |dst|count| > | 1|1| > | 4|1| > | 3|1| > | 2|1| > | 1|1| > | 1|1| > If we build jr directly from raw data (commented line), this error will not > show up. So > we suspect that there is a bug in the Catalyst optimizer when multiple joins > and groupBy's > are being optimized. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23648) extend hint syntax to support any expression for R
Dylan Guedes created SPARK-23648: Summary: extend hint syntax to support any expression for R Key: SPARK-23648 URL: https://issues.apache.org/jira/browse/SPARK-23648 Project: Spark Issue Type: Sub-task Components: SparkR, SQL Affects Versions: 2.3.0, 2.2.0 Reporter: Dylan Guedes Relax checks in [https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23647) extend hint syntax to support any expression for Python
Dylan Guedes created SPARK-23647: Summary: extend hint syntax to support any expression for Python Key: SPARK-23647 URL: https://issues.apache.org/jira/browse/SPARK-23647 Project: Spark Issue Type: Sub-task Components: PySpark, SQL Affects Versions: 2.2.0 Reporter: Dylan Guedes Relax checks in [https://github.com/apache/spark/blob/6cbc61d1070584ffbc34b1f53df352c9162f414a/python/pyspark/sql/dataframe.py#L422] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21030) extend hint syntax to support any expression for Python and R
[ https://issues.apache.org/jira/browse/SPARK-21030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392049#comment-16392049 ] Dylan Guedes commented on SPARK-21030: -- So, I started, and here is my progress: [https://github.com/DylanGuedes/spark/commit/433c622ae987f2b6e2a9a5bc97a0addc0d938d4b] Could anyone give me a feedback/hints before I open the PR? > extend hint syntax to support any expression for Python and R > - > > Key: SPARK-21030 > URL: https://issues.apache.org/jira/browse/SPARK-21030 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Major > > See SPARK-20854 > we need to relax checks in > https://github.com/apache/spark/blob/6cbc61d1070584ffbc34b1f53df352c9162f414a/python/pyspark/sql/dataframe.py#L422 > and > https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21030) extend hint syntax to support any expression for Python and R
[ https://issues.apache.org/jira/browse/SPARK-21030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392049#comment-16392049 ] Dylan Guedes edited comment on SPARK-21030 at 3/8/18 10:50 PM: --- So, I started, and here is my progress: [https://github.com/DylanGuedes/spark/commit/433c622ae987f2b6e2a9a5bc97a0addc0d938d4b] Could anyone give me a feedback/hints before I open the PR? I'm not sure if my approach is correct. was (Author: dylanguedes): So, I started, and here is my progress: [https://github.com/DylanGuedes/spark/commit/433c622ae987f2b6e2a9a5bc97a0addc0d938d4b] Could anyone give me a feedback/hints before I open the PR? > extend hint syntax to support any expression for Python and R > - > > Key: SPARK-21030 > URL: https://issues.apache.org/jira/browse/SPARK-21030 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Major > > See SPARK-20854 > we need to relax checks in > https://github.com/apache/spark/blob/6cbc61d1070584ffbc34b1f53df352c9162f414a/python/pyspark/sql/dataframe.py#L422 > and > https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21030) extend hint syntax to support any expression for Python and R
[ https://issues.apache.org/jira/browse/SPARK-21030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390321#comment-16390321 ] Dylan Guedes edited comment on SPARK-21030 at 3/7/18 10:11 PM: --- Hi, I would like to try this one (for Python). Do you guys think that this is a good one for a newcomer? Thank you! was (Author: dylanguedes): Hi, I would like to try this one. Do you guys think that this is a good one for a newcomer? Thank you! > extend hint syntax to support any expression for Python and R > - > > Key: SPARK-21030 > URL: https://issues.apache.org/jira/browse/SPARK-21030 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Major > > See SPARK-20854 > we need to relax checks in > https://github.com/apache/spark/blob/6cbc61d1070584ffbc34b1f53df352c9162f414a/python/pyspark/sql/dataframe.py#L422 > and > https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21030) extend hint syntax to support any expression for Python and R
[ https://issues.apache.org/jira/browse/SPARK-21030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390321#comment-16390321 ] Dylan Guedes edited comment on SPARK-21030 at 3/7/18 10:11 PM: --- Hi, I would like to try this one (in Python). Do you guys think that this is a good one for a newcomer? Thank you! was (Author: dylanguedes): Hi, I would like to try this one (for Python). Do you guys think that this is a good one for a newcomer? Thank you! > extend hint syntax to support any expression for Python and R > - > > Key: SPARK-21030 > URL: https://issues.apache.org/jira/browse/SPARK-21030 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Major > > See SPARK-20854 > we need to relax checks in > https://github.com/apache/spark/blob/6cbc61d1070584ffbc34b1f53df352c9162f414a/python/pyspark/sql/dataframe.py#L422 > and > https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21030) extend hint syntax to support any expression for Python and R
[ https://issues.apache.org/jira/browse/SPARK-21030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390321#comment-16390321 ] Dylan Guedes commented on SPARK-21030: -- Hi, I would like to try this one. Do you guys think that this is a good one for a newcomer? Thank you! > extend hint syntax to support any expression for Python and R > - > > Key: SPARK-21030 > URL: https://issues.apache.org/jira/browse/SPARK-21030 > Project: Spark > Issue Type: Improvement > Components: PySpark, SparkR, SQL >Affects Versions: 2.2.0 >Reporter: Felix Cheung >Priority: Major > > See SPARK-20854 > we need to relax checks in > https://github.com/apache/spark/blob/6cbc61d1070584ffbc34b1f53df352c9162f414a/python/pyspark/sql/dataframe.py#L422 > and > https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23595) Add interpreted execution for ValidateExternalType expression
[ https://issues.apache.org/jira/browse/SPARK-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388136#comment-16388136 ] Dylan Guedes commented on SPARK-23595: -- [~maropu] I checked your progress, and looks like you are almost finishing it, so It is fine. Whatever, your solution was very enlightening, thank you! > Add interpreted execution for ValidateExternalType expression > - > > Key: SPARK-23595 > URL: https://issues.apache.org/jira/browse/SPARK-23595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23595) Add interpreted execution for ValidateExternalType expression
[ https://issues.apache.org/jira/browse/SPARK-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387853#comment-16387853 ] Dylan Guedes commented on SPARK-23595: -- Hi, I would like to help with this issue, but since I am a newcomer I am not sure if it is a good way to start (maybe it is too hard and I don't want to be a bottleneck). I started reading code of the related issues, it is similar? What do you guys think? Thank you! > Add interpreted execution for ValidateExternalType expression > - > > Key: SPARK-23595 > URL: https://issues.apache.org/jira/browse/SPARK-23595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23595) Add interpreted execution for ValidateExternalType expression
[ https://issues.apache.org/jira/browse/SPARK-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387853#comment-16387853 ] Dylan Guedes edited comment on SPARK-23595 at 3/6/18 2:24 PM: -- Hi, I would like to help with this issue, but since I am a newcomer I am not sure if it is a good way to start (maybe it is too hard and I don't want to be a bottleneck). I started reading code of the related issues, is this one similar? What do you guys think? Thank you! was (Author: dylanguedes): Hi, I would like to help with this issue, but since I am a newcomer I am not sure if it is a good way to start (maybe it is too hard and I don't want to be a bottleneck). I started reading code of the related issues, it is similar? What do you guys think? Thank you! > Add interpreted execution for ValidateExternalType expression > - > > Key: SPARK-23595 > URL: https://issues.apache.org/jira/browse/SPARK-23595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org