[jira] [Created] (SPARK-36082) when the right side is small enough to use SingleColumn Null Aware Anti Join

2021-07-10 Thread mcdull_zhang (Jira)
mcdull_zhang created SPARK-36082:


 Summary: when the right side is small enough to use SingleColumn 
Null Aware Anti Join
 Key: SPARK-36082
 URL: https://issues.apache.org/jira/browse/SPARK-36082
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0, 3.1.3
Reporter: mcdull_zhang
 Fix For: 3.2.0


NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) will 
build right side into a HashMap.

code in SparkStrategy:

 
{code:java}
case j @ ExtractSingleColumnNullAwareAntiJoin(leftKeys, rightKeys) =>
  Seq(joins.BroadcastHashJoinExec(leftKeys, rightKeys, LeftAnti, BuildRight,
None, planLater(j.left), planLater(j.right), isNullAwareAntiJoin = 
true)){code}
we should add the conditions and use this optimization when the size of the 
right side is small enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378522#comment-17378522
 ] 

Apache Spark commented on SPARK-36081:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33287

> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.
> The root cause seems to be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36081:


Assignee: Kousuke Saruta  (was: Apache Spark)

> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.
> The root cause seems to be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36081:
---
Description: 
sql-migration-guide.md mentions about the behavior of cast like as follows.
{code}
In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
int and bigint), datetime types(date, timestamp and interval) and boolean type, 
the leading and trailing whitespaces (<= ASCII 32) will be trimmed before 
converted to these type values, for example, `cast(' 1\t' as int)` results `1`, 
`cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as date)` results 
the date value `2019-10-10`. In Spark version 2.4 and below, when casting 
string to integrals and booleans, it does not trim the whitespaces from both 
ends; the foregoing results is `null`, while to datetimes, only the trailing 
spaces (= ASCII 32) are removed.
{code}

In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 in Spark 
3.0.0. 
But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
the specification.

  was:
sql-migration-guide.md mentions about the behavior of cast like as follows.
{code}
In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
int and bigint), datetime types(date, timestamp and interval) and boolean type, 
the leading and trailing whitespaces (<= ASCII 32) will be trimmed before 
converted to these type values, for example, `cast(' 1\t' as int)` results `1`, 
`cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as date)` results 
the date value `2019-10-10`. In Spark version 2.4 and below, when casting 
string to integrals and booleans, it does not trim the whitespaces from both 
ends; the foregoing results is `null`, while to datetimes, only the trailing 
spaces (= ASCII 32) are removed.
{code}

In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
3.0.0. 
But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
the specification.

The root cause seems to be 


> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 in Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378521#comment-17378521
 ] 

Apache Spark commented on SPARK-36081:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33287

> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.
> The root cause seems to be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36081:


Assignee: Apache Spark  (was: Kousuke Saruta)

> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.
> The root cause seems to be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36081:
---
Description: 
sql-migration-guide.md mentions about the behavior of cast like as follows.
{code}
In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
int and bigint), datetime types(date, timestamp and interval) and boolean type, 
the leading and trailing whitespaces (<= ASCII 32) will be trimmed before 
converted to these type values, for example, `cast(' 1\t' as int)` results `1`, 
`cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as date)` results 
the date value `2019-10-10`. In Spark version 2.4 and below, when casting 
string to integrals and booleans, it does not trim the whitespaces from both 
ends; the foregoing results is `null`, while to datetimes, only the trailing 
spaces (= ASCII 32) are removed.
{code}

In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
3.0.0. 
But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
the specification.

The root cause seems to be 

  was:
sql-migration-guide.md mentions about the behavior of cast like as follows.
{code}
In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
int and bigint), datetime types(date, timestamp and interval) and boolean type, 
the leading and trailing whitespaces (<= ASCII 32) will be trimmed before 
converted to these type values, for example, `cast(' 1\t' as int)` results `1`, 
`cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as date)` results 
the date value `2019-10-10`. In Spark version 2.4 and below, when casting 
string to integrals and booleans, it does not trim the whitespaces from both 
ends; the foregoing results is `null`, while to datetimes, only the trailing 
spaces (= ASCII 32) are removed.
{code}

In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
3.0.0. 
But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
the specification.


> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.
> The root cause seems to be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36081:
---
Component/s: (was: Spark Core)

> The implementation of cast doesn't comply with its specification
> 
>
> Key: SPARK-36081
> URL: https://issues.apache.org/jira/browse/SPARK-36081
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> sql-migration-guide.md mentions about the behavior of cast like as follows.
> {code}
> In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
> int and bigint), datetime types(date, timestamp and interval) and boolean 
> type, the leading and trailing whitespaces (<= ASCII 32) will be trimmed 
> before converted to these type values, for example, `cast(' 1\t' as int)` 
> results `1`, `cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as 
> date)` results the date value `2019-10-10`. In Spark version 2.4 and below, 
> when casting string to integrals and booleans, it does not trim the 
> whitespaces from both ends; the foregoing results is `null`, while to 
> datetimes, only the trailing spaces (= ASCII 32) are removed.
> {code}
> In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
> 3.0.0. 
> But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
> the specification.
> The root cause seems to be 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36081) The implementation of cast doesn't comply with its specification

2021-07-10 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-36081:
--

 Summary: The implementation of cast doesn't comply with its 
specification
 Key: SPARK-36081
 URL: https://issues.apache.org/jira/browse/SPARK-36081
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.1.2, 3.0.3, 3.2.0, 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


sql-migration-guide.md mentions about the behavior of cast like as follows.
{code}
In Spark 3.0, when casting string value to integral types(tinyint, smallint, 
int and bigint), datetime types(date, timestamp and interval) and boolean type, 
the leading and trailing whitespaces (<= ASCII 32) will be trimmed before 
converted to these type values, for example, `cast(' 1\t' as int)` results `1`, 
`cast(' 1\t' as boolean)` results `true`, `cast('2019-10-10\t as date)` results 
the date value `2019-10-10`. In Spark version 2.4 and below, when casting 
string to integrals and booleans, it does not trim the whitespaces from both 
ends; the foregoing results is `null`, while to datetimes, only the trailing 
spaces (= ASCII 32) are removed.
{code}

In fact,  select cast('2019-10-10\b' as date); returns 2019-10-10 with Spark 
3.0.0. 
But after 3.0.1, the query returns NULL and this behavior doesn't comply with 
the specification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35508) job group and description do not apply on broadcasts

2021-07-10 Thread Shockang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378483#comment-17378483
 ] 

Shockang commented on SPARK-35508:
--

It seems that this bug comes from this PR: 
[https://github.com/apache/spark/pull/24595] , which will override the settings 
of job group and job description in the user code. Let me fix this issue.

> job group and description do not apply on broadcasts
> 
>
> Key: SPARK-35508
> URL: https://issues.apache.org/jira/browse/SPARK-35508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Lior Chaga
>Priority: Minor
> Attachments: spark2-image.png, spark3-image.png
>
>
> Given the following code:
> {code:java}
> SparkContext context = new SparkContext("local", "test"); 
> SparkSession session = new SparkSession(context); 
> List strings = Lists.newArrayList("a", "b", "c"); 
> List otherString = Lists.newArrayList( "b", "c", "d"); 
> Dataset broadcastedDf = session.createDataset(strings, 
> Encoders.STRING()).toDF(); 
> Dataset dataframe = session.createDataset(otherString, 
> Encoders.STRING()).toDF(); 
> context.setJobGroup("my group", "my job", false); 
> dataframe.join(broadcast(broadcastedDf), "value").count();
> {code}
> Job group and description do not apply on broadcasted dataframe. 
> With spark 2.x, broadcast creation is given the same job description as the 
> query itself. 
> This seems to be broken with spark 3.x
> See attached images
>  !spark3-image.png!  !spark2-image.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36080) Broadcast join outer join stream side

2021-07-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378463#comment-17378463
 ] 

Apache Spark commented on SPARK-36080:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/33288

> Broadcast join outer join stream side
> -
>
> Key: SPARK-36080
> URL: https://issues.apache.org/jira/browse/SPARK-36080
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36080) Broadcast join outer join stream side

2021-07-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378462#comment-17378462
 ] 

Apache Spark commented on SPARK-36080:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/33288

> Broadcast join outer join stream side
> -
>
> Key: SPARK-36080
> URL: https://issues.apache.org/jira/browse/SPARK-36080
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36080) Broadcast join outer join stream side

2021-07-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36080:


Assignee: (was: Apache Spark)

> Broadcast join outer join stream side
> -
>
> Key: SPARK-36080
> URL: https://issues.apache.org/jira/browse/SPARK-36080
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36080) Broadcast join outer join stream side

2021-07-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36080:


Assignee: Apache Spark

> Broadcast join outer join stream side
> -
>
> Key: SPARK-36080
> URL: https://issues.apache.org/jira/browse/SPARK-36080
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36047) Replace the handwriting compare methods with static compare methods in Java code

2021-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-36047:


Assignee: Yang Jie

> Replace the handwriting compare methods with static compare methods in Java 
> code
> 
>
> Key: SPARK-36047
> URL: https://issues.apache.org/jira/browse/SPARK-36047
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
>
> There are some handwriting compare methods like 
> `ShuffleInMemorySorter.SortComparator`
> {code:java}
> private static final class SortComparator implements 
> Comparator {
>   @Override
>   public int compare(PackedRecordPointer left, PackedRecordPointer right) {
> int leftId = left.getPartitionId();
> int rightId = right.getPartitionId();
> return Integer.compare(leftId, rightId);
>   }
> }
> {code}
> the handwriting compare methods can replace with `Integer.compare()` method 
> and similar methods after Java 1.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36047) Replace the handwriting compare methods with static compare methods in Java code

2021-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-36047.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33260
[https://github.com/apache/spark/pull/33260]

> Replace the handwriting compare methods with static compare methods in Java 
> code
> 
>
> Key: SPARK-36047
> URL: https://issues.apache.org/jira/browse/SPARK-36047
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.3.0
>
>
> There are some handwriting compare methods like 
> `ShuffleInMemorySorter.SortComparator`
> {code:java}
> private static final class SortComparator implements 
> Comparator {
>   @Override
>   public int compare(PackedRecordPointer left, PackedRecordPointer right) {
> int leftId = left.getPartitionId();
> int rightId = right.getPartitionId();
> return Integer.compare(leftId, rightId);
>   }
> }
> {code}
> the handwriting compare methods can replace with `Integer.compare()` method 
> and similar methods after Java 1.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-07-10 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-36073:
---
Description: 
Currently `EquivalentExpressions` has 2 issues:
- identifying common expressions in conditional expressions is not correct in 
all cases
- transparently canonicalized expressions (like `PromotePrecision`) are 
considered common subexpressions

  was:

Fixes an issue with identifying common expressions in conditional expressions 
(a side effect of the above).
Fixes the issue of transparently canonicalized expressions (like 
PromotePrecision) are considered common subexpressions.


> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Minor
>
> Currently `EquivalentExpressions` has 2 issues:
> - identifying common expressions in conditional expressions is not correct in 
> all cases
> - transparently canonicalized expressions (like `PromotePrecision`) are 
> considered common subexpressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-07-10 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-36073:
---
Issue Type: Bug  (was: Improvement)

> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Major
>
> Currently `EquivalentExpressions` has 2 issues:
> - identifying common expressions in conditional expressions is not correct in 
> all cases
> - transparently canonicalized expressions (like `PromotePrecision`) are 
> considered common subexpressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-07-10 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-36073:
---
Priority: Major  (was: Minor)

> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Major
>
> Currently `EquivalentExpressions` has 2 issues:
> - identifying common expressions in conditional expressions is not correct in 
> all cases
> - transparently canonicalized expressions (like `PromotePrecision`) are 
> considered common subexpressions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-07-10 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-36073:
---
Description: 

Fixes an issue with identifying common expressions in conditional expressions 
(a side effect of the above).
Fixes the issue of transparently canonicalized expressions (like 
PromotePrecision) are considered common subexpressions.

  was:SPARK-35410 
(https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112)
 filters out all child expressions, but in some cases that is not necessary.


> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Minor
>
> Fixes an issue with identifying common expressions in conditional expressions 
> (a side effect of the above).
> Fixes the issue of transparently canonicalized expressions (like 
> PromotePrecision) are considered common subexpressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-07-10 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-36073:
---
Summary: EquivalentExpressions fixes and improvements  (was: SubExpr 
elimination should include common child exprs of conditional expressions)

> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Priority: Minor
>
> SPARK-35410 
> (https://github.com/apache/spark/commit/9e1b204bcce4a8fe24c1edd8271197277b5017f4#diff-4d8c210a38fc808fef3e5c966b438591f225daa3c9fd69359446b94c351aa11eR106-R112)
>  filters out all child expressions, but in some cases that is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org