[jira] [Resolved] (SPARK-43126) mark two Hive UDF expressions as stateful

2023-04-13 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43126. -- Fix Version/s: 3.4.1 3.5.0 Resolution: Fixed > mark two Hive UDF expressions

[jira] [Assigned] (SPARK-43126) mark two Hive UDF expressions as stateful

2023-04-13 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43126: Assignee: Wenchen Fan > mark two Hive UDF expressions as stateful > -

[jira] [Assigned] (SPARK-43121) Use `BytesWritable.copyBytes` instead of manual copy in `HiveInspectors`

2023-04-13 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43121: Assignee: Yang Jie > Use `BytesWritable.copyBytes` instead of manual copy in `HiveInspectors` > -

[jira] [Resolved] (SPARK-43121) Use `BytesWritable.copyBytes` instead of manual copy in `HiveInspectors`

2023-04-13 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-43121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43121. -- Fix Version/s: 3.5.0 Resolution: Fixed > Use `BytesWritable.copyBytes` instead of manual copy i

[jira] [Updated] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-42480: - Fix Version/s: 3.4.0 > Improve the performance of drop partitions >

[jira] [Updated] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-42480: - Fix Version/s: (was: 3.5.0) > Improve the performance of drop partitions > -

[jira] [Assigned] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-42480: Assignee: Wechar > Improve the performance of drop partitions > -

[jira] [Resolved] (SPARK-42480) Improve the performance of drop partitions

2023-03-08 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-42480. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40069 [https://github.com

[jira] [Updated] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-42539: - Fix Version/s: 3.4.0 > User-provided JARs can override Spark's Hive metadata client JARs when using > "

[jira] [Resolved] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-42539. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40144 [https://github.com

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-42539: Assignee: Erik Krogen > User-provided JARs can override Spark's Hive metadata client JARs when us

[jira] [Resolved] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-20 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41952. -- Fix Version/s: 3.2.4 3.4.0 3.3.3 Resolution: Fixed > Upgr

[jira] [Assigned] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-20 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41952: Assignee: Cheng Pan > Upgrade Parquet to fix off-heap memory leaks in Zstd codec > --

[jira] [Created] (SPARK-42454) SPJ: encapsulate all SPJ related parameters in BatchScanExec

2023-02-15 Thread Chao Sun (Jira)
Chao Sun created SPARK-42454: Summary: SPJ: encapsulate all SPJ related parameters in BatchScanExec Key: SPARK-42454 URL: https://issues.apache.org/jira/browse/SPARK-42454 Project: Spark Issue T

[jira] [Commented] (SPARK-33807) Data Source V2: Remove read specific distributions

2023-02-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685674#comment-17685674 ] Chao Sun commented on SPARK-33807: -- This is actually already resolved as part of SPARK-

[jira] [Assigned] (SPARK-33807) Data Source V2: Remove read specific distributions

2023-02-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-33807: Assignee: (was: Chao Sun) > Data Source V2: Remove read specific distributions >

[jira] [Assigned] (SPARK-33807) Data Source V2: Remove read specific distributions

2023-02-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-33807: Assignee: Chao Sun > Data Source V2: Remove read specific distributions > ---

[jira] [Updated] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-02-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41470: - Fix Version/s: 3.4.0 (was: 3.5.0) > SPJ: Spark shouldn't assume InternalRow imple

[jira] [Assigned] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-02-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41470: Assignee: Mars > SPJ: Spark shouldn't assume InternalRow implements equals and hashCode > ---

[jira] [Resolved] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2023-02-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41470. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39687 [https://github.com

[jira] [Created] (SPARK-42040) SPJ: Introduce a new API for V2 input partition to report partition size

2023-01-12 Thread Chao Sun (Jira)
Chao Sun created SPARK-42040: Summary: SPJ: Introduce a new API for V2 input partition to report partition size Key: SPARK-42040 URL: https://issues.apache.org/jira/browse/SPARK-42040 Project: Spark

[jira] [Created] (SPARK-42039) SPJ: Remove Option in KeyGroupedPartitioning#partitionValues

2023-01-12 Thread Chao Sun (Jira)
Chao Sun created SPARK-42039: Summary: SPJ: Remove Option in KeyGroupedPartitioning#partitionValues Key: SPARK-42039 URL: https://issues.apache.org/jira/browse/SPARK-42039 Project: Spark Issue T

[jira] [Created] (SPARK-42038) SPJ: Support partially clustered distribution

2023-01-12 Thread Chao Sun (Jira)
Chao Sun created SPARK-42038: Summary: SPJ: Support partially clustered distribution Key: SPARK-42038 URL: https://issues.apache.org/jira/browse/SPARK-42038 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-36529) Decouple CPU with IO work in vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36529: - Parent: (was: SPARK-35743) Issue Type: Bug (was: Sub-task) > Decouple CPU with IO work in v

[jira] [Updated] (SPARK-36529) Decouple CPU with IO work in vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36529: - Issue Type: Improvement (was: Bug) > Decouple CPU with IO work in vectorized Parquet reader > -

[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36528: - Parent: (was: SPARK-35743) Issue Type: Bug (was: Sub-task) > Implement lazy decoding for th

[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36528: - Issue Type: New Feature (was: Bug) > Implement lazy decoding for the vectorized Parquet reader > --

[jira] [Resolved] (SPARK-35743) Improve Parquet vectorized reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-35743. -- Fix Version/s: 3.4.0 Resolution: Fixed > Improve Parquet vectorized reader > --

[jira] [Updated] (SPARK-36527) Implement lazy materialization for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-36527: - Parent: (was: SPARK-35743) Issue Type: Improvement (was: Sub-task) > Implement lazy materia

[jira] [Assigned] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-22 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41413: Assignee: Chao Sun > SPJ: Avoid shuffle when partition keys mismatch, but join expressions are >

[jira] [Resolved] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-22 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41413. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38950 [https://github.com

[jira] [Updated] (SPARK-41470) SPJ: Spark shouldn't assume InternalRow implements equals and hashCode

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41470: - Summary: SPJ: Spark shouldn't assume InternalRow implements equals and hashCode (was: SPJ shouldn't ass

[jira] [Updated] (SPARK-41471) SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41471: - Summary: SPJ: Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning (was: SPJ: re

[jira] [Updated] (SPARK-40946) SPJ: Introduce a new DataSource V2 interface SupportsPushDownClusterKeys

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40946: - Summary: SPJ: Introduce a new DataSource V2 interface SupportsPushDownClusterKeys (was: Introduce a new

[jira] [Updated] (SPARK-41398) SPJ: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41398: - Summary: SPJ: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering

[jira] [Updated] (SPARK-37375) Umbrella: Storage Partitioned Join (SPJ)

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37375: - Summary: Umbrella: Storage Partitioned Join (SPJ) (was: Umbrella: Storage Partitioned Join) > Umbrella

[jira] [Updated] (SPARK-41413) SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41413: - Summary: SPJ: Spark should avoid shuffle when partition keys mismatch, but join expressions are compatib

[jira] [Updated] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-41413: - Summary: SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible (was: SPJ

[jira] [Updated] (SPARK-37377) SPJ: Initial implementation of Storage-Partitioned Join

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37377: - Summary: SPJ: Initial implementation of Storage-Partitioned Join (was: Initial implementation of Storag

[jira] [Created] (SPARK-41471) SPJ: reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning

2022-12-09 Thread Chao Sun (Jira)
Chao Sun created SPARK-41471: Summary: SPJ: reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning Key: SPARK-41471 URL: https://issues.apache.org/jira/browse/SPARK-41471 Project: Spa

[jira] [Updated] (SPARK-37378) SPJ: Convert V2 Transform expressions into catalyst expressions and load their associated functions from V2 FunctionCatalog

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37378: - Summary: SPJ: Convert V2 Transform expressions into catalyst expressions and load their associated funct

[jira] [Updated] (SPARK-37376) SPJ: Introduce a new DataSource V2 interface HasPartitionKey

2022-12-09 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-37376: - Summary: SPJ: Introduce a new DataSource V2 interface HasPartitionKey (was: Introduce a new DataSource

[jira] [Created] (SPARK-41470) SPJ shouldn't assume InternalRow implements equals and hashCode

2022-12-09 Thread Chao Sun (Jira)
Chao Sun created SPARK-41470: Summary: SPJ shouldn't assume InternalRow implements equals and hashCode Key: SPARK-41470 URL: https://issues.apache.org/jira/browse/SPARK-41470 Project: Spark Issu

[jira] [Created] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Chao Sun (Jira)
Chao Sun created SPARK-41413: Summary: Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible Key: SPARK-41413 URL: https://issues.apache.org/jira/browse/SPARK-

[jira] [Created] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-05 Thread Chao Sun (Jira)
Chao Sun created SPARK-41398: Summary: Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match Key: SPARK-41398 URL: https://issues.apache.org/jira/browse/SPARK-41398

[jira] [Assigned] (SPARK-41096) Support reading parquet FIXED_LEN_BYTE_ARRAY type

2022-11-14 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-41096: Assignee: Kazuyuki Tanimura > Support reading parquet FIXED_LEN_BYTE_ARRAY type > ---

[jira] [Resolved] (SPARK-41096) Support reading parquet FIXED_LEN_BYTE_ARRAY type

2022-11-14 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-41096. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38628 [https://github.com

[jira] [Created] (SPARK-41091) Fix Docker release tool for branch-3.2

2022-11-09 Thread Chao Sun (Jira)
Chao Sun created SPARK-41091: Summary: Fix Docker release tool for branch-3.2 Key: SPARK-41091 URL: https://issues.apache.org/jira/browse/SPARK-41091 Project: Spark Issue Type: Improvement

[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER295436): Thank you for sharing such good information. Very informative and effective post.  +[https://www.igmguru.com/digital-mar

[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-31 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER294516): Great job. [Salesforce Marketing Cloud Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-traini

[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-22 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER295111): Very informative and effective post.  [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/sales

[jira] [Commented] (SPARK-40876) Spark's Vectorized ParquetReader should support type promotions

2022-10-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622514#comment-17622514 ] Chao Sun commented on SPARK-40876: -- Yes, Spark doesn't support int -> long for Parquet.

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-10 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615255#comment-17615255 ] Chao Sun commented on SPARK-40703: -- Thanks [~bryanck] . Now I see where the issue is.

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614303#comment-17614303 ] Chao Sun commented on SPARK-40703: -- Hmm somehow in the unit test I was able to see that

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614298#comment-17614298 ] Chao Sun commented on SPARK-40703: -- (one idea is that {{SinglePartitionSpec#canCreatePa

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614296#comment-17614296 ] Chao Sun commented on SPARK-40703: -- Hmm interesting. Let me try to come up with a unit

[jira] [Commented] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614289#comment-17614289 ] Chao Sun commented on SPARK-40703: -- I see. The reason HashPartitioning is not picked as

[jira] [Updated] (SPARK-40703) Performance regression for joins in Spark 3.3 vs Spark 3.2

2022-10-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40703: - Component/s: SQL (was: Spark Core) > Performance regression for joins in Spark 3.3

[jira] [Commented] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607869#comment-17607869 ] Chao Sun commented on SPARK-40508: -- [~dongjoon][~viirya] could you add [~yuzhih...@gmai

[jira] [Resolved] (SPARK-40508) Treat unknown partitioning as UnknownPartitioning

2022-09-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40508. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37952 [https://github.com

[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40169: - Fix Version/s: 3.2.3 > Fix the issue with Parquet column index and predicate pushdown in Data source >

[jira] [Assigned] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40169: Assignee: Chao Sun > Fix the issue with Parquet column index and predicate pushdown in Data sourc

[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-40169: - Fix Version/s: 3.3.1 > Fix the issue with Parquet column index and predicate pushdown in Data source >

[jira] [Resolved] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-09-16 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40169. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37881 [https://github.com

[jira] [Resolved] (SPARK-40295) Allow v2 functions with literal args in write distribution and ordering

2022-09-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40295. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37749 [https://github.com

[jira] [Assigned] (SPARK-40295) Allow v2 functions with literal args in write distribution and ordering

2022-09-07 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40295: Assignee: Anton Okolnychyi > Allow v2 functions with literal args in write distribution and order

[jira] [Commented] (SPARK-40128) Add DELTA_LENGTH_BYTE_ARRAY as a recognized standalone encoding in VectorizedColumnReader

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17581073#comment-17581073 ] Chao Sun commented on SPARK-40128: -- Seems we need to add [~dennishuo] as Spark contribu

[jira] [Resolved] (SPARK-40128) Add DELTA_LENGTH_BYTE_ARRAY as a recognized standalone encoding in VectorizedColumnReader

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40128. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37557 [https://github.com

[jira] [Assigned] (SPARK-40110) Add JDBCWithAQESuite

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40110: Assignee: Kazuyuki Tanimura > Add JDBCWithAQESuite > > > Key

[jira] [Resolved] (SPARK-40110) Add JDBCWithAQESuite

2022-08-17 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40110. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37544 [https://github.com

[jira] [Assigned] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-12 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-40052: Assignee: Ivan Sadikov > Handle direct byte buffers in VectorizedDeltaBinaryPackedReader > --

[jira] [Resolved] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-12 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-40052. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37485 [https://github.com

[jira] [Updated] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39833: - Affects Version/s: 3.3.0 > Filtered parquet data frame count() and show() produce inconsistent results

[jira] [Commented] (SPARK-39863) Upgrade Hadoop to 3.3.4

2022-08-03 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574770#comment-17574770 ] Chao Sun commented on SPARK-39863: -- Thanks [~ste...@apache.org], noted > Upgrade Hadoo

[jira] [Updated] (SPARK-39951) Support columnar batches with nested fields in Parquet V2

2022-08-02 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39951: - Fix Version/s: 3.3.1 > Support columnar batches with nested fields in Parquet V2 > -

[jira] [Resolved] (SPARK-39951) Support columnar batches with nested fields in Parquet V2

2022-08-02 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39951. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37379 [https://github.com

[jira] [Created] (SPARK-39863) Upgrade Hadoop to 3.3.4

2022-07-25 Thread Chao Sun (Jira)
Chao Sun created SPARK-39863: Summary: Upgrade Hadoop to 3.3.4 Key: SPARK-39863 URL: https://issues.apache.org/jira/browse/SPARK-39863 Project: Spark Issue Type: Improvement Components:

[jira] [Created] (SPARK-39657) YARN AM client should call the non-static setTokensConf method

2022-07-01 Thread Chao Sun (Jira)
Chao Sun created SPARK-39657: Summary: YARN AM client should call the non-static setTokensConf method Key: SPARK-39657 URL: https://issues.apache.org/jira/browse/SPARK-39657 Project: Spark Issue

[jira] [Resolved] (SPARK-39638) Change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`

2022-06-30 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39638. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37029 [https://github.com

[jira] [Assigned] (SPARK-39638) Change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`

2022-06-30 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-39638: Assignee: Yang Jie > Change to use `ConstantColumnVector` to store partition columns in > `OrcCo

[jira] [Commented] (SPARK-39644) Add RangePartitioning to DataSource V2

2022-06-30 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561196#comment-17561196 ] Chao Sun commented on SPARK-39644: -- Thanks. Following this JIRA now. > Add RangePartit

[jira] [Assigned] (SPARK-39231) Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-06-29 Thread Chao Sun (Jira)
Title: Message Title Chao Sun assigned an

[jira] [Resolved] (SPARK-39231) Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-06-29 Thread Chao Sun (Jira)
Title: Message Title Chao Sun resolved as

[jira] [Updated] (SPARK-34863) Support nested column in Spark Parquet vectorized readers

2022-06-27 Thread Chao Sun (Jira)
Title: Message Title Chao Sun updated an i

[jira] [Resolved] (SPARK-38647) Add SupportsReportOrdering mix in interface for Scan

2022-06-21 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38647. -- Fix Version/s: 3.4.0 Assignee: Enrico Minack Resolution: Fixed > Add SupportsReportOrd

[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545268#comment-17545268 ] Chao Sun commented on SPARK-29260: -- Thanks [~yumwang]. Spark currently throw exception

[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545166#comment-17545166 ] Chao Sun commented on SPARK-29260: -- [~yumwang] Looks like HIVE-8472 is for the server s

[jira] [Updated] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39313: - Fix Version/s: 3.3.0 > V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be > tr

[jira] [Assigned] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-39313: Assignee: Cheng Pan > V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be

[jira] [Resolved] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39313. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36697 [https://github.com

[jira] [Updated] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-05-27 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-39313: - Priority: Blocker (was: Critical) > V2ExpressionUtils.toCatalystOrdering should fail if V2Expression ca

[jira] [Resolved] (SPARK-39086) Support UDT in Spark Parquet vectorized reader

2022-05-11 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-39086. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36427 [https://github.com

[jira] [Assigned] (SPARK-39086) Support UDT in Spark Parquet vectorized reader

2022-05-11 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-39086: Assignee: Ivan Sadikov > Support UDT in Spark Parquet vectorized reader > ---

[jira] [Created] (SPARK-39119) Upgrade to Hadoop 3.3.3

2022-05-06 Thread Chao Sun (Jira)
Chao Sun created SPARK-39119: Summary: Upgrade to Hadoop 3.3.3 Key: SPARK-39119 URL: https://issues.apache.org/jira/browse/SPARK-39119 Project: Spark Issue Type: Improvement Components:

[jira] [Updated] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-05-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated SPARK-38891: - Fix Version/s: 3.3.0 > Skipping allocating vector for repetition & definition levels when possible > ---

[jira] [Resolved] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-05-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38891. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36202 [https://github.com

[jira] [Assigned] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-05-04 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-38891: Assignee: Chao Sun > Skipping allocating vector for repetition & definition levels when possible

[jira] [Assigned] (SPARK-38573) Support Auto Partition Statistics Collection

2022-04-15 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-38573: Assignee: Kazuyuki Tanimura > Support Auto Partition Statistics Collection >

[jira] [Resolved] (SPARK-38573) Support Auto Partition Statistics Collection

2022-04-15 Thread Chao Sun (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-38573. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36067 [https://github.com

[jira] [Created] (SPARK-38891) Skipping allocating vector for repetition & definition levels when possible

2022-04-13 Thread Chao Sun (Jira)
Chao Sun created SPARK-38891: Summary: Skipping allocating vector for repetition & definition levels when possible Key: SPARK-38891 URL: https://issues.apache.org/jira/browse/SPARK-38891 Project: Spark

<    1   2   3   4   5   >