[jira] [Created] (SPARK-29176) Optimization should change join type to CROSS

2019-09-19 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-29176: - Summary: Optimization should change join type to CROSS Key: SPARK-29176 URL: https://issues.apache.org/jira/browse/SPARK-29176 Project: Spark Issue Type:

[jira] [Commented] (SPARK-18748) UDF multiple evaluations causes very poor performance

2019-10-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951672#comment-16951672 ] Enrico Minack commented on SPARK-18748: --- I think the behaviour of {{asNondeterministic()}} is

[jira] [Comment Edited] (SPARK-18748) UDF multiple evaluations causes very poor performance

2019-10-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951672#comment-16951672 ] Enrico Minack edited comment on SPARK-18748 at 10/15/19 7:16 AM: - I

[jira] [Commented] (SPARK-29176) Optimization should change join type to CROSS

2019-11-25 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981388#comment-16981388 ] Enrico Minack commented on SPARK-29176: --- This has been discussed on the dev mailing list:

[jira] [Resolved] (SPARK-29176) Optimization should change join type to CROSS

2019-11-25 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack resolved SPARK-29176. --- Resolution: Not A Problem > Optimization should change join type to CROSS >

[jira] [Created] (SPARK-30296) Dataset diffing transformation

2019-12-18 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30296: - Summary: Dataset diffing transformation Key: SPARK-30296 URL: https://issues.apache.org/jira/browse/SPARK-30296 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-30319) Adds a stricter version of as[T]

2019-12-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30319: - Summary: Adds a stricter version of as[T] Key: SPARK-30319 URL: https://issues.apache.org/jira/browse/SPARK-30319 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-30815) Function to format timestamp with time zone

2020-02-13 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30815: - Summary: Function to format timestamp with time zone Key: SPARK-30815 URL: https://issues.apache.org/jira/browse/SPARK-30815 Project: Spark Issue Type:

[jira] [Updated] (SPARK-30319) Adds a stricter version of as[T]

2020-02-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30319: -- Description: The behaviour of as[T] is not intuitive when you read code like

[jira] [Created] (SPARK-30957) Null-safe variant of Dataset.join(Dataset[_], Seq[String])

2020-02-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30957: - Summary: Null-safe variant of Dataset.join(Dataset[_], Seq[String]) Key: SPARK-30957 URL: https://issues.apache.org/jira/browse/SPARK-30957 Project: Spark

[jira] [Updated] (SPARK-30666) Reliable single-stage accumulators

2020-02-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30666: -- Description: This proposes a pragmatic improvement to allow for reliable single-stage

[jira] [Updated] (SPARK-30531) Duplicate query plan on Spark UI SQL page

2020-01-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30531: -- Affects Version/s: 2.4.4 > Duplicate query plan on Spark UI SQL page >

[jira] [Resolved] (SPARK-30296) Dataset diffing transformation

2020-01-21 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack resolved SPARK-30296. --- Resolution: Won't Do > Dataset diffing transformation > -- > >

[jira] [Updated] (SPARK-30296) Dataset diffing transformation

2020-01-21 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30296: -- Priority: Minor (was: Major) > Dataset diffing transformation >

[jira] [Created] (SPARK-31056) Add CalendarIntervals division

2020-03-05 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-31056: - Summary: Add CalendarIntervals division Key: SPARK-31056 URL: https://issues.apache.org/jira/browse/SPARK-31056 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30664: -- Attachment: image-2020-01-28-16-13-36-174.png > Add more metrics to the all stages page >

[jira] [Updated] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30664: -- Attachment: image-2020-01-28-16-12-49-807.png > Add more metrics to the all stages page >

[jira] [Created] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30664: - Summary: Add more metrics to the all stages page Key: SPARK-30664 URL: https://issues.apache.org/jira/browse/SPARK-30664 Project: Spark Issue Type:

[jira] [Updated] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30664: -- Attachment: Show Additional Metrics.png > Add more metrics to the all stages page >

[jira] [Updated] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30664: -- Attachment: image-2020-01-28-16-15-20-258.png > Add more metrics to the all stages page >

[jira] [Updated] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30664: -- Description: The web UI page for individual stages has many useful metrics to diagnose

[jira] [Updated] (SPARK-30664) Add more metrics to the all stages page

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30664: -- Attachment: (was: Show Additional Metrics.png) > Add more metrics to the all stages page

[jira] [Created] (SPARK-30666) Reliable single-stage accumulators

2020-01-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30666: - Summary: Reliable single-stage accumulators Key: SPARK-30666 URL: https://issues.apache.org/jira/browse/SPARK-30666 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-30666) Reliable single-stage accumulators

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30666: -- Description: This proposes a pragmatic improvement to allow for reliable single-stage

[jira] [Updated] (SPARK-30666) Reliable single-stage accumulators

2020-01-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30666: -- Component/s: (was: SQL) Spark Core > Reliable single-stage accumulators

[jira] [Updated] (SPARK-30666) Reliable single-stage accumulators

2020-02-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30666: -- Description: This proposes a pragmatic improvement to allow for reliable single-stage

[jira] [Updated] (SPARK-30666) Reliable single-stage accumulators

2020-02-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30666: -- Description: This proposes a pragmatic improvement to allow for reliable single-stage

[jira] [Updated] (SPARK-30666) Reliable single-stage accumulators

2020-02-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30666: -- Description: This proposes a pragmatic improvement to allow for reliable single-stage

[jira] [Created] (SPARK-30531) Duplicate query plan on Spark UI SQL page

2020-01-16 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-30531: - Summary: Duplicate query plan on Spark UI SQL page Key: SPARK-30531 URL: https://issues.apache.org/jira/browse/SPARK-30531 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-30319) Adds a stricter version of as[T]

2020-01-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30319: -- Affects Version/s: (was: 2.4.4) 3.0.0 > Adds a stricter version of

[jira] [Updated] (SPARK-30319) Adds a stricter version of as[T]

2020-01-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-30319: -- Fix Version/s: (was: 3.0.0) > Adds a stricter version of as[T] >

[jira] [Created] (SPARK-31853) Mention removal of params mixins setter in migration guide

2020-05-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-31853: - Summary: Mention removal of params mixins setter in migration guide Key: SPARK-31853 URL: https://issues.apache.org/jira/browse/SPARK-31853 Project: Spark

[jira] [Updated] (SPARK-31853) Mention removal of params mixins setter in migration guide

2020-05-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-31853: -- Description: In SPARK-29093, all setters have been removed from `Params` mixins in

[jira] [Created] (SPARK-32120) Single GPU is allocated multiple times

2020-06-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-32120: - Summary: Single GPU is allocated multiple times Key: SPARK-32120 URL: https://issues.apache.org/jira/browse/SPARK-32120 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-32120) Single GPU is allocated multiple times

2020-06-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-32120: -- Attachment: screenshot-1.png > Single GPU is allocated multiple times >

[jira] [Updated] (SPARK-32120) Single GPU is allocated multiple times

2020-06-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-32120: -- Attachment: screenshot-2.png > Single GPU is allocated multiple times >

[jira] [Updated] (SPARK-32120) Single GPU is allocated multiple times

2020-06-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-32120: -- Attachment: screenshot-3.png > Single GPU is allocated multiple times >

[jira] [Updated] (SPARK-32120) Single GPU is allocated multiple times

2020-06-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-32120: -- Description: I am running Spark in a {{local-cluster[2,1,1024]}} with one GPU per worker,

[jira] [Updated] (SPARK-32120) Single GPU is allocated multiple times

2020-06-28 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-32120: -- Attachment: (was: screenshot-1.png) > Single GPU is allocated multiple times >

[jira] [Commented] (SPARK-34806) Helper class for batch Dataset.observe()

2021-06-09 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360325#comment-17360325 ] Enrico Minack commented on SPARK-34806: --- Thanks for the input [~cloud_fan] , I have moved the

[jira] [Comment Edited] (SPARK-34806) Helper class for batch Dataset.observe()

2021-05-02 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312403#comment-17312403 ] Enrico Minack edited comment on SPARK-34806 at 5/2/21, 8:15 PM: The

[jira] [Commented] (SPARK-34806) Helper class for batch Dataset.observe()

2021-05-02 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338112#comment-17338112 ] Enrico Minack commented on SPARK-34806: --- [~cloud_fan] [~kabhwan]  [~hvanhovell] [~hyukjin.kwon]

[jira] [Resolved] (SPARK-32120) Single GPU is allocated multiple times

2021-01-27 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack resolved SPARK-32120. --- Resolution: Not A Bug Its documented, so it is not a bug. Thanks! > Single GPU is

[jira] [Commented] (SPARK-34806) Helper class for batch Dataset.observe()

2021-03-31 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-34806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312403#comment-17312403 ] Enrico Minack commented on SPARK-34806: --- The interaction with {{Observation}} can be split into

[jira] [Created] (SPARK-34806) Helper class for batch Dataset.observe()

2021-03-19 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-34806: - Summary: Helper class for batch Dataset.observe() Key: SPARK-34806 URL: https://issues.apache.org/jira/browse/SPARK-34806 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark

2021-07-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-36263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-36263: -- Description: With SPARK-34806 we now have a way to use the `Dataset.observe` method without

[jira] [Created] (SPARK-36263) Add Dataset.observe(Observation, Column, Column*) to PySpark

2021-07-22 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-36263: - Summary: Add Dataset.observe(Observation, Column, Column*) to PySpark Key: SPARK-36263 URL: https://issues.apache.org/jira/browse/SPARK-36263 Project: Spark

[jira] [Created] (SPARK-36319) Have Observation return Map instead of Row

2021-07-27 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-36319: - Summary: Have Observation return Map instead of Row Key: SPARK-36319 URL: https://issues.apache.org/jira/browse/SPARK-36319 Project: Spark Issue Type:

[jira] [Created] (SPARK-38591) Add flatMapSortedGroups to KeyValueGroupedDataset

2022-03-17 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-38591: - Summary: Add flatMapSortedGroups to KeyValueGroupedDataset Key: SPARK-38591 URL: https://issues.apache.org/jira/browse/SPARK-38591 Project: Spark Issue

[jira] [Updated] (SPARK-38647) Add SupportsReportOrdering mix in interface for Scan

2022-03-24 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38647: -- Description: As {{SupportsReportPartitioning}} allows implementations of {{Scan}} provide

[jira] [Created] (SPARK-38647) Add SupportsReportOrdering mix in interface for Scan

2022-03-24 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-38647: - Summary: Add SupportsReportOrdering mix in interface for Scan Key: SPARK-38647 URL: https://issues.apache.org/jira/browse/SPARK-38647 Project: Spark Issue

[jira] [Created] (SPARK-38864) Melt function for Dataset API

2022-04-11 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-38864: - Summary: Melt function for Dataset API Key: SPARK-38864 URL: https://issues.apache.org/jira/browse/SPARK-38864 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-38864) Melt function for Dataset API

2022-04-11 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38864: -- Description: As pointed out in SPARK-30273 and SPARK-37799, the Dataset API provides the

[jira] [Created] (SPARK-45708) Retry mvn deploy failures

2023-10-27 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-45708: - Summary: Retry mvn deploy failures Key: SPARK-45708 URL: https://issues.apache.org/jira/browse/SPARK-45708 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-09-19 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766881#comment-17766881 ] Enrico Minack commented on SPARK-38200: --- Sadly, still no feedback from reviewers. > [SQL] Spark

[jira] [Created] (SPARK-45651) Snapshots of some packages are not published any more

2023-10-24 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-45651: - Summary: Snapshots of some packages are not published any more Key: SPARK-45651 URL: https://issues.apache.org/jira/browse/SPARK-45651 Project: Spark

[jira] [Updated] (SPARK-38970) Skip build-and-test workflow on forks when scheduled

2022-04-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38970: -- Summary: Skip build-and-test workflow on forks when scheduled (was: Check for changes only

[jira] [Created] (SPARK-38966) Fix CI for fork branches in-sync with upstream master

2022-04-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-38966: - Summary: Fix CI for fork branches in-sync with upstream master Key: SPARK-38966 URL: https://issues.apache.org/jira/browse/SPARK-38966 Project: Spark

[jira] [Created] (SPARK-38970) Check for changes only if changes are being built / tested (not on forks)

2022-04-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-38970: - Summary: Check for changes only if changes are being built / tested (not on forks) Key: SPARK-38970 URL: https://issues.apache.org/jira/browse/SPARK-38970 Project:

[jira] [Updated] (SPARK-38833) PySpark applyInPandas should allow to return empty DataFrame without columns

2022-04-08 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38833: -- Summary: PySpark applyInPandas should allow to return empty DataFrame without columns (was:

[jira] [Created] (SPARK-38833) PySpark allows applyInPandas return empty DataFrame without columns

2022-04-08 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-38833: - Summary: PySpark allows applyInPandas return empty DataFrame without columns Key: SPARK-38833 URL: https://issues.apache.org/jira/browse/SPARK-38833 Project: Spark

[jira] [Created] (SPARK-39038) Skip reporting test results if triggering workflow was skipped

2022-04-27 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39038: - Summary: Skip reporting test results if triggering workflow was skipped Key: SPARK-39038 URL: https://issues.apache.org/jira/browse/SPARK-39038 Project: Spark

[jira] [Created] (SPARK-39292) Make Dataset.melt work with struct fields

2022-05-25 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39292: - Summary: Make Dataset.melt work with struct fields Key: SPARK-39292 URL: https://issues.apache.org/jira/browse/SPARK-39292 Project: Spark Issue Type:

[jira] [Updated] (SPARK-39292) Make Dataset.melt work with struct fields

2022-05-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39292: -- Description: In SPARK-38864, the melt function was added to Dataset. It would be nice if

[jira] [Created] (SPARK-39644) Add RangePartitioning to DataSource V2

2022-06-30 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39644: - Summary: Add RangePartitioning to DataSource V2 Key: SPARK-39644 URL: https://issues.apache.org/jira/browse/SPARK-39644 Project: Spark Issue Type: New

[jira] [Commented] (SPARK-39644) Add RangePartitioning to DataSource V2

2022-06-30 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561132#comment-17561132 ] Enrico Minack commented on SPARK-39644: --- [~csun] As discussed, I'll be working on this. > Add

[jira] [Commented] (SPARK-39292) Make Dataset.melt work with struct fields

2022-06-03 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17546058#comment-17546058 ] Enrico Minack commented on SPARK-39292: --- This is being fixed as part of

[jira] [Updated] (SPARK-39532) Move checkout and sync steps into re-usable composite action

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39532: -- Parent: SPARK-39515 Issue Type: Sub-task (was: Improvement) > Move checkout and sync

[jira] [Created] (SPARK-39532) Move checkout and sync steps into re-usable composite action

2022-06-20 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39532: - Summary: Move checkout and sync steps into re-usable composite action Key: SPARK-39532 URL: https://issues.apache.org/jira/browse/SPARK-39532 Project: Spark

[jira] [Commented] (SPARK-39529) Refactor and merge all related job selection logic into precondition

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556323#comment-17556323 ] Enrico Minack commented on SPARK-39529: --- A first good step could be to merge {{precondition}} into

[jira] [Commented] (SPARK-39532) Move checkout and sync steps into re-usable composite action

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556316#comment-17556316 ] Enrico Minack commented on SPARK-39532: --- This composite action first becomes available once merged

[jira] [Commented] (SPARK-39515) Improve/recover scheduled jobs in GitHub Actions

2022-06-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556265#comment-17556265 ] Enrico Minack commented on SPARK-39515: --- I can give [https://github.com/apache/spark/pull/36888]

[jira] [Resolved] (SPARK-39292) Make Dataset.melt work with struct fields

2022-07-16 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack resolved SPARK-39292. --- Resolution: Fixed > Make Dataset.melt work with struct fields >

[jira] [Updated] (SPARK-38864) Unpivot / melt function for Dataset API

2022-07-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38864: -- Summary: Unpivot / melt function for Dataset API (was: Melt function for Dataset API) >

[jira] [Created] (SPARK-39876) Unpivot / melt function for SQL

2022-07-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39876: - Summary: Unpivot / melt function for SQL Key: SPARK-39876 URL: https://issues.apache.org/jira/browse/SPARK-39876 Project: Spark Issue Type: New Feature

[jira] [Created] (SPARK-39877) Unpivot / melt function for PySpark

2022-07-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39877: - Summary: Unpivot / melt function for PySpark Key: SPARK-39877 URL: https://issues.apache.org/jira/browse/SPARK-39877 Project: Spark Issue Type: New

[jira] [Created] (SPARK-39878) Migrate melt function in Pandas API to PySpark / Scala unpivot

2022-07-26 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39878: - Summary: Migrate melt function in Pandas API to PySpark / Scala unpivot Key: SPARK-39878 URL: https://issues.apache.org/jira/browse/SPARK-39878 Project: Spark

[jira] [Updated] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-07-15 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39783: -- Description: The following code references a nested value {{{}`the`.`id`{}}}, that does not

[jira] [Commented] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-07-14 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566998#comment-17566998 ] Enrico Minack commented on SPARK-39783: --- [~srielau] [~cloud_fan]  > Wrong column backticks in

[jira] [Created] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-07-14 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39783: - Summary: Wrong column backticks in UNRESOLVED_COLUMN error Key: SPARK-39783 URL: https://issues.apache.org/jira/browse/SPARK-39783 Project: Spark Issue

[jira] [Created] (SPARK-39074) Fail on uploading test files, not when downloading them

2022-04-29 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-39074: - Summary: Fail on uploading test files, not when downloading them Key: SPARK-39074 URL: https://issues.apache.org/jira/browse/SPARK-39074 Project: Spark

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling {{DataFrame.groupby(...).applyInPandas(...)}} for very small groups in

[jira] [Updated] (SPARK-39931) Improve performance of applyInPandas for very small groups

2022-08-26 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-39931: -- Description: Calling `DataFrame.groupby(...).applyInPandas(...)` for very small groups in

[jira] [Created] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40601: - Summary: Improve error when cogrouping groups with mismatching key sizes Key: SPARK-40601 URL: https://issues.apache.org/jira/browse/SPARK-40601 Project: Spark

[jira] [Updated] (SPARK-40830) Dataset.groupBy.as should be preferred over Dataset.groupByKey

2022-10-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40830: -- Description: Calling {{Dataset.groupBy(...).as[K, T]}} should be preferred over calling

[jira] [Updated] (SPARK-38591) Add sortWithinGroups to KeyValueGroupedDataset

2022-10-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38591: -- Summary: Add sortWithinGroups to KeyValueGroupedDataset (was: Add flatMapSortedGroups and

[jira] [Created] (SPARK-40830) Dataset.groupBy.as should be preferred over Dataset.groupByKey

2022-10-18 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40830: - Summary: Dataset.groupBy.as should be preferred over Dataset.groupByKey Key: SPARK-40830 URL: https://issues.apache.org/jira/browse/SPARK-40830 Project: Spark

[jira] [Updated] (SPARK-40830) Dataset.groupBy.as should be preferred over Dataset.groupByKey

2022-10-18 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40830: -- Priority: Minor (was: Trivial) > Dataset.groupBy.as should be preferred over

[jira] [Updated] (SPARK-38591) Add flatMapSortedGroups and cogroupSorted to KeyValueGroupedDataset

2022-10-17 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-38591: -- Description: The existing methods {{KeyValueGroupedDataset.flatMapGroups}} and

[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/20/22 4:23 PM: - Here

[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/20/22 4:23 PM: - Here

[jira] [Updated] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40819: -- Affects Version/s: 3.4.0 3.3.1 3.2.3

[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on

2022-10-20 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621032#comment-17621032 ] Enrico Minack commented on SPARK-40588: --- Here is a more concise and complete example to reproduce

[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/22/22 1:02 PM: - Here

[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17622620#comment-17622620 ] Enrico Minack commented on SPARK-40588: --- Even with AQE enabled (pre Spark 3.4.0), the written

[jira] [Updated] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-40588: -- Summary: Sorting issue with partitioned-writing and AQE turned on (was: Sorting issue with

[jira] [Comment Edited] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/23/22 4:55 PM: - Here

[jira] [Comment Edited] (SPARK-40588) Sorting issue with partitioned-writing and AQE turned on

2022-10-23 Thread Enrico Minack (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/23/22 4:55 PM: - Here

  1   2   >