[jira] [Commented] (SPARK-40048) Partitions are traversed multiple times invalidating Accumulator consistency

2022-08-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579214#comment-17579214
 ] 

Hyukjin Kwon commented on SPARK-40048:
--

Spark 2.4 is EOL. mind trying Spark 3.1+?

> Partitions are traversed multiple times invalidating Accumulator consistency
> 
>
> Key: SPARK-40048
> URL: https://issues.apache.org/jira/browse/SPARK-40048
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: sam
>Priority: Major
>
> We are trying to use Accumulators to count RDDs without having to force 
> `.count()` on them for efficiency reasons.  We are aware tasks can fail and 
> re-run, which will invalidate the value of the accumulator, so we count the 
> number of times a partition has been traversed, so we can detect this.
> The problem is that partitions are being traversed multiple times even though
>  - We cache the RDD in memory _after we have applied the logic below_
>  - No tasks are failing, no executors are dying.
>  - There is plenty of memory (no RDD eviction)
> The code we use:
> ```
> val count: LongAccumulator
> val partitionTraverseCounts: List[LongAccumulator]
> def incrementTimesCalled(partitionIndex: Int): Unit =
>   partitionTraverseCounts(partitionIndex).add(1)
> def incrementForPartition[T](index: Int, it: Iterator[T]): Iterator[T] = {
> incrementTimesCalled(index)
> it.map { x =>
>   increment()
>   x
> }
>   }
> ```
> How we use the above:
> ```
> rdd.mapPartitionsWithIndex(safeCounter.incrementForPartition)
> ```
> We have a 50 partition RDD, and we frequently see odd traverse counts:
> ```
> traverseCounts: List(2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
> 2, 2, 2, 1, 2)
> ```
> As you can see, some partitions are traversed twice, while others are 
> traversed only once.
> To confirm no task failures:
> ```
> cat job.log | grep -i task | grep -i fail
> ```
> To confirm no memory issues:
> ```
> cat job.log | grep -i memory
> ```
> We see every log line has multiple GB memory free.
> We also don't see any errors or exceptions.
> Question:
> 1. Why is spark traversing a cached RDD multiple times?
> 2. Is there any way to disable this?
> Many thanks,
> Sam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579213#comment-17579213
 ] 

Hyukjin Kwon commented on SPARK-40063:
--

{quote}
 it ends up mixing the column's rows ordering.
{quote}

Can you show the expected/actual output? What's column's rows ordering?

> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
> Environment: Databricks Runtime 11.1
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>  Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.
> Setting one column as index also didn't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40068) Extend new heartbeat mechanism to YARN

2022-08-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40068:
-
Component/s: YARN
 (was: Spark Core)

> Extend new heartbeat mechanism to YARN
> --
>
> Key: SPARK-40068
> URL: https://issues.apache.org/jira/browse/SPARK-40068
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.4.0
>Reporter: Kai-Hsun Chen
>Priority: Major
>
> Extend the new heartbeat mechanism in SPARK-39984 to YARN.
>  
> SPARK-39984 issue:
> [https://issues.apache.org/jira/projects/SPARK/issues/SPARK-39984?filter=allopenissues]
>  
> SPARK-39984 PR:
> [https://github.com/apache/spark/pull/37411]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40069) Extend the new heartbeat mechanism to Kubernetes

2022-08-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40069:
-
Component/s: Kubernetes
 (was: Spark Core)

> Extend the new heartbeat mechanism to Kubernetes
> 
>
> Key: SPARK-40069
> URL: https://issues.apache.org/jira/browse/SPARK-40069
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Kai-Hsun Chen
>Priority: Major
>
> Extend the new heartbeat mechanism in SPARK-39984 to Kubernetes.
>  
> SPARK-39984 issue:
> [https://issues.apache.org/jira/projects/SPARK/issues/SPARK-39984?filter=allopenissues]
>  
> SPARK-39984 PR:
> [https://github.com/apache/spark/pull/37411]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40061) Document cast of ANSI intervals

2022-08-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40061.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37495
[https://github.com/apache/spark/pull/37495]

> Document cast of ANSI intervals
> ---
>
> Key: SPARK-40061
> URL: https://issues.apache.org/jira/browse/SPARK-40061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Update the doc page 
> https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#cast 
> regarding cast of ANSI intervals to/from decimals/integrals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40069) Extend the new heartbeat mechanism to Kubernetes

2022-08-12 Thread Kai-Hsun Chen (Jira)
Kai-Hsun Chen created SPARK-40069:
-

 Summary: Extend the new heartbeat mechanism to Kubernetes
 Key: SPARK-40069
 URL: https://issues.apache.org/jira/browse/SPARK-40069
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Kai-Hsun Chen


Extend the new heartbeat mechanism in SPARK-39984 to Kubernetes.

 

SPARK-39984 issue:

[https://issues.apache.org/jira/projects/SPARK/issues/SPARK-39984?filter=allopenissues]

 

SPARK-39984 PR:

[https://github.com/apache/spark/pull/37411]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40067) Add table name to Spark plan node in SparkUI

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579209#comment-17579209
 ] 

Apache Spark commented on SPARK-40067:
--

User 'sumeetgajjar' has created a pull request for this issue:
https://github.com/apache/spark/pull/37505

> Add table name to Spark plan node in SparkUI
> 
>
> Key: SPARK-40067
> URL: https://issues.apache.org/jira/browse/SPARK-40067
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.4.0
>Reporter: Sumeet
>Priority: Major
>
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced 
> `Scan#name()` API to expose the name of the TableScan in the `BatchScan` node 
> in SparkUI.
> However, a better suggestion was to use the `Table#name()`. Furthermore, we 
> can also extract other useful information `Table` thus revert 
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use 
> `Table` to fetch relevant information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40067) Add table name to Spark plan node in SparkUI

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40067:


Assignee: (was: Apache Spark)

> Add table name to Spark plan node in SparkUI
> 
>
> Key: SPARK-40067
> URL: https://issues.apache.org/jira/browse/SPARK-40067
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.4.0
>Reporter: Sumeet
>Priority: Major
>
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced 
> `Scan#name()` API to expose the name of the TableScan in the `BatchScan` node 
> in SparkUI.
> However, a better suggestion was to use the `Table#name()`. Furthermore, we 
> can also extract other useful information `Table` thus revert 
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use 
> `Table` to fetch relevant information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40067) Add table name to Spark plan node in SparkUI

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40067:


Assignee: Apache Spark

> Add table name to Spark plan node in SparkUI
> 
>
> Key: SPARK-40067
> URL: https://issues.apache.org/jira/browse/SPARK-40067
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.4.0
>Reporter: Sumeet
>Assignee: Apache Spark
>Priority: Major
>
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced 
> `Scan#name()` API to expose the name of the TableScan in the `BatchScan` node 
> in SparkUI.
> However, a better suggestion was to use the `Table#name()`. Furthermore, we 
> can also extract other useful information `Table` thus revert 
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use 
> `Table` to fetch relevant information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40068) Extend new heartbeat mechanism to YARN

2022-08-12 Thread Kai-Hsun Chen (Jira)
Kai-Hsun Chen created SPARK-40068:
-

 Summary: Extend new heartbeat mechanism to YARN
 Key: SPARK-40068
 URL: https://issues.apache.org/jira/browse/SPARK-40068
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Kai-Hsun Chen


Extend the new heartbeat mechanism in SPARK-39984 to YARN.

 

SPARK-39984 issue:

[https://issues.apache.org/jira/projects/SPARK/issues/SPARK-39984?filter=allopenissues]

 

SPARK-39984 PR:

[https://github.com/apache/spark/pull/37411]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40067) Add table name to Spark plan node in SparkUI

2022-08-12 Thread Sumeet (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumeet updated SPARK-40067:
---
Description: 
[SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced 
`Scan#name()` API to expose the name of the TableScan in the `BatchScan` node 
in SparkUI.
However, a better suggestion was to use the `Table#name()`. Furthermore, we can 
also extract other useful information `Table` thus revert 
[SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use `Table` 
to fetch relevant information.

> Add table name to Spark plan node in SparkUI
> 
>
> Key: SPARK-40067
> URL: https://issues.apache.org/jira/browse/SPARK-40067
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.4.0
>Reporter: Sumeet
>Priority: Major
>
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced 
> `Scan#name()` API to expose the name of the TableScan in the `BatchScan` node 
> in SparkUI.
> However, a better suggestion was to use the `Table#name()`. Furthermore, we 
> can also extract other useful information `Table` thus revert 
> [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use 
> `Table` to fetch relevant information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40067) Add table name to Spark plan node in SparkUI

2022-08-12 Thread Sumeet (Jira)
Sumeet created SPARK-40067:
--

 Summary: Add table name to Spark plan node in SparkUI
 Key: SPARK-40067
 URL: https://issues.apache.org/jira/browse/SPARK-40067
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.4.0
Reporter: Sumeet






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579202#comment-17579202
 ] 

Apache Spark commented on SPARK-40065:
--

User 'nsuke' has created a pull request for this issue:
https://github.com/apache/spark/pull/37504

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40065:


Assignee: (was: Apache Spark)

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579201#comment-17579201
 ] 

Apache Spark commented on SPARK-40065:
--

User 'nsuke' has created a pull request for this issue:
https://github.com/apache/spark/pull/37504

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40065:


Assignee: Apache Spark

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Assignee: Apache Spark
>Priority: Minor
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-40065:
-
Description: 
When executor config map is made optional in SPARK-34316, mount volume is 
unconditionally disabled erroneously when non-default profile is used.

When spark.kubernetes.executor.disableConfigMap is false, expected behavior is 
that the ConfigMap is mounted regardless of executor's resource profile. 
However, it is not mounted if the resource profile is non-default.

  was:
When executor config map is made optional in SPARK-34316, mount volume is 
unconditionally disabled erroneously when non-default profile is used.

When spark.kubernetes.executor.disableConfigMap is false, expected behavior is 
that the ConfigMap is mounted regardless of executor's resource profile. 
However, it was not mounted if the resource profile is non-default.


> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it is not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-40065:
-
Description: 
When executor config map is made optional in SPARK-34316, mount volume is 
unconditionally disabled erroneously when non-default profile is used.

When spark.kubernetes.executor.disableConfigMap is false, expected behavior is 
that the ConfigMap is mounted regardless of executor's resource profile. 
However, it was not mounted if the resource profile is non-default.

  was:When the resource profile is non-default, executor configmap is not 
mounted even if spark.kubernetes.executor.disableConfigMap is false.


> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When executor config map is made optional in SPARK-34316, mount volume is 
> unconditionally disabled erroneously when non-default profile is used.
> When spark.kubernetes.executor.disableConfigMap is false, expected behavior 
> is that the ConfigMap is mounted regardless of executor's resource profile. 
> However, it was not mounted if the resource profile is non-default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40066) ANSI mode: always return null on invalid access to map column

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40066:


Assignee: Gengliang Wang  (was: Apache Spark)

> ANSI mode: always return null on invalid access to map column
> -
>
> Key: SPARK-40066
> URL: https://issues.apache.org/jira/browse/SPARK-40066
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Since https://github.com/apache/spark/pull/30386, Spark always throws an 
> error on invalid access to a map column. There is no such syntax in the ANSI 
> SQL standard since there is no Map type in it. There is a similar type 
> `multiset` which returns null on non-existing element access.
> Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns 
> null return on map(json) key not exists.
> I suggest loosen the the syntax here. When users get the error, most of them 
> will just use `try_element_at()` to get the same syntax or just turn off the 
> ANSI SQL mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40066) ANSI mode: always return null on invalid access to map column

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40066:


Assignee: Apache Spark  (was: Gengliang Wang)

> ANSI mode: always return null on invalid access to map column
> -
>
> Key: SPARK-40066
> URL: https://issues.apache.org/jira/browse/SPARK-40066
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Since https://github.com/apache/spark/pull/30386, Spark always throws an 
> error on invalid access to a map column. There is no such syntax in the ANSI 
> SQL standard since there is no Map type in it. There is a similar type 
> `multiset` which returns null on non-existing element access.
> Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns 
> null return on map(json) key not exists.
> I suggest loosen the the syntax here. When users get the error, most of them 
> will just use `try_element_at()` to get the same syntax or just turn off the 
> ANSI SQL mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40066) ANSI mode: always return null on invalid access to map column

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579200#comment-17579200
 ] 

Apache Spark commented on SPARK-40066:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37503

> ANSI mode: always return null on invalid access to map column
> -
>
> Key: SPARK-40066
> URL: https://issues.apache.org/jira/browse/SPARK-40066
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Since https://github.com/apache/spark/pull/30386, Spark always throws an 
> error on invalid access to a map column. There is no such syntax in the ANSI 
> SQL standard since there is no Map type in it. There is a similar type 
> `multiset` which returns null on non-existing element access.
> Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns 
> null return on map(json) key not exists.
> I suggest loosen the the syntax here. When users get the error, most of them 
> will just use `try_element_at()` to get the same syntax or just turn off the 
> ANSI SQL mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40066) ANSI mode: always return null on invalid access to map column

2022-08-12 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-40066:
---
Description: 
Since https://github.com/apache/spark/pull/30386, Spark always throws an error 
on invalid access to a map column. There is no such syntax in the ANSI SQL 
standard since there is no Map type in it. There is a similar type `multiset` 
which returns null on non-existing element access.
Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns null 
return on map(json) key not exists.
I suggest loosen the the syntax here. When users get the error, most of them 
will just use `try_element_at()` to get the same syntax or just turn off the 
ANSI SQL mode.

> ANSI mode: always return null on invalid access to map column
> -
>
> Key: SPARK-40066
> URL: https://issues.apache.org/jira/browse/SPARK-40066
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Since https://github.com/apache/spark/pull/30386, Spark always throws an 
> error on invalid access to a map column. There is no such syntax in the ANSI 
> SQL standard since there is no Map type in it. There is a similar type 
> `multiset` which returns null on non-existing element access.
> Also, I investigated PostgreSQL/Snowflake/Biguqery and all of them returns 
> null return on map(json) key not exists.
> I suggest loosen the the syntax here. When users get the error, most of them 
> will just use `try_element_at()` to get the same syntax or just turn off the 
> ANSI SQL mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-40065:
-
Description: When the resource profile is non-default, executor configmap 
is not mounted even if spark.kubernetes.executor.disableConfigMap is false.  
(was: When the resource profile is non-default, executor configmap is not 
created even if spark.kubernetes.executor.disableConfigMap is false.)

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When the resource profile is non-default, executor configmap is not mounted 
> even if spark.kubernetes.executor.disableConfigMap is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40065) Executor ConfigMap is not mounted if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-40065:
-
Summary: Executor ConfigMap is not mounted if profile is not default  (was: 
Executor ConfigMap is not created if profile is not default)

> Executor ConfigMap is not mounted if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When the resource profile is non-default, executor configmap is not created 
> even if spark.kubernetes.executor.disableConfigMap is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-40049.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37500
[https://github.com/apache/spark/pull/37500]

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.4.0
>
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-40049:
-

Assignee: Kazuyuki Tanimura

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40065) Executor ConfigMap is not created if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-40065:
-
Affects Version/s: 3.2.1
   3.2.0

> Executor ConfigMap is not created if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When the resource profile is non-default, executor configmap is not created 
> even if spark.kubernetes.executor.disableConfigMap is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40065) Executor ConfigMap is not created if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-40065:
-
Affects Version/s: 3.2.2

> Executor ConfigMap is not created if profile is not default
> ---
>
> Key: SPARK-40065
> URL: https://issues.apache.org/jira/browse/SPARK-40065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0, 3.2.2
>Reporter: Nobuaki Sukegawa
>Priority: Minor
>
> When the resource profile is non-default, executor configmap is not created 
> even if spark.kubernetes.executor.disableConfigMap is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40066) ANSI mode: always return null on invalid access to map column

2022-08-12 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-40066:
--

 Summary: ANSI mode: always return null on invalid access to map 
column
 Key: SPARK-40066
 URL: https://issues.apache.org/jira/browse/SPARK-40066
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40037) Upgrade com.google.crypto.tink:tink from 1.6.1 to 1.7.0

2022-08-12 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40037.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/37473

> Upgrade com.google.crypto.tink:tink from 1.6.1 to 1.7.0
> ---
>
> Key: SPARK-40037
> URL: https://issues.apache.org/jira/browse/SPARK-40037
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> [CVE-2022-25647|https://www.cve.org/CVERecord?id=CVE-2022-25647]
> [Info at 
> SNYK|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLECODEGSON-1730327]
> [CVE-2021-22569|https://www.cve.org/CVERecord?id=CVE-2021-22569]
> [Info at 
> SNYK|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-2331703]
> [releases log|https://github.com/google/tink/releases/tag/v1.7.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40037) Upgrade com.google.crypto.tink:tink from 1.6.1 to 1.7.0

2022-08-12 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40037:
-
Priority: Minor  (was: Major)

> Upgrade com.google.crypto.tink:tink from 1.6.1 to 1.7.0
> ---
>
> Key: SPARK-40037
> URL: https://issues.apache.org/jira/browse/SPARK-40037
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Minor
>
> [CVE-2022-25647|https://www.cve.org/CVERecord?id=CVE-2022-25647]
> [Info at 
> SNYK|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLECODEGSON-1730327]
> [CVE-2021-22569|https://www.cve.org/CVERecord?id=CVE-2021-22569]
> [Info at 
> SNYK|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-2331703]
> [releases log|https://github.com/google/tink/releases/tag/v1.7.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40037) Upgrade com.google.crypto.tink:tink from 1.6.1 to 1.7.0

2022-08-12 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40037:


Assignee: Bjørn Jørgensen

> Upgrade com.google.crypto.tink:tink from 1.6.1 to 1.7.0
> ---
>
> Key: SPARK-40037
> URL: https://issues.apache.org/jira/browse/SPARK-40037
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> [CVE-2022-25647|https://www.cve.org/CVERecord?id=CVE-2022-25647]
> [Info at 
> SNYK|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLECODEGSON-1730327]
> [CVE-2021-22569|https://www.cve.org/CVERecord?id=CVE-2021-22569]
> [Info at 
> SNYK|https://security.snyk.io/vuln/SNYK-JAVA-COMGOOGLEPROTOBUF-2331703]
> [releases log|https://github.com/google/tink/releases/tag/v1.7.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38969) Graceful decomissionning on Kubernetes fails / decom script error

2022-08-12 Thread Holden Karau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-38969.
--
Fix Version/s: 3.4.0
 Assignee: Holden Karau
   Resolution: Fixed

Updated decommissioning script to be more resilent and block as long as it 
takes on the executor to exit. K8s will still kill the pod if it exceeds the 
graceful shutdown time-limit so we don't have to worry too much about blocking 
forever there.

 

Also updated how we tag executor loss reasons for executors which decommission 
too "quickly"

 

See https://github.com/apache/spark/pull/36434/files

> Graceful decomissionning on Kubernetes fails / decom script error
> -
>
> Key: SPARK-38969
> URL: https://issues.apache.org/jira/browse/SPARK-38969
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
> Environment: Running spark-thriftserver (3.2.0) on Kubernetes (GKE 
> 1.20.15-gke.2500). 
>  
>Reporter: Yeachan Park
>Assignee: Holden Karau
>Priority: Minor
> Fix For: 3.4.0
>
>
> Hello, we are running into some issue while attempting graceful 
> decommissioning of executors. We enabled:
>  * spark.decommission.enabled 
>  * spark.storage.decommission.rddBlocks.enabled
>  * spark.storage.decommission.shuffleBlocks.enabled
>  * spark.storage.decommission.enabled
> and set spark.storage.decommission.fallbackStorage.path to a path in our 
> bucket.
>  
> The logs from the driver seems to suggest the decommissioning process started 
> but then unexpectedly exited and failed:
>  
> ```
> 22/04/20 15:09:09 WARN 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Received executor 
> 3 decommissioned message
> 22/04/20 15:09:09 INFO KubernetesClusterSchedulerBackend: Decommission 
> executors: 3
> 22/04/20 15:09:09 INFO BlockManagerMasterEndpoint: Mark BlockManagers 
> (BlockManagerId(3, 100.96.1.130, 44789, None)) as being decommissioning.
> 22/04/20 15:09:10 ERROR TaskSchedulerImpl: Lost executor 3 on 100.96.1.130: 
> Executor decommission.
> 22/04/20 15:09:10 INFO DAGScheduler: Executor lost: 3 (epoch 2)
> 22/04/20 15:09:10 INFO ExecutorMonitor: Executor 3 is removed. Remove reason 
> statistics: (gracefully decommissioned: 0, decommision unfinished: 0, driver 
> killed: 0, unexpectedly exited: 3).
> 22/04/20 15:09:10 INFO BlockManagerMasterEndpoint: Trying to remove executor 
> 3 from BlockManagerMaster.
> 22/04/20 15:09:10 INFO BlockManagerMasterEndpoint: Removing block manager 
> BlockManagerId(3, 100.96.1.130, 44789, None)
> 22/04/20 15:09:10 INFO BlockManagerMaster: Removed 3 successfully in 
> removeExecutor
> 22/04/20 15:09:10 INFO DAGScheduler: Shuffle files lost for executor: 3 
> (epoch 2)
> ```
>  
> However, the executor logs seem to suggest that decommissioning was 
> successful:
>  
> ```
> 22/04/20 15:09:09 INFO CoarseGrainedExecutorBackend: Decommission executor 3.
> 22/04/20 15:09:09 INFO CoarseGrainedExecutorBackend: Will exit when finished 
> decommissioning
> 22/04/20 15:09:09 INFO BlockManager: Starting block manager decommissioning 
> process...
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Starting block migration
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Attempting to migrate all 
> RDD blocks
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Attempting to migrate all 
> shuffle blocks
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Start refreshing 
> migratable shuffle blocks
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: 0 of 0 local shuffles are 
> added. In total, 0 shuffles are remained.
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Attempting to migrate all 
> cached RDD blocks
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Starting shuffle block 
> migration thread for BlockManagerId(4, 100.96.1.131, 35607, None)
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Starting shuffle block 
> migration thread for BlockManagerId(fallback, remote, 7337, None)
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Finished current round 
> refreshing migratable shuffle blocks, waiting for 3ms before the next 
> round refreshing.
> 22/04/20 15:09:10 WARN BlockManagerDecommissioner: Asked to decommission RDD 
> cache blocks, but no blocks to migrate
> 22/04/20 15:09:10 INFO BlockManagerDecommissioner: Finished current round RDD 
> blocks migration, waiting for 3ms before the next round migration.
> 22/04/20 15:09:10 INFO CoarseGrainedExecutorBackend: Checking to see if we 
> can shutdown.
> 22/04/20 15:09:10 INFO CoarseGrainedExecutorBackend: No running tasks, 
> checking migrations
> 22/04/20 15:09:10 INFO CoarseGrainedExecutorBackend: No running tasks, all 
> blocks migrated, stopping.
> 22/04/20 15:09:10 ERROR 

[jira] [Created] (SPARK-40065) Executor ConfigMap is not created if profile is not default

2022-08-12 Thread Nobuaki Sukegawa (Jira)
Nobuaki Sukegawa created SPARK-40065:


 Summary: Executor ConfigMap is not created if profile is not 
default
 Key: SPARK-40065
 URL: https://issues.apache.org/jira/browse/SPARK-40065
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Nobuaki Sukegawa


When the resource profile is non-default, executor configmap is not created 
even if spark.kubernetes.executor.disableConfigMap is false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-40052:


Assignee: Ivan Sadikov

> Handle direct byte buffers in VectorizedDeltaBinaryPackedReader
> ---
>
> Key: SPARK-40052
> URL: https://issues.apache.org/jira/browse/SPARK-40052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-40052.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37485
[https://github.com/apache/spark/pull/37485]

> Handle direct byte buffers in VectorizedDeltaBinaryPackedReader
> ---
>
> Key: SPARK-40052
> URL: https://issues.apache.org/jira/browse/SPARK-40052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40064:


Assignee: (was: Apache Spark)

> Use V2 Filter in SupportsOverwrite
> --
>
> Key: SPARK-40064
> URL: https://issues.apache.org/jira/browse/SPARK-40064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Priority: Major
>
>  Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40064:


Assignee: Apache Spark

> Use V2 Filter in SupportsOverwrite
> --
>
> Key: SPARK-40064
> URL: https://issues.apache.org/jira/browse/SPARK-40064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Major
>
>  Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579150#comment-17579150
 ] 

Apache Spark commented on SPARK-40064:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/37502

> Use V2 Filter in SupportsOverwrite
> --
>
> Key: SPARK-40064
> URL: https://issues.apache.org/jira/browse/SPARK-40064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Priority: Major
>
>  Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579149#comment-17579149
 ] 

Apache Spark commented on SPARK-40064:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/37502

> Use V2 Filter in SupportsOverwrite
> --
>
> Key: SPARK-40064
> URL: https://issues.apache.org/jira/browse/SPARK-40064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Priority: Major
>
>  Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-12 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-40064:
--

 Summary: Use V2 Filter in SupportsOverwrite
 Key: SPARK-40064
 URL: https://issues.apache.org/jira/browse/SPARK-40064
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


 Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39528) Use V2 Filter in SupportsRuntimeFiltering

2022-08-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39528:
---
Parent: SPARK-36555
Issue Type: Sub-task  (was: Improvement)

> Use V2 Filter in SupportsRuntimeFiltering
> -
>
> Key: SPARK-39528
> URL: https://issues.apache.org/jira/browse/SPARK-39528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, SupportsRuntimeFiltering uses v1 filter. We should use v2 filter 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39966) Use V2 Filter in SupportsDelete

2022-08-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39966:
---
Parent: SPARK-36555
Issue Type: Sub-task  (was: Improvement)

> Use V2 Filter in SupportsDelete
> ---
>
> Key: SPARK-39966
> URL: https://issues.apache.org/jira/browse/SPARK-39966
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1) {code}
A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.

Setting one column as index also didn't work.

  was:
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1) {code}
A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.


> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
> Environment: Databricks Runtime 11.1
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>  Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.
> Setting one column as index also didn't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
   Language: Python
Environment: Databricks Runtime 11.1
 Labels: Pandas PySpark  (was: )

> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
> Environment: Databricks Runtime 11.1
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>  Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39926) Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39926:


Assignee: (was: Apache Spark)

> Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans
> ---
>
> Key: SPARK-39926
> URL: https://issues.apache.org/jira/browse/SPARK-39926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>
> How to reproduce:
> {code:sql}
> set spark.sql.parquet.enableVectorizedReader=false;
> create table t(a int) using parquet;
> insert into t values (42);
> alter table t add column b int default 42;
> insert into t values (43, null);
> select * from t;
> {code}
> This should return two rows:
> (42, 42) and (43, NULL)
> But instead the scan misses the inserted NULL value, and returns the 
> existence DEFAULT value of "42" instead:
> (42, 42) and (43, 42).
>  
> This bug happens because the Parquet API calls one of these set* methods in 
> ParquetRowConverter.scala whenever it finds a non-NULL value:
> {code:scala}
> private class RowUpdater(row: InternalRow, ordinal: Int)
> extends ParentContainerUpdater {
>   override def set(value: Any): Unit = row(ordinal) = value
>   override def setBoolean(value: Boolean): Unit = row.setBoolean(ordinal, 
> value)
>   override def setByte(value: Byte): Unit = row.setByte(ordinal, value)
>   override def setShort(value: Short): Unit = row.setShort(ordinal, value)
>   override def setInt(value: Int): Unit = row.setInt(ordinal, value)
>   override def setLong(value: Long): Unit = row.setLong(ordinal, value)
>   override def setDouble(value: Double): Unit = row.setDouble(ordinal, value)
>   override def setFloat(value: Float): Unit = row.setFloat(ordinal, value)
> }
>  {code}
>  
> But it never calls anything like "setNull()" when encountering a NULL value.
> To fix the bug, we need to know how many columns of data were present in each 
> row of the Parquet data, so we can differentiate between a NULL value and a 
> missing column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39926) Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39926:


Assignee: Apache Spark

> Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans
> ---
>
> Key: SPARK-39926
> URL: https://issues.apache.org/jira/browse/SPARK-39926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>
> How to reproduce:
> {code:sql}
> set spark.sql.parquet.enableVectorizedReader=false;
> create table t(a int) using parquet;
> insert into t values (42);
> alter table t add column b int default 42;
> insert into t values (43, null);
> select * from t;
> {code}
> This should return two rows:
> (42, 42) and (43, NULL)
> But instead the scan misses the inserted NULL value, and returns the 
> existence DEFAULT value of "42" instead:
> (42, 42) and (43, 42).
>  
> This bug happens because the Parquet API calls one of these set* methods in 
> ParquetRowConverter.scala whenever it finds a non-NULL value:
> {code:scala}
> private class RowUpdater(row: InternalRow, ordinal: Int)
> extends ParentContainerUpdater {
>   override def set(value: Any): Unit = row(ordinal) = value
>   override def setBoolean(value: Boolean): Unit = row.setBoolean(ordinal, 
> value)
>   override def setByte(value: Byte): Unit = row.setByte(ordinal, value)
>   override def setShort(value: Short): Unit = row.setShort(ordinal, value)
>   override def setInt(value: Int): Unit = row.setInt(ordinal, value)
>   override def setLong(value: Long): Unit = row.setLong(ordinal, value)
>   override def setDouble(value: Double): Unit = row.setDouble(ordinal, value)
>   override def setFloat(value: Float): Unit = row.setFloat(ordinal, value)
> }
>  {code}
>  
> But it never calls anything like "setNull()" when encountering a NULL value.
> To fix the bug, we need to know how many columns of data were present in each 
> row of the Parquet data, so we can differentiate between a NULL value and a 
> missing column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39926) Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579142#comment-17579142
 ] 

Apache Spark commented on SPARK-39926:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/37501

> Fix bug in existence DEFAULT value lookups for non-vectorized Parquet scans
> ---
>
> Key: SPARK-39926
> URL: https://issues.apache.org/jira/browse/SPARK-39926
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>
> How to reproduce:
> {code:sql}
> set spark.sql.parquet.enableVectorizedReader=false;
> create table t(a int) using parquet;
> insert into t values (42);
> alter table t add column b int default 42;
> insert into t values (43, null);
> select * from t;
> {code}
> This should return two rows:
> (42, 42) and (43, NULL)
> But instead the scan misses the inserted NULL value, and returns the 
> existence DEFAULT value of "42" instead:
> (42, 42) and (43, 42).
>  
> This bug happens because the Parquet API calls one of these set* methods in 
> ParquetRowConverter.scala whenever it finds a non-NULL value:
> {code:scala}
> private class RowUpdater(row: InternalRow, ordinal: Int)
> extends ParentContainerUpdater {
>   override def set(value: Any): Unit = row(ordinal) = value
>   override def setBoolean(value: Boolean): Unit = row.setBoolean(ordinal, 
> value)
>   override def setByte(value: Byte): Unit = row.setByte(ordinal, value)
>   override def setShort(value: Short): Unit = row.setShort(ordinal, value)
>   override def setInt(value: Int): Unit = row.setInt(ordinal, value)
>   override def setLong(value: Long): Unit = row.setLong(ordinal, value)
>   override def setDouble(value: Double): Unit = row.setDouble(ordinal, value)
>   override def setFloat(value: Float): Unit = row.setFloat(ordinal, value)
> }
>  {code}
>  
> But it never calls anything like "setNull()" when encountering a NULL value.
> To fix the bug, we need to know how many columns of data were present in each 
> row of the Parquet data, so we can differentiate between a NULL value and a 
> missing column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1) {code}
A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.

  was:
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1) {code}
 

A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.


> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2 

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1) {code}
 

A workaround is to assign the results to a new column instead of the same one, 
but if the old column is dropped, the same error is produced.

  was:
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1){code}


> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1) {code}
>  
> A workaround is to assign the results to a new column instead of the same 
> one, but if the old column is dropped, the same error is produced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Summary: pyspark.pandas .apply() changing rows ordering  (was: 
pyspark.pandas .apply() chaging rows ordering)

> pyspark.pandas .apply() changing rows ordering
> --
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() chaging rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
def example_func(df_col):
  return df_col ** 2

df['row_to_apply_function'] = df.apply(lambda row: 
example_func(row['row_to_apply_function']), axis=1){code}

  was:
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
df['row_to_apply_function'] = df.apply(lambda row: 
func(row['row_to_apply_function']), axis=1){code}


> pyspark.pandas .apply() chaging rows ordering
> -
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2
> df['row_to_apply_function'] = df.apply(lambda row: 
> example_func(row['row_to_apply_function']), axis=1){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() chaging rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

 

A command like this:


{code:java}
df['row_to_apply_function'] = df.apply(lambda row: 
func(row['row_to_apply_function']), axis=1){code}

> pyspark.pandas .apply() chaging rows ordering
> -
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
>  
> A command like this:
> {code:java}
> df['row_to_apply_function'] = df.apply(lambda row: 
> func(row['row_to_apply_function']), axis=1){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() chaging rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Description: 
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

A command like this:
{code:java}
df['row_to_apply_function'] = df.apply(lambda row: 
func(row['row_to_apply_function']), axis=1){code}

  was:
When using the apply function to apply a function to a DataFrame column, it 
ends up mixing the column's rows ordering.

 

A command like this:


{code:java}
df['row_to_apply_function'] = df.apply(lambda row: 
func(row['row_to_apply_function']), axis=1){code}


> pyspark.pandas .apply() chaging rows ordering
> -
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>
> When using the apply function to apply a function to a DataFrame column, it 
> ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> df['row_to_apply_function'] = df.apply(lambda row: 
> func(row['row_to_apply_function']), axis=1){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40049:


Assignee: Apache Spark

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Apache Spark
>Priority: Minor
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579138#comment-17579138
 ] 

Apache Spark commented on SPARK-40049:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/37500

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40049) Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40049:


Assignee: (was: Apache Spark)

> Add adaptive plan case in ReplaceNullWithFalseInPredicateEndToEndSuite
> --
>
> Key: SPARK-40049
> URL: https://issues.apache.org/jira/browse/SPARK-40049
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Currently `ReplaceNullWithFalseInPredicateEndToEndSuite` assumes that 
> adaptive query execution is turned off. We should add cases 
> `spark.sql.adaptive.forceApply=true`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40063) pyspark.pandas .apply() chaging rows ordering

2022-08-12 Thread Marcelo Rossini Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Rossini Castro updated SPARK-40063:
---
Summary: pyspark.pandas .apply() chaging rows ordering  (was: 
pyspark.pandas .apply() chaging rows order)

> pyspark.pandas .apply() chaging rows ordering
> -
>
> Key: SPARK-40063
> URL: https://issues.apache.org/jira/browse/SPARK-40063
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.3.0
>Reporter: Marcelo Rossini Castro
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40063) pyspark.pandas .apply() chaging rows order

2022-08-12 Thread Marcelo Rossini Castro (Jira)
Marcelo Rossini Castro created SPARK-40063:
--

 Summary: pyspark.pandas .apply() chaging rows order
 Key: SPARK-40063
 URL: https://issues.apache.org/jira/browse/SPARK-40063
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark
Affects Versions: 3.3.0
Reporter: Marcelo Rossini Castro






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40062) Spark - Creating Sub Folder while writing to Partitioned Hive Table

2022-08-12 Thread dinesh (Jira)
dinesh created SPARK-40062:
--

 Summary: Spark - Creating Sub Folder while writing to Partitioned 
Hive Table
 Key: SPARK-40062
 URL: https://issues.apache.org/jira/browse/SPARK-40062
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.4.7
Reporter: dinesh


We had been writing to a Partitioned Hive Table and realized that data is being 
written has sub-folder.

For ex- Refer Table definition as below - 

_Create table T1 ( name string, address string) Partitioned by (process_date 
string) stored as parquet location '/mytable/a/b/c/org=employee';_

 

While writing to table HDFS path being written looks something like this - 

{_}/mytable/a/b/c/org=employee/{_}{_}process_date=20220812/{_}{color:#de350b}_org=employee_{color}

 

The unnecessary addition of  _org=employee_ after process_date partition is 
because Hive Table has location consisting "=" operator, which Hive uses as 
syntax to determine partition column.

Re-defining Table resolves above problem - 

_Create table T1 ( name string, address string) Partitioned by (process_date 
string) stored as parquet location '/mytable/a/b/c/employee';_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40061) Document cast of ANSI intervals

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40061:


Assignee: Max Gekk  (was: Apache Spark)

> Document cast of ANSI intervals
> ---
>
> Key: SPARK-40061
> URL: https://issues.apache.org/jira/browse/SPARK-40061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Update the doc page 
> https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#cast 
> regarding cast of ANSI intervals to/from decimals/integrals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40061) Document cast of ANSI intervals

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579096#comment-17579096
 ] 

Apache Spark commented on SPARK-40061:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/37495

> Document cast of ANSI intervals
> ---
>
> Key: SPARK-40061
> URL: https://issues.apache.org/jira/browse/SPARK-40061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Update the doc page 
> https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#cast 
> regarding cast of ANSI intervals to/from decimals/integrals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40061) Document cast of ANSI intervals

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40061:


Assignee: Apache Spark  (was: Max Gekk)

> Document cast of ANSI intervals
> ---
>
> Key: SPARK-40061
> URL: https://issues.apache.org/jira/browse/SPARK-40061
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Update the doc page 
> https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#cast 
> regarding cast of ANSI intervals to/from decimals/integrals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40061) Document cast of ANSI intervals

2022-08-12 Thread Max Gekk (Jira)
Max Gekk created SPARK-40061:


 Summary: Document cast of ANSI intervals
 Key: SPARK-40061
 URL: https://issues.apache.org/jira/browse/SPARK-40061
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


Update the doc page 
https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#cast 
regarding cast of ANSI intervals to/from decimals/integrals.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread ZiyueGuan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZiyueGuan updated SPARK-40058:
--
Affects Version/s: 3.4.0
   (was: 3.2.2)

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: ZiyueGuan
>Priority: Minor
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40060) Add numberDecommissioningExecutors metric

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40060:


Assignee: Apache Spark

> Add numberDecommissioningExecutors metric
> -
>
> Key: SPARK-40060
> URL: https://issues.apache.org/jira/browse/SPARK-40060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Assignee: Apache Spark
>Priority: Minor
>
> The num of decommissioning executor should exposed as metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40060) Add numberDecommissioningExecutors metric

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40060:


Assignee: (was: Apache Spark)

> Add numberDecommissioningExecutors metric
> -
>
> Key: SPARK-40060
> URL: https://issues.apache.org/jira/browse/SPARK-40060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> The num of decommissioning executor should exposed as metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40060) Add numberDecommissioningExecutors metric

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579058#comment-17579058
 ] 

Apache Spark commented on SPARK-40060:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/37499

> Add numberDecommissioningExecutors metric
> -
>
> Key: SPARK-40060
> URL: https://issues.apache.org/jira/browse/SPARK-40060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> The num of decommissioning executor should exposed as metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40060) Add numberDecommissioningExecutors metric

2022-08-12 Thread Zhongwei Zhu (Jira)
Zhongwei Zhu created SPARK-40060:


 Summary: Add numberDecommissioningExecutors metric
 Key: SPARK-40060
 URL: https://issues.apache.org/jira/browse/SPARK-40060
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Zhongwei Zhu


The num of decommissioning executor should exposed as metric



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40054) Restore the error handling syntax of try_cast()

2022-08-12 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-40054.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37486
[https://github.com/apache/spark/pull/37486]

> Restore the error handling syntax of try_cast()
> ---
>
> Key: SPARK-40054
> URL: https://issues.apache.org/jira/browse/SPARK-40054
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> For the following query
> {code:java}
> SET spark.sql.ansi.enabled=true;
> SELECT try_cast(1/0 AS string); {code}
> Spark 3.3 will throw an exception for the division by zero error. In current 
> master branch, it returns null after the refactoring PR 
> https://github.com/apache/spark/pull/36703
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579047#comment-17579047
 ] 

Apache Spark commented on SPARK-40058:
--

User 'guanziyue' has created a pull request for this issue:
https://github.com/apache/spark/pull/37498

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: ZiyueGuan
>Priority: Minor
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40058:


Assignee: (was: Apache Spark)

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: ZiyueGuan
>Priority: Minor
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40058:


Assignee: Apache Spark

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: ZiyueGuan
>Assignee: Apache Spark
>Priority: Minor
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579048#comment-17579048
 ] 

Apache Spark commented on SPARK-40058:
--

User 'guanziyue' has created a pull request for this issue:
https://github.com/apache/spark/pull/37498

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: ZiyueGuan
>Priority: Minor
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40056) Upgrade mvn-scalafmt from 1.0.4 to 1.1.1640084764.9f463a9

2022-08-12 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40056:


Assignee: BingKun Pan

> Upgrade mvn-scalafmt from 1.0.4 to 1.1.1640084764.9f463a9
> -
>
> Key: SPARK-40056
> URL: https://issues.apache.org/jira/browse/SPARK-40056
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40056) Upgrade mvn-scalafmt from 1.0.4 to 1.1.1640084764.9f463a9

2022-08-12 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40056:
-
Priority: Trivial  (was: Minor)

> Upgrade mvn-scalafmt from 1.0.4 to 1.1.1640084764.9f463a9
> -
>
> Key: SPARK-40056
> URL: https://issues.apache.org/jira/browse/SPARK-40056
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40056) Upgrade mvn-scalafmt from 1.0.4 to 1.1.1640084764.9f463a9

2022-08-12 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40056.
--
Resolution: Fixed

Issue resolved by pull request 37489
[https://github.com/apache/spark/pull/37489]

> Upgrade mvn-scalafmt from 1.0.4 to 1.1.1640084764.9f463a9
> -
>
> Key: SPARK-40056
> URL: https://issues.apache.org/jira/browse/SPARK-40056
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40020) centralize the code of qualifying identifiers in SessionCatalog

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579013#comment-17579013
 ] 

Apache Spark commented on SPARK-40020:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37497

> centralize the code of qualifying identifiers in SessionCatalog
> ---
>
> Key: SPARK-40020
> URL: https://issues.apache.org/jira/browse/SPARK-40020
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40020) centralize the code of qualifying identifiers in SessionCatalog

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579014#comment-17579014
 ] 

Apache Spark commented on SPARK-40020:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/37497

> centralize the code of qualifying identifiers in SessionCatalog
> ---
>
> Key: SPARK-40020
> URL: https://issues.apache.org/jira/browse/SPARK-40020
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40059) Row indexes can overshadow user-created data

2022-08-12 Thread Ala Luszczak (Jira)
Ala Luszczak created SPARK-40059:


 Summary: Row indexes can overshadow user-created data
 Key: SPARK-40059
 URL: https://issues.apache.org/jira/browse/SPARK-40059
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Ala Luszczak


https://github.com/apache/spark/pull/37228 introduces ability to compute row 
indexes, which users can access through `_metadata.row_index` column. 
Internally this is achieved with the help of an extra column 
`_tmp_metadata_row_index`. When present in the schema sent to parquet reader, 
the reader populates it with row indexes, and the values are later placed in 
the `_metadata` struct. 

While relatively unlikely, it's still possible, that a user might want to 
include column `_tmp_metadata_row_index` in their data. In such scenario, the 
column will be populated with row indexes, rather than data read from the file.

For repro, search `FileMetadataStructRowIndexSuite.scala` for this Jira ticket 
number.

We could introduce some kind of countermeasure to handle this scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread ZiyueGuan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZiyueGuan updated SPARK-40058:
--
Component/s: Spark Core
 (was: SQL)

> Avoid filter twice in HadoopFSUtils
> ---
>
> Key: SPARK-40058
> URL: https://issues.apache.org/jira/browse/SPARK-40058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.2
>Reporter: ZiyueGuan
>Priority: Minor
>
> In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
> method call. This may waste more time when filter logic is heavy. Would like 
> to have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40057.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37492
[https://github.com/apache/spark/pull/37492]

> Cleanup "" in doctest
> 
>
> Key: SPARK-40057
> URL: https://issues.apache.org/jira/browse/SPARK-40057
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40057:


Assignee: Yikun Jiang

> Cleanup "" in doctest
> 
>
> Key: SPARK-40057
> URL: https://issues.apache.org/jira/browse/SPARK-40057
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39887) Expression transform error

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578973#comment-17578973
 ] 

Apache Spark commented on SPARK-39887:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/37496

> Expression transform error
> --
>
> Key: SPARK-39887
> URL: https://issues.apache.org/jira/browse/SPARK-39887
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.2.2
>Reporter: zhuml
>Priority: Major
>
> {code:java}
> spark.sql(
>   """
> |select to_date(a) a, to_date(b) b from
> |(select  a, a as b from
> |(select to_date(a) a from
> | values ('2020-02-01') as t1(a)
> | group by to_date(a)) t3
> |union all
> |select a, b from
> |(select to_date(a) a, to_date(b) b from
> |values ('2020-01-01','2020-01-02') as t1(a, b)
> | group by to_date(a), to_date(b)) t4) t5
> |group by to_date(a), to_date(b)
> |""".stripMargin).show(){code}
> result is (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-01)
> expected (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-02)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40013) DS V2 expressions should have the default toString

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40013:


Assignee: Apache Spark

> DS V2 expressions should have the default toString
> --
>
> Key: SPARK-40013
> URL: https://issues.apache.org/jira/browse/SPARK-40013
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Currently, V2 expressions missing the default toString and lead to unexpected 
> result.
> We should add a default implementation in the base class Expression using 
> ToStringSQLBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40013) DS V2 expressions should have the default toString

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40013:


Assignee: (was: Apache Spark)

> DS V2 expressions should have the default toString
> --
>
> Key: SPARK-40013
> URL: https://issues.apache.org/jira/browse/SPARK-40013
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, V2 expressions missing the default toString and lead to unexpected 
> result.
> We should add a default implementation in the base class Expression using 
> ToStringSQLBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40013) DS V2 expressions should have the default toString

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40013:


Assignee: Apache Spark

> DS V2 expressions should have the default toString
> --
>
> Key: SPARK-40013
> URL: https://issues.apache.org/jira/browse/SPARK-40013
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Currently, V2 expressions missing the default toString and lead to unexpected 
> result.
> We should add a default implementation in the base class Expression using 
> ToStringSQLBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40013) DS V2 expressions should have the default toString

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578936#comment-17578936
 ] 

Apache Spark commented on SPARK-40013:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/37494

> DS V2 expressions should have the default toString
> --
>
> Key: SPARK-40013
> URL: https://issues.apache.org/jira/browse/SPARK-40013
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, V2 expressions missing the default toString and lead to unexpected 
> result.
> We should add a default implementation in the base class Expression using 
> ToStringSQLBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40013) DS V2 expressions should have the default toString

2022-08-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-40013:
---
Summary: DS V2 expressions should have the default toString  (was: DS V2 
expressions should have the default implementation of toString)

> DS V2 expressions should have the default toString
> --
>
> Key: SPARK-40013
> URL: https://issues.apache.org/jira/browse/SPARK-40013
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, V2 expressions missing the default toString and lead to unexpected 
> result.
> We should add a default implementation in the base class Expression using 
> ToStringSQLBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40013) DS V2 expressions should have the default implementation of toString

2022-08-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng reopened SPARK-40013:


> DS V2 expressions should have the default implementation of toString
> 
>
> Key: SPARK-40013
> URL: https://issues.apache.org/jira/browse/SPARK-40013
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, V2 expressions missing the default toString and lead to unexpected 
> result.
> We should add a default implementation in the base class Expression using 
> ToStringSQLBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40014) Support cast of decimals to ANSI intervals

2022-08-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40014.
--
Resolution: Fixed

Issue resolved by pull request 37466
[https://github.com/apache/spark/pull/37466]

> Support cast of decimals to ANSI intervals
> --
>
> Key: SPARK-40014
> URL: https://issues.apache.org/jira/browse/SPARK-40014
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Support casts of decimal to ANSI intervals, and preserve the fractional parts 
> of seconds in the casts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40014) Support cast of decimals to ANSI intervals

2022-08-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40014:


Assignee: Max Gekk

> Support cast of decimals to ANSI intervals
> --
>
> Key: SPARK-40014
> URL: https://issues.apache.org/jira/browse/SPARK-40014
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Support casts of decimal to ANSI intervals, and preserve the fractional parts 
> of seconds in the casts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40058) Avoid filter twice in HadoopFSUtils

2022-08-12 Thread ZiyueGuan (Jira)
ZiyueGuan created SPARK-40058:
-

 Summary: Avoid filter twice in HadoopFSUtils
 Key: SPARK-40058
 URL: https://issues.apache.org/jira/browse/SPARK-40058
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.2
Reporter: ZiyueGuan


In HadoopFSUtils, listLeafFiles will apply filter more than once in recursive 
method call. This may waste more time when filter logic is heavy. Would like to 
have a refactor on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37690) Recursive view `df` detected (cycle: `df` -> `df`)

2022-08-12 Thread Daniel Darabos (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578901#comment-17578901
 ] 

Daniel Darabos commented on SPARK-37690:


It's fixed in Spark 3.3.0. 
(https://github.com/apache/spark/commit/1d068cef38f2323967be83045118cef0e537e8dc)
 Does upgrading count as a workaround?

Or on 3.2 you can avoid the cycle error by saving the new table under a new 
name. 

> Recursive view `df` detected (cycle: `df` -> `df`)
> --
>
> Key: SPARK-37690
> URL: https://issues.apache.org/jira/browse/SPARK-37690
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Robin
>Priority: Major
>
> In Spark 3.2.0, you can no longer reuse the same name for a temporary view.  
> This change is backwards incompatible, and means a common way of running 
> pipelines of SQL queries no longer works.   The following is a simple 
> reproducible example that works in Spark 2.x and 3.1.2, but not in 3.2.0: 
> {code:python}from pyspark.context import SparkContext 
> from pyspark.sql import SparkSession 
> sc = SparkContext.getOrCreate() 
> spark = SparkSession(sc) 
> sql = """ SELECT id as col_1, rand() AS col_2 FROM RANGE(10); """ 
> df = spark.sql(sql) 
> df.createOrReplaceTempView("df") 
> sql = """ SELECT * FROM df """ 
> df = spark.sql(sql) 
> df.createOrReplaceTempView("df") 
> sql = """ SELECT * FROM df """ 
> df = spark.sql(sql) {code}   
> The following error is now produced:   
> {code:python}AnalysisException: Recursive view `df` detected (cycle: `df` -> 
> `df`) 
> {code} 
> I'm reasonably sure this change is unintentional in 3.2.0 since it breaks a 
> lot of legacy code, and the `createOrReplaceTempView` method is named 
> explicitly such that replacing an existing view should be allowed.   An 
> internet search suggests other users have run into a similar problems, e.g. 
> [here|https://community.databricks.com/s/question/0D53f1Qugr7CAB/upgrading-from-spark-24-to-32-recursive-view-errors-when-using]
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40057:


Assignee: Apache Spark

> Cleanup "" in doctest
> 
>
> Key: SPARK-40057
> URL: https://issues.apache.org/jira/browse/SPARK-40057
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578886#comment-17578886
 ] 

Apache Spark commented on SPARK-40057:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37492

> Cleanup "" in doctest
> 
>
> Key: SPARK-40057
> URL: https://issues.apache.org/jira/browse/SPARK-40057
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578885#comment-17578885
 ] 

Apache Spark commented on SPARK-40057:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/37492

> Cleanup "" in doctest
> 
>
> Key: SPARK-40057
> URL: https://issues.apache.org/jira/browse/SPARK-40057
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40057:


Assignee: (was: Apache Spark)

> Cleanup "" in doctest
> 
>
> Key: SPARK-40057
> URL: https://issues.apache.org/jira/browse/SPARK-40057
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40057) Cleanup "" in doctest

2022-08-12 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-40057:
---

 Summary: Cleanup "" in doctest
 Key: SPARK-40057
 URL: https://issues.apache.org/jira/browse/SPARK-40057
 Project: Spark
  Issue Type: Bug
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Yikun Jiang


https://github.com/apache/spark/pull/37465#discussion_r943080421



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39887) Expression transform error

2022-08-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578855#comment-17578855
 ] 

Apache Spark commented on SPARK-39887:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/37491

> Expression transform error
> --
>
> Key: SPARK-39887
> URL: https://issues.apache.org/jira/browse/SPARK-39887
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.2.2
>Reporter: zhuml
>Priority: Major
>
> {code:java}
> spark.sql(
>   """
> |select to_date(a) a, to_date(b) b from
> |(select  a, a as b from
> |(select to_date(a) a from
> | values ('2020-02-01') as t1(a)
> | group by to_date(a)) t3
> |union all
> |select a, b from
> |(select to_date(a) a, to_date(b) b from
> |values ('2020-01-01','2020-01-02') as t1(a, b)
> | group by to_date(a), to_date(b)) t4) t5
> |group by to_date(a), to_date(b)
> |""".stripMargin).show(){code}
> result is (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-01)
> expected (2020-02-01, 2020-02-01), (2020-01-01, 2020-01-02)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >