[jira] [Resolved] (SPARK-44555) Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44555.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42169
[https://github.com/apache/spark/pull/42169]

> Use checkError() to check Exception in command Suite & assign some error 
> class names
> 
>
> Key: SPARK-44555
> URL: https://issues.apache.org/jira/browse/SPARK-44555
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44555) Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-44555:


Assignee: BingKun Pan

> Use checkError() to check Exception in command Suite & assign some error 
> class names
> 
>
> Key: SPARK-44555
> URL: https://issues.apache.org/jira/browse/SPARK-44555
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44632) DiskBlockManager should check and be able to handle stale directories

2023-08-01 Thread Kent Yao (Jira)
Kent Yao created SPARK-44632:


 Summary: DiskBlockManager should check and be able to handle stale 
directories
 Key: SPARK-44632
 URL: https://issues.apache.org/jira/browse/SPARK-44632
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.1, 3.5.0
Reporter: Kent Yao


The subDir in the memory cache could be stale, for example, after a damaged 
disk repair or replacement. This dir could be accessed subsequently by others. 
Especially,  `filename` generated by `RDDBlockId` is unchanged between task 
reties, so it probably attempts to access the same subDir repeatedly. 
Therefore, it is necessary to check if the subDir exists. If it is stale and 
the hardware has been recovered without data and directories, we will recreate 
the subDir to prevent FileNotFoundException during writing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44631) Remove session-based directory when the isolated session cache is evicted

2023-08-01 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-44631:


 Summary: Remove session-based directory when the isolated session 
cache is evicted
 Key: SPARK-44631
 URL: https://issues.apache.org/jira/browse/SPARK-44631
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.5.0
Reporter: Hyukjin Kwon


SPARK-44078 added the cache for isolated sessions, and SPARK-44348 added the 
session-based directory for isolation.

 

When the isolated session cache is evicted, we should remove the session-based 
directory so it doesn't fail when the same session is used, see also 
https://github.com/apache/spark/pull/41625#discussion_r1251427466



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44588:
--
Fix Version/s: 3.3.3

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Assignee: Henry Mai
>Priority: Critical
> Fix For: 3.3.3, 3.4.2, 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44588:
--
Fix Version/s: 3.4.2

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Assignee: Henry Mai
>Priority: Critical
> Fix For: 3.4.2, 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44600) Make `repl` module daily test pass

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44600:
-
Description: 
[https://github.com/apache/spark/actions/runs/5727123477/job/15518895421]

 
{code:java}
- SPARK-15236: use Hive catalog *** FAILED ***
18137  isContain was true Interpreter output contained 'Exception':
18138  Welcome to
18139  __
18140   / __/__  ___ _/ /__
18141  _\ \/ _ \/ _ `/ __/  '_/
18142 /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
18143/_/
18144   
18145  Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 1.8.0_372)
18146  Type in expressions to have them evaluated.
18147  Type :help for more information.
18148  
18149  scala> 
18150  scala> java.lang.NoClassDefFoundError: 
org/sparkproject/guava/cache/CacheBuilder
18151at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:197)
18152at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:153)
18153at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog(BaseSessionStateBuilder.scala:152)
18154at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog$lzycompute(BaseSessionStateBuilder.scala:166)
18155at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.v2SessionCatalog(BaseSessionStateBuilder.scala:166)
18156at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager$lzycompute(BaseSessionStateBuilder.scala:168)
18157at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.catalogManager(BaseSessionStateBuilder.scala:168)
18158at 
org.apache.spark.sql.internal.BaseSessionStateBuilder$$anon$1.(BaseSessionStateBuilder.scala:185)
18159at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer(BaseSessionStateBuilder.scala:185)
18160at 
org.apache.spark.sql.internal.BaseSessionStateBuilder.$anonfun$build$2(BaseSessionStateBuilder.scala:374)
18161at 
org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:92)
18162at 
org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:92)
18163at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
18164at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
18165at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
18166at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
18167at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
18168at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
18169at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
18170at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
18171at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
18172at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
18173at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
18174at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
18175at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
18176at 
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
18177at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
18178at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
18179at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
18180at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
18181... 100 elided
18182  Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.guava.cache.CacheBuilder
18183at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
18184at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
18185at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
18186at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
18187... 130 more
18188  
18189  scala>  | 
18190  scala> :quit (ReplSuite.scala:83) {code}

> Make `repl` module daily test pass
> --
>
> Key: SPARK-44600
> URL: https://issues.apache.org/jira/browse/SPARK-44600
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/5727123477/job/15518895421]
>  
> {code:java}
> - SPARK-15236: use Hive catalog *** FAILED ***
> 18137  isContain was true Interpreter output contained 'Exception':
> 18138  Welcome to
> 18139 

[jira] [Resolved] (SPARK-44607) Remove unused function `containsNestedColumn` from `Filter`

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44607.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42239
[https://github.com/apache/spark/pull/42239]

> Remove unused function `containsNestedColumn` from `Filter`
> ---
>
> Key: SPARK-44607
> URL: https://issues.apache.org/jira/browse/SPARK-44607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44607) Remove unused function `containsNestedColumn` from `Filter`

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44607:


Assignee: Yang Jie

> Remove unused function `containsNestedColumn` from `Filter`
> ---
>
> Key: SPARK-44607
> URL: https://issues.apache.org/jira/browse/SPARK-44607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44630) Revert SPARK-43043 Improve the performance of MapOutputTracker.updateMapOutput

2023-08-01 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44630:
-

 Summary: Revert SPARK-43043 Improve the performance of 
MapOutputTracker.updateMapOutput
 Key: SPARK-44630
 URL: https://issues.apache.org/jira/browse/SPARK-44630
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44629) Publish PySpark Test Guidelines webpage

2023-08-01 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44629:
--

 Summary: Publish PySpark Test Guidelines webpage
 Key: SPARK-44629
 URL: https://issues.apache.org/jira/browse/SPARK-44629
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43241) MultiIndex.append not checking names for equality

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43241:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> MultiIndex.append not checking names for equality
> -
>
> Key: SPARK-43241
> URL: https://issues.apache.org/jira/browse/SPARK-43241
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> To match the behavior with pandas: 
> https://github.com/pandas-dev/pandas/pull/48288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42621) Add `inclusive` parameter for date_range

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42621:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add `inclusive` parameter for date_range
> 
>
> Key: SPARK-42621
> URL: https://issues.apache.org/jira/browse/SPARK-42621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/issues/40245



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42620) Add `inclusive` parameter for (DataFrame|Series).between_time

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42620:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add `inclusive` parameter for (DataFrame|Series).between_time
> -
>
> Key: SPARK-42620
> URL: https://issues.apache.org/jira/browse/SPARK-42620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/43248



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43194) PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43194:

Affects Version/s: 4.0.0
   (was: 3.4.0)

> PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0
> --
>
> Key: SPARK-43194
> URL: https://issues.apache.org/jira/browse/SPARK-43194
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
> Environment: {code}
> In [4]: import pandas as pd
> In [5]: pd.__version__
> Out[5]: '2.0.0'
> In [6]: import pyspark as ps
> In [7]: ps.__version__
> Out[7]: '3.4.0'
> {code}
>Reporter: Phillip Cloud
>Priority: Major
>
> {code}
> In [1]: from pyspark.sql import SparkSession
> In [2]: session = SparkSession.builder.appName("test").getOrCreate()
> 23/04/19 09:21:42 WARN Utils: Your hostname, albatross resolves to a loopback 
> address: 127.0.0.2; using 192.168.1.170 instead (on interface enp5s0)
> 23/04/19 09:21:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/04/19 09:21:42 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> In [3]: session.sql("select now()").toPandas()
> {code}
> Results in:
> {code}
> ...
> TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass 
> e.g. 'datetime64[ns]' instead.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42619) Add `show_counts` parameter for DataFrame.info

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42619:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add `show_counts` parameter for DataFrame.info
> --
>
> Key: SPARK-42619
> URL: https://issues.apache.org/jira/browse/SPARK-42619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/37999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42617) Support `isocalendar`

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42617:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43271) Match behavior with DataFrame.reindex with specifying `index`.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43271:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Match behavior with DataFrame.reindex with specifying `index`.
> --
>
> Key: SPARK-43271
> URL: https://issues.apache.org/jira/browse/SPARK-43271
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Re-enable pandas 2.0.0 test in DataFrameTests.test_reindex in proper way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43451) Enable RollingTests.test_rolling_count for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43451:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable RollingTests.test_rolling_count for pandas 2.0.0.
> 
>
> Key: SPARK-43451
> URL: https://issues.apache.org/jira/browse/SPARK-43451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable RollingTests.test_rolling_count for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43282) Investigate DataFrame.sort_values with pandas behavior.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43282:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Investigate DataFrame.sort_values with pandas behavior.
> ---
>
> Key: SPARK-43282
> URL: https://issues.apache.org/jira/browse/SPARK-43282
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> {code:java}
> import pandas as pd
> pdf = pd.DataFrame(
>     {
>         "a": pd.Categorical([1, 2, 3, 1, 2, 3]),
>         "b": pd.Categorical(
>             ["b", "a", "c", "c", "b", "a"], categories=["c", "b", "d", "a"]
>         ),
>     },
> )
> pdf.groupby("a").apply(lambda x: x).sort_values(["a"])
> Traceback (most recent call last):
> ...
> ValueError: 'a' is both an index level and a column label, which is 
> ambiguous. {code}
> We should investigate this issue whether this is intended behavior or just 
> bug in pandas.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43245) Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of Index.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43245:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of 
> Index.
> -
>
> Key: SPARK-43245
> URL: https://issues.apache.org/jira/browse/SPARK-43245
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#index-can-now-hold-numpy-numeric-dtypes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43433) Match `GroupBy.nth` behavior with new pandas behavior

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43433:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Match `GroupBy.nth` behavior with new pandas behavior
> -
>
> Key: SPARK-43433
> URL: https://issues.apache.org/jira/browse/SPARK-43433
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Match behavior with 
> https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43432) Fix `min_periods` for Rolling to work same as pandas

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43432:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Fix `min_periods` for Rolling to work same as pandas 
> -
>
> Key: SPARK-43432
> URL: https://issues.apache.org/jira/browse/SPARK-43432
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Fix `min_periods` for Rolling to work same as pandas
> https://github.com/pandas-dev/pandas/issues/31302



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43291) Match behavior for DataFrame.cov on string DataFrame

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43291:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Match behavior for DataFrame.cov on string DataFrame
> 
>
> Key: SPARK-43291
> URL: https://issues.apache.org/jira/browse/SPARK-43291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Should enable test below:
> {code:java}
> pdf = pd.DataFrame([("1", "2"), ("0", "3"), ("2", "0"), ("1", "1")], 
> columns=["a", "b"])
> psdf = ps.from_pandas(pdf)
> self.assert_eq(pdf.cov(), psdf.cov()) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43295) Make DataFrameGroupBy.sum support for string type columns

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43295:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Make DataFrameGroupBy.sum support for string type columns
> -
>
> Key: SPARK-43295
> URL: https://issues.apache.org/jira/browse/SPARK-43295
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> From pandas 2.0.0, DataFrameGroupBy.sum also works for string type columns:
> {code:java}
> >>> psdf
>    A    B  C      D
> 0  1  3.1  a   True
> 1  2  4.1  b  False
> 2  1  4.1  b  False
> 3  2  3.1  a   True
> >>> psdf.groupby("A").sum().sort_index()
>      B  D
> A
> 1  7.2  1
> 2  7.2  1
> >>> psdf.to_pandas().groupby("A").sum().sort_index()
>      B   C  D
> A
> 1  7.2  ab  1
> 2  7.2  ba  1 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44628) Clear some unused codes in "***Errors" and extract some common logic

2023-08-01 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44628:
---

 Summary: Clear some unused codes in "***Errors" and extract some 
common logic
 Key: SPARK-44628
 URL: https://issues.apache.org/jira/browse/SPARK-44628
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43460) Enable OpsOnDiffFramesGroupByTests.test_groupby_different_lengths for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43460:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable OpsOnDiffFramesGroupByTests.test_groupby_different_lengths for pandas 
> 2.0.0.
> ---
>
> Key: SPARK-43460
> URL: https://issues.apache.org/jira/browse/SPARK-43460
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesGroupByTests.test_groupby_different_lengths for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43453) Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43453:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.
> 
>
> Key: SPARK-43453
> URL: https://issues.apache.org/jira/browse/SPARK-43453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesEnabledTests.test_concat_column_axis for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43459) Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43459:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 
> 2.0.0.
> 
>
> Key: SPARK-43459
> URL: https://issues.apache.org/jira/browse/SPARK-43459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43476) Enable SeriesStringTests.test_string_replace for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43476:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.
> --
>
> Key: SPARK-43476
> URL: https://issues.apache.org/jira/browse/SPARK-43476
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_replace for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43458) Enable SeriesConversionTests.test_to_latex for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43458:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesConversionTests.test_to_latex for pandas 2.0.0.
> 
>
> Key: SPARK-43458
> URL: https://issues.apache.org/jira/browse/SPARK-43458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesConversionTests.test_to_latex for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43452) Enable RollingTests.test_groupby_rolling_count for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43452:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable RollingTests.test_groupby_rolling_count for pandas 2.0.0.
> 
>
> Key: SPARK-43452
> URL: https://issues.apache.org/jira/browse/SPARK-43452
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable RollingTests.test_groupby_rolling_count for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43462) Enable SeriesDateTimeTests.test_date_subtraction for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43462:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesDateTimeTests.test_date_subtraction for pandas 2.0.0.
> --
>
> Key: SPARK-43462
> URL: https://issues.apache.org/jira/browse/SPARK-43462
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesDateTimeTests.test_date_subtraction for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43477) Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43477:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.
> -
>
> Key: SPARK-43477
> URL: https://issues.apache.org/jira/browse/SPARK-43477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_rsplit for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43497) Enable StatsTests.test_cov_corr_meta for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43497:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable StatsTests.test_cov_corr_meta for pandas 2.0.0.
> --
>
> Key: SPARK-43497
> URL: https://issues.apache.org/jira/browse/SPARK-43497
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_cov_corr_meta for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43498) Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43498:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.
> --
>
> Key: SPARK-43498
> URL: https://issues.apache.org/jira/browse/SPARK-43498
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_axis_on_dataframe for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43478) Enable SeriesStringTests.test_string_split for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43478:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesStringTests.test_string_split for pandas 2.0.0.
> 
>
> Key: SPARK-43478
> URL: https://issues.apache.org/jira/browse/SPARK-43478
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SeriesStringTests.test_string_split for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43506) Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43506:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.
> ---
>
> Key: SPARK-43506
> URL: https://issues.apache.org/jira/browse/SPARK-43506
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43499) Enable StatsTests.test_stat_functions_with_no_numeric_columns for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43499:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable StatsTests.test_stat_functions_with_no_numeric_columns for pandas 
> 2.0.0.
> ---
>
> Key: SPARK-43499
> URL: https://issues.apache.org/jira/browse/SPARK-43499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable StatsTests.test_stat_functions_with_no_numeric_columns for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43561) Enable DataFrameConversionTests.test_to_latex for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43561:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameConversionTests.test_to_latex for pandas 2.0.0.
> ---
>
> Key: SPARK-43561
> URL: https://issues.apache.org/jira/browse/SPARK-43561
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DataFrameConversionTests.test_to_latex for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43562) Enable DataFrameTests.test_append for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43562:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameTests.test_append for pandas 2.0.0.
> ---
>
> Key: SPARK-43562
> URL: https://issues.apache.org/jira/browse/SPARK-43562
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DataFrameTests.test_append for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43533) Enable MultiIndex test for IndexesTests.test_difference

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43533:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable MultiIndex test for IndexesTests.test_difference
> ---
>
> Key: SPARK-43533
> URL: https://issues.apache.org/jira/browse/SPARK-43533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable MultiIndex test for IndexesTests.test_difference



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43563) Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43563:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.
> 
>
> Key: SPARK-43563
> URL: https://issues.apache.org/jira/browse/SPARK-43563
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43570) Enable DateOpsTests.test_rsub for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43570:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DateOpsTests.test_rsub for pandas 2.0.0.
> ---
>
> Key: SPARK-43570
> URL: https://issues.apache.org/jira/browse/SPARK-43570
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DateOpsTests.test_rsub for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43608) Enable IndexesTests.test_union for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43608:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable IndexesTests.test_union for pandas 2.0.0.
> 
>
> Key: SPARK-43608
> URL: https://issues.apache.org/jira/browse/SPARK-43608
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_union for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43705) Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43705:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.
> 
>
> Key: SPARK-43705
> URL: https://issues.apache.org/jira/browse/SPARK-43705
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43644) Enable DatetimeIndexTests.test_indexer_between_time for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43644:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DatetimeIndexTests.test_indexer_between_time for pandas 2.0.0.
> -
>
> Key: SPARK-43644
> URL: https://issues.apache.org/jira/browse/SPARK-43644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DatetimeIndexTests.test_indexer_between_time for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43606) Enable IndexesTests.test_index_basic for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43606:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable IndexesTests.test_index_basic for pandas 2.0.0.
> --
>
> Key: SPARK-43606
> URL: https://issues.apache.org/jira/browse/SPARK-43606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_index_basic for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43567) Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43567:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.
> -
>
> Key: SPARK-43567
> URL: https://issues.apache.org/jira/browse/SPARK-43567
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CategoricalIndexTests.test_factorize for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43571) Enable DateOpsTests.test_sub for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43571:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DateOpsTests.test_sub for pandas 2.0.0.
> --
>
> Key: SPARK-43571
> URL: https://issues.apache.org/jira/browse/SPARK-43571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable DateOpsTests.test_sub for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43607) Enable IndexesTests.test_intersection for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43607:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable IndexesTests.test_intersection for pandas 2.0.0.
> ---
>
> Key: SPARK-43607
> URL: https://issues.apache.org/jira/browse/SPARK-43607
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_intersection for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43568) Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43568:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.
> -
>
> Key: SPARK-43568
> URL: https://issues.apache.org/jira/browse/SPARK-43568
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CategoricalIndexTests.test_categories_setter for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43633) Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43633:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.
> -
>
> Key: SPARK-43633
> URL: https://issues.apache.org/jira/browse/SPARK-43633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable CategoricalIndexTests.test_remove_categories for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43811) Enable DataFrameTests.test_reindex for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43811:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameTests.test_reindex for pandas 2.0.0.
> 
>
> Key: SPARK-43811
> URL: https://issues.apache.org/jira/browse/SPARK-43811
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43869) Enable GroupBySlowTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43869:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable GroupBySlowTests for pandas 2.0.0.
> -
>
> Key: SPARK-43869
> URL: https://issues.apache.org/jira/browse/SPARK-43869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_value_counts
>  * test_split_apply_combine_on_series



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43709) Enable NamespaceTests.test_date_range for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43709:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable NamespaceTests.test_date_range for pandas 2.0.0.
> ---
>
> Key: SPARK-43709
> URL: https://issues.apache.org/jira/browse/SPARK-43709
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43812) Enable DataFrameTests.test_all for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43812:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameTests.test_all for pandas 2.0.0.
> 
>
> Key: SPARK-43812
> URL: https://issues.apache.org/jira/browse/SPARK-43812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43871) Enable SeriesDateTimeTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43871:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesDateTimeTests for pandas 2.0.0.
> 
>
> Key: SPARK-43871
> URL: https://issues.apache.org/jira/browse/SPARK-43871
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_day
>  * test_dayofweek
>  * test_dayofyear
>  * test_days_in_month
>  * test_daysinmonth
>  * test_hour
>  * test_microsecond
>  * test_minute
>  * test_month
>  * test_quarter
>  * test_second
>  * test_wrrkday
>  * test_year



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43872) Enable DataFramePlotMatplotlibTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43872:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFramePlotMatplotlibTests for pandas 2.0.0.
> -
>
> Key: SPARK-43872
> URL: https://issues.apache.org/jira/browse/SPARK-43872
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_area_plot
>  * test_area_plot_stacked_false
>  * test_area_plot_y
>  * test_bar_plot
>  * test_bar_with_x_y
>  * test_barh_plot_with_x_y
>  * test_barh_plot
>  * test_line_plot
>  * test_pie_plot
>  * test_scatter_plot
>  * test_hist_plot
>  * test_kde_plot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43873) Enable DataFrameSlowTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43873:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable DataFrameSlowTests for pandas 2.0.0.
> ---
>
> Key: SPARK-43873
> URL: https://issues.apache.org/jira/browse/SPARK-43873
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_describe
>  * test_between_time
>  * test_product
>  * test_iteritems
>  * test_mad
>  * test_cov
>  * test_quantile



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43870) Enable SeriesTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43870:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable SeriesTests for pandas 2.0.0.
> 
>
> Key: SPARK-43870
> URL: https://issues.apache.org/jira/browse/SPARK-43870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_value_counts
>  * test_append
>  * test_astype
>  * test_between
>  * test_mad
>  * test_quantile
>  * test_rank
>  * test_between_time
>  * test_iteritems
>  * test_product
>  * test_factorize



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43874) Enable GroupByTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43874:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable GroupByTests for pandas 2.0.0.
> -
>
> Key: SPARK-43874
> URL: https://issues.apache.org/jira/browse/SPARK-43874
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_prod
>  * test_nth
>  * test_mad
>  * test_basic_stat_funcs
>  * test_groupby_multiindex_columns
>  * test_apply_without_shortcut
>  * test_mean
>  * test_apply



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43875) Enable CategoricalTests for pandas 2.0.0.

2023-08-01 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43875:

Affects Version/s: 4.0.0
   (was: 3.5.0)

> Enable CategoricalTests for pandas 2.0.0.
> -
>
> Key: SPARK-43875
> URL: https://issues.apache.org/jira/browse/SPARK-43875
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> test list:
>  * test_factorize
>  * test_as_ordered_unordered
>  * test_categories_setter
>  * test_remove_categories
>  * test_groupby_apply_without_shortcut



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Epic Link: SPARK-43754

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan never reached the server, a ReattachExecute will fail with 
> INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send 
> ExecutePlan again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Description: If the ExecutePlan never reached the server, a ReattachExecute 
will fail with INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try 
to send ExecutePlan again.  (was: Even though we empirically observed that 
error is throws only from first next() or hasNext() of the response 
StreamObserver, wrap the initial call in retries as well to not depend on it in 
case it's just an quirk that's not fully dependable.)

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> If the ExecutePlan never reached the server, a ReattachExecute will fail with 
> INVALID_HANDLE.OPERATION_NOT_FOUND. In that case, we could try to send 
> ExecutePlan again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44624) Spark Connect reattachable Execute when initial ExecutePlan didn't reach server

2023-08-01 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-44624:
--
Summary: Spark Connect reattachable Execute when initial ExecutePlan didn't 
reach server  (was: Wrap retries around initial streaming GRPC call in connect)

> Spark Connect reattachable Execute when initial ExecutePlan didn't reach 
> server
> ---
>
> Key: SPARK-44624
> URL: https://issues.apache.org/jira/browse/SPARK-44624
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Even though we empirically observed that error is throws only from first 
> next() or hasNext() of the response StreamObserver, wrap the initial call in 
> retries as well to not depend on it in case it's just an quirk that's not 
> fully dependable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44627) org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows produces wrong data

2023-08-01 Thread Min Zhao (Jira)
Min Zhao created SPARK-44627:


 Summary: 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#resultSetToRows 
produces wrong data
 Key: SPARK-44627
 URL: https://issues.apache.org/jira/browse/SPARK-44627
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.1, 2.3.2
Reporter: Min Zhao


when the resultSet exists a timestmp column and it's value is null, In the row 
it generates, this column will use the value of the same column in the previous 
row. 

example:

the value of resultSet

1, 2023-01-01 12:00:00

2, null

 

the value of row

1, 2023-01-01 12:00:00

2, 2023-01-01 12:00:00

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42941) Add support for streaming listener in Python

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42941.
--
Resolution: Fixed

Issue resolved by pull request 42250
[https://github.com/apache/spark/pull/42250]

> Add support for streaming listener in Python
> 
>
> Key: SPARK-42941
> URL: https://issues.apache.org/jira/browse/SPARK-42941
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Raghu Angadi
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> Add support of streaming listener in Python. 
> This likely requires a design doc to hash out the details. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42730) Update Spark Standalone Mode - Starting a Cluster Manually

2023-08-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750044#comment-17750044
 ] 

Hyukjin Kwon commented on SPARK-42730:
--

Please go ahead. Refs: [https://spark.apache.org/contributing.html] , 
[https://spark.apache.org/developer-tools.html]

> Update Spark Standalone Mode - Starting a Cluster Manually
> --
>
> Key: SPARK-42730
> URL: https://issues.apache.org/jira/browse/SPARK-42730
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/spark-standalone.html
> Add start-connect-server.sh to this list and cover Spark Connect sessions - 
> other changes needed here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44218) Customize diff log in assertDataFrameEqual error message format

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44218:


Assignee: Amanda Liu

> Customize diff log in assertDataFrameEqual error message format
> ---
>
> Key: SPARK-44218
> URL: https://issues.apache.org/jira/browse/SPARK-44218
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44218) Customize diff log in assertDataFrameEqual error message format

2023-08-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44218.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42196
[https://github.com/apache/spark/pull/42196]

> Customize diff log in assertDataFrameEqual error message format
> ---
>
> Key: SPARK-44218
> URL: https://issues.apache.org/jira/browse/SPARK-44218
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44626) Followup on streaming query termination when client session is timed out for Spark Connect

2023-08-01 Thread Bo Gao (Jira)
Bo Gao created SPARK-44626:
--

 Summary: Followup on streaming query termination when client 
session is timed out for Spark Connect
 Key: SPARK-44626
 URL: https://issues.apache.org/jira/browse/SPARK-44626
 Project: Spark
  Issue Type: Task
  Components: Connect, Structured Streaming
Affects Versions: 3.5.0
Reporter: Bo Gao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44588:
-

Assignee: Henry Mai

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Assignee: Henry Mai
>Priority: Critical
> Fix For: 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44588) Migrated shuffle blocks are encrypted multiple times when io.encryption is enabled

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44588.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Migrated shuffle blocks are encrypted multiple times when io.encryption is 
> enabled 
> ---
>
> Key: SPARK-44588
> URL: https://issues.apache.org/jira/browse/SPARK-44588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 
> 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>Reporter: Henry Mai
>Priority: Critical
> Fix For: 3.5.0
>
>
> Shuffle blocks upon migration are wrapped for encryption again when being 
> written out to a file on the receiver side.
>  
> Pull request to fix this: https://github.com/apache/spark/pull/42214
>  
> Details:
> Sender/Read side:
> BlockManagerDecommissioner:run()
>     blocks = bm.migratableResolver.getMigrationBlocks()
>     *dataFile = IndexShuffleBlockResolver:getDataFile(...)*
>    buffer = FileSegmentManagedBuffer(..., dataFile)
>    *^ This reads straight from disk without decryption*
>     blocks.foreach((blockId, buffer) => 
> bm.blockTransferService.uploadBlockSync(..., buffer, ...))
>     -> uploadBlockSync() -> uploadBlock(..., buffer, ...)
>     -> client.uploadStream(UploadBlockStream, buffer, ...)
>  - Notice that there is no decryption here on the sender/read side.
> Receiver/Write side:
> NettyBlockRpcServer:receiveStream() <--- This is the UploadBlockStream handler
>     putBlockDataAsStream()
>     migratableResolver.putShuffleBlockAsStream()
>     *-> file = IndexShuffleBlockResolver:getDataFile(...)*
>     -> tmpFile = (file + . extension)
>     *-> Creates an encrypting writable channel to a tmpFile using 
> serializerManager.wrapStream()*
>     -> onData() writes the data into the channel
>     -> onComplete() renames the tmpFile to the file
>  - Notice:
>  * Both getMigrationBlocks()[read] and putShuffleBlockAsStream()[write] 
> target IndexShuffleBlockResolver:getDataFile()
>  * The read path does not decrypt but the write path encrypts.
>  * As a thought exercise: if this cycle happens more than once (where this 
> receiver is now a sender) even if we assume that the shuffle blocks are 
> initially unencrypted*, then bytes in the file will just have more and more 
> layers of encryption applied to it each time it gets migrated.
>  * *In practice, the shuffle blocks are encrypted on disk to begin with, this 
> is just a thought exercise



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44563) Upgrade Apache Arrow to 13.0.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44563.
---
Resolution: Duplicate

> Upgrade Apache Arrow to 13.0.0
> --
>
> Key: SPARK-44563
> URL: https://issues.apache.org/jira/browse/SPARK-44563
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-44563) Upgrade Apache Arrow to 13.0.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-44563.
-

> Upgrade Apache Arrow to 13.0.0
> --
>
> Key: SPARK-44563
> URL: https://issues.apache.org/jira/browse/SPARK-44563
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44625) Spark Connect clean up abandoned executions

2023-08-01 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44625:
-

 Summary: Spark Connect clean up abandoned executions
 Key: SPARK-44625
 URL: https://issues.apache.org/jira/browse/SPARK-44625
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


With reattachable executions, some executions might get orphaned when 
ReattachExecute and ReleaseExecute never comes. Add a mechanism to track that 
and to clean them up.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44601) Make `hive-thriftserver` module daily test pass

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44601:


Assignee: Yang Jie

> Make `hive-thriftserver` module daily test pass
> ---
>
> Key: SPARK-44601
> URL: https://issues.apache.org/jira/browse/SPARK-44601
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> [https://github.com/LuciferYang/spark/actions/runs/5694334367/job/15435297305]
>  
> {code:java}
> *** RUN ABORTED ***
> 20159  java.lang.NoClassDefFoundError: 
> org/codehaus/jackson/map/type/TypeFactory
> 20160  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20161  at java.lang.Class.forName0(Native Method)
> 20162  at java.lang.Class.forName(Class.java:348)
> 20163  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20164  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20165  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20166  at 
> org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
> 20167  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
> 20168  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
> 20169  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
> 20170  ...
> 20171  Cause: java.lang.ClassNotFoundException: 
> org.codehaus.jackson.map.type.TypeFactory
> 20172  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> 20173  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> 20174  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> 20175  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> 20176  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20177  at java.lang.Class.forName0(Native Method)
> 20178  at java.lang.Class.forName(Class.java:348)
> 20179  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20180  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20181  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20182  ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44601) Make `hive-thriftserver` module daily test pass

2023-08-01 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44601.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42260
[https://github.com/apache/spark/pull/42260]

> Make `hive-thriftserver` module daily test pass
> ---
>
> Key: SPARK-44601
> URL: https://issues.apache.org/jira/browse/SPARK-44601
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> [https://github.com/LuciferYang/spark/actions/runs/5694334367/job/15435297305]
>  
> {code:java}
> *** RUN ABORTED ***
> 20159  java.lang.NoClassDefFoundError: 
> org/codehaus/jackson/map/type/TypeFactory
> 20160  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20161  at java.lang.Class.forName0(Native Method)
> 20162  at java.lang.Class.forName(Class.java:348)
> 20163  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20164  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20165  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20166  at 
> org.apache.hadoop.hive.ql.exec.Registry.addFunction(Registry.java:519)
> 20167  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:163)
> 20168  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:154)
> 20169  at 
> org.apache.hadoop.hive.ql.exec.Registry.registerUDF(Registry.java:147)
> 20170  ...
> 20171  Cause: java.lang.ClassNotFoundException: 
> org.codehaus.jackson.map.type.TypeFactory
> 20172  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> 20173  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> 20174  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> 20175  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> 20176  at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64)
> 20177  at java.lang.Class.forName0(Native Method)
> 20178  at java.lang.Class.forName(Class.java:348)
> 20179  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClassInternal(GenericUDFBridge.java:142)
> 20180  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:132)
> 20181  at 
> org.apache.hadoop.hive.ql.exec.FunctionInfo.getFunctionClass(FunctionInfo.java:151)
> 20182  ... {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44624) Wrap retries around initial streaming GRPC call in connect

2023-08-01 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-44624:
-

 Summary: Wrap retries around initial streaming GRPC call in connect
 Key: SPARK-44624
 URL: https://issues.apache.org/jira/browse/SPARK-44624
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Juliusz Sompolski


Even though we empirically observed that error is throws only from first next() 
or hasNext() of the response StreamObserver, wrap the initial call in retries 
as well to not depend on it in case it's just an quirk that's not fully 
dependable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44480) Add option for thread pool to perform maintenance for RocksDB/HDFS State Store Providers

2023-08-01 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-44480:


Assignee: Eric Marnadi

> Add option for thread pool to perform maintenance for RocksDB/HDFS State 
> Store Providers
> 
>
> Key: SPARK-44480
> URL: https://issues.apache.org/jira/browse/SPARK-44480
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Eric Marnadi
>Assignee: Eric Marnadi
>Priority: Major
>
> Maintenance tasks on StateStore was being done by a single background thread, 
> which is prone to straggling. In this change, the single background thread 
> would instead schedule maintenance tasks to a thread pool.
> Introduce 
> {{spark.sql.streaming.stateStore.enableStateStoreMaintenanceThreadPool}} 
> config so that the user can enable a thread pool for maintenance manually.
> Introduce {{spark.sql.streaming.stateStore.numStateStoreMaintenanceThreads}} 
> config so the thread pool size is configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44480) Add option for thread pool to perform maintenance for RocksDB/HDFS State Store Providers

2023-08-01 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-44480.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42066
[https://github.com/apache/spark/pull/42066]

> Add option for thread pool to perform maintenance for RocksDB/HDFS State 
> Store Providers
> 
>
> Key: SPARK-44480
> URL: https://issues.apache.org/jira/browse/SPARK-44480
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Eric Marnadi
>Assignee: Eric Marnadi
>Priority: Major
> Fix For: 4.0.0
>
>
> Maintenance tasks on StateStore was being done by a single background thread, 
> which is prone to straggling. In this change, the single background thread 
> would instead schedule maintenance tasks to a thread pool.
> Introduce 
> {{spark.sql.streaming.stateStore.enableStateStoreMaintenanceThreadPool}} 
> config so that the user can enable a thread pool for maintenance manually.
> Introduce {{spark.sql.streaming.stateStore.numStateStoreMaintenanceThreads}} 
> config so the thread pool size is configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44623) Upgrade commons-lang3 to 3.13.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44623:
-

Assignee: Dongjoon Hyun

> Upgrade commons-lang3 to 3.13.0
> ---
>
> Key: SPARK-44623
> URL: https://issues.apache.org/jira/browse/SPARK-44623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44623) Upgrade commons-lang3 to 3.13.0

2023-08-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44623.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42269
[https://github.com/apache/spark/pull/42269]

> Upgrade commons-lang3 to 3.13.0
> ---
>
> Key: SPARK-44623
> URL: https://issues.apache.org/jira/browse/SPARK-44623
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29497) Cannot assign instance of java.lang.invoke.SerializedLambda to field

2023-08-01 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-29497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749991#comment-17749991
 ] 

Herman van Hövell commented on SPARK-29497:
---

I have added a check for this to Spark Connect. If someone is brave enough they 
can do the same thing for other UDFs.

> Cannot assign instance of java.lang.invoke.SerializedLambda to field
> 
>
> Key: SPARK-29497
> URL: https://issues.apache.org/jira/browse/SPARK-29497
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3, 3.0.1, 3.2.0
> Environment: Spark 2.4.3 Scala 2.12
> Spark 3.2.0 Scala 2.13.5 (Java 11.0.12)
>Reporter: Rob Russo
>Priority: Major
>
> Note this is for scala 2.12:
> There seems to be an issue in spark with serializing a udf that is created 
> from a function assigned to a class member that references another function 
> assigned to a class member. This is similar to 
> https://issues.apache.org/jira/browse/SPARK-25047 but it looks like the 
> resolution has an issue with this case. After trimming it down to the base 
> issue I came up with the following to reproduce:
>  
>  
> {code:java}
> object TestLambdaShell extends Serializable {
>   val hello: String => String = s => s"hello $s!"  
>   val lambdaTest: String => String = hello( _ )  
>   def functionTest: String => String = hello( _ )
> }
> val hello = udf( TestLambdaShell.hello )
> val functionTest = udf( TestLambdaShell.functionTest )
> val lambdaTest = udf( TestLambdaShell.lambdaTest )
> sc.parallelize(Seq("world"),1).toDF("test").select(hello($"test")).show(1)
> sc.parallelize(Seq("world"),1).toDF("test").select(functionTest($"test")).show(1)
> sc.parallelize(Seq("world"),1).toDF("test").select(lambdaTest($"test")).show(1)
> {code}
>  
> All of which works except the last line which results in an exception on the 
> executors:
>  
> {code:java}
> Caused by: java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> $$$82b5b23cea489b2712a1db46c77e458w$TestLambdaShell$.lambdaTest of type 
> scala.Function1 in instance of 
> $$$82b5b23cea489b2712a1db46c77e458w$TestLambdaShell$
>   at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
>   at 

[jira] [Resolved] (SPARK-44613) Add Encoders.scala to Spark Connect Scala Client

2023-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44613.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Add Encoders.scala to Spark Connect Scala Client
> 
>
> Key: SPARK-44613
> URL: https://issues.apache.org/jira/browse/SPARK-44613
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44616) Hive Generic UDF support no longer supports short-circuiting of argument evaluation

2023-08-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-44616:
---
Description: 
PR [https://github.com/apache/spark/pull/39555] changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{code:java}
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
{code}

 

  was:
PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{{{
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
}}}


> Hive Generic UDF support no longer supports short-circuiting of argument 
> evaluation
> ---
>
> Key: SPARK-44616
> URL: https://issues.apache.org/jira/browse/SPARK-44616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> PR [https://github.com/apache/spark/pull/39555] changed DeferredObject to no 
> longer contain a function, and instead contains a value. This removes the 
> deferred evaluation capability and means that HiveGenericUDF implementations 
> can no longer short-circuit the evaluation of their arguments, which could be 
> a performance issue for some users.
> Here is a relevant javadoc comment from the Hive source for DeferredObject:
> {code:java}
>   /**
>* A Defered Object allows us to do lazy-evaluation and short-circuiting.
>* GenericUDF use DeferedObject to pass arguments.
>*/
>   public static interface DeferredObject {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44616) Hive Generic UDF support no longer supports short-circuiting of argument evaluation

2023-08-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-44616:
---
Description: 
PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{{{
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
}}}

  was:PR https://github.com/apache/spark/pull/39555 changed DeferredObject to 
no longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.


> Hive Generic UDF support no longer supports short-circuiting of argument 
> evaluation
> ---
>
> Key: SPARK-44616
> URL: https://issues.apache.org/jira/browse/SPARK-44616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
> longer contain a function, and instead contains a value. This removes the 
> deferred evaluation capability and means that HiveGenericUDF implementations 
> can no longer short-circuit the evaluation of their arguments, which could be 
> a performance issue for some users.
> Here is a relevant javadoc comment from the Hive source for DeferredObject:
> {{{
>   /**
>* A Defered Object allows us to do lazy-evaluation and short-circuiting.
>* GenericUDF use DeferedObject to pass arguments.
>*/
>   public static interface DeferredObject {
> }}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42730) Update Spark Standalone Mode - Starting a Cluster Manually

2023-08-01 Thread Junyao Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749926#comment-17749926
 ] 

Junyao Huang commented on SPARK-42730:
--

Hi, [~gurwls223] , I'd like to work on this subtask.

Would there any guideline now for me to start on local to verify this? 

> Update Spark Standalone Mode - Starting a Cluster Manually
> --
>
> Key: SPARK-42730
> URL: https://issues.apache.org/jira/browse/SPARK-42730
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/spark-standalone.html
> Add start-connect-server.sh to this list and cover Spark Connect sessions - 
> other changes needed here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44623) Upgrade commons-lang3 to 3.13.0

2023-08-01 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44623:
-

 Summary: Upgrade commons-lang3 to 3.13.0
 Key: SPARK-44623
 URL: https://issues.apache.org/jira/browse/SPARK-44623
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44622) Add GetDebugInfo RPC

2023-08-01 Thread Yihong He (Jira)
Yihong He created SPARK-44622:
-

 Summary: Add GetDebugInfo RPC
 Key: SPARK-44622
 URL: https://issues.apache.org/jira/browse/SPARK-44622
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Yihong He






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33864) How can we submit or initiate multiple spark application with single or few JVM

2023-08-01 Thread Laurenceau Julien (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749898#comment-17749898
 ] 

Laurenceau Julien commented on SPARK-33864:
---

Hi,

I think you missed something with Livy. 

Livy batch is effectively quite the same as an HTTP-REST spark-submit.

However livy session enable to open a spark session that will stay idle until 
you submit to it statements of code to be executed. 

Be aware of possible security issues (dataleak) when you share a spark session 
between tasks of different projects !

> How can we submit or initiate multiple spark application with single or few 
> JVM
> ---
>
> Key: SPARK-33864
> URL: https://issues.apache.org/jira/browse/SPARK-33864
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 2.4.5
>Reporter: Ramesha Bhatta
>Priority: Major
>
> How can we have single JVM or few JVM process submit multiple application to 
> cluster.
> It is observed that each spark-submit opens upto 400 JARS of >1GB size and 
> creates  _spark_conf_.zip in /tmp  and copy under application specific 
> .staging directory.    When run concurrently for # of JVMs that can be 
> supported in a server is limited and 100% CPU during job submission and  
> until client java processes start exiting.
> Initially we thought creating zip files and distributing this to hdfs for 
> each application is the source of issue. However reducing the size of zipfile 
> by 50% also we didn't see much difference and indicates the main source of 
> issue is number of JAVA process on client side.
> Direct impact is any submission with concurrency >40 (#of hyperthreaded 
> cores) leads to failure and CPU overload on GW. Tried Livy, however noticed, 
> in the background this solution also does a spark-submit and same problem 
> persists and getting "response code 404" and observe the same CPU overload on 
> server running livy. The concurrency is due to mini-batches over REST and 
> expecting and try to support 2000+ concurrent requests as long as we have the 
> resource to support in the cluster. For this spark-submit is the major 
> bottleneck because of the explained situation. For JARS submission, we have 
> more than one work-around (1.pre-distribute the jars to a specified folder 
> and refer local keyword or 2) stage the JARS in a HDFS location and specify 
> HDFS reference thus no file-copy per application).
> Is there a way to create a service/services that will stay running and submit 
> jobs to cluster. For running application in Client mode make sense to open 
> 400+ jars, however just for sumibtting the application to cluster we could 
> have a simple/lite process that runs as service.
> Regards,
> -Ramesh



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44614) Add missing packages in setup.py

2023-08-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44614:
--
Fix Version/s: 3.5.0

> Add missing packages in setup.py
> 
>
> Key: SPARK-44614
> URL: https://issues.apache.org/jira/browse/SPARK-44614
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Some packages for SQL module are missing in {{setup.py}} file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44571) Eliminate the Join by combine multiple Aggregates

2023-08-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749895#comment-17749895
 ] 

Hudson commented on SPARK-44571:


User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/42223

> Eliminate the Join by combine multiple Aggregates
> -
>
> Key: SPARK-44571
> URL: https://issues.apache.org/jira/browse/SPARK-44571
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Recently, I investigate the test case q28 which is belong to the TPC-DS 
> queries.
> The query contains multiple scalar subquery with aggregation and connected 
> with inner join.
> If we can merge the filters and aggregates, we can scan data source only once 
> and eliminate the join so as avoid shuffle. Obviously, this change will 
> improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44571) Eliminate the Join by combine multiple Aggregates

2023-08-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749893#comment-17749893
 ] 

Hudson commented on SPARK-44571:


User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/42223

> Eliminate the Join by combine multiple Aggregates
> -
>
> Key: SPARK-44571
> URL: https://issues.apache.org/jira/browse/SPARK-44571
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Recently, I investigate the test case q28 which is belong to the TPC-DS 
> queries.
> The query contains multiple scalar subquery with aggregation and connected 
> with inner join.
> If we can merge the filters and aggregates, we can scan data source only once 
> and eliminate the join so as avoid shuffle. Obviously, this change will 
> improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32744) request executor cores with decimal when spark on k8s

2023-08-01 Thread Laurenceau Julien (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749892#comment-17749892
 ] 

Laurenceau Julien commented on SPARK-32744:
---

This would be specially useful for small jobs and should also be extended to 
the driver number of cores !

> request executor cores with decimal when spark on k8s
> -
>
> Key: SPARK-32744
> URL: https://issues.apache.org/jira/browse/SPARK-32744
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Yu Wang
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
>   In current spark version which does not support to request executor cores 
> with decimal when spark on k8s . because the cores is Int type in 
> CoarseGrainedExecutorBackend class.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44621) Connect Scala Client should support value classes

2023-08-01 Thread Jira
Herman van Hövell created SPARK-44621:
-

 Summary: Connect Scala Client should support value classes
 Key: SPARK-44621
 URL: https://issues.apache.org/jira/browse/SPARK-44621
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44311) UDF should support function taking value classes

2023-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-44311:
-

Assignee: Emil Ejbyfeldt

> UDF should support function taking value classes
> 
>
> Key: SPARK-44311
> URL: https://issues.apache.org/jira/browse/SPARK-44311
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>
> Running the following code in a spark 
> ```
> final case class ValueClass(a: Int) extends AnyVal
> final case class Wrapper(v: ValueClass)
> val f = udf((a: ValueClass) => a.a > 0)
> spark.createDataset(Seq(Wrapper(ValueClass(1.filter(f(col("v"))).show()
> ```
> fails with
> ```
> java.lang.ClassCastException: class org.apache.spark.sql.types.IntegerType$ 
> cannot be cast to class org.apache.spark.sql.types.StructType 
> (org.apache.spark.sql.types.IntegerType$ and 
> org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$220(Analyzer.scala:3241)
>   at scala.Option.map(Option.scala:242)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$219(Analyzer.scala:3239)
>   at scala.collection.immutable.List.map(List.scala:246)
>   at scala.collection.immutable.List.map(List.scala:79)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3237)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3234)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:566)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:566)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44311) UDF should support function taking value classes

2023-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44311.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> UDF should support function taking value classes
> 
>
> Key: SPARK-44311
> URL: https://issues.apache.org/jira/browse/SPARK-44311
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
> Fix For: 3.5.0
>
>
> Running the following code in a spark 
> ```
> final case class ValueClass(a: Int) extends AnyVal
> final case class Wrapper(v: ValueClass)
> val f = udf((a: ValueClass) => a.a > 0)
> spark.createDataset(Seq(Wrapper(ValueClass(1.filter(f(col("v"))).show()
> ```
> fails with
> ```
> java.lang.ClassCastException: class org.apache.spark.sql.types.IntegerType$ 
> cannot be cast to class org.apache.spark.sql.types.StructType 
> (org.apache.spark.sql.types.IntegerType$ and 
> org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$220(Analyzer.scala:3241)
>   at scala.Option.map(Option.scala:242)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$219(Analyzer.scala:3239)
>   at scala.collection.immutable.List.map(List.scala:246)
>   at scala.collection.immutable.List.map(List.scala:79)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3237)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3234)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:566)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:566)
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43606) Enable IndexesTests.test_index_basic for pandas 2.0.0.

2023-08-01 Thread GridGain Integration (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749848#comment-17749848
 ] 

GridGain Integration commented on SPARK-43606:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/42267

> Enable IndexesTests.test_index_basic for pandas 2.0.0.
> --
>
> Key: SPARK-43606
> URL: https://issues.apache.org/jira/browse/SPARK-43606
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable IndexesTests.test_index_basic for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >