[jira] [Commented] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-29 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850486#comment-17850486 ] Zheng Shao commented on SPARK-48380: A second look at the stacktrace shows that the current issue is

[jira] [Commented] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-21 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848375#comment-17848375 ] Zheng Shao commented on SPARK-48380: A draft PR can be found here:

[jira] [Updated] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-21 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-48380: --- Description: AutoBatchedPickler assumes that the row sizes are more or less uniform. That's of

[jira] [Updated] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-21 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-48380: --- Description: AutoBatchedPickler assumes that the row sizes are more or less uniform. That's of

[jira] [Updated] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-21 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-48380: --- Description: The stacktrace: {{```}} {{Py4JJavaError: An error occurred while calling

[jira] [Updated] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-21 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-48380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-48380: --- Description: The stacktrace: {{```}} {{Py4JJavaError: An error occurred while calling

[jira] [Created] (SPARK-48380) AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit

2024-05-21 Thread Zheng Shao (Jira)
Zheng Shao created SPARK-48380: -- Summary: AutoBatchedPickler caused Unsafe allocate to fail due to 2GB limit Key: SPARK-48380 URL: https://issues.apache.org/jira/browse/SPARK-48380 Project: Spark

[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-26 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-47959: --- Description: We have a Spark executor that is running 32 workers in parallel.  The query is a

[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-23 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-47959: --- Description: We have a Spark executor that is running 32 workers in parallel.  The query is a

[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-23 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-47959: --- Description: We have a Spark executor that is running 32 workers in parallel.  The query is a

[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-23 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-47959: --- Description: We have a Spark executor that is running 32 workers in parallel.  The query is a

[jira] [Updated] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-23 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-47959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated SPARK-47959: --- Description: We have a Spark executor that is running 32 workers in parallel.  The query is a

[jira] [Created] (SPARK-47959) Improve GET_JSON_OBJECT performance on executors running multiple tasks

2024-04-23 Thread Zheng Shao (Jira)
Zheng Shao created SPARK-47959: -- Summary: Improve GET_JSON_OBJECT performance on executors running multiple tasks Key: SPARK-47959 URL: https://issues.apache.org/jira/browse/SPARK-47959 Project: Spark

[jira] [Created] (SPARK-47801) Use simdjson-java in JSON related UDFs

2024-04-10 Thread Zheng Shao (Jira)
Zheng Shao created SPARK-47801: -- Summary: Use simdjson-java in JSON related UDFs Key: SPARK-47801 URL: https://issues.apache.org/jira/browse/SPARK-47801 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-47670) Multiple calls to GET_JSON_OBJECT with the same JSON str should parse it just one time

2024-04-01 Thread Zheng Shao (Jira)
Zheng Shao created SPARK-47670: -- Summary: Multiple calls to GET_JSON_OBJECT with the same JSON str should parse it just one time Key: SPARK-47670 URL: https://issues.apache.org/jira/browse/SPARK-47670

[jira] [Commented] (SPARK-26764) [SPIP] Spark Relational Cache

2021-06-28 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370770#comment-17370770 ] Zheng Shao commented on SPARK-26764: [~adrian-wang] It has been over 2 years since this issue was

[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2021-06-28 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370768#comment-17370768 ] Zheng Shao commented on SPARK-29038: [~cltlfcjin] and [~AidenZhang]. I also recently started to look

[jira] [Commented] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle

2019-09-07 Thread Zheng Shao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924940#comment-16924940 ] Zheng Shao commented on SPARK-1529: --- For huge Spark jobs, we are seeing frequent failures due to the

[jira] [Commented] (SPARK-6951) History server slow startup if the event log directory is large

2017-02-28 Thread Zheng Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889588#comment-15889588 ] Zheng Shao commented on SPARK-6951: --- Did we consider using a distributed store to solve the scalability