[jira] [Updated] (SPARK-46981) Driver OOM happens in query planning phase with empty tables

2024-02-05 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-46981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-46981: -- Description: We have observed that Driver OOM happens in query planning phase with

[jira] [Updated] (SPARK-46981) Driver OOM happens in query planning phase with empty tables

2024-02-05 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-46981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-46981: -- Description: We have observed that Driver OOM happens in query planning phase with

[jira] [Updated] (SPARK-46981) Driver OOM happens in query planning phase with empty tables

2024-02-05 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-46981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-46981: -- Attachment: test_and_twodays_simplified.sql > Driver OOM happens in query planning

[jira] [Created] (SPARK-46981) Driver OOM happens in query planning phase with empty tables

2024-02-05 Thread Noritaka Sekiyama (Jira)
Noritaka Sekiyama created SPARK-46981: - Summary: Driver OOM happens in query planning phase with empty tables Key: SPARK-46981 URL: https://issues.apache.org/jira/browse/SPARK-46981 Project:

[jira] [Updated] (SPARK-46981) Driver OOM happens in query planning phase with empty tables

2024-02-05 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-46981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-46981: -- Attachment: create_sanitized_tables.py > Driver OOM happens in query planning phase

[jira] [Updated] (SPARK-33266) Add total duration, read duration, and write duration as task level metrics

2020-10-27 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-33266: -- Description: Sometimes we need to identify performance bottlenecks, for example, how

[jira] [Created] (SPARK-33266) Add total duration, read duration, and write duration as task level metrics

2020-10-27 Thread Noritaka Sekiyama (Jira)
Noritaka Sekiyama created SPARK-33266: - Summary: Add total duration, read duration, and write duration as task level metrics Key: SPARK-33266 URL: https://issues.apache.org/jira/browse/SPARK-33266

[jira] [Updated] (SPARK-32432) Add support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-07-28 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32432: -- Description: Hive style symlink (SymlinkTextInputFormat) is commonly used in

[jira] [Created] (SPARK-32432) Add support for reading ORC/Parquet files with SymlinkTextInputFormat

2020-07-24 Thread Noritaka Sekiyama (Jira)
Noritaka Sekiyama created SPARK-32432: - Summary: Add support for reading ORC/Parquet files with SymlinkTextInputFormat Key: SPARK-32432 URL: https://issues.apache.org/jira/browse/SPARK-32432

[jira] [Updated] (SPARK-32112) Easier way to repartition/coalesce DataFrames based on the number of parallel tasks that Spark can process at the same time

2020-06-28 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32112: -- Description: Repartition/coalesce is very important to optimize Spark application's

[jira] [Updated] (SPARK-32112) Easier way to repartition/coalesce DataFrames based on the number of parallel tasks that Spark can process at the same time

2020-06-27 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32112: -- Description: Repartition/coalesce is very important to optimize Spark application's

[jira] [Updated] (SPARK-32112) Easier way to repartition/coalesce DataFrames based on the number of parallel tasks that Spark can process at the same time

2020-06-27 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32112: -- Summary: Easier way to repartition/coalesce DataFrames based on the number of

[jira] [Created] (SPARK-32112) Add a method to calculate the number of parallel tasks that Spark can process at the same time

2020-06-26 Thread Noritaka Sekiyama (Jira)
Noritaka Sekiyama created SPARK-32112: - Summary: Add a method to calculate the number of parallel tasks that Spark can process at the same time Key: SPARK-32112 URL:

[jira] [Updated] (SPARK-32013) Support query execution before/after reading/writing over JDBC

2020-06-22 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32013: -- Description: For ETL workload, there is a common requirement to perform SQL statement

[jira] [Updated] (SPARK-32013) Support query execution before/after reading/writing over JDBC

2020-06-22 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32013: -- Description: For ETL workload, there is a common requirement to perform SQL statement

[jira] [Updated] (SPARK-32013) Support query execution before/after reading/writing over JDBC

2020-06-17 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32013: -- Description: For ETL workload, there is a common requirement to perform SQL statement

[jira] [Updated] (SPARK-32013) Support query execution before/after reading/writing over JDBC

2020-06-17 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32013: -- Description: For ETL workload, there is a common requirement to perform SQL statement

[jira] [Updated] (SPARK-32013) Support query execution before/after reading/writing over JDBC

2020-06-17 Thread Noritaka Sekiyama (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-32013: -- Description: For ETL workload, there is a common requirement to perform SQL statement

[jira] [Created] (SPARK-32013) Support query execution before/after reading/writing over JDBC

2020-06-17 Thread Noritaka Sekiyama (Jira)
Noritaka Sekiyama created SPARK-32013: - Summary: Support query execution before/after reading/writing over JDBC Key: SPARK-32013 URL: https://issues.apache.org/jira/browse/SPARK-32013 Project:

[jira] [Updated] (SPARK-28069) Switch log directory from Spark UI without restarting history server

2019-06-16 Thread Noritaka Sekiyama (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-28069: -- Description: History server polls the directory specified in

[jira] [Updated] (SPARK-28069) Switch log directory from Spark UI without restarting history server

2019-06-16 Thread Noritaka Sekiyama (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noritaka Sekiyama updated SPARK-28069: -- Description: History server polls the directory specified in

[jira] [Created] (SPARK-28069) Switch log directory from Spark UI without restarting history server

2019-06-16 Thread Noritaka Sekiyama (JIRA)
Noritaka Sekiyama created SPARK-28069: - Summary: Switch log directory from Spark UI without restarting history server Key: SPARK-28069 URL: https://issues.apache.org/jira/browse/SPARK-28069

[jira] [Commented] (SPARK-21514) Hive has updated with new support for S3 and InsertIntoHiveTable.scala should update also

2019-06-16 Thread Noritaka Sekiyama (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865228#comment-16865228 ] Noritaka Sekiyama commented on SPARK-21514: --- To move data from S3 (s3a) to HDFS, there is a

[jira] [Commented] (SPARK-21514) Hive has updated with new support for S3 and InsertIntoHiveTable.scala should update also

2018-12-18 Thread Noritaka Sekiyama (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724763#comment-16724763 ] Noritaka Sekiyama commented on SPARK-21514: --- I'm working on fixing this. Will update once I

[jira] [Created] (SPARK-18432) Fix HDFS block size in programming guide

2016-11-13 Thread Noritaka Sekiyama (JIRA)
Noritaka Sekiyama created SPARK-18432: - Summary: Fix HDFS block size in programming guide Key: SPARK-18432 URL: https://issues.apache.org/jira/browse/SPARK-18432 Project: Spark Issue