[jira] [Updated] (SPARK-37722) Escape dots in partition names

2021-12-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37722: - Description: Some file systems (for example, ABFS) do not support file names/paths ending with

[jira] [Updated] (SPARK-37722) Escape dots in partition names

2021-12-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37722: - Description: Some file systems (for example, ABFS) do not support file names/paths ending with

[jira] [Updated] (SPARK-37722) Escape dots in partition names

2021-12-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37722: - Description: Some file systems (for example, ABFS) do not support file names/paths ending with

[jira] [Created] (SPARK-37722) Escape dots in partition names

2021-12-22 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-37722: Summary: Escape dots in partition names Key: SPARK-37722 URL: https://issues.apache.org/jira/browse/SPARK-37722 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-37722) Escape dot character in partition names

2021-12-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37722: - Summary: Escape dot character in partition names (was: Escape dots in partition names) >

[jira] [Updated] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException

2021-12-28 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37771: - Issue Type: Bug (was: Improvement) > Race condition in withHiveState and limited logic in

[jira] [Created] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException

2021-12-28 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-37771: Summary: Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException Key: SPARK-37771 URL:

[jira] [Updated] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException

2021-12-28 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37771: - Description: There is a race condition between creating a Hive client and loading classes that

[jira] [Updated] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException

2021-12-28 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-37771: - Description: There is a race condition between creating a Hive client and loading classes that

[jira] [Created] (SPARK-37385) Add tests for TimestampNTZ and TimestampLTZ for Parquet data source

2021-11-18 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-37385: Summary: Add tests for TimestampNTZ and TimestampLTZ for Parquet data source Key: SPARK-37385 URL: https://issues.apache.org/jira/browse/SPARK-37385 Project: Spark

[jira] [Created] (SPARK-37360) Support TimestampNTZ in JSON data source

2021-11-17 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-37360: Summary: Support TimestampNTZ in JSON data source Key: SPARK-37360 URL: https://issues.apache.org/jira/browse/SPARK-37360 Project: Spark Issue Type:

[jira] [Created] (SPARK-37326) Support TimestampNTZ in CSV data source

2021-11-14 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-37326: Summary: Support TimestampNTZ in CSV data source Key: SPARK-37326 URL: https://issues.apache.org/jira/browse/SPARK-37326 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException

2022-02-02 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17486107#comment-17486107 ] Ivan Sadikov commented on SPARK-37771: -- I could not manage to work around the issue with Hadoop

[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-10 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520296#comment-17520296 ] Ivan Sadikov commented on SPARK-38829: -- I opened [https://github.com/apache/spark/pull/36137] to

[jira] [Commented] (SPARK-38829) New configuration for controlling timestamp inference of Parquet

2022-04-11 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520870#comment-17520870 ] Ivan Sadikov commented on SPARK-38829: -- [~Gengliang.Wang] Do you still want to merge the PR for 3.3

[jira] [Created] (SPARK-45139) Add DatabricksDialect to handle SQL type conversion

2023-09-12 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-45139: Summary: Add DatabricksDialect to handle SQL type conversion Key: SPARK-45139 URL: https://issues.apache.org/jira/browse/SPARK-45139 Project: Spark Issue

[jira] [Updated] (SPARK-45139) Add DatabricksDialect to handle SQL type conversion

2023-09-12 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-45139: - Description: Databricks SQL dialect is needed to refine type conversion when connecting to a

[jira] [Created] (SPARK-45194) Parquet reads fail with "RuntimeException: Unable to create Parquet converter for data type "timestamp_ntz" due to incorrect schema inference

2023-09-17 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-45194: Summary: Parquet reads fail with "RuntimeException: Unable to create Parquet converter for data type "timestamp_ntz" due to incorrect schema inference Key: SPARK-45194 URL:

[jira] [Commented] (SPARK-45194) Parquet reads fail with "RuntimeException: Unable to create Parquet converter for data type "timestamp_ntz" due to incorrect schema inference

2023-09-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766198#comment-17766198 ] Ivan Sadikov commented on SPARK-45194: -- cc [~gengliang] [~cloud_fan] > Parquet reads fail with

[jira] [Updated] (SPARK-45194) Parquet reads fail with "RuntimeException: Unable to create Parquet converter for data type "timestamp_ntz" due to incorrect schema inference

2023-09-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-45194: - Description: I found that Parquet reads could fail due to incorrect schema inference with two

[jira] [Updated] (SPARK-45194) Parquet reads fail with "RuntimeException: Unable to create Parquet converter for data type "timestamp_ntz" due to incorrect schema inference

2023-09-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-45194: - Description: I found that Parquet reads could fail due to incorrect schema inference with two

[jira] [Created] (SPARK-44940) Improve performance of JSON parsing when partial results are enabled

2023-08-24 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-44940: Summary: Improve performance of JSON parsing when partial results are enabled Key: SPARK-44940 URL: https://issues.apache.org/jira/browse/SPARK-44940 Project: Spark

[jira] [Updated] (SPARK-44940) Improve performance of JSON parsing when partial results are enabled

2023-08-24 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-44940: - Description: Follow-up on https://issues.apache.org/jira/browse/SPARK-40646. I found that JSON

[jira] [Commented] (SPARK-44940) Improve performance of JSON parsing when partial results are enabled

2023-08-24 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758394#comment-17758394 ] Ivan Sadikov commented on SPARK-44940: -- I have prototyped the fix and will open a PR shortly. >

[jira] [Updated] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-24 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-44940: - Summary: Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is

[jira] [Commented] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-08-24 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758792#comment-17758792 ] Ivan Sadikov commented on SPARK-44940: -- Opened https://github.com/apache/spark/pull/42667. >

[jira] [Created] (SPARK-39339) Support TimestampNTZ in JDBC data source

2022-05-30 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-39339: Summary: Support TimestampNTZ in JDBC data source Key: SPARK-39339 URL: https://issues.apache.org/jira/browse/SPARK-39339 Project: Spark Issue Type:

[jira] [Created] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV

2022-07-10 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-39731: Summary: Correctness issue when parsing dates with MMdd format in CSV Key: SPARK-39731 URL: https://issues.apache.org/jira/browse/SPARK-39731 Project: Spark

[jira] [Updated] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV

2022-07-10 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39731: - Description: In Spark 3.x, when reading CSV data like this: {code:java} name,mydate 1,2020011

[jira] [Updated] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV

2022-07-10 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39731: - Description: In Spark 3.x, when reading CSV data like this: {code:java} name,mydate 1,2020011

[jira] [Updated] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV

2022-07-10 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39731: - Description: In Spark 3.x, when reading CSV data like this: {code:java} name,mydate 1,2020011

[jira] [Commented] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-07-27 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571768#comment-17571768 ] Ivan Sadikov commented on SPARK-39833: -- Interesting, I will take a look. > Filtered parquet data

[jira] [Updated] (SPARK-39904) Rename inferDate to preferDate and fix an issue when inferring schema

2022-07-27 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39904: - Description: Follow-up for https://issues.apache.org/jira/browse/SPARK-39469. > Rename

[jira] [Created] (SPARK-39904) Rename inferDate to preferDate and fix an issue when inferring schema

2022-07-27 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-39904: Summary: Rename inferDate to preferDate and fix an issue when inferring schema Key: SPARK-39904 URL: https://issues.apache.org/jira/browse/SPARK-39904 Project: Spark

[jira] [Updated] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV and JSON

2022-07-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39731: - Description: In Spark 3.x, when reading CSV data like this: {code:java} name,mydate 1,2020011

[jira] [Updated] (SPARK-39731) Correctness issue when parsing dates with yyyyMMdd format in CSV and JSON

2022-07-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39731: - Summary: Correctness issue when parsing dates with MMdd format in CSV and JSON (was:

[jira] [Created] (SPARK-39802) Support Avro recursive schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-39802: Summary: Support Avro recursive schemas in Spark Key: SPARK-39802 URL: https://issues.apache.org/jira/browse/SPARK-39802 Project: Spark Issue Type:

[jira] [Commented] (SPARK-39802) Support Avro recursive schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567759#comment-17567759 ] Ivan Sadikov commented on SPARK-39802: -- [~Gengliang.Wang] Would you be able to comment on this

[jira] [Updated] (SPARK-39802) Support recursive Avro schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Summary: Support recursive Avro schemas in Spark (was: Support Avro recursive schemas in

[jira] [Updated] (SPARK-39802) Support Avro recursive schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-25718.  It

[jira] [Updated] (SPARK-39802) Support Avro recursive schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-25718.  It

[jira] [Updated] (SPARK-39802) Support Avro recursive schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-25718.  It

[jira] [Updated] (SPARK-39802) Support Avro recursive schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-25718.  It

[jira] [Updated] (SPARK-39802) Support recursive references in Avro schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-25718.  It

[jira] [Updated] (SPARK-39802) Support recursive references in Avro schemas in Spark

2022-07-17 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39802: - Summary: Support recursive references in Avro schemas in Spark (was: Support recursive Avro

[jira] [Commented] (SPARK-39084) df.rdd.isEmpty() results in unexpected executor failure and JVM crash

2022-05-01 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530600#comment-17530600 ] Ivan Sadikov commented on SPARK-39084: -- I am going to open a PR to fix this shortly. >

[jira] [Updated] (SPARK-39084) df.rdd.isEmpty() results in unexpected executor failure and JVM crash

2022-05-01 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39084: - Description: It was discovered that a particular data distribution in a DataFrame with groupBy

[jira] [Created] (SPARK-39084) df.rdd.isEmpty() results in unexpected executor failure and JVM crash

2022-05-01 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-39084: Summary: df.rdd.isEmpty() results in unexpected executor failure and JVM crash Key: SPARK-39084 URL: https://issues.apache.org/jira/browse/SPARK-39084 Project: Spark

[jira] [Created] (SPARK-39086) Support UDT in Parquet OSS vectorised reader

2022-05-02 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-39086: Summary: Support UDT in Parquet OSS vectorised reader Key: SPARK-39086 URL: https://issues.apache.org/jira/browse/SPARK-39086 Project: Spark Issue Type:

[jira] [Updated] (SPARK-39086) Support UDT in Spark Parquet vectorized reader

2022-05-02 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39086: - Summary: Support UDT in Spark Parquet vectorized reader (was: Support UDT in Parquet OSS

[jira] [Created] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-24 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40215: Summary: Add SQL configs to control CSV/JSON date and timestamp parsing behaviour Key: SPARK-40215 URL: https://issues.apache.org/jira/browse/SPARK-40215 Project:

[jira] [Commented] (SPARK-40215) Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-24 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584612#comment-17584612 ] Ivan Sadikov commented on SPARK-40215: -- Follow-up. > Add SQL configs to control CSV/JSON date and

[jira] [Commented] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-08-21 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582664#comment-17582664 ] Ivan Sadikov commented on SPARK-40169: -- I would like to work on it as it was my responsibility to

[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-08-21 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40169: - Description: This is a follow for SPARK-39833. In

[jira] [Created] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1

2022-08-21 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40169: Summary: Fix the issue with Parquet column index and predicate pushdown in Data source V1 Key: SPARK-40169 URL: https://issues.apache.org/jira/browse/SPARK-40169

[jira] [Commented] (SPARK-40292) arrays_zip output unexpected alias column names

2022-09-04 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600198#comment-17600198 ] Ivan Sadikov commented on SPARK-40292: -- I will take a look. > arrays_zip output unexpected alias

[jira] [Created] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits

2022-10-16 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40815: Summary: SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits Key: SPARK-40815 URL: https://issues.apache.org/jira/browse/SPARK-40815

[jira] [Created] (SPARK-40496) Configs to control "enableDateTimeParsingFallback" are incorrectly swapped

2022-09-20 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40496: Summary: Configs to control "enableDateTimeParsingFallback" are incorrectly swapped Key: SPARK-40496 URL: https://issues.apache.org/jira/browse/SPARK-40496 Project:

[jira] [Updated] (SPARK-40527) Keep struct field names or map keys in CreateNamedStruct

2022-09-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40527: - Summary: Keep struct field names or map keys in CreateNamedStruct (was: Keep struct field

[jira] [Updated] (SPARK-40527) Keep struct field names or map keys for UnresolvedExtractValue in CreateNamedStruct

2022-09-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40527: - Summary: Keep struct field names or map keys for UnresolvedExtractValue in CreateNamedStruct

[jira] [Updated] (SPARK-40527) Keep struct/map field names for UnresolvedExtractValue in CreateNamedStruct

2022-09-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40527: - Summary: Keep struct/map field names for UnresolvedExtractValue in CreateNamedStruct (was:

[jira] [Updated] (SPARK-40527) Generate field names for UnresolvedExtractValue in CreateNamedStruct

2022-09-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40527: - Description: Using index-like notation when extracting columns in a struct produces generated

[jira] [Updated] (SPARK-40527) Generate field names for UnresolvedExtractValue in CreateNamedStruct

2022-09-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40527: - Summary: Generate field names for UnresolvedExtractValue in CreateNamedStruct (was: Generate

[jira] [Created] (SPARK-40527) Generate names for UnresolvedExtractValue in CreateNamedStruct

2022-09-22 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40527: Summary: Generate names for UnresolvedExtractValue in CreateNamedStruct Key: SPARK-40527 URL: https://issues.apache.org/jira/browse/SPARK-40527 Project: Spark

[jira] [Updated] (SPARK-40527) Keep struct field names or map keys in CreateStruct

2022-09-22 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40527: - Summary: Keep struct field names or map keys in CreateStruct (was: Keep struct field names or

[jira] [Updated] (SPARK-40470) arrays_zip output unexpected alias column names when using Map

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40470: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-40292.  I

[jira] [Updated] (SPARK-40470) arrays_zip output unexpected alias column names when using Map

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40470: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-40292.  I

[jira] [Created] (SPARK-40470) arrays_zip output unexpected alias column names when using Map

2022-09-15 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40470: Summary: arrays_zip output unexpected alias column names when using Map Key: SPARK-40470 URL: https://issues.apache.org/jira/browse/SPARK-40470 Project: Spark

[jira] [Updated] (SPARK-40470) arrays_zip output unexpected alias column names when using Map

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40470: - Description: This is a follow-up for https://issues.apache.org/jira/browse/SPARK-40292.  I

[jira] [Updated] (SPARK-40470) arrays_zip output unexpected alias column names when using GetMapValue and GetArrayStructFields

2022-09-16 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40470: - Summary: arrays_zip output unexpected alias column names when using GetMapValue and

[jira] [Updated] (SPARK-40468) Column pruning is not handled correctly in CSV when _corrupt_record is used

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40468: - Description: I have found that depending on the name of the corrupt record in CSV, the field

[jira] [Created] (SPARK-40468) Column pruning is not handled correctly in CSV when _corrupt_record is used

2022-09-15 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40468: Summary: Column pruning is not handled correctly in CSV when _corrupt_record is used Key: SPARK-40468 URL: https://issues.apache.org/jira/browse/SPARK-40468 Project:

[jira] [Updated] (SPARK-40468) Column pruning is not handled correctly in CSV when _corrupt_record is used

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40468: - Description: I have found that depending on the name of the corrupt record in CSV, the field

[jira] [Updated] (SPARK-40468) Column pruning is not handled correctly in CSV when _corrupt_record is used

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40468: - Description: I have found that depending on the name of the corrupt record in CSV, the field

[jira] [Updated] (SPARK-40468) Column pruning is not handled correctly in CSV when _corrupt_record is used

2022-09-15 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40468: - Description: I have found that depending on the name of the corrupt record in CSV, the field

[jira] [Updated] (SPARK-40646) Fix returning partial results in JSON data source and JSON functions

2022-10-03 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40646: - Description: I recently found an issue when parsing the following JSON file: {code:java} {"a":

[jira] [Updated] (SPARK-40646) Fix returning partial results in JSON data source and JSON functions

2022-10-03 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40646: - Description: I recently found an issue when parsing the following JSON file: {code:java} {"a":

[jira] [Created] (SPARK-40646) Fix returning partial results in JSON data source and JSON functions

2022-10-03 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40646: Summary: Fix returning partial results in JSON data source and JSON functions Key: SPARK-40646 URL: https://issues.apache.org/jira/browse/SPARK-40646 Project: Spark

[jira] [Commented] (SPARK-40584) Incorrect Count when reading CSV file

2022-10-05 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613304#comment-17613304 ] Ivan Sadikov commented on SPARK-40584: -- Disabling "multiLine" also fixes the issue. Seems to be an

[jira] [Commented] (SPARK-40541) NullPointerException with UTF8String.getBaseObject() when UDF

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617914#comment-17617914 ] Ivan Sadikov commented on SPARK-40541: -- I was asking about the actual problem. It is not clear what

[jira] [Comment Edited] (SPARK-40541) NullPointerException with UTF8String.getBaseObject() when UDF

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617914#comment-17617914 ] Ivan Sadikov edited comment on SPARK-40541 at 10/14/22 6:42 PM: I was

[jira] [Comment Edited] (SPARK-40541) NullPointerException with UTF8String.getBaseObject() when UDF

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617914#comment-17617914 ] Ivan Sadikov edited comment on SPARK-40541 at 10/14/22 6:43 PM: I was

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the erroWrong column backticks in UNRESOLVED_COLUMN error

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39783: - Summary: Column backticks are misplaced in the erroWrong column backticks in UNRESOLVED_COLUMN

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39783: - Summary: Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39783: - Description: AnalysisException [UNRESOLVED_COLUMN]   The following code references a nested

[jira] [Updated] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-39783: - Description: AnalysisException [UNRESOLVED_COLUMN] shows the wrong suggestion when a field

[jira] [Commented] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617509#comment-17617509 ] Ivan Sadikov commented on SPARK-39783: -- It is not clear from the ticket, you should update the

[jira] [Comment Edited] (SPARK-39783) Column backticks are misplaced in the AnalysisException [UNRESOLVED_COLUMN] error message when using field with "."

2022-10-14 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617509#comment-17617509 ] Ivan Sadikov edited comment on SPARK-39783 at 10/14/22 7:14 AM: It is

[jira] [Commented] (SPARK-40637) DataFrame can correctly encode BINARY type but SparkSQL cannot

2022-10-13 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617435#comment-17617435 ] Ivan Sadikov commented on SPARK-40637: -- You are not writing to the table in the first example but

[jira] [Commented] (SPARK-40541) NullPointerException with UTF8String.getBaseObject() when UDF

2022-10-13 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617439#comment-17617439 ] Ivan Sadikov commented on SPARK-40541: -- What is the question here? Does marking column as nullable

[jira] [Commented] (SPARK-40430) Spark session does not update number of files for partition

2022-10-13 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617441#comment-17617441 ] Ivan Sadikov commented on SPARK-40430: -- Can you try FSCK REPAIR TABLE command on your table if you

[jira] [Commented] (SPARK-39783) Wrong column backticks in UNRESOLVED_COLUMN error

2022-10-13 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617460#comment-17617460 ] Ivan Sadikov commented on SPARK-39783: -- This is by design if I am not mistaken. Such columns need

[jira] [Commented] (SPARK-39257) use spark.read.jdbc() to read data from SQL databse into dataframe, it fails silently, when the session is killed from SQL server side

2022-10-13 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17617463#comment-17617463 ] Ivan Sadikov commented on SPARK-39257: -- I have had a similar issue before, you would need to do

[jira] (SPARK-39257) use spark.read.jdbc() to read data from SQL databse into dataframe, it fails silently, when the session is killed from SQL server side

2022-10-13 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39257 ] Ivan Sadikov deleted comment on SPARK-39257: -- was (Author: ivan.sadikov): I have had a similar issue before, you would need to do packet capture to figure out what the underlying issue is.

[jira] [Commented] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-04 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575510#comment-17575510 ] Ivan Sadikov commented on SPARK-39833: -- It appears to be a bug in Parquet-Mr.  There is a

[jira] [Comment Edited] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-04 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575032#comment-17575032 ] Ivan Sadikov edited comment on SPARK-39833 at 8/5/22 1:48 AM: -- This is

[jira] [Commented] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-04 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575570#comment-17575570 ] Ivan Sadikov commented on SPARK-39833: -- I opened a PR to quickly fix it:

[jira] [Comment Edited] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-04 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575032#comment-17575032 ] Ivan Sadikov edited comment on SPARK-39833 at 8/5/22 5:07 AM: -- Your example

[jira] [Created] (SPARK-40052) Handle direct byte buffers in VectorizedDeltaBinaryPackedReader

2022-08-11 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-40052: Summary: Handle direct byte buffers in VectorizedDeltaBinaryPackedReader Key: SPARK-40052 URL: https://issues.apache.org/jira/browse/SPARK-40052 Project: Spark

[jira] [Comment Edited] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

2022-08-03 Thread Ivan Sadikov (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575032#comment-17575032 ] Ivan Sadikov edited comment on SPARK-39833 at 8/4/22 5:51 AM: -- This is

  1   2   >