[jira] [Commented] (SPARK-46981) Driver OOM happens in query planning phase with empty tables

2024-04-14 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-46981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837039#comment-17837039 ] Jarred Li commented on SPARK-46981: --- I used default driver memory setting(1GB), OOM was thrown out. It

[jira] [Updated] (SPARK-42069) Data duplicate or data lost with non-deterministic function

2023-01-14 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-42069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarred Li updated SPARK-42069: -- Description: When write table with shuffle data and non-deterministic function, data may be

[jira] [Created] (SPARK-42069) Data duplicate or data lost with non-deterministic function

2023-01-14 Thread Jarred Li (Jira)
Jarred Li created SPARK-42069: - Summary: Data duplicate or data lost with non-deterministic function Key: SPARK-42069 URL: https://issues.apache.org/jira/browse/SPARK-42069 Project: Spark Issue

[jira] [Commented] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-16 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178445#comment-17178445 ] Jarred Li commented on SPARK-32582: --- ??I am not sure it would be helpful since there is no API in

[jira] [Comment Edited] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-11 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175428#comment-17175428 ] Jarred Li edited comment on SPARK-32582 at 8/11/20, 10:07 AM: -- I think this

[jira] [Commented] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-11 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175428#comment-17175428 ] Jarred Li commented on SPARK-32582: --- I think this is one limitation of ORC file infer schema.

[jira] [Comment Edited] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-11 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175165#comment-17175165 ] Jarred Li edited comment on SPARK-32582 at 8/11/20, 7:06 AM: - The

[jira] [Comment Edited] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-11 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175165#comment-17175165 ] Jarred Li edited comment on SPARK-32582 at 8/11/20, 7:05 AM: - The

[jira] [Comment Edited] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-11 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175165#comment-17175165 ] Jarred Li edited comment on SPARK-32582 at 8/11/20, 6:59 AM: - The

[jira] [Updated] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-11 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarred Li updated SPARK-32582: -- Description: When infer schema is enabled, it tries to list all the files in the table, however only

[jira] [Commented] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-10 Thread Jarred Li (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175165#comment-17175165 ] Jarred Li commented on SPARK-32582: --- The performance I mentioned here is not the read file, but "LIST"

[jira] [Created] (SPARK-32582) Spark SQL Infer Schema Performance

2020-08-10 Thread Jarred Li (Jira)
Jarred Li created SPARK-32582: - Summary: Spark SQL Infer Schema Performance Key: SPARK-32582 URL: https://issues.apache.org/jira/browse/SPARK-32582 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-4164) spark.kryo.registrator shall use comma separated value to support multiple registrator

2014-10-31 Thread Jarred Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarred Li updated SPARK-4164: - Remaining Estimate: 2h Original Estimate: 2h spark.kryo.registrator shall use comma separated value

[jira] [Commented] (SPARK-4164) spark.kryo.registrator shall use comma separated value to support multiple registrator

2014-10-31 Thread Jarred Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191475#comment-14191475 ] Jarred Li commented on SPARK-4164: -- I can work on this issue. Could somebody assign this

[jira] [Resolved] (SPARK-3980) GraphX Performance Issue

2014-10-19 Thread Jarred Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarred Li resolved SPARK-3980. -- Resolution: Not a Problem Resolved the issue by running the job with big cluster. GraphX Performance

[jira] [Created] (SPARK-3980) GraphX Performance Issue

2014-10-16 Thread Jarred Li (JIRA)
Jarred Li created SPARK-3980: Summary: GraphX Performance Issue Key: SPARK-3980 URL: https://issues.apache.org/jira/browse/SPARK-3980 Project: Spark Issue Type: Bug Components: GraphX

[jira] [Updated] (SPARK-3980) GraphX Performance Issue

2014-10-16 Thread Jarred Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarred Li updated SPARK-3980: - Description: I run 4 workes in AWS (c3.xlarge), 4g memory for executor, 85,331,846 edges

[jira] [Updated] (SPARK-3980) GraphX Performance Issue

2014-10-16 Thread Jarred Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarred Li updated SPARK-3980: - Description: I run 4 workes in AWS (c3.xlarge), 4g memory for executor, 85,331,846 edges