[jira] [Commented] (SPARK-24906) Adaptively set split size for columnar file to ensure the task read data size fit expectation

2020-01-01 Thread Jason Guo (Jira)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006571#comment-17006571 ] Jason Guo commented on SPARK-24906: --- [~lio...@taboola.com] Yes, estimating with sampl

[jira] [Commented] (SPARK-29031) Materialized column to accelerate queries

2019-09-15 Thread Jason Guo (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930230#comment-16930230 ] Jason Guo commented on SPARK-29031: --- [~lishuming] `Materialized column` is supported i

[jira] [Updated] (SPARK-29031) Materialized column to accelerate queries

2019-09-10 Thread Jason Guo (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-29031: -- Description: Goals * Add a new SQL grammar of Materialized column * Implicitly rewrite SQL queries o

[jira] [Created] (SPARK-29031) Materialized column to accelerate queries

2019-09-09 Thread Jason Guo (Jira)
Jason Guo created SPARK-29031: - Summary: Materialized column to accelerate queries Key: SPARK-29031 URL: https://issues.apache.org/jira/browse/SPARK-29031 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-27792) SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join

2019-06-01 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Shepherd: (was: Dongjoon Hyun) > SkewJoin--handle only skewed keys with broadcastjoin and other keys

[jira] [Updated] (SPARK-27792) SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join

2019-05-29 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Shepherd: Dongjoon Hyun (was: Liang-Chi Hsieh) > SkewJoin--handle only skewed keys with broadcastjoin

[jira] [Updated] (SPARK-27865) Spark SQL support 1:N sort merge bucket join without shuffle

2019-05-29 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27865: -- Shepherd: Dongjoon Hyun > Spark SQL support 1:N sort merge bucket join without shuffle > -

[jira] [Updated] (SPARK-27865) Spark SQL support 1:N sort merge bucket join without shuffle

2019-05-28 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27865: -- Summary: Spark SQL support 1:N sort merge bucket join without shuffle (was: Spark SQL support 1:N sor

[jira] [Created] (SPARK-27865) Spark SQL support 1:N sort merge bucket join

2019-05-28 Thread Jason Guo (JIRA)
Jason Guo created SPARK-27865: - Summary: Spark SQL support 1:N sort merge bucket join Key: SPARK-27865 URL: https://issues.apache.org/jira/browse/SPARK-27865 Project: Spark Issue Type: New Featur

[jira] [Updated] (SPARK-27792) SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (big_sk

[jira] [Updated] (SPARK-27792) SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Shepherd: Liang-Chi Hsieh > SkewJoin--handle only skewed keys with broadcastjoin and other keys with

[jira] [Updated] (SPARK-27792) SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Summary: SkewJoin--handle only skewed keys with broadcastjoin and other keys with normal join (was: S

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Attachment: sql.png > SkewJoin hint > - > > Key: SPARK-27792 >

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (big_sk

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (big_sk

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (tableA

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Attachment: time.png skew join DAG.png > SkewJoin hint > - > >

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (tableA

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Attachment: SMJ tasks.png > SkewJoin hint > - > > Key: SPARK-27792 >

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Attachment: SMJ DAG.png > SkewJoin hint > - > > Key: SPARK-27792 >

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (tableA

[jira] [Updated] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-27792: -- Description: This feature is designed to handle data skew in Join   *Senario* * A big table (tableA

[jira] [Created] (SPARK-27792) SkewJoin hint

2019-05-21 Thread Jason Guo (JIRA)
Jason Guo created SPARK-27792: - Summary: SkewJoin hint Key: SPARK-27792 URL: https://issues.apache.org/jira/browse/SPARK-27792 Project: Spark Issue Type: New Feature Components: SQL

[jira] [Commented] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571027#comment-16571027 ] Jason Guo commented on SPARK-25038: --- [~hyukjin.kwon] Gotcha I will create a PR for th

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Description: When Spark SQL read large amount of data, it take a long time (more than 10 minutes) to

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: (was: job start original.png) > Accelerate Spark Plan generation when Spark SQL read l

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: (was: issue sql original.png) > Accelerate Spark Plan generation when Spark SQL read l

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: job start original.png job start optimized.png issue sql or

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: issue sql original.png > Accelerate Spark Plan generation when Spark SQL read large amount

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Description: When Spark SQL read large amount of data, it take a long time (more than 10 minutes) to

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: job start original.png > Accelerate Spark Plan generation when Spark SQL read large amount

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: start.png issue.png > Accelerate Spark Plan generation when Spark SQL read

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: (was: start.png) > Accelerate Spark Plan generation when Spark SQL read large amount o

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Attachment: (was: issue.png) > Accelerate Spark Plan generation when Spark SQL read large amount o

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Description: When Spark SQL read large amount of data, it take a long time (more than 10 minutes) to

[jira] [Updated] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-25038: -- Description: When Spark SQL read large amount of data, it take a long time (more than 10 minutes) to

[jira] [Created] (SPARK-25038) Accelerate Spark Plan generation when Spark SQL read large amount of data

2018-08-06 Thread Jason Guo (JIRA)
Jason Guo created SPARK-25038: - Summary: Accelerate Spark Plan generation when Spark SQL read large amount of data Key: SPARK-25038 URL: https://issues.apache.org/jira/browse/SPARK-25038 Project: Spark

[jira] [Updated] (SPARK-24906) Adaptively set split size for columnar file to ensure the task read data size fit expectation

2018-08-06 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-24906: -- Summary: Adaptively set split size for columnar file to ensure the task read data size fit expectation

[jira] [Commented] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-25 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556413#comment-16556413 ] Jason Guo commented on SPARK-24906: --- [~maropu]  [~viirya]  What do you think about thi

[jira] [Comment Edited] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554972#comment-16554972 ] Jason Guo edited comment on SPARK-24906 at 7/25/18 6:09 AM:

[jira] [Comment Edited] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554972#comment-16554972 ] Jason Guo edited comment on SPARK-24906 at 7/25/18 1:03 AM:

[jira] [Commented] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554972#comment-16554972 ] Jason Guo commented on SPARK-24906: --- Thanks [~maropu] and [~viirya] for your comments.

[jira] [Updated] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-24906: -- Description: For columnar file, such as, when spark sql read the table, each split will be 128 MB by

[jira] [Updated] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-24906: -- Attachment: image-2018-07-24-20-30-24-552.png > Enlarge split size for columnar file to ensure the tas

[jira] [Updated] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-24906: -- Attachment: image-2018-07-24-20-29-24-797.png > Enlarge split size for columnar file to ensure the tas

[jira] [Updated] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-24906: -- Attachment: image-2018-07-24-20-28-06-269.png > Enlarge split size for columnar file to ensure the tas

[jira] [Updated] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Guo updated SPARK-24906: -- Attachment: image-2018-07-24-20-26-32-441.png > Enlarge split size for columnar file to ensure the tas

[jira] [Created] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Jason Guo (JIRA)
Jason Guo created SPARK-24906: - Summary: Enlarge split size for columnar file to ensure the task read enough data Key: SPARK-24906 URL: https://issues.apache.org/jira/browse/SPARK-24906 Project: Spark