[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-12-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels: query-eng sev:high user-support-issues  (was: sev:high 
user-support-issues)

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: query-eng, sev:high, user-support-issues
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-02-06 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels: sev:high user-support-issues  (was: user-support-issues)

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels:   (was: user-support-issues)

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels: user-support-issues  (was: )

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: user-support-issues
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-26 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels: user-support-issues  (was: )

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: user-support-issues
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-829:
-
Summary: Efficiently reading hudi tables through spark-shell  (was: Reading 
Hudi tables through spark-shell is slow (even with 
spark.sql.hive.convertMetastoreParquet) )

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables. Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-829:
-
Description: 
[~uditme] Created this ticket to track some discussion on read/query path of 
spark with Hudi tables. 

My understanding is that when you read Hudi tables through spark-shell, some of 
your queries are slower due to some sequential activity performed by spark when 
interacting with Hudi tables (even with spark.sql.hive.convertMetastoreParquet 
which can give you the same data reading speed and all the vectorization 
benefits). Is this slowness observed during spark query planning ? Can you 
please elaborate on this ? 

  was:
[~uditme] Created this ticket to track some discussion on read/query path of 
spark with Hudi tables. 

My understanding is that when you read Hudi tables through spark-shell, some of 
your queries are slower due to some sequential activity performed by spark when 
interacting with Hudi tables. Can you please elaborate on this ? 


> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)