[jira] [Commented] (SPARK-44116) Utilize Hadoop vectorized APIs

2023-07-31 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749297#comment-17749297
 ] 

Dongjoon Hyun commented on SPARK-44116:
---

Thank you, [~ste...@apache.org].

> Utilize Hadoop vectorized APIs
> --
>
> Key: SPARK-44116
> URL: https://issues.apache.org/jira/browse/SPARK-44116
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Hadoop 3.3.5+ supports vectorized APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44116) Utilize Hadoop vectorized APIs

2023-07-31 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749266#comment-17749266
 ] 

Steve Loughran commented on SPARK-44116:


If this gets into the libraries, you don't need explicit support in spark 
unless you really want to do your own.

what could be good is replacing FileSystem.open() with the openFile() builder, 
passing in your read policy and any file status/file length you have. saves 
HEAD requests and tunes GET/prefetching based on expected use.

> Utilize Hadoop vectorized APIs
> --
>
> Key: SPARK-44116
> URL: https://issues.apache.org/jira/browse/SPARK-44116
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Hadoop 3.3.5+ supports vectorized APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org