[jira] [Work started] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

2019-05-16 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8490 started by Alex Rodoni.
---
> Impala Doc: the file handle cache now supports S3
> -
>
> Key: IMPALA-8490
> URL: https://issues.apache.org/jira/browse/IMPALA-8490
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Sahil Takiar
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html 
> state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to 
> non-HDFS tables, such as Kudu or HBase tables, or tables that store their 
> data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 
> files.
> We should add a section to the docs similar to what we added when support for 
> remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS 
> file handles. This is controlled by the cache_remote_file_handles flag for an 
> impalad. It is recommended that you use the default value of true as this 
> caching prevents your NameNode from overloading when your cluster has many 
> remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has 
> been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a 
> call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode 
> from overloading when your cluster has many remote HDFS reads" should be 
> changed to something like "avoids an unnecessary call to 
> S3AFileSystem#getFileStatus() which reduces the number of API calls made to 
> S3."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8490) Impala Doc: the file handle cache now supports S3

2019-05-03 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8490 started by Alex Rodoni.
---
> Impala Doc: the file handle cache now supports S3
> -
>
> Key: IMPALA-8490
> URL: https://issues.apache.org/jira/browse/IMPALA-8490
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Sahil Takiar
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> https://impala.apache.org/docs/build/html/topics/impala_scalability.html 
> state:
> {quote}
> Because this feature only involves HDFS data files, it does not apply to 
> non-HDFS tables, such as Kudu or HBase tables, or tables that store their 
> data on cloud services such as S3 or ADLS.
> {quote}
> This section should be updated because the file handle cache now supports S3 
> files.
> We should add a section to the docs similar to what we added when support for 
> remote HDFS files was added to the file handle cache:
> {quote}
> In Impala 3.2 and higher, file handle caching also applies to remote HDFS 
> file handles. This is controlled by the cache_remote_file_handles flag for an 
> impalad. It is recommended that you use the default value of true as this 
> caching prevents your NameNode from overloading when your cluster has many 
> remote HDFS reads.
> {quote}
> Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has 
> been added as an impalad startup option (the flag is enabled by default).
> Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a 
> call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode 
> from overloading when your cluster has many remote HDFS reads" should be 
> changed to something like "avoids an unnecessary call to 
> S3AFileSystem#getFileStatus() which reduces the number of API calls made to 
> S3."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org