[jira] [Commented] (HDFS-15392) DistrbutedFileSystem#concat api can create large number of small blocks

2020-07-09 Thread jianghua zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154422#comment-17154422
 ] 

jianghua zhu commented on HDFS-15392:
-

[~weichiu] , I very much agree with your suggestion.

 

> DistrbutedFileSystem#concat api can create large number of small blocks
> ---
>
> Key: HDFS-15392
> URL: https://issues.apache.org/jira/browse/HDFS-15392
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Priority: Major
>
> DistrbutedFileSystem#concat moves blocks from source to target. If the api is 
> repeatedly used on small files it can create large number of small blocks in 
> the target file. The Jira aims to optimize the api to avoid the issue of 
> small blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15392) DistrbutedFileSystem#concat api can create large number of small blocks

2020-06-08 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128732#comment-17128732
 ] 

Wei-Chiu Chuang commented on HDFS-15392:


A file with lots of blocks is known to slow down. Hadoop 2.x allows up to a 
million blocks per file. It is because of this reason we reduced the default 
max limit to 100k.

{quote}

dfs.namenode.fs-limits.max-blocks-per-file
1
Maximum number of blocks per file, enforced by the Namenode on
write. This prevents the creation of extremely large files which can
degrade performance.

{quote}

Some suggestions for supportability:

maybe this can be added to fsck. If the number of blocks of a file exceeds a 
certain number, emit a warning. This is pretty easy to do.

Also note that, the append() API has a flag NEW_BLOCK where it forces to start 
a new block even if the current block is not full. You could end up with lots 
of small blocks with this flag too.

It should also be pretty easy to log a warning in NN log when concate()/ 
append() finds the file has more than a certain number of blocks.

A more thoughtful improvement could add a metrics to maintain the the count of 
such badly behaved files.

> DistrbutedFileSystem#concat api can create large number of small blocks
> ---
>
> Key: HDFS-15392
> URL: https://issues.apache.org/jira/browse/HDFS-15392
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lokesh Jain
>Priority: Major
>
> DistrbutedFileSystem#concat moves blocks from source to target. If the api is 
> repeatedly used on small files it can create large number of small blocks in 
> the target file. The Jira aims to optimize the api to avoid the issue of 
> small blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org