[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-09-20 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Release Note: 
This change allows the inode and inode directory sections of the fsimage to be 
loaded in parallel. Tests on large images have shown this change to reduce the 
image load time to about 50% of the pre-change run time.

It works by writing sub-section entries to the image index, effectively 
splitting each image section into many sub-sections which can be processed in 
parallel. By default 12 sub-sections per image section are created when the 
image is saved, and 4 threads are used to load the image at startup.

This is disabled by default for any image with more than 1M inodes 
(dfs.image.parallel.inode.threshold) and can be enabled by setting 
dfs.image.parallel.load to true. When the feature is enabled, the next HDFS 
checkpoint will write the image sub-sections and subsequent namenode restarts 
can load the image in parallel.

A image with the parallel sections can be read even if the feature is disabled, 
but HDFS versions without this Jira cannot load an image with parallel 
sections. OIV can process a parallel enabled image without issues.

Key configuration parameters are:

dfs.image.parallel.load=false - enable or disable the feature

dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim 
for 2 to 3 times the number of dfs.image.parallel.threads.

dfs.image.parallel.inode.threshold = 100 - Only save and load in parallel 
if the image has more than this number of inodes.

dfs.image.parallel.threads = 4 - The number of threads used to load the image. 
Testing has shown 4 to be optimal, but this may depends on the environment.

UPGRADE WARN: 
1. It can upgrade smoothly from 2.10 to 3.* if not enable this feature ever.
2. Only path to do upgrade from 2.10 to 3.3 currently when enable fsimage 
parallel loading feature.
3. If someone want to upgrade 2.10 to 3.*(3.1.*/3.2.*) prior release, please 
make sure that save at least one fsimage file after disable this feature. It 
relies on change configuration parameter(dfs.image.parallel.load=false) first 
and restart namenode before upgrade operation.

  was:
This change allows the inode and inode directory sections of the fsimage to be 
loaded in parallel. Tests on large images have shown this change to reduce the 
image load time to about 50% of the pre-change run time.

It works by writing sub-section entries to the image index, effectively 
splitting each image section into many sub-sections which can be processed in 
parallel. By default 12 sub-sections per image section are created when the 
image is saved, and 4 threads are used to load the image at startup.

This is enabled by default for any image with more than 1M inodes 
(dfs.image.parallel.inode.threshold) and but can be disabled by setting 
dfs.image.parallel.load to false. When the feature is enabled, the next HDFS 
checkpoint will write the image sub-sections and subsequent namenode restarts 
can load the image in parallel.

A image with the parallel sections can be read even if the feature is disabled, 
but HDFS versions without this Jira cannot load an image with parallel 
sections. OIV can process a parallel enabled image without issues.

Key configuration parameters are:

dfs.image.parallel.load=true - enable or disable the feature

dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim 
for 2 to 3 times the number of dfs.image.parallel.threads.

dfs.image.parallel.inode.threshold = 100 - Only save and load in parallel 
if the image has more than this number of inodes.

dfs.image.parallel.threads = 4 - The number of threads used to load the image. 
Testing has shown 4 to be optimal, but this may depends on the environment.

UPGRADE WARN: 
1. It can upgrade smoothly from 2.10 to 3.* if not enable this feature ever.
2. Only path to do upgrade from 2.10 to 3.3 currently when enable fsimage 
parallel loading feature.
3. If someone want to upgrade 2.10 to 3.* release prior, please make sure that 
save at least one fsimage file after disable this feature. It rely on change 
configuration parameter first and restart namenode before upgrade  operation.


> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0
>
> Attachments: HDFS-14771.branch-2.001.patch, 
> 

[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-09-20 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Release Note: 
This change allows the inode and inode directory sections of the fsimage to be 
loaded in parallel. Tests on large images have shown this change to reduce the 
image load time to about 50% of the pre-change run time.

It works by writing sub-section entries to the image index, effectively 
splitting each image section into many sub-sections which can be processed in 
parallel. By default 12 sub-sections per image section are created when the 
image is saved, and 4 threads are used to load the image at startup.

This is enabled by default for any image with more than 1M inodes 
(dfs.image.parallel.inode.threshold) and but can be disabled by setting 
dfs.image.parallel.load to false. When the feature is enabled, the next HDFS 
checkpoint will write the image sub-sections and subsequent namenode restarts 
can load the image in parallel.

A image with the parallel sections can be read even if the feature is disabled, 
but HDFS versions without this Jira cannot load an image with parallel 
sections. OIV can process a parallel enabled image without issues.

Key configuration parameters are:

dfs.image.parallel.load=true - enable or disable the feature

dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim 
for 2 to 3 times the number of dfs.image.parallel.threads.

dfs.image.parallel.inode.threshold = 100 - Only save and load in parallel 
if the image has more than this number of inodes.

dfs.image.parallel.threads = 4 - The number of threads used to load the image. 
Testing has shown 4 to be optimal, but this may depends on the environment.

UPGRADE WARN: 
1. It can upgrade smoothly from 2.10 to 3.* if not enable this feature ever.
2. Only path to do upgrade from 2.10 to 3.3 currently when enable fsimage 
parallel loading feature.
3. If someone want to upgrade 2.10 to 3.* release prior, please make sure that 
save at least one fsimage file after disable this feature. It rely on change 
configuration parameter first and restart namenode before upgrade  operation.

Provided release note similar as HDFS-14617 notes with upgrade warning notes. 
[~xkrogen], [~sodonnell],[~jojochuang] please help to take a look. Thanks.

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0
>
> Attachments: HDFS-14771.branch-2.001.patch, 
> HDFS-14771.branch-2.002.patch, HDFS-14771.branch-2.003.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-09-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14771:
---
Fix Version/s: 2.10.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to branch-2. Thanks!

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0
>
> Attachments: HDFS-14771.branch-2.001.patch, 
> HDFS-14771.branch-2.002.patch, HDFS-14771.branch-2.003.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-09-04 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Attachment: HDFS-14771.branch-2.003.patch

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-14771.branch-2.001.patch, 
> HDFS-14771.branch-2.002.patch, HDFS-14771.branch-2.003.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-09-03 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Attachment: HDFS-14771.branch-2.002.patch

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-14771.branch-2.001.patch, 
> HDFS-14771.branch-2.002.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-08-27 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14771:
---
Labels: release-blocker  (was: )

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
>  Labels: release-blocker
> Attachments: HDFS-14771.branch-2.001.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-08-25 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Attachment: HDFS-14771.branch-2.001.patch
Status: Patch Available  (was: Open)

submit demo patch following HDFS-14617 and pending what Jenkins says.

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-14771.branch-2.001.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-08-23 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Description: This JIRA aims to backport HDFS-14617 to branch-2: fsimage 
load time by writing sub-sections to the fsimage index.  (was: This JIRA aims 
to backport HDFS-12914 to branch-2: fsimage load time by writing sub-sections 
to the fsimage index.)

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Priority: Major
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14771) Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing sub-sections to the fsimage index)

2019-08-23 Thread He Xiaoqiao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao updated HDFS-14771:
---
Summary: Backport HDFS-14617 to branch-2 (Improve fsimage load time by 
writing sub-sections to the fsimage index)  (was: Backport HDFS-12914 to 
branch-2 (Improve fsimage load time by writing sub-sections to the fsimage 
index))

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> 
>
> Key: HDFS-14771
> URL: https://issues.apache.org/jira/browse/HDFS-14771
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: He Xiaoqiao
>Priority: Major
>
> This JIRA aims to backport HDFS-12914 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org