[jira] [Commented] (HDFS-7784) load fsimage in parallel

2019-07-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889199#comment-16889199
 ] 

Hadoop QA commented on HDFS-7784:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-7784 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-7784 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27267/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2017-01-03 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796824#comment-15796824
 ] 

Gang Xie commented on HDFS-7784:


About the GC activities, the following is the gstat output. And it do caused 
some long-time GC. But comparing it to the one used in full block report, it 
looks OK. 


jstat -gcutil 10885 5000 1000
  S0 S1 E  O  P YGC YGCTFGCFGCT GCT   
  0.00 100.00  67.94  89.32  69.63313  188.870 33.130  192.000
  0.00 100.00  67.94  89.32  69.63313  188.870 33.130  192.000
  0.00 100.00  67.95  89.32  69.63313  188.870 33.130  192.000
  0.00 100.00  81.32  89.32  70.61313  188.870 33.130  192.000
100.00   0.00  19.44  89.68  70.62314  192.495 33.130  195.626
  0.00  64.43  60.41  90.04  70.62315  192.938 33.130  196.068
 56.75   7.26 100.00  90.27  70.62317  193.167 33.130  196.297
  2.27   0.00  43.16  90.38  70.62318  193.653 33.130  196.783
  0.00   0.68  91.15  90.38  70.62319  193.729 33.130  196.859
  0.00   0.05  38.53  90.38  70.62321  193.875 33.130  197.005
  0.01   0.00  82.04  90.38  70.62322  193.951 33.130  197.081
  0.00   0.00  19.95  90.38  70.62324  194.084 33.130  197.214
  0.00   0.00   0.00  90.38  70.62326  194.235 43.130  197.365
  0.00   0.00  98.27  90.33  70.62326  194.235 45.240  199.475
  0.00   0.00  40.11  90.27  70.62328  194.372 45.240  199.612
  0.00   0.00  90.25  90.20  70.62329  194.449 45.240  199.689
  0.00   0.00  30.08  90.13  70.62331  194.605 45.240  199.845
  0.00   0.00  74.21  90.05  70.62332  194.676 45.240  199.916
  0.00   0.00  14.04  89.95  70.62334  194.819 45.240  200.059
  0.00   0.00  62.17  89.85  70.62335  194.894 45.240  200.134
  0.00   0.00   4.01  89.79  70.62337  195.042 45.240  200.282
  0.00   0.00  48.13  89.74  60.00338  195.116 45.240  200.356
  0.00   0.00  80.22  89.74  60.00339  195.192 55.241  200.433
  0.00   0.00   4.01  89.74  60.00341  195.349 55.241  200.590
  0.00   0.00  24.07  89.74  60.00342  195.423 55.241  200.664
  0.00   0.00  50.14  89.74  60.00343  195.498 55.241  200.739
  0.00   0.00  96.27  89.74  60.00344  195.571 55.241  200.813
  0.00   0.00  38.11  89.74  60.00346  195.708 55.241  200.949
  0.00   0.00  86.24  89.74  60.00347  195.785 55.241  201.026


Total time for which application threads were stopped: 1.6167710 seconds
Total time for which application threads were stopped: 9.6578530 seconds
Total time for which application threads were stopped: 1.0820690 seconds
Total time for which application threads were stopped: 1.1189530 seconds
Total time for which application threads were stopped: 1.2096840 seconds
Total time for which application threads were stopped: 8.6128080 seconds
Total time for which application threads were stopped: 7.5763860 seconds
Total time for which application threads were stopped: 2.1393520 seconds
Total time for which application threads were stopped: 1.9607400 seconds
Total time for which application threads were stopped: 3.0785030 seconds
Total time for which application threads were stopped: 2.7774960 seconds
Total time for which application threads were stopped: 4.5180250 seconds
Total time for which application threads were stopped: 1.9637590 seconds
Total time for which application threads were stopped: 1.8422970 seconds
Total time for which application threads were stopped: 1.9868880 seconds
Total time for which application threads were stopped: 2.2927440 seconds
Total time for which application threads were stopped: 2.7141160 seconds
Total time for which application threads were stopped: 2.9030460 seconds
Total time for which application threads were stopped: 5.2282350 seconds
Total time for which application threads were stopped: 3.6261510 seconds
Total time for which application threads were stopped: 2.1100760 seconds


> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So 

[jira] [Commented] (HDFS-7784) load fsimage in parallel

2017-01-03 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796809#comment-15796809
 ] 

Gang Xie commented on HDFS-7784:


The hardware info:
CPU:
Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz with 24 Cores

Mem:
cat /proc/meminfo
MemTotal:   131749888 kB
MemFree: 9390596 kB
Buffers:  171080 kB
Cached: 23657816 kB
SwapCached:0 kB
Active: 119711620 kB
Inactive: 381236 kB
Active(anon):   96186924 kB
Inactive(anon):81452 kB
Active(file):   23524696 kB
Inactive(file):   299784 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal: 0 kB
SwapFree:  0 kB
Dirty:   108 kB
Writeback: 0 kB
AnonPages:  96264056 kB
Mapped:26604 kB
Shmem:  4412 kB
Slab: 728272 kB
SReclaimable: 673344 kB
SUnreclaim:54928 kB
KernelStack:5392 kB
PageTables:   192256 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:65874944 kB
Committed_AS:   107921484 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  488704 kB
VmallocChunk:   34289747040 kB
HardwareCorrupted: 4 kB
AnonHugePages:  90095616 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:8192 kB
DirectMap2M: 2015232 kB
DirectMap1G:132120576 kB

And it's hdd.

 

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2017-01-03 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796804#comment-15796804
 ] 

Kai Zheng commented on HDFS-7784:
-

OOO today for customer visit, please expect delayed response. Thanks.



> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2017-01-03 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796797#comment-15796797
 ] 

Gang Xie commented on HDFS-7784:


The JVM setting:
  -Xmx102400m
  -Xms102400m
  -Xmn5508m
  -XX:MaxDirectMemorySize=3686m
  -XX:MaxPermSize=1024m
  -XX:+PrintGCApplicationStoppedTime
  -XX:+UseConcMarkSweepGC
  -verbose:gc
  -XX:+PrintGCDetails
  -XX:+PrintGCDateStamps
  -XX:SurvivorRatio=6
  -XX:+UseCMSCompactAtFullCollection
  -XX:CMSInitiatingOccupancyFraction=70
  -XX:+UseCMSInitiatingOccupancyOnly
  -XX:+CMSParallelRemarkEnabled
  -XX:+UseNUMA
  -XX:+CMSClassUnloadingEnabled
  -XX:CMSMaxAbortablePrecleanTime=1
  -XX:TargetSurvivorRatio=80
  -XX:+UseGCLogFileRotation
  -XX:NumberOfGCLogFiles=100
  -XX:GCLogFileSize=128m
  -XX:CMSWaitDuration=8000
  -XX:+CMSScavengeBeforeRemark
  -XX:ConcGCThreads=16
  -XX:ParallelGCThreads=16
  -XX:+CMSConcurrentMTEnabled
  -XX:+SafepointTimeout
  -XX:MonitorBound=16384
  -XX:-UseBiasedLocking
  -XX:MaxTenuringThreshold=3
  -XX:+ParallelRefProcEnabled
  -XX:-OmitStackTraceInFastThrow

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2017-01-03 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795396#comment-15795396
 ] 

Kihwal Lee commented on HDFS-7784:
--

[~xiegang112], when you have a chance to test the performance, please also 
share the jvm GC setting and the hardware spec (e.g. how many cores, as it 
affects the GC performance). It will be even better if you can measure the GC 
activities before and after.  If everything looks positive, people will 
certainly be interested.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2016-12-28 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784660#comment-15784660
 ] 

Gang Xie commented on HDFS-7784:


After make the AclStorage synchronized, it could works. the loading time could 
be reduced to ~12 mins from 29 mins with 20 threads. Need further check if 
similar issue exists.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2016-12-28 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784285#comment-15784285
 ] 

Gang Xie commented on HDFS-7784:


Found a potential issue while trying to back port this patch to 2.4. Pls 
correct me if I'm wrong:

When ACL is enabled on the file, it will call addAclFeature to add the 
AclFeature to UNIQUE_ACL_FEATURES, which is a hashmap and shared by all the 
files. Since we intrudoced multi threading, I think this could be a problem.

Actually, in my test, trying to load 22G fsimage with 200M inodes, it could not 
 finished the loading fsimage in some hours (without the patch, it could finish 
it in about 30mins). And the jstack show it's busy with UNIQUE_ACL_FEATURES. 
Not sure if the cache is messed up. As the image is huge and need 100G mem to 
profile it. It's hard to open the dump. So, I'm 100% sure about this.

Do we hit similar issue when doing the test?

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2016-12-26 Thread Gang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777928#comment-15777928
 ] 

Gang Xie commented on HDFS-7784:


Hello,
Any update about this improvement? Loading huge image really takes time. And it 
seems that this improvement is quite necessary.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-12-21 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067073#comment-15067073
 ] 

Kihwal Lee commented on HDFS-7784:
--

bq.  protobuf seems to generate a lot of garbage during startup, causing many 
full GCs which really consume a lot of time.
One of the large NNs used to do multiple full GCs during start-up, but mainly 
due to initial full block report processing. Ever since the young gen size was 
increased, it stopped doing it.  We initially feared the minor collection time 
would increase dramatically, but that wasn't the case.  Along with the increase 
YG size, we set {{-XX:ParGCCardsPerStrideChunk=32768}}.

We will look into javanano version. Thanks for the pointer.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-12-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066949#comment-15066949
 ] 

Colin Patrick McCabe commented on HDFS-7784:


Thanks, [~kihwal].  Unfortunately, that's what we've seen as well... protobuf 
seems to generate a lot of garbage during startup, causing many full GCs which 
really consume a lot of time.  It used to be you could ignore temporary objects 
as long as you didn't create tenured objects, but it turns out that if there 
are too many temporaries, HotSpot pushes them into the PermGen.  At this point, 
it's not clear that parallelization is a win for fsimage loading unless we can 
mitigate that GC problem.

Have you guys looked into using the "javanano" version of protocol buffers?  
See here: https://github.com/google/protobuf/tree/master/javanano

It seems like this would generate a lot less garbage than the "official" PB 
library because it avoids builders in favor of mutable state, uses ints instead 
of enums, uses arrays instead of ArrayList, etc. etc.  I think we should 
probably adopt this on the server-side, even if we keep the client-side with 
the existing PB library.  This would help with RPC as well, of course.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-12-18 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064274#comment-15064274
 ] 

Kihwal Lee commented on HDFS-7784:
--

bq. ... find out that the bottleneck is deserialization taking too much cpu 
time, not disk I/O.
That's exactly what we see. Disk never is the bottleneck for loading fsimage. 
It's the decoding of protobuf that is slow and creating a lot of garbage.  
Parallelizing will increase the gabage generation rate and if the GC cannot 
keep up, it can get even slower by incurring full gc.

As for time reduction, I would still say yes to even 50% speed up.

> load fsimage in parallel
> 
>
> Key: HDFS-7784
> URL: https://issues.apache.org/jira/browse/HDFS-7784
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-24 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335929#comment-14335929
 ] 

Walter Su commented on HDFS-7784:
-

I use visualvm to profile the loading process and find out that the bottleneck 
is deserialization taking too much cpu time, not disk I/O. The 
test(test-20150213.pdf) uses three 7200rpm hard disks as raid0. I tried 
single-threaded starts with and without cleaning buffer cache, and the 
difference is very small.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325162#comment-14325162
 ] 

Colin Patrick McCabe commented on HDFS-7784:


At the end of the day, there are situations where you have to restart both 
NameNodes.  For example, you might have hit a bug that causes both the standby 
and the active to crash.  We've had bugs like that in the past.  So I do think 
this is an important improvement.

I think the discussion here has been a little too dismissive.  Some people are 
regularly spending 10 minutes to load their big fsimages... I don't think those 
people would write off a 2x (or 2.5x speedup) as not good enough.

I do think [~wheat9]'s point about avoiding complexity is good.  Can we get 
some benefit just doing a really large amount of readahead?   For example, if 
we had a background thread that ran concurrently, that simply did nothing but 
read the FSImage from start to back, it would warm up the buffer cache for 
the other thread.  This would mean that our single-threaded loading process 
would spend less time waiting for disk I/O.  Maybe try that out and see what 
the numbers look like on a really big fsimage (something like 5-7 GB).

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319717#comment-14319717
 ] 

Hadoop QA commented on HDFS-7784:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698660/HDFS-7784.001.patch
  against trunk revision ba3c80a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs 

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9572//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9572//console

This message is automatically generated.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-7784.001.patch


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319840#comment-14319840
 ] 

Hadoop QA commented on HDFS-7784:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12698691/test-20150213.pdf
  against trunk revision ba3c80a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9573//console

This message is automatically generated.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320201#comment-14320201
 ] 

Walter Su commented on HDFS-7784:
-

I agree with you. A single Namenode with 64GB memory can hold about 100m 
files(maybe a little more). In this situation, The startup time drops from 371s 
to 159s and it's not good enough. Usually we don't restart Namenode often. So I 
think it's ok we wait another 2 minutes for restarting. 
If people store 10x or 100x more than 100m files, they should consider 
federation.
So I changed the priority to minor, and still I'll upload the patch, Maybe 
it'll help someone.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320210#comment-14320210
 ] 

Walter Su commented on HDFS-7784:
-

I mean fsimage loading time drops from 371s to 159s. And processing blockreport 
takes much more time than that.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-13 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320261#comment-14320261
 ] 

Walter Su commented on HDFS-7784:
-

In my testing, the memory usage doesn't grow. GC doesn't get worse. I do use a 
small buffer to avoid frequently lock()/unlock(), How will (small)buffer affect 
gc? Deserialization still create the same amount of garbage, it's a matter of 
speed.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-13 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321075#comment-14321075
 ] 

Kai Zheng commented on HDFS-7784:
-

Hi [~walter.k.su],

It's interesting, thanks !
bq.So I changed the priority to minor
I don't think it's minor. It does make sense. I thought it's a good discussion.
bq.One thing we might consider is a two-thread system, where one thread does 
deserialization and puts the results into a BlockingQueue read by the other FSN 
loading thread. 
I thought it's a good idea. We might consider it as well and have a try ?

So we have the current approach, the parallel approach proposed here, and the 
above one suggested by [~cmccabe]. Is it possible to enhance and allow to 
plugin the fsimage loading approach ? By default it will use the current method.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-7784.001.patch, test-20150213.pdf


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-12 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319115#comment-14319115
 ] 

Haohui Mai commented on HDFS-7784:
--

I've done some experiments in HDFS-5698. Parallelism does improve the 
performance, however, my feeling is that the improvement is  significant enough 
to justify the the complexity, especially having one race / bug here could 
easily lead to data loss.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su

 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-12 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319097#comment-14319097
 ] 

Colin Patrick McCabe commented on HDFS-7784:


Hi Walter, this is an interesting idea.

We have found that GC is a major part of NN startup time.  Have you tested with 
FSImages larger than 3 GB?

If we are doing a lot of buffering, my concern would be that GC could get worse.

One thing we might consider is a two-thread system, where one thread does 
deserialization and puts the results into a BlockingQueue read by the other FSN 
loading thread.  This would avoid buffering an enormous amount of data, but 
still get 2x parallelism.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su

 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7784) load fsimage in parallel

2015-02-12 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319684#comment-14319684
 ] 

Walter Su commented on HDFS-7784:
-

I'll upload performance test results in 4 hours.

 load fsimage in parallel
 

 Key: HDFS-7784
 URL: https://issues.apache.org/jira/browse/HDFS-7784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-7784.001.patch


 When single Namenode has huge amount of files, without using federation, the 
 startup/restart speed is slow. The fsimage loading step takes the most of the 
 time. fsimage loading can seperate to two parts, deserialization and object 
 construction(mostly map insertion). Deserialization takes the most of CPU 
 time. So we can do deserialization in parallel, and add to hashmap in serial. 
  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)