[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2021-03-25 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308697#comment-17308697
 ] 

Stephen O'Donnell commented on HDFS-13639:
--

Thanks [~leosun08]. This was was a clean cherry-pick to branch-3.3 so 
backported it to there.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2021-03-24 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307586#comment-17307586
 ] 

Lisheng Sun commented on HDFS-13639:


Hi, [~sodonnell] 

This patch is  running in production for a long time.

The concurrency  of  short-circuit read has been improved greatly in the Hbase 
sence. 

 

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, 

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2021-03-23 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307027#comment-17307027
 ] 

Stephen O'Donnell commented on HDFS-13639:
--

[~leosun08] Are you running this change in production? Have you encountered any 
issues with it, and did you see a noticeable improvement on your production 
cluster when it was deployed?

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-05-21 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113526#comment-17113526
 ] 

Hudson commented on HDFS-13639:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18285 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18285/])
HDFS-13639. SlotReleaser is not fast enough (#1885) (github: rev 
be374faf429d28561dd9c582f5c55451213d89a4)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DfsClientShmManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java


> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the 

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-05-19 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111267#comment-17111267
 ] 

Lisheng Sun commented on HDFS-13639:


 
||case|| ||
|Before Case|perf_before_improve_SlotReleaser.png|
|After Case|perf_after_improve_SlotReleaser.png|

We can clearly see that: after the patch,  YCSB get qps throughput increased 
about 17.8% ( math: (4200 qps - 3700 qps) / 3700qps ~13.5%). See 
CPS-1-min/name=hbase/ycsb-test/get on data grash. The stress test could be 
finished smoothly, the qps of allocating and releasing match.

 The x axis indicates time and the y axis indicates latency  of  
**percentile/name=hbase/ycsb-test/get on data grash.

After the patch, the latency is not long with the pressure test time.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, 

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-05-15 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108453#comment-17108453
 ] 

Wei-Chiu Chuang commented on HDFS-13639:


It would be really great if you can explain the charts, what are the y axis and 
x axis.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
> The diff is against 2.4, and I think this issue exists till latest version. I 
> doesn't have test env with 2.7 and higher version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-05-11 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105030#comment-17105030
 ] 

Lisheng Sun commented on HDFS-13639:


hi [~sodonnell]  [~ayushtkn] Could you have time to help review it? Thank you.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
> The diff is against 2.4, and I think this issue exists till latest version. I 
> doesn't have test env with 2.7 and higher version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-03-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059720#comment-17059720
 ] 

Hadoop QA commented on HDFS-13639:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
2s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m  3s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}195m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-13639 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12996764/HDFS-13639.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1658552647b3 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 74780c2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_242 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-03-09 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055520#comment-17055520
 ] 

Lisheng Sun commented on HDFS-13639:


[~elgoiri] [~ayushtkn] Could you have time to help review it? Thank you.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> ShortCircuitCache_new_slotReleaser.diff, perf_after_improve_SlotReleaser.png, 
> perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
> The diff is against 2.4, and I think this issue exists till latest version. I 
> doesn't have test env with 2.7 and higher version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2020-03-09 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055519#comment-17055519
 ] 

Lisheng Sun commented on HDFS-13639:


i submit one PR about this issue.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> ShortCircuitCache_new_slotReleaser.diff, perf_after_improve_SlotReleaser.png, 
> perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
> The diff is against 2.4, and I think this issue exists till latest version. I 
> doesn't have test env with 2.7 and higher version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2018-07-28 Thread Gang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560942#comment-16560942
 ] 

Gang Xie commented on HDFS-13639:
-

Attached the diff against the 2.4.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: HDFS-13639-2.4.diff, 
> ShortCircuitCache_new_slotReleaser.diff, perf_after_improve_SlotReleaser.png, 
> perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
> The diff is against 2.4, and I think this issue exists till latest version. I 
> doesn't have test env with 2.7 and higher version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2018-05-30 Thread Gang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495014#comment-16495014
 ] 

Gang Xie commented on HDFS-13639:
-

I used YCSB on hbase to issue local read on HDFDS datanode. Pls see the env for 
the detail.

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
> The diff is against 2.4, and I think this issue exists till latest version. I 
> doesn't have test env with 2.7 and higher version. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org