[jira] [Commented] (HDFS-14297) Add cache for getContentSummary() result

2019-02-20 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772780#comment-16772780
 ] 

Tao Jie commented on HDFS-14297:


Thank you [~xkrogen], {{getContentSummary}} is invoked from several peripheral 
systems, not only for monitoring quotas in our environment. We can replace 
{{getContentSummary}} by {{getQuotaUsage}} in some place. I still think we 
should do some improvement on server side. If we have a new user who call 
{{getContentSummary}} very frequently, it will cause a lot of load to namenode 
rpc server

> Add cache for getContentSummary() result
> 
>
> Key: HDFS-14297
> URL: https://issues.apache.org/jira/browse/HDFS-14297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Jie
>Priority: Major
>
> In a large HDFS cluster, calling {{getContentSummary}} for a directory with 
> large amount of files is very expensive. In a certain cluster with more than 
> 100 million files, calling {{getContentSummary}} may take more than 10s and 
> it will hold fsnamesystem lock for such a long time.
> In our cluster, there are several peripheral systems calling 
> {{getContentSummary}} periodically to monitor the status of dirs. Actually we 
> don't need the very accurate result in most cases. We could keep a cache for 
> those contentSummary result in namenode, with which we could avoid repeated 
> heavy request in a span. Also we should add more restrictions to  this cache: 
> 1,its size should be limited and it should be LRU, 2, only result of heavy 
> request would be  added to this cache, eg, rpctime over 1000ms.
> We may create a new RPC method or add a flag to the current method so that we 
> will not modify the current behavior and we can have a choose of a accurate 
> but expensive method or a fast but inaccurate method. 
> Any thought?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-02-19 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772535#comment-16772535
 ] 

Tao Jie commented on HDFS-14219:


Thank you [~vagarychen] for your comments. I've run failed unit tests on my 
local environment. All those tests passed.


> ConcurrentModificationException occurs in datanode occasionally
> ---
>
> Key: HDFS-14219
> URL: https://issues.apache.org/jira/browse/HDFS-14219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: HDFS-14129-branch-2.8.001.patch, HDFS-14219.001.patch
>
>
> ERROR occasionally occurs in datanode log:
>  ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
> BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
> c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
>  java.util.ConcurrentModificationException: modification=62852685 != 
> iterModification = 62852684
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14297) Add cache for getContentSummary() result

2019-02-19 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-14297:
---
Description: 
In a large HDFS cluster, calling {{getContentSummary}} for a directory with 
large amount of files is very expensive. In a certain cluster with more than 
100 million files, calling {{getContentSummary}} may take more than 10s and it 
will hold fsnamesystem lock for such a long time.
In our cluster, there are several peripheral systems calling 
{{getContentSummary}} periodically to monitor the status of dirs. Actually we 
don't need the very accurate result in most cases. We could keep a cache for 
those contentSummary result in namenode, with which we could avoid repeated 
heavy request in a span. Also we should add more restrictions to  this cache: 
1,its size should be limited and it should be LRU, 2, only result of heavy 
request would be  added to this cache, eg, rpctime over 1000ms.
We may create a new RPC method or add a flag to the current method so that we 
will not modify the current behavior and we can have a choose of a accurate but 
expensive method or a fast but inaccurate method. 
Any thought?

  was:
In a large HDFS cluster, calling {{getContentSummary}} for a directory with 
large amount of files is very expensive. In a certain cluster with more than 
100 million files, calling {{getContentSummary}} may take more than 10s and it 
will hold fsnamesystem lock for such a long time.
In our cluster, there are several peripheral systems calling 
{{getContentSummary}} periodically to monitor the status of dirs. Actually we 
don't need the very accurate result in most cases. We could keep a cache for 
those contentSummary result in namenode, with which we could avoid repeated 
heavy request in a span. Also we should add more restrictions to  this cache: 
1,its size should be limited and it should be LRU, 2, only result of heavy 
request would be  added to this cache, eg, rpctime over 1000ms.
Any thought?


> Add cache for getContentSummary() result
> 
>
> Key: HDFS-14297
> URL: https://issues.apache.org/jira/browse/HDFS-14297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Jie
>Priority: Major
>
> In a large HDFS cluster, calling {{getContentSummary}} for a directory with 
> large amount of files is very expensive. In a certain cluster with more than 
> 100 million files, calling {{getContentSummary}} may take more than 10s and 
> it will hold fsnamesystem lock for such a long time.
> In our cluster, there are several peripheral systems calling 
> {{getContentSummary}} periodically to monitor the status of dirs. Actually we 
> don't need the very accurate result in most cases. We could keep a cache for 
> those contentSummary result in namenode, with which we could avoid repeated 
> heavy request in a span. Also we should add more restrictions to  this cache: 
> 1,its size should be limited and it should be LRU, 2, only result of heavy 
> request would be  added to this cache, eg, rpctime over 1000ms.
> We may create a new RPC method or add a flag to the current method so that we 
> will not modify the current behavior and we can have a choose of a accurate 
> but expensive method or a fast but inaccurate method. 
> Any thought?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14297) Add cache for getContentSummary() result

2019-02-19 Thread Tao Jie (JIRA)
Tao Jie created HDFS-14297:
--

 Summary: Add cache for getContentSummary() result
 Key: HDFS-14297
 URL: https://issues.apache.org/jira/browse/HDFS-14297
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tao Jie


In a large HDFS cluster, calling {{getContentSummary}} for a directory with 
large amount of files is very expensive. In a certain cluster with more than 
100 million files, calling {{getContentSummary}} may take more than 10s and it 
will hold fsnamesystem lock for such a long time.
In our cluster, there are several peripheral systems calling 
{{getContentSummary}} periodically to monitor the status of dirs. Actually we 
don't need the very accurate result in most cases. We could keep a cache for 
those contentSummary result in namenode, with which we could avoid repeated 
heavy request in a span. Also we should add more restrictions to  this cache: 
1,its size should be limited and it should be LRU, 2, only result of heavy 
request would be  added to this cache, eg, rpctime over 1000ms.
Any thought?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-02-18 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-14219:
---
Attachment: HDFS-14129-branch-2.8.001.patch

> ConcurrentModificationException occurs in datanode occasionally
> ---
>
> Key: HDFS-14219
> URL: https://issues.apache.org/jira/browse/HDFS-14219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: HDFS-14129-branch-2.8.001.patch, HDFS-14219.001.patch
>
>
> ERROR occasionally occurs in datanode log:
>  ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
> BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
> c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
>  java.util.ConcurrentModificationException: modification=62852685 != 
> iterModification = 62852684
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-02-17 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-14219:
---
Status: Patch Available  (was: Open)

> ConcurrentModificationException occurs in datanode occasionally
> ---
>
> Key: HDFS-14219
> URL: https://issues.apache.org/jira/browse/HDFS-14219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: HDFS-14219.001.patch
>
>
> ERROR occasionally occurs in datanode log:
>  ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
> BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
> c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
>  java.util.ConcurrentModificationException: modification=62852685 != 
> iterModification = 62852684
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-02-17 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie reassigned HDFS-14219:
--

Assignee: Tao Jie

> ConcurrentModificationException occurs in datanode occasionally
> ---
>
> Key: HDFS-14219
> URL: https://issues.apache.org/jira/browse/HDFS-14219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Minor
> Attachments: HDFS-14219.001.patch
>
>
> ERROR occasionally occurs in datanode log:
>  ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
> BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
> c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
>  java.util.ConcurrentModificationException: modification=62852685 != 
> iterModification = 62852684
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-02-17 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770796#comment-16770796
 ] 

Tao Jie commented on HDFS-14219:


I checked code in FsDatasetImpl.java, it seems missed one line change in 
HDFS-10682 when it was merged into branch-2.8.x
{code}
 List curVolumes = null;
-synchronized(this) {
+try (AutoCloseableLock lock = datasetLock.acquire()) {
   curVolumes = volumes.getVolumes();
   for (FsVolumeSpi v : curVolumes) {
 builders.put(v.getStorageID(), 
BlockListAsLongs.builder(maxDataLength));
{code}
[~vagarychen], [~arpitagarwal] would you have a look at this patch?

> ConcurrentModificationException occurs in datanode occasionally
> ---
>
> Key: HDFS-14219
> URL: https://issues.apache.org/jira/browse/HDFS-14219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Priority: Minor
> Attachments: HDFS-14219.001.patch
>
>
> ERROR occasionally occurs in datanode log:
>  ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
> BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
> c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
>  java.util.ConcurrentModificationException: modification=62852685 != 
> iterModification = 62852684
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-02-17 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-14219:
---
Attachment: HDFS-14219.001.patch

> ConcurrentModificationException occurs in datanode occasionally
> ---
>
> Key: HDFS-14219
> URL: https://issues.apache.org/jira/browse/HDFS-14219
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Priority: Minor
> Attachments: HDFS-14219.001.patch
>
>
> ERROR occasionally occurs in datanode log:
>  ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
> BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
> c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
>  java.util.ConcurrentModificationException: modification=62852685 != 
> iterModification = 62852684
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
>  at 
> org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14219) ConcurrentModificationException occurs in datanode occasionally

2019-01-21 Thread Tao Jie (JIRA)
Tao Jie created HDFS-14219:
--

 Summary: ConcurrentModificationException occurs in datanode 
occasionally
 Key: HDFS-14219
 URL: https://issues.apache.org/jira/browse/HDFS-14219
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.2
Reporter: Tao Jie


ERROR occasionally occurs in datanode log:

 ERROR BPServiceActor.java:792 - Exception in BPOfferService for Block pool 
BP-1687106048-10.14.9.17-1535259994856 (Datanode Uuid 
c17e635a-912f-4488-b4e8-93a58a27b5db) service to osscmh-9-21/10.14.9.21:8070
 java.util.ConcurrentModificationException: modification=62852685 != 
iterModification = 62852684
 at 
org.apache.hadoop.util.LightWeightGSet$SetIterator.ensureNext(LightWeightGSet.java:305)
 at 
org.apache.hadoop.util.LightWeightGSet$SetIterator.hasNext(LightWeightGSet.java:322)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1872)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:349)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:648)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:790)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-31 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563185#comment-16563185
 ] 

Tao Jie commented on HDFS-13769:


Add documents and fixed findbugs in the latest patch

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, 
> HDFS-13769.003.patch, HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-31 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13769:
---
Attachment: HDFS-13769.004.patch

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, 
> HDFS-13769.003.patch, HDFS-13769.004.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-30 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13769:
---
Attachment: HDFS-13769.003.patch

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch, 
> HDFS-13769.003.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-27 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559568#comment-16559568
 ] 

Tao Jie commented on HDFS-13769:


Updated the patch with [~linyiqun] 's suggestion. In 2 test cases, the Invoke 
times of {{getContentSummary}} reduces from 29 to 18 in one case, and from 21 
to 10 in the other case with this update.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-27 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13769:
---
Attachment: HDFS-13769.002.patch

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch, HDFS-13769.002.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559125#comment-16559125
 ] 

Tao Jie commented on HDFS-13769:


[~csun], I agree with [~kihwal].  We cannot use this logic in the default 
delete operation, since it breaks the existing delete semantics. However we can 
use this logic in trash deletion which brings less side effect. Also clear 
checkpoint in trash is a typical situation of deleting a large dir, since the 
checkpoint dir of trash accumulates deleted files within several hours.

[~jojochuang], Agree! \{{getContentSummary}} is a recursive method and it may 
take several seconds if the dir is very large. \{{getContentSummary}} holds the 
read-lock in \{{FSNameSystem}} rather than the write-lock. Also we need a way 
to know whether a dir is large. If there is a better solution I don't know, 
please tell me, and I think it need not to be very accurate.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Tao Jie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558292#comment-16558292
 ] 

Tao Jie commented on HDFS-13769:


Hi, [~jojochuang] , the version of our cluster is 2.8.2, and this patch is 
based on the trunk. However I found the logic about the trash policy is almost 
the same in 2.8.2 and 3.

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13769:
---
Attachment: HDFS-13769.001.patch

> Namenode gets stuck when deleting large dir in trash
> 
>
> Key: HDFS-13769
> URL: https://issues.apache.org/jira/browse/HDFS-13769
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13769.001.patch
>
>
> Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
> long time when deleting trash dir with a large mount of data. We found log in 
> namenode:
> {quote}
> 2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
> (FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
> 23018 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)
> {quote}
> One simple solution is to avoid deleting large data in one delete RPC call. 
> We implement a trashPolicy that divide the delete operation into several 
> delete RPCs, and each single deletion would not delete too many files.
> Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13769) Namenode gets stuck when deleting large dir in trash

2018-07-26 Thread Tao Jie (JIRA)
Tao Jie created HDFS-13769:
--

 Summary: Namenode gets stuck when deleting large dir in trash
 Key: HDFS-13769
 URL: https://issues.apache.org/jira/browse/HDFS-13769
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.1.0, 2.8.2
Reporter: Tao Jie
Assignee: Tao Jie


Similar to the situation discussed in HDFS-13671, Namenode gets stuck for a 
long time when deleting trash dir with a large mount of data. We found log in 
namenode:

{quote}

2018-06-08 20:00:59,042 INFO namenode.FSNamesystem 
(FSNamesystemLock.java:writeUnlock(252)) - FSNamesystem write lock held for 
23018 ms via
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:254)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1567)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2820)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1047)

{quote}

One simple solution is to avoid deleting large data in one delete RPC call. We 
implement a trashPolicy that divide the delete operation into several delete 
RPCs, and each single deletion would not delete too many files.

Any thought? [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13032) Make AvailableSpaceBlockPlacementPolicy more adaptive

2018-06-19 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie reassigned HDFS-13032:
--

Assignee: Tao Jie

> Make AvailableSpaceBlockPlacementPolicy more adaptive
> -
>
> Key: HDFS-13032
> URL: https://issues.apache.org/jira/browse/HDFS-13032
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13032.001.patch, HDFS-13032.002.patch
>
>
> In a heterogeneous HDFS cluster, datanode capacity and usage are very 
> different.
> Now we can use HDFS-8131, a usage-aware block placement policy to deal with 
> the problem. However, this policy could be more flexible.
> 1, The probability of a node with high usage being chosen is fixed once the 
> parameter is set. That is the probability is always the same no matter its 
> usage is 90% or 70%. When the usage of a node is close to full, its 
> probability of being chosen should be lower.
> 2, When the difference of usage is below 5%(hard code), the two nodes are 
> considered the same usage. I think it's OK when usage is 30% and 35%, but 
> when usage is 93% and 98%, they should not be treated equally. The correction 
> of probability could be more smooth.
> In my opinion, when we choose one node from two candidates (A: usage 30%, B: 
> usage 60%), we can calculate the probability according to the available 
> storage. p(A) = 70%/(70% + 40%), p(B) = 40% (70% +40%). When a node is close 
> to full, the probability would be very small.
> Also we could have another factor to weaken this correctness, and make the 
> modification not so aggressive.
> Any thought? [~liushaohui]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-16 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439024#comment-16439024
 ] 

Tao Jie commented on HDFS-13279:


Updated the patch. With the commit in HDFS-13418, we have not to change the 
current logic in the latest attached patch.

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch, 
> HDFS-13279.006.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-16 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.006.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch, 
> HDFS-13279.006.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13411) Should log more information when FSNameSystem is locked for a long time

2018-04-13 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437059#comment-16437059
 ] 

Tao Jie commented on HDFS-13411:


This patch is quite simple, [~arpitagarwal], [~hanishakoneru] would you give it 
a quick review?

> Should log more information when FSNameSystem is locked for a long time
> ---
>
> Key: HDFS-13411
> URL: https://issues.apache.org/jira/browse/HDFS-13411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13411.001.patch, HDFS-13411.002.patch
>
>
> Today when one RPC to namenode locks FSNameSystem for a long time, it would 
> print the stacktrace to the namenode log. However all we know from this log 
> is the operation name to the Namenode(such as create, delete). 
> It should print more information about the rpc call(like caller name, caller 
> ip, operation src), which blocks the namenode. So we can have a better tuning 
> to the HDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-13 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13418:
---
Attachment: HDFS-13418-branch2.001.patch

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418-branch2.001.patch, HDFS-13418.001.patch, 
> HDFS-13418.002.patch, HDFS-13418.003.patch, HDFS-13418.004.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-11 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13418:
---
Attachment: HDFS-13418.004.patch

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch, HDFS-13418.002.patch, 
> HDFS-13418.003.patch, HDFS-13418.004.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-11 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13418:
---
Attachment: HDFS-13418.003.patch

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch, HDFS-13418.002.patch, 
> HDFS-13418.003.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-11 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433678#comment-16433678
 ] 

Tao Jie commented on HDFS-13418:


Thank you [~linyiqun] for your comments.
For the UT, I think we need one additional case, which is what I actually need:
Switch {{dfs.use.dfs.network.topology}} to true, then set the value of 
{{dfs.net.topology.impl}} to class that extends {{DFSNetworkTopology}}.
Actually I am trying to add a new NetworkTopology and expect it to go well with 
the current {{DFSNetworkTopology}}. So I need this test case to ensure it does 
work.
I will update the patch as your suggestion soon.

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch, HDFS-13418.002.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-11 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433553#comment-16433553
 ] 

Tao Jie commented on HDFS-13418:


Updated the patch in which added default property in hdfs-default.xml and added 
testcase in TestDatanodeManager

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch, HDFS-13418.002.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-11 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13418:
---
Attachment: HDFS-13418.002.patch

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch, HDFS-13418.002.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-10 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432482#comment-16432482
 ] 

Tao Jie commented on HDFS-13279:


Thank you [~ajayydv] for reply!
1, I set up another jira HDFS-13418, and I will try not to modify any current 
logic.
2, I try to make my point clear. There are 2 approaches to choose the right 
node:
# As you mentioned, first choose the right rack with weight, then choose a node 
from the selected rack.
# As the code in patch,  first choose a node as it used to, then check if it 
need another choosing according to the weight of the rack.

I understand approach 1 could be more accurate, also approach 2 works well as I 
have tested. I prefer approach 2 because it reuse rather than rewrite the 
current logic of choosing nodes.
For the consideration of performance, approach 2 do have some overhead. In a 
certain proportion of choosing, we may choose twice. But I don't think the 
chance will be over 10% in a typical case. For approach 1, I think we should 
have more test for the performance since the choosing logic is totally changed 
and it seems to be heavier than the former logic.


> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-10 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432029#comment-16432029
 ] 

Tao Jie commented on HDFS-13418:


Thank you [~linyiqun] for your patient explanation.
I attached a patch in which I add new option {{dfs.net.topology.impl}} to 
config NetworkTopology implementation class when 
{{dfs.use.dfs.network.topology}} is true.
Now if we need to add a new NetworkTopology impl, we can
1, Define {{NewNetworkTopology}} extends {{DFSNetworkTopology}} and config 
{{dfs.net.topology.impl=NewNetworkTopology}} in hdfs-site.xml when it is used 
in HDFS case.
2, Define {{NewNetworkTopology}} extends {{NetworkTopology}} and config 
{{net.topology.impl=NewNetworkTopology}} in core-site.xml and set 
{{dfs.use.dfs.network.topology}} to false when not used in HDFS case
Dose it make sense? [~linyiqun] like to hear your comments:)

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-10 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13418:
---
Attachment: HDFS-13418.001.patch

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13418.001.patch
>
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-09 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431728#comment-16431728
 ] 

Tao Jie commented on HDFS-13418:


[~linyiqun] Thank you for your comment. I feel a little tricky that 
{{net.topology.impl}} is configured but does not work when 
{{dfs.use.dfs.network.topology}} is true.
 1, Can we just remove 
{{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
core-default.xml? Since we have default impl in code.
 2, Can we set {{net.topology.impl=org.apache.hadoop.net.DFSNetworkTopology}} 
in hdfs-default.xml? So it would cover the value set in core-site.default when 
we use HDFS.
 3, If we want to add a new NetworkTopology impl and work with 
{{DFSNetworkTopology}}, we may define the new NetworkTopology extends 
{{DFSNetworkTopology}}. But it does not work since {{DFSNetworkTopology}} is 
hardcoded here once {{dfs.use.dfs.network.topology}} is true. Can we use 
reflection to instantiate the NetworkTopology impl when 
{{dfs.use.dfs.network.topology}} is set true?

>  NetworkTopology should be configurable when enable DFSNetworkTopology
> --
>
> Key: HDFS-13418
> URL: https://issues.apache.org/jira/browse/HDFS-13418
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.1
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
>
> In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
> DFSNetworkTopology as the default implementation.
> We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
> core-site.default. Actually this property does not effect once 
> {{dfs.use.dfs.network.topology}} is true. 
> in {{DatanodeManager}},networkTopology is initialized as 
> {code}
> if (useDfsNetworkTopology) {
>   networktopology = DFSNetworkTopology.getInstance(conf);
> } else {
>   networktopology = NetworkTopology.getInstance(conf);
> }
> {code}
> I think we should still make the NetworkTopology  configurable rather than 
> hard code the implementation since we may need another NetworkTopology impl.
> I am not sure if there is other consideration. Any thought? [~vagarychen] 
> [~linyiqun]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13418) NetworkTopology should be configurable when enable DFSNetworkTopology

2018-04-09 Thread Tao Jie (JIRA)
Tao Jie created HDFS-13418:
--

 Summary:  NetworkTopology should be configurable when enable 
DFSNetworkTopology
 Key: HDFS-13418
 URL: https://issues.apache.org/jira/browse/HDFS-13418
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.1
Reporter: Tao Jie
Assignee: Tao Jie


In HDFS-11530 we introduce DFSNetworkTopology and in HDFS-11998 we set 
DFSNetworkTopology as the default implementation.

We still have {{net.topology.impl=org.apache.hadoop.net.NetworkTopology}} in 
core-site.default. Actually this property does not effect once 
{{dfs.use.dfs.network.topology}} is true. 
in {{DatanodeManager}},networkTopology is initialized as 
{code}
if (useDfsNetworkTopology) {
  networktopology = DFSNetworkTopology.getInstance(conf);
} else {
  networktopology = NetworkTopology.getInstance(conf);
}
{code}
I think we should still make the NetworkTopology  configurable rather than hard 
code the implementation since we may need another NetworkTopology impl.
I am not sure if there is other consideration. Any thought? [~vagarychen] 
[~linyiqun]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13411) Should log more information when FSNameSystem is locked for a long time

2018-04-09 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430193#comment-16430193
 ] 

Tao Jie commented on HDFS-13411:


Fixed failed test and checkstyle.

> Should log more information when FSNameSystem is locked for a long time
> ---
>
> Key: HDFS-13411
> URL: https://issues.apache.org/jira/browse/HDFS-13411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13411.001.patch, HDFS-13411.002.patch
>
>
> Today when one RPC to namenode locks FSNameSystem for a long time, it would 
> print the stacktrace to the namenode log. However all we know from this log 
> is the operation name to the Namenode(such as create, delete). 
> It should print more information about the rpc call(like caller name, caller 
> ip, operation src), which blocks the namenode. So we can have a better tuning 
> to the HDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13411) Should log more information when FSNameSystem is locked for a long time

2018-04-09 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13411:
---
Attachment: HDFS-13411.002.patch

> Should log more information when FSNameSystem is locked for a long time
> ---
>
> Key: HDFS-13411
> URL: https://issues.apache.org/jira/browse/HDFS-13411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13411.001.patch, HDFS-13411.002.patch
>
>
> Today when one RPC to namenode locks FSNameSystem for a long time, it would 
> print the stacktrace to the namenode log. However all we know from this log 
> is the operation name to the Namenode(such as create, delete). 
> It should print more information about the rpc call(like caller name, caller 
> ip, operation src), which blocks the namenode. So we can have a better tuning 
> to the HDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13411) Should log more information when FSNameSystem is locked for a long time

2018-04-08 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13411:
---
Attachment: HDFS-13411.001.patch

> Should log more information when FSNameSystem is locked for a long time
> ---
>
> Key: HDFS-13411
> URL: https://issues.apache.org/jira/browse/HDFS-13411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13411.001.patch
>
>
> Today when one RPC to namenode locks FSNameSystem for a long time, it would 
> print the stacktrace to the namenode log. However all we know from this log 
> is the operation name to the Namenode(such as create, delete). 
> It should print more information about the rpc call(like caller name, caller 
> ip, operation src), which blocks the namenode. So we can have a better tuning 
> to the HDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13411) Should log more information when FSNameSystem is locked for a long time

2018-04-08 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13411:
---
Status: Patch Available  (was: Open)

> Should log more information when FSNameSystem is locked for a long time
> ---
>
> Key: HDFS-13411
> URL: https://issues.apache.org/jira/browse/HDFS-13411
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13411.001.patch
>
>
> Today when one RPC to namenode locks FSNameSystem for a long time, it would 
> print the stacktrace to the namenode log. However all we know from this log 
> is the operation name to the Namenode(such as create, delete). 
> It should print more information about the rpc call(like caller name, caller 
> ip, operation src), which blocks the namenode. So we can have a better tuning 
> to the HDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13411) Should log more information when FSNameSystem is locked for a long time

2018-04-07 Thread Tao Jie (JIRA)
Tao Jie created HDFS-13411:
--

 Summary: Should log more information when FSNameSystem is locked 
for a long time
 Key: HDFS-13411
 URL: https://issues.apache.org/jira/browse/HDFS-13411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.8.2
Reporter: Tao Jie
Assignee: Tao Jie


Today when one RPC to namenode locks FSNameSystem for a long time, it would 
print the stacktrace to the namenode log. However all we know from this log is 
the operation name to the Namenode(such as create, delete). 

It should print more information about the rpc call(like caller name, caller 
ip, operation src), which blocks the namenode. So we can have a better tuning 
to the HDFS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-05 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427895#comment-16427895
 ] 

Tao Jie commented on HDFS-13279:


[~ajayydv] Thank you for your comments.
{quote}
Don't remove the default impl entry from core-default.xml.
Similarly we can do this with almost no change in  Network Topology, 
DFSNetworkTopology and TestBalancerWithNodeGroup. We can  update 
DatanodeManager#init to instantiate according to value of "net.topology.impl".
{quote}
Actually in original patch based on 2.8.2, we don't need to modify 
core-default.xml, Network Topology, TestBalancerWithNodeGroup. It is a little 
tricky in HDFS-11998 which set DFSNetworkTopology as default topology 
implementation even though {{net.topology.impl}} is set to NetworkTopology. In 
HDFS-11530, once {{dfs.use.dfs.network.topology}} is true, the implementation 
is hard code to {{DFSNetworkTopology}} no matter what {{net.topology.impl}} is. 
So we have to modify the behavior if we need to add a new topology 
implementation and let it work. Maybe we could fix it in another Jira?
{quote}
L43, chooseDataNode: Instead of choosing datanode twice we can just call 
super.chooseRandom if we overide chooseRandom in 
NetworkTopologyWithWeightedRack. This way we can avoid calling chooseRandom 
twice.
{quote}
It is OK if we use {{first choose a rack then choose a node}} logic in 
{{chooseRandom}}. The purpose of a twice choosing is to mostly reuse the 
current choosing logic, which make the code more easier:)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-04-04 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.005.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch, HDFS-13279.005.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-29 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.004.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-29 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: (was: HDFS-13279.004.patch)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-29 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.004.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch, HDFS-13279.004.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-26 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413612#comment-16413612
 ] 

Tao Jie commented on HDFS-13279:


Thank you [~ajayydv] for your comments.
{quote}
Instead if modifying current default placement policy we should create new one 
by extending DefaultBlockPlacementPolicy/BlockPlacementPolicy.
{quote}
Agree! I will try to add a new blockPlacementPolicy in the next patch. However 
we may still modify {{NetworkTopology}}, in which we calculate weight for each 
rack according node number of each rack. We'd better do the calculation when 
adding/removing nodes rather than choosing nodes for blocks.
 

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-22 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.003.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch, 
> HDFS-13279.003.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.002.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch, HDFS-13279.002.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Status: Patch Available  (was: Open)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.8.3
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407728#comment-16407728
 ] 

Tao Jie commented on HDFS-13279:


Updated patch and add a flag in 
configuration:{{net.topology.modify.small.rack}}. With this option on, it would 
add a modifier to the probability of choosing nodes in small rack.
I have made some test about this modifier:
||TotalNodes||Racks||NodesPerNormalRack||NodesPerSmallRack||Probalility to 
Small Rack without the modifier|| Probalility to Small Rack with the 
modifier(better close to 1)||
|35|3|15|5|1.334|1.103|
|50|4|15|5|1.189|1.023|
|65|5|15|5|1.114|1.035|
|140|10|15|5|1.087|1.014|
|95|4|30|5|1.247|1.030|
|155|4|50|5|1.288|1.030|
|455|10|50|5|1.108|1.014|


> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Attachment: HDFS-13279.001.patch

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13279.001.patch
>
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-15 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16400065#comment-16400065
 ] 

Tao Jie commented on HDFS-13279:


It seems not easy to make probability absolutely equal for each node, but we 
can add a modifier to those nodes in smaller rack and avoid them from being 
written too frequently.

Any thought?

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Description: 
In a Hadoop cluster, number of nodes on a rack could be different. For example, 
we have 50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes 
on the last rack. In this situation, we find that storage usage on the last 5 
nodes would be much higher than other nodes.
 With the default blockplacement policy, for each block, the first replication 
has the same probability to write to each datanode, but the probability for the 
2nd/3rd replication to write to the last 5 nodes would much higher than to 
other nodes. 
 Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
rack4(5 nodes) would receive 1.29 replications in all, while other node would 
receive 0.97 reps.
||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
|From rack1|-|15/35=0.43|0.43|0.43|
|From rack2|0.43|-|0.43|0.43|
|From rack3|0.43|0.43|-|0.43|
|From rack4|5/45=0.11|0.11|0.11|-|
|Total|0.97|0.97|0.97|1.29|

  was:
In a Hadoop cluster, number of nodes on a rack could be different. For example, 
we have 50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes 
on the last rack. In this situation, we find that storage usage on the last 5 
nodes would be much higher than other nodes.
With the default blockplacement policy, for each block, the first replication 
has the same probability to write to each datanode, but the probability for the 
2nd/3rd replication to write to the last 5 nodes would much higher than to 
other nodes. 
Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
rack4(5 nodes) would receive 1.29 replications in all, while other node would 
receive 0.97 reps.


||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
|From rack1|-|15/35=0.43|0.43|0.43|
|From rack2|0.43|-|0.43|0.43|
|From rack3|0.43|0.43|-|0.43|
|From rack4|5/45=0.11|0.11|0.11|-|
|Total|0.97|0.97|0.97|1.29|


> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
>  With the default blockplacement policy, for each block, the first 
> replication has the same probability to write to each datanode, but the 
> probability for the 2nd/3rd replication to write to the last 5 nodes would 
> much higher than to other nodes. 
>  Consider we write 50 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Description: 
In a Hadoop cluster, number of nodes on a rack could be different. For example, 
we have 50 Datanodes in all and 15 datanodes per rack, it would remain 5 nodes 
on the last rack. In this situation, we find that storage usage on the last 5 
nodes would be much higher than other nodes.
With the default blockplacement policy, for each block, the first replication 
has the same probability to write to each datanode, but the probability for the 
2nd/3rd replication to write to the last 5 nodes would much higher than to 
other nodes. 
Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
rack4(5 nodes) would receive 1.29 replications in all, while other node would 
receive 0.97 reps.


||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
|From rack1|-|15/35=0.43|0.43|0.43|
|From rack2|0.43|-|0.43|0.43|
|From rack3|0.43|0.43|-|0.43|
|From rack4|5/45=0.11|0.11|0.11|-|
|Total|0.97|0.97|0.97|1.29|

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
> With the default blockplacement policy, for each block, the first replication 
> has the same probability to write to each datanode, but the probability for 
> the 2nd/3rd replication to write to the last 5 nodes would much higher than 
> to other nodes. 
> Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Affects Version/s: 2.8.3
   3.0.0

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Tao Jie
>Priority: Major
>
> In a Hadoop cluster, number of nodes on a rack could be different. For 
> example, we have 50 Datanodes in all and 15 datanodes per rack, it would 
> remain 5 nodes on the last rack. In this situation, we find that storage 
> usage on the last 5 nodes would be much higher than other nodes.
> With the default blockplacement policy, for each block, the first replication 
> has the same probability to write to each datanode, but the probability for 
> the 2nd/3rd replication to write to the last 5 nodes would much higher than 
> to other nodes. 
> Consider we write 100 blocks to such 50 datanodes. The first rep of 100 block 
> would distirbuted to 50 node equally. The 2rd rep of blocks which the 1st rep 
> is on rack1(15 reps) would send equally to other 35 nodes and each nodes 
> receive 0.428 rep. So does blocks on rack2 and rack3. As a result, node on 
> rack4(5 nodes) would receive 1.29 replications in all, while other node would 
> receive 0.97 reps.
> ||-||Rack1(15 nodes)||Rack2(15 nodes)||Rack3(15 nodes)||Rack4(5 nodes)||
> |From rack1|-|15/35=0.43|0.43|0.43|
> |From rack2|0.43|-|0.43|0.43|
> |From rack3|0.43|0.43|-|0.43|
> |From rack4|5/45=0.11|0.11|0.11|-|
> |Total|0.97|0.97|0.97|1.29|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13279) Datanodes usage is imbalanced if number of nodes per rack is not equal

2018-03-14 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13279:
---
Summary: Datanodes usage is imbalanced if number of nodes per rack is not 
equal  (was: Datanodes usage is imbalanced if node)

> Datanodes usage is imbalanced if number of nodes per rack is not equal
> --
>
> Key: HDFS-13279
> URL: https://issues.apache.org/jira/browse/HDFS-13279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Jie
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13279) Datanodes usage is imbalanced if node

2018-03-14 Thread Tao Jie (JIRA)
Tao Jie created HDFS-13279:
--

 Summary: Datanodes usage is imbalanced if node
 Key: HDFS-13279
 URL: https://issues.apache.org/jira/browse/HDFS-13279
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tao Jie






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13214) RBF: Configuration on Router conflicts with client side configuration

2018-03-06 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389026#comment-16389026
 ] 

Tao Jie edited comment on HDFS-13214 at 3/7/18 4:56 AM:


Sorry for replying late and thank you [~linyiqun] [~elgoiri] for working on 
this JIRA.
It clear to me now:)
Some other minor suggestion about the document:
1, Rebalancing data across subclusters mentioned in the document of 
2.9.0/3.0.0GA is not ready today, right? We'd better avoid misleading users 
when the function is not available (I have tried to find out the way of 
rebalancing for a while :) ). 
2, The diagram of the diagram of Architecture implies that the subclusters are 
independent HDFS clusters. Actually subclusters could also be federation 
cluster or a mixed cluster with federation and independent cluster. We could 
mention it explicitly in the document.
I am ok to handle this in another jira.
+1 for the current patch.


was (Author: tao jie):
Sorry for replying late and thank you [~linyiqun] [~elgoiri] for working on 
this JIRA.
It clear to me now:)
Some other minor suggestion about the document:
1, Rebalancing data across subclusters mentioned in the document of 
2.9.0/3.0.0GA is not ready today, right? We'd better avoid misleading users 
when the function is not available (I have tried to find out the way of 
rebalancing for a while :) ). 
2, The diagram of the diagram of Architecture implies that the subclusters are 
independent HDFS clusters. Actually subclusters could also be federation 
cluster or a mixed cluster with federation and independent cluster. We could 
mention it explicitly in the document.
I'am ok to handle this in another jira.
+1 for the current patch.

> RBF: Configuration on Router conflicts with client side configuration
> -
>
> Key: HDFS-13214
> URL: https://issues.apache.org/jira/browse/HDFS-13214
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Tao Jie
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13214.001.patch, HDFS-13214.002.patch, 
> HDFS-13214.003.patch, HDFS-13214.004.patch
>
>
> In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
> {code}
> 
> dfs.nameservices
> ns1,ns2,ns-fed
>   
>   
> dfs.ha.namenodes.ns-fed
> r1,r2
>   
>   
> dfs.namenode.rpc-address.ns1
> host1:8020
>   
>   
> dfs.namenode.rpc-address.ns2
> host2:8020
>   
>   
> dfs.namenode.rpc-address.ns-fed.r1
> host1:
>   
>   
> dfs.namenode.rpc-address.ns-fed.r2
> host2:
>   
> {code}
> {{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
> However with this configuration on server node, Router fails to start with 
> error:
> {code}
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
> at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
> at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1131)
> at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1086)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createLocalNamenodeHearbeatService(Router.java:466)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createNamenodeHearbeatServices(Router.java:423)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:199)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> 2018-03-01 18:05:56,208 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
> router
> {code}
> Then the router tries to find the local namenode, multiple properties: 
> {{dfs.namenode.rpc-address.ns1}}, {{dfs.namenode.rpc-address.ns-fed.r1}} 
> match the local address.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13214) RBF: Configuration on Router conflicts with client side configuration

2018-03-06 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389026#comment-16389026
 ] 

Tao Jie commented on HDFS-13214:


Sorry for replying late and thank you [~linyiqun] [~elgoiri] for working on 
this JIRA.
It clear to me now:)
Some other minor suggestion about the document:
1, Rebalancing data across subclusters mentioned in the document of 
2.9.0/3.0.0GA is not ready today, right? We'd better avoid misleading users 
when the function is not available (I have tried to find out the way of 
rebalancing for a while :) ). 
2, The diagram of the diagram of Architecture implies that the subclusters are 
independent HDFS clusters. Actually subclusters could also be federation 
cluster or a mixed cluster with federation and independent cluster. We could 
mention it explicitly in the document.
I'am ok to handle this in another jira.
+1 for the current patch.

> RBF: Configuration on Router conflicts with client side configuration
> -
>
> Key: HDFS-13214
> URL: https://issues.apache.org/jira/browse/HDFS-13214
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Tao Jie
>Assignee: Yiqun Lin
>Priority: Major
> Attachments: HDFS-13214.001.patch, HDFS-13214.002.patch, 
> HDFS-13214.003.patch, HDFS-13214.004.patch
>
>
> In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
> {code}
> 
> dfs.nameservices
> ns1,ns2,ns-fed
>   
>   
> dfs.ha.namenodes.ns-fed
> r1,r2
>   
>   
> dfs.namenode.rpc-address.ns1
> host1:8020
>   
>   
> dfs.namenode.rpc-address.ns2
> host2:8020
>   
>   
> dfs.namenode.rpc-address.ns-fed.r1
> host1:
>   
>   
> dfs.namenode.rpc-address.ns-fed.r2
> host2:
>   
> {code}
> {{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
> However with this configuration on server node, Router fails to start with 
> error:
> {code}
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
> at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
> at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1131)
> at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1086)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createLocalNamenodeHearbeatService(Router.java:466)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createNamenodeHearbeatServices(Router.java:423)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:199)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> 2018-03-01 18:05:56,208 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
> router
> {code}
> Then the router tries to find the local namenode, multiple properties: 
> {{dfs.namenode.rpc-address.ns1}}, {{dfs.namenode.rpc-address.ns-fed.r1}} 
> match the local address.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13219) RBF:Cluster information on Router is not correct when the Federation shares datanodes.

2018-03-02 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13219:
---
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-12615

> RBF:Cluster information on Router is not correct when the Federation shares 
> datanodes.
> --
>
> Key: HDFS-13219
> URL: https://issues.apache.org/jira/browse/HDFS-13219
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Tao Jie
>Priority: Major
>
> Now summary information on Router website aggregates summary of each 
> nameservice. However in a typical federation cluster deployment, datanodes 
> are shared among nameservices. Consider we have 2 namespaces and 100 
> datanodes in one cluster. 100 datanodes are available for each namespace, but 
> we see 200 datanodes on the router website. So does other information such as 
> {{Total capacity}}, {{Remaining capacity}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13219) RBF:Cluster information on Router is not correct when the Federation shares datanodes.

2018-03-02 Thread Tao Jie (JIRA)
Tao Jie created HDFS-13219:
--

 Summary: RBF:Cluster information on Router is not correct when the 
Federation shares datanodes.
 Key: HDFS-13219
 URL: https://issues.apache.org/jira/browse/HDFS-13219
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Tao Jie


Now summary information on Router website aggregates summary of each 
nameservice. However in a typical federation cluster deployment, datanodes are 
shared among nameservices. Consider we have 2 namespaces and 100 datanodes in 
one cluster. 100 datanodes are available for each namespace, but we see 200 
datanodes on the router website. So does other information such as {{Total 
capacity}}, {{Remaining capacity}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13214) RBF: Configuration on Router conflicts with client side configuration

2018-03-01 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383236#comment-16383236
 ] 

Tao Jie commented on HDFS-13214:


[~elgoiri] [~ywskycn] [~linyiqun] thank you for your response.
{quote}
In our internal setup, we configure dfs.nameservice.id.
{quote}
 I have made brief test about HA/non-HA configuration, once we don't specify 
{{dfs.nameservice.id}}, the Exception occurs no matter whether HA is enabled. 
So I don't think HA/non-HA mode is directly related to this issue.
In current code logic, we try to find the local namenode host from the 
configuration. So I think we should set {{dfs.nameservice.id}} to {{ns1}} or 
{{ns2}} rather than {{ns-fed}}. Otherwise the Router would think itself as the 
local namenode by mistake.
Today property {{dfs.nameservice.id}} is not a necessary one in a federation 
cluster (HA or non-HA), right?
1, We can complete the document and ensure {{dfs.nameservice.id}} must be 
specified on Router node.
2, Improve the logic of finding the local namenode address in case of 
{{dfs.nameservice.id}} not be specified.
Please correct me if I am wrong.

> RBF: Configuration on Router conflicts with client side configuration
> -
>
> Key: HDFS-13214
> URL: https://issues.apache.org/jira/browse/HDFS-13214
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Tao Jie
>Priority: Major
>
> In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
> {code}
> 
> dfs.nameservices
> ns1,ns2,ns-fed
>   
>   
> dfs.ha.namenodes.ns-fed
> r1,r2
>   
>   
> dfs.namenode.rpc-address.ns1
> host1:8020
>   
>   
> dfs.namenode.rpc-address.ns2
> host2:8020
>   
>   
> dfs.namenode.rpc-address.ns-fed.r1
> host1:
>   
>   
> dfs.namenode.rpc-address.ns-fed.r2
> host2:
>   
> {code}
> {{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
> However with this configuration on server node, Router fails to start with 
> error:
> {code}
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
> at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
> at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1131)
> at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1086)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createLocalNamenodeHearbeatService(Router.java:466)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createNamenodeHearbeatServices(Router.java:423)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:199)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> 2018-03-01 18:05:56,208 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
> router
> {code}
> Then the router tries to find the local namenode, multiple properties: 
> {{dfs.namenode.rpc-address.ns1}}, {{dfs.namenode.rpc-address.ns-fed.r1}} 
> match the local address.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13214) RBF: Configuration on Router conflicts with client side configuration

2018-03-01 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13214:
---
Description: 
In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
{code}

dfs.nameservices
ns1,ns2,ns-fed
  
  
dfs.ha.namenodes.ns-fed
r1,r2
  
  
dfs.namenode.rpc-address.ns1
host1:8020
  
  
dfs.namenode.rpc-address.ns2
host2:8020
  
  
dfs.namenode.rpc-address.ns-fed.r1
host1:
  
  
dfs.namenode.rpc-address.ns-fed.r2
host2:
  
{code}
{{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
However with this configuration on server node, Router fails to start with 
error:
{code}
org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
addresses that match local node's address. Please configure the system with 
dfs.nameservice.id and dfs.ha.namenode.id
at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1131)
at 
org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1086)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.createLocalNamenodeHearbeatService(Router.java:466)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.createNamenodeHearbeatServices(Router.java:423)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:199)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
2018-03-01 18:05:56,208 ERROR 
org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
router
{code}
Then the router tries to find the local namenode, multiple properties: 
{{dfs.namenode.rpc-address.ns1}}, {{dfs.namenode.rpc-address.ns-fed.r1}} match 
the local address.


  was:
In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
{code}

dfs.nameservices
ns1,ns2,ns-fed
  
  
dfs.ha.namenodes.ns-fed
r1,r2
  
  
dfs.namenode.rpc-address.ns1
host1:8020
  
  
dfs.namenode.rpc-address.ns2
host2:8020
  
  
dfs.namenode.rpc-address.ns-fed.r1
host1:
  
  
dfs.namenode.rpc-address.ns-fed.r2
host2:
  
{code}
{{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
However with this configuration on server node, Router fails to start with 
error:
{code}
org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
addresses that match local node's address. Please configure the system with 
dfs.nameservice.id and dfs.ha.namenode.id
at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1131)
at 
org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1086)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.createLocalNamenodeHearbeatService(Router.java:466)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.createNamenodeHearbeatServices(Router.java:423)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:199)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
2018-03-01 18:05:56,208 ERROR 
org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
router
{code}


> RBF: Configuration on Router conflicts with client side configuration
> -
>
> Key: HDFS-13214
> URL: https://issues.apache.org/jira/browse/HDFS-13214
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Tao Jie
>Priority: Major
>
> In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
> {code}
> 
> dfs.nameservices
> ns1,ns2,ns-fed
>   
>   
> dfs.ha.namenodes.ns-fed
> r1,r2
>   
>   
> dfs.namenode.rpc-address.ns1
> host1:8020
>   
>   
> dfs.namenode.rpc-address.ns2
> host2:8020
>   
>   
> dfs.namenode.rpc-address.ns-fed.r1
> host1:
>   
>   
> dfs.namenode.rpc-address.ns-fed.r2
> host2:
>   
> {code}
> {{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
> However with this configuration on server node, Router fails to start with 
> error:
> {code}
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
> at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
> at 

[jira] [Created] (HDFS-13214) RBF: Configuration on Router conflicts with client side configuration

2018-03-01 Thread Tao Jie (JIRA)
Tao Jie created HDFS-13214:
--

 Summary: RBF: Configuration on Router conflicts with client side 
configuration
 Key: HDFS-13214
 URL: https://issues.apache.org/jira/browse/HDFS-13214
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Tao Jie


In a typical router-based federation cluster, hdfs-site.xml is supposed to be:
{code}

dfs.nameservices
ns1,ns2,ns-fed
  
  
dfs.ha.namenodes.ns-fed
r1,r2
  
  
dfs.namenode.rpc-address.ns1
host1:8020
  
  
dfs.namenode.rpc-address.ns2
host2:8020
  
  
dfs.namenode.rpc-address.ns-fed.r1
host1:
  
  
dfs.namenode.rpc-address.ns-fed.r2
host2:
  
{code}
{{dfs.ha.namenodes.ns-fed}} here is used for client to access the Router. 
However with this configuration on server node, Router fails to start with 
error:
{code}
org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
addresses that match local node's address. Please configure the system with 
dfs.nameservice.id and dfs.ha.namenode.id
at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1198)
at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1131)
at 
org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1086)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.createLocalNamenodeHearbeatService(Router.java:466)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.createNamenodeHearbeatServices(Router.java:423)
at 
org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:199)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
2018-03-01 18:05:56,208 ERROR 
org.apache.hadoop.hdfs.server.federation.router.DFSRouter: Failed to start 
router
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13032) Make AvailableSpaceBlockPlacementPolicy more adaptive

2018-01-22 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13032:
---
Attachment: HDFS-13032.002.patch

> Make AvailableSpaceBlockPlacementPolicy more adaptive
> -
>
> Key: HDFS-13032
> URL: https://issues.apache.org/jira/browse/HDFS-13032
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13032.001.patch, HDFS-13032.002.patch
>
>
> In a heterogeneous HDFS cluster, datanode capacity and usage are very 
> different.
> Now we can use HDFS-8131, a usage-aware block placement policy to deal with 
> the problem. However, this policy could be more flexible.
> 1, The probability of a node with high usage being chosen is fixed once the 
> parameter is set. That is the probability is always the same no matter its 
> usage is 90% or 70%. When the usage of a node is close to full, its 
> probability of being chosen should be lower.
> 2, When the difference of usage is below 5%(hard code), the two nodes are 
> considered the same usage. I think it's OK when usage is 30% and 35%, but 
> when usage is 93% and 98%, they should not be treated equally. The correction 
> of probability could be more smooth.
> In my opinion, when we choose one node from two candidates (A: usage 30%, B: 
> usage 60%), we can calculate the probability according to the available 
> storage. p(A) = 70%/(70% + 40%), p(B) = 40% (70% +40%). When a node is close 
> to full, the probability would be very small.
> Also we could have another factor to weaken this correctness, and make the 
> modification not so aggressive.
> Any thought? [~liushaohui]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13032) Make AvailableSpaceBlockPlacementPolicy more adaptive

2018-01-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-13032:
---
Status: Patch Available  (was: Open)

> Make AvailableSpaceBlockPlacementPolicy more adaptive
> -
>
> Key: HDFS-13032
> URL: https://issues.apache.org/jira/browse/HDFS-13032
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Priority: Major
> Attachments: HDFS-13032.001.patch
>
>
> In a heterogeneous HDFS cluster, datanode capacity and usage are very 
> different.
> Now we can use HDFS-8131, a usage-aware block placement policy to deal with 
> the problem. However, this policy could be more flexible.
> 1, The probability of a node with high usage being chosen is fixed once the 
> parameter is set. That is the probability is always the same no matter its 
> usage is 90% or 70%. When the usage of a node is close to full, its 
> probability of being chosen should be lower.
> 2, When the difference of usage is below 5%(hard code), the two nodes are 
> considered the same usage. I think it's OK when usage is 30% and 35%, but 
> when usage is 93% and 98%, they should not be treated equally. The correction 
> of probability could be more smooth.
> In my opinion, when we choose one node from two candidates (A: usage 30%, B: 
> usage 60%), we can calculate the probability according to the available 
> storage. p(A) = 70%/(70% + 40%), p(B) = 40% (70% +40%). When a node is close 
> to full, the probability would be very small.
> Also we could have another factor to weaken this correctness, and make the 
> modification not so aggressive.
> Any thought? [~liushaohui]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11638) Support marking a datanode dead by DFSAdmin

2017-04-17 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971129#comment-15971129
 ] 

Tao Jie commented on HDFS-11638:


In the attached patch, we add an option in dfsadmin:
{{hdfs dfsadmin -shutdownDatanode  force}} 
When the option "force" is set, the dfsadmin would send a request to the 
namenode and mark the datanode as dead immediately.

> Support marking a datanode dead by DFSAdmin
> ---
>
> Key: HDFS-11638
> URL: https://issues.apache.org/jira/browse/HDFS-11638
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Jie
> Attachments: HDFS-11638-001.patch
>
>
> We have met such a circumstance that:
> Kernal error occured on one slave node, and error message like
> {code}
> Apr 1 08:48:05 xxhdn033 kernel: BUG: soft lockup - CPU#0 stuck for 67s! 
> [java:19096]
> Apr 1 08:48:05 xxhdn033 kernel: Modules linked in: bridge stp llc fuse 
> autofs4 bonding ipv6 uinput iTCO_wdt iTCO_vendor_support microcode 
> power_meter acpi_ipmi ipmi_si ipmi_msghandler sb_edac edac_core joydev 
> i2c_i801 i2c_core lpc_ich mfd_core sg ses enclosure ixgbe dca ptp pps_core 
> mdio ext4 jbd2 mbcache sd_mod crc_t10dif ahci megaraid_sas dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> {code}
> The datanode process was still alive and continued to send heartbeat to the 
> namenode, but it could not response any command to this node and reading or 
> writing blocks on this datanode would fail. As a result, request to the HDFS 
> would be slower since too many read/write timeout.
> We try to walk around this case by adding a dfsadmin command that mark such a 
> abnormal datanode as dead by force until it get restarted. When this case 
> happens again, it would avoid the client to access the error datanode.
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11638) Support marking a datanode dead by DFSAdmin

2017-04-17 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated HDFS-11638:
---
Attachment: HDFS-11638-001.patch

> Support marking a datanode dead by DFSAdmin
> ---
>
> Key: HDFS-11638
> URL: https://issues.apache.org/jira/browse/HDFS-11638
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Jie
> Attachments: HDFS-11638-001.patch
>
>
> We have met such a circumstance that:
> Kernal error occured on one slave node, and error message like
> {code}
> Apr 1 08:48:05 xxhdn033 kernel: BUG: soft lockup - CPU#0 stuck for 67s! 
> [java:19096]
> Apr 1 08:48:05 xxhdn033 kernel: Modules linked in: bridge stp llc fuse 
> autofs4 bonding ipv6 uinput iTCO_wdt iTCO_vendor_support microcode 
> power_meter acpi_ipmi ipmi_si ipmi_msghandler sb_edac edac_core joydev 
> i2c_i801 i2c_core lpc_ich mfd_core sg ses enclosure ixgbe dca ptp pps_core 
> mdio ext4 jbd2 mbcache sd_mod crc_t10dif ahci megaraid_sas dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> {code}
> The datanode process was still alive and continued to send heartbeat to the 
> namenode, but it could not response any command to this node and reading or 
> writing blocks on this datanode would fail. As a result, request to the HDFS 
> would be slower since too many read/write timeout.
> We try to walk around this case by adding a dfsadmin command that mark such a 
> abnormal datanode as dead by force until it get restarted. When this case 
> happens again, it would avoid the client to access the error datanode.
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11638) Support marking a datanode dead by DFSAdmin

2017-04-10 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963715#comment-15963715
 ] 

Tao Jie commented on HDFS-11638:


[~shahrs87], yes , in this case, when the datanode began to be panic(maybe 
kernal bug or hardware failure), the node lost connection to the Ambari and we 
could not login to this node. 
Everything returned to normal once we restarted the bad node. We are trying to 
handle this case automatic, so we attempt to mark the bad datanode as dead by 
the monitoring system(Ambari)  when this case happen again.

> Support marking a datanode dead by DFSAdmin
> ---
>
> Key: HDFS-11638
> URL: https://issues.apache.org/jira/browse/HDFS-11638
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Tao Jie
>
> We have met such a circumstance that:
> Kernal error occured on one slave node, and error message like
> {code}
> Apr 1 08:48:05 xxhdn033 kernel: BUG: soft lockup - CPU#0 stuck for 67s! 
> [java:19096]
> Apr 1 08:48:05 xxhdn033 kernel: Modules linked in: bridge stp llc fuse 
> autofs4 bonding ipv6 uinput iTCO_wdt iTCO_vendor_support microcode 
> power_meter acpi_ipmi ipmi_si ipmi_msghandler sb_edac edac_core joydev 
> i2c_i801 i2c_core lpc_ich mfd_core sg ses enclosure ixgbe dca ptp pps_core 
> mdio ext4 jbd2 mbcache sd_mod crc_t10dif ahci megaraid_sas dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> {code}
> The datanode process was still alive and continued to send heartbeat to the 
> namenode, but it could not response any command to this node and reading or 
> writing blocks on this datanode would fail. As a result, request to the HDFS 
> would be slower since too many read/write timeout.
> We try to walk around this case by adding a dfsadmin command that mark such a 
> abnormal datanode as dead by force until it get restarted. When this case 
> happens again, it would avoid the client to access the error datanode.
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11638) Support marking a datanode dead by DFSAdmin

2017-04-10 Thread Tao Jie (JIRA)
Tao Jie created HDFS-11638:
--

 Summary: Support marking a datanode dead by DFSAdmin
 Key: HDFS-11638
 URL: https://issues.apache.org/jira/browse/HDFS-11638
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tao Jie


We have met such a circumstance that:
Kernal error occured on one slave node, and error message like
{code}
Apr 1 08:48:05 xxhdn033 kernel: BUG: soft lockup - CPU#0 stuck for 67s! 
[java:19096]
Apr 1 08:48:05 xxhdn033 kernel: Modules linked in: bridge stp llc fuse autofs4 
bonding ipv6 uinput iTCO_wdt iTCO_vendor_support microcode power_meter 
acpi_ipmi ipmi_si ipmi_msghandler sb_edac edac_core joydev i2c_i801 i2c_core 
lpc_ich mfd_core sg ses enclosure ixgbe dca ptp pps_core mdio ext4 jbd2 mbcache 
sd_mod crc_t10dif ahci megaraid_sas dm_mirror dm_region_hash dm_log dm_mod 
[last unloaded: speedstep_lib]
{code}
The datanode process was still alive and continued to send heartbeat to the 
namenode, but it could not response any command to this node and reading or 
writing blocks on this datanode would fail. As a result, request to the HDFS 
would be slower since too many read/write timeout.
We try to walk around this case by adding a dfsadmin command that mark such a 
abnormal datanode as dead by force until it get restarted. When this case 
happens again, it would avoid the client to access the error datanode.
Any thought?




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9149) Consider multi datacenter when sortByDistance

2017-01-06 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804190#comment-15804190
 ] 

Tao Jie commented on HDFS-9149:
---

Hi, [~hexiaoqiao],
We have the similar situation, and I wonder if *NetworkTopologyWithNodeGroup* 
could satisfy your requirement here since it also provides a 4-layers 
hierarchical network topology.

> Consider multi datacenter when sortByDistance
> -
>
> Key: HDFS-9149
> URL: https://issues.apache.org/jira/browse/HDFS-9149
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: He Xiaoqiao
>Assignee: He Tianyi
>
> {{sortByDistance}} doesn't consider multi-datacenter when read data, so there 
> my be reading data via other datacenter when hadoop deployment with multi-IDC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10359) Allow trigger block report from all datanodes

2016-05-03 Thread Tao Jie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270024#comment-15270024
 ] 

Tao Jie commented on HDFS-10359:


[~kihwal], [~cnauroth], [~arpitagarwal], [~linyiqun] Thank you for reply!
I understand that trigger block report for all datanode would bring too much 
presure for namenode.
Our scenario is we use *setrep -w* command to ensure replications of blocks on 
all datanodes sometime. Blocks on datanode may lost somehow, however namenode 
would not notice block missing until block report in 6 hours. In this case, we 
suppose to trigger block report for all datanodes before *setrep -w*. Further 
more, if we want to set replication of blocks to 1, some blocks may corrupt.
It is OK to use a script to trigger block report from all datenodes, or just 
restart namenode.
I am not very familiar with this logic, if I am wrong or there is a better way, 
please correct me.


> Allow trigger block report from all datanodes
> -
>
> Key: HDFS-10359
> URL: https://issues.apache.org/jira/browse/HDFS-10359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Tao Jie
>
> Since we have HDFS-7278 allows trigger block report from one certain 
> datanode. It would be helpful to add a option to this command to trigger 
> block report from all datanodes.
> Command maybe like this:
> *hdfs dfsadmin -triggerBlockReport \[-incremental\] 
> *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10359) Allow trigger block report from all datanodes

2016-05-03 Thread Tao Jie (JIRA)
Tao Jie created HDFS-10359:
--

 Summary: Allow trigger block report from all datanodes
 Key: HDFS-10359
 URL: https://issues.apache.org/jira/browse/HDFS-10359
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.6.1, 2.7.0
Reporter: Tao Jie


Since we have HDFS-7278 allows trigger block report from one certain datanode. 
It would be helpful to add a option to this command to trigger block report 
from all datanodes.
Command maybe like this:
*hdfs dfsadmin -triggerBlockReport \[-incremental\] 
*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org